Tuesday, 8 May 2018

Programming Portfolio

Software Ray Tracer: 
Link: https://ihorszlachtycz.blogspot.ca/2018/05/software-ray-tracer-pixel-scratcher.html
Source Link: https://github.com/Ihaa21/SoftwareRayTracer

Familiarized myself with trends in current game lighting and global illumination by coding a C++ 3D software ray tracer with support for indirect light bounces and transparent materials from scratch.

OpenGL Deferred Renderer:
Link: https://ihorszlachtycz.blogspot.ca/2018/05/deferred-renderer-and-model-loader-coco.html
Source Link: https://github.com/Ihaa21/Coco-Deferred-Renderer

Built a program using C++ and OpenGL which rendered 3D geometry from .obj files with support for texture mapping and deferred shading in GLSL shaders. This project was driven by my interest in 3d rendering and shader programming.

C Compiler:

Link: https://ihorszlachtycz.blogspot.ca/2018/05/c-compiler-nutella.html
Source Link: https://github.com/Ihaa21/nutella_compiler

Gained understanding about how code gets translated and read by a CPU by building a compiler that translates a subset of C to x86 assembly. The compiler supports runtime execution of code and was built from scratch in C++.

Independent Video Game:


Link: https://ihorszlachtycz.blogspot.ca/2018/05/independent-video-game-rosemary.html

Worked independently full time and passionately for 9 months on a multi platform video game app for Android and iOS written completely in C++ with a memory management, profiling tools, and an OpenGL ES renderer. The project is entirely self-driven and built without any external libraries.

Software Rasterizer:
Link: https://ihorszlachtycz.blogspot.ca/2018/05/software-triangle-rasterizer-kiwi.html
Source Link: https://github.com/Ihaa21/Kiwi-Software-Rasterizer

Implemented a 3D software renderer in C++ that renders transformed textured triangles with depth buffering, and a top left filling rule. By doing so, I gained a greater understanding about the perspective transform, and the actions a GPU must perform in order to produce images on our screens.

Realtime Software Octree Renderer:
Link: http://ihorszlachtycz.blogspot.com/2018/06/sparse-voxel-octree-renderer-fluffy.html
Source Link: https://github.com/Ihaa21/Fluffy

Implemented a realtime octree software renderer in C++ which can render highly detailed objects on a single core. The renderer implements a level of detail algorithm to only render per pixel detail, making rendering of complex models dependent on screen size instead of model complexity.

C++ Neural Network:

Link: https://ihorszlachtycz.blogspot.ca/2018/05/neural-network-cognition.html
Source Link: https://github.com/Ihaa21/Cognition-NeuralNetwork-

Coded a neural network that recognizes and reconstructs handwritten digits in images from scratch in C++. This project was inspired by my current research in the field of artificial intelligence (AI) and image recognition.

C Compiler (Nutella)

Nutella is a compiler that transforms a subset of C to x86 assembly. It supports integers, floats, pointers and structs as well as if statements and while statements. It has support for (+ - * \) math operations, (==, !=, >=, <=, >, <) comparison operations as well as (*, &) pointer operations. The compiler also supports compile time code execution with support to run math operations at compile time. The code for this project can be found at https://github.com/Ihaa21/nutella_compiler.

This project started off as a school project where we where supposed to work in groups to build a compiler. I was fascinated by the idea of building a compiler so I decided to build one on my own without external tools to help parse code. Below I will be detailing how the compiler converts a subset of C code to x86 assembly:

Parsing:

The compiler starts off by parsing the code into tokens, and while parsing it builds a type list, function list, and a IR graph (intermediate representation). The IR graph has nodes that represent variable definition, assignment, as well as all the operations supported by the compiler such as +, -, etc.

Intermediate Representation:

The IR graph handles if statements by storing a node for the if statement itself, a graph node that points to a subgraph to execute if the if statement is taken, as well a node to the first line of code outside of the if statement. While loops are stored in a similar fashion. This method of storage is useful for the compiler backend because the backend will be able to know where to insert jump statements in the code and to which lines the jumps have to jump to. It also is useful because of the compilers feature for compile time code execution.


Compile Time Code Execution:

Compile time code execution was inspired by Jonathan blow's language where you can put a #run statement next to any statement of code, and that statement will be executed at compile time of theprogram. This avoids the need for things like template meta programming when we want to pre-compute constants for our code, and it can be extended to build look up tables and do static code analysis. In Nutella, the #run directive works the same way and it supports precomputation of constants in the code as well as function calls.

x86 Backend:

After the IR graph is built, the program executes any #run statements through an interperter that understands how to execute the IR graph. Once the constants are placed into the code, we invoke the backend to convert the IR to x86 assembly.

The backend steps through the graph and calculates the stack offsets of each local variable in the program. Afterwards, it steps through the graph again and it converts every node to the x86 assembly instructions that represent it. The backend tries to keep as few registers in use at any time as possible, so for every instruction, it loads the value from the stack that is needed into registers, it operates on the values, and then it pushes the result back to the stack.

Working on this project was one of the coolest thing's I've ever coded, it opened up my eyes to how code optimization occurs, the kind of information interpreters have to process, and it also taught me a lot about parsing strings. Building the compiler from scratch was also useful because I got to see how the compiler has to handle various error cases and outputting the correct error message. If you'd like to check the code for this project, it can be found at https://github.com/Ihaa21/nutella_compiler.

Software Triangle Rasterizer (Kiwi)


Kiwi is a software triangle rasterizer I built in C++. The rasterizer supports textured triangles with depth interpolation, and a top left filling rule. The code for this project can be found at https://github.com/Ihaa21/Kiwi-Software-Rasterizer/tree/master/code. The Kiwi renderer works in 2 stages:

Vertex Processing:

The renderer begins with a vertex shader stage where every vertex gets transformed into pixel coordinates. At this stage, data such as Z and UV are setup to be interpolated during rasterization. The vertices are also sorted into counter clockwise winding.

Rasterization and Fragment Processing:

Afterwards, Kiwi loops through every triangle and performs triangle discarding tests, as well as screen clipping. I calculate a minimum bounding box around the triangle and begin rasterizing the triangle. Once the program encounters a pixel that is within the triangle, Kiwi performs a depth test, interpolates the triangles Z and UV values, samples the triangles texture, and renders the pixel onto the screen. The results of this method can be seen below:




Independent Video Game (Rosemary)

Rosemary is a tower defense game I've been working on independently for iOS and Android. The game is written completely in C++ and uses no external libraries. There are over 50 different types of unique monsters, each with their own abilities and behaviors. I've written every component of Rosemary myself and have been coding/designing the game on my own for almost a year. Below, I go over how the systems work and some of the problems I faced during development:

Debugging/Profiling

Rosemary has a built in profiler that measures how long function calls take, and graphs them relative to the frames duration. The profiler logs function timings with calls to TIMED_FUNC() at the beginning of function definition. After the timings are collected, it is processed and transformed to be displayed on screen. Below is an example of profilers output:
The multi colored blocks represent the time span a function took to run (the function can be seen by hovering over the block). The top blocks are the first measured function calls. The lower blocks are function calls that are called from inside the higher blocks. The black bar at the end of the screen is where the 16.6ms target is. If we pass that bar, it means the game is running slower than the target frame rate. Rosemary stores 256 frames of debug data and by clicking on the frame bar at the top of the screen, we can pause the profiling to view the function call times for a different frame. Rosemary also can display a Top Clock function list, which is a list of functions that took the most CPU time to run. 
Run time Compilation

Rosemary supports run time compilation of C++ code. This is achieved by having a platform specific DLL (Win32, Android, etc) and a game DLL. The platform DLL stores only the platform-specific code that the game needs to run (creates a window, talks with the sound driver, etc). The game DLL stores all the game code, from rendering to AI. When I recompile the game DLL, the platform layer notices that the game DLL has changed, so it swaps the old DLL in memory with the new DLL, and updates any function pointers that it relies on . Doing so allows me to recompile my game at run time and see the changes occur in real time:
Path finding

Since Rosemary is a tower defense where the player builds mazes; I need to have a system that determines where the monsters have to go to get across the maze. To do so, the game divides the world into tiles that are labelled as empty or taken, and performs a breadth first search to figure out how to make monsters get across the maze. Unfortunately, a regular BFS causes monsters to sometimes walk in "L" motions when they should be walking diagonally. To solve this, I had the algorithm alternate which neighbors would be added to the BFS queue depending on the position of the cell in the world. This gave the proper zigzag movement that looked a lot more natural and avoided more damage from the towers.
To make the system even more visually pleasing, I wanted to get rid of the zigzagging that occurred and have the monsters move diagonally when possible. To do so, once the monsters get to a tile and are searching where to go next, the game finds where the next tile and the next tile after that is. It then shoots a ray cast to see if moving to the farther tile is allowed, and if so, the monster will move in that direction. This got rid of the zigzag motion and made the monsters movement look a lot more appealing.
Sound Mixing

The game loads WAV sound files and stores stereo sample buffers in the asset file for loading at runtime. The platform layers notify the game with the number of samples the game should write, and how many samples have elapsed since the previous frame. The game then loops through all the sounds that are currently playing, and it concatenates the samples together into the output buffer. Before concatenating, the game performs any pitch changes that the sounds metadata requests. Usually, the game writes 2 seconds of sound extra to prevent audio cutting during any dropped frames.

Entity System

Towers are non moving and are all the same size, they have AI that looks for targets to attack and can have more than one ability that they cast. Projectiles always move towards a target and have logic to handle responses when the target dies, and what to do on impact. Projectiles are usually created by abilities that a monster or tower are casting, so they have different AI states to handle those abilities. An example of this is a spike tower that lays spikes within a radius around it. The spikes themselves are projectiles in an inactive state so when a monster gets near the projectile, it takes damage and the spike is destroyed.
Monsters are the most complex entities in the game with various types of AI's such as flying, walking, swimming and burrowing. They can have multiple abilities and therefore, use similar AI functions to cast abilities as towers. There is also a meta entity called a monster group. This is for bosses that are made out of multiple monsters, but that still function as a single entity. The Centipede monster for example, is made up of multiple smaller monsters chained together in a line. The monster group state stores pointers to all the monsters it controls and has a separate update loop for them. It then sets their state so that the monster AI loop will correctly simulate the monsters for the monster group state needs.
Memory Management

Rosemary is built to only ever perform one allocation from the OS in the entire programs life cycle. At startup, the game requests a large block of memory from the OS. Once it receives that memory, the game partitions it such that every system has access to as much memory as it needs.  For regular allocations, Rosemary mostly uses linear allocators and free list allocators. For example, I use a free list allocator for all the entities in my game but I use a linear allocator for partitioning the game memory across systems and for temporary single frame allocations.

For asset memory management, every level requests a set of assets from the asset file that are used in that level, and upon level start up, those assets are loaded. Once we quit out of the level, the assets memory is marked as usable and overwritten by the next level that we play. This way, the game only needs enough asset memory for a single level, which helps with cutting down memory usage.

Rendering

Rosemary has a OpenGL/OpenGL ES renderer that supports instancing, SRGB correct images, and premultiplied alpha. The renderer has a command buffer that the game logic populates as its simulating a frame. This buffer contains any command that the game pushes to render. An example of a command is to render a sprite. While receiving commands, the command buffer tries to batch as many commands together based on texture used as it can, to lower driver overhead later when rendering. 

Deferred Renderer and Model Loader (Coco)


Coco is a C++ OpenGL forward/deferred renderer I wrote with support for point lights and directional lights. The program also has a multi threaded asset builder where the asset file is built by jobs from a lockless multi producer multi consumer queue. The code for this project can be found at https://github.com/Ihaa21/Coco-Deferred-Renderer.

This project originally started as an attempt to write my own .obj and .mtl loader that could load sponza and store it inside my asset file format. The asset file stores a header which itself stores offsets into the file for arrays of asset meta data. The asset meta data can be a texture struct that stores a width, height, and a offset into the asset file for the texel array.


For model assets, I had a array of model meta data, where every index would store the number of vertices for that mesh and a offset in the asset file for the vertex array.


In my effort to load sponza, I began adding support for mesh groups in obj files. Obj files can store multiple individual meshes which when rendered together, form the whole object. Sponza implements this, so to add support, I reworked the layout of asset meshes in my asset file. Instead of having a model store an offset to an array of vertices, I had the model store an offset to an array of meshes, each of which would individually store offsets into separate arrays of vertices.

Afterwards, I added mtl file loading, to texture my obj files. The mtl files provide names for materials which the obj file references for each mesh. I had the program first load the mtl file, store all of the textures in the asset file, and then load the obj file. To accommodate textures, I added a separate texture array for each model, and I had meshes reference this array to assign textures to themselves. Doing so, I was able to correctly load sponza with textures, and render it in OpenGL.


Once sponza was loaded correctly, I decided to multi thread my program to make the asset loader load different assets on different cores, increasing the speed of my program. I achieved this by building a lock free multi producer multi consumer queue. The queue stores a ring buffer of jobs and indexes to the currently being read job and to the last written job. I have a separate queue for loading data into the asset file that was executed by a dedicated thread, since only 1 thread can write to the asset file at any given time.

Since object files with textures require loading of many separate files, I decided to have a job which load the requested mtl file and all of its associated textures. Each texture load was itself a separate job, so to guarantee that after the texture loads I execute a obj load job, I added a separate high priority job queue. This job queue would always be checked first for remaining jobs, and it would only execute the job if all of its dependent jobs had finished processing. Doing so, I successfully loaded sponza using multiple threads into the asset file.

After multi threading the asset build system, I implemented a deferred rendering pipeline in OpenGL. All the geometry of my scene is loaded into a GBuffer containing world pos, normals, diffuse texels, and specular lighting data. Afterwards, the deferred shader renders every point light using a stencil pass and a lighting pass. The stencil pass modifies the stencil buffer by 1 for every front face of the point light sphere which isn't occluded, and by -1 for every back face of the point light that isn't occluded. Afterwards, during the lighting pass, the point light is culled if the stencil buffer's value isn't positive, helping us remove unnecessary lighting calculations from slowing down our application. Doing so let me render scenes with hundreds of lights as can be seen below:




Software Ray Tracer (Pixel Scratcher)


PixelScratcher is a real time software ray tracer written in C++ for Windows. Currently it implements ray tracing spheres, boxes and planes, with support for diffuse, specular, reflections, and refractions. The code can be found at https://github.com/Ihaa21/SoftwareRayTracer. Below I will detail the process of creating the raytracer.

I began the project during university after I saw lots of incredible research results on building real time raytracers, and I found tutorials at scratchapixel.com detailing how to implement the phong rendering equation. I tried my best to only use those tutorials as references and to code as much as I could on my own.

I used my code from Handmade Hero (a tutorial on making games from scratch), and I took out all the program specific code in it. I wanted to use the handmade hero code because it supported runtime compilation of code by swapping DLL's. Doing so, helped me test my project quicker since I could change my algorithms in code and recompile right away to see the changes in my app without having to close everything downand reopen.

After I had the app code setup, I initialize a buffer that stores the ray directions in camera space and I rendered it to the screen to see if the patterns looked correct.


Once I got the normals setup correctly, I derived a sphere/ray intersection formula and implemented it to render the spheres in my scene. Using that, I was able to draw my first sphere. I then began by visualizing the normals for the sphere, and checked to see if they where visually consistent.


Afterwards, I added my first lights into the scene. Since I didn't have a lighting model nor did the geometry have any color, I made every ray intersection shoot a shadow ray that would try to hit the light in my scene. If the shadow ray hit a light, it set the surface's value it came from, to the lights color. I then moved the light around to see if the bouncing code was working correctly. This can be seen below:


I turned to scratchapixel's tutorial on the diffuse + specular lighting model and implemented it for my spheres. Since I was already shooting shadow rays in the above example, adding a floor to the spheres generated shadows for me with the same implementation.


To capture indirect lighting, I added the ability for lights to bounce multiple times across the scene. I also added support for reflective materials for my geometry which better tested how well the indirect lighting was working.



Finally I finished the phong model by implementing refraction using a index that controls how bent the light ray becomes once it hits a surface. Calculating refraction was tricky because I had to be able to check if a ray is inside a sphere and when it hits the outside. I followed scratchapixels tutorial on the lighting model and fresnel equations to implement the desired refraction effect which can be seen below:


This program was my first attempt at building a ray tracer of any kind, and I learnt a lot about how the lighting model works as well as various issues that arise when attempting to implement indirect lighting. The code for this project can be found at https://github.com/Ihaa21/SoftwareRayTracer so feel free to check it out!

Neural Network (Cognition)

Cognition is a neural network I built in C++ for windows without any external libraries. The program supports arbitrarily long networks of feed forward layers, with backpropogation + momentum as the learning algorithm. In the code, you can find a 3 layer classifier network that classifies MNIST digits with over 90% training accuracy, and a 3 layer auto encoder that encodes digits into a 32 vector with 90% training decoding accuracy. The code can be found at https://github.com/Ihaa21/Cognition-NeuralNetwork-.

I began this project after taking a course on neural networks and exposing myself to the state of the art research in the field. Cognition was my attempt at building a neural network from scratch which I built to fully grasp the concepts in neural networks.

The neural networks start with their connections initialized to random values, and using back propagation (the learning rule), the connections of the neurons are adjusted to minimize the error for the particular task they are executing.


We can see this in the auto encoder image on the left where close to initialization time, the network reconstructs very blurry images of the digits. The image on the right is after the network had already trained on the data set and it gets much more accurate at reconstructing the input digits.