Jump to content
  • Advertisement

Vilem Otte

  • Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by Vilem Otte

  1. Vilem Otte


    GROOM GROOM is my entry into DOOM challenge here on GameDev.net, GROOM is a simple first person shooter, which has many similarities with original DOOM game, and is somehow supposed to re-create DOOM experience (which was different for every single one of us). The game itself is very simple and small - I've inspired by simple map from DOOM 2 (Dead Simple), and made a very basic art for it, including skeletal animated enemy. TECHNOLOGY Most of us remember DOOM as one of the first 3D games, which has used ray casting for rendering. To recreate similar experience, GROOM is using a custom path tracing renderer (actual rendering happens on GPU, but some preparation - like acceleration structures happen on CPU). Due to this the game may seem noisy and tends to be quite heavy on the hardware (although it can run on non-RTX GPUs, as the whole renderer is written in OpenCL). The scene acceleration structure (multi level one) gets rebuilt each single frame, including some of the bottom level trees which contain characters with skeletal animation. The art is quite programmerish-style, and was put within last 48 hours before release. If you are able to run the project, I'd be glad to hear any feedback about the renderer itself - and hopefully it runs for you and you enjoyed it! SCREENSHOTS Fig. 01 - Real time path traced game level Fig. 02 - Real time path traced enemies! Fig. 03 - Main menu DOWNLOAD GRDOOM.zip (Alternative Link) https://www.otte.cz/GRDOOM.zip CREDITS MUSIC: Royalty free music from https://www.fesliyanstudios.com SOUND: This game uses many sounds from freesound (InspectorJ, thecheeseman)
  2. Vilem Otte

    DOOM: Post Mortem

    For past 2 months there was a challenge running here on GameDev.net - this time I finally decided to participate. INTRODUCTION I, as many others, remember DOOM as one of the first (if not first) 3D games that we have played. GROOM is inspired by DOOM, built upon custom in-house ray tracing library, it allows you to experience a small map rendered with path tracer. The single level design is inspired by level from DOOM 2 - Dead Simple. This is not the only similarity with DOOM, having half-robot and half-demon spider-like enemy is another, and AI and behavior is also similar (enemies need to be 'activated' by shooting, otherwise they are not aggressive). WHAT WENT RIGHT? The whole idea of mine was bringing a real time path traced game (as DOOM was essentially a game using ray-casting). This whole had to begin with building a ray-tracer. Or to be precise a ray-tracer library that is fast enough to render complex environments at interactive frame rates. For this I was putting together something I will refer to as "OpenTracer" (subject name to change) - which ended up in OpenCL based library for ray-tracing. If something went right, it is indeed this ray-tracing library. It actually ended up having quite interesting features like: Support for BVH ray tracing (built using SAH, HLBVH, MULTI-LEVEL) MULTI-LEVEL BVH being the most useful, allowing not just for dynamic scenes, but also instancing BVHs (even bottom level) can be dynamically rebuilt (for characters with skeletal animation) Semi-programmable pipeline (would need a bit of work, but still - possible) Support for textures (building big texture atlas, which is then used during rendering) Somewhat useful library interface (building 'data buffers', requesting BVH built for those, etc.) Due to library design, it is technically re-usable Etc. I just went a bit ahead and wanted to try rendering some nice pictures, let me show an example here (of Crytek's sponza), the following image was rendered in under 1 second on AMD Radeon Rx 590 (interesting fact - it can run on Intel HD 620): Fig. 01 - Sponza path traced (note, I have placed enemy from DOOM into it) Other than that, what went right is actually finishing the project. With longer competitions (like 2-months ones) the original interest goes away after short time, and you need to keep working on the project to actually finish it. WHAT WENT WRONG? If I can state which parts I'm not satisfied in the project - the list would be quite long. But first and foremost it is the scale of the level and having a lot more programmer's art than what I wanted. The actual art for the game was made within last 72 hours before final deadline, not sooner. Luckily the first thing I did was blocking out the design for the level: Fig. 02 - Block out design of the level Of course, having a good art requires time - so I had to make as much as fast as possible - most of the models ended up being blocky. Additionally to that I was probably saving way too much geometry to end up with high performance (the actual performance hit by having more triangles isn't really a problem ... as there is BVH acceleration structure - theoretically doubling the geometry adds just single additional level to it). In the end I was actually able to build up most of the map, and an enemy (which has a hand made skeletal animation). Although not nearly at the quality I wanted. SOMETHING I LEARNED I have hit the typical problem for most challenges or jams and that is - time management is critical (I have literally planned what to have finished by when - to actually be able to finish the game). Some deadlines were done long before, some few days after. But mostly I have kept to written features/dates and that helped me finish the project. Unlike Ludum Dares (in which I participate regularly) time management for longer project is different (additionally you also have to fit in work/business and family into it ... and yes, even recent Ludum Dare was put into the plans). Apart from that, I've found out that handling dynamic scenes with ray-tracer is far harder compared to standard renderers like Direct3D (or even software rasterizeration ones). While I was working with ray-tracing a lot during my studies, I have actually never worked with fully dynamic scenes or dynamic deformable objects in it. CONCLUSION The challenge was actually quite interesting, I enjoyed it a lot (especially due to fact that I could work with ray-tracing). If you want to try it - go ahead and try playing the game.
  3. A (very simple and short) DOOM shooter like experience (Named GROOM), that isn't using ray casting, but real time path tracing. Post mortem and gameplay video will be added shortly. I hope that anyone playing it will be able to run it (very good GPU is required!), and hopefully it will be at least a bit fun for him. And yes, programmer's art included! EDIT: I'm not sure if post mortem should be posted here or not, anyways here it is -
  4. Vilem Otte

    Bare bones AAA team

    From experience - the performance isn't really that big problem anymore (or at least what it used to be). The off-the-shelf assets have different, huge problem. Composing scenes out of those tend to be hard - they often don't match each other, or simply don't fit together - and you may end up reworking some of those assets. In ideal scenario you would have one artist (or generally the fewer the better) who would make you the whole world, as that would keep art consistent throughout that world (of course that is not really feasible for AAA ... but in indie studios this is much more common, and it often ends up having more consistent art than AAA games). This issue happens quite regularly even without off-the-shelf assets and within AAA studios, more often than you would expect. Do you remember The Elder Scrolls IV: Oblivion? Not sure if you played it - but going around the world you can recognize that certain parts were made by different graphics people and art isn't as consistent as one would expect (of course, due to scale of the game, it is far beyond scope of single artist to create it within short time ... if you have never played it - compare environment around Anvil and around Imperial City).
  5. Vilem Otte


    And here it is, the gameplay, rendering and many other features are pretty much almost ready. I believe that at this point the game actually satisfies everything the challenge requests. Including, somewhat dark but still complete main menu: Fig. 01 - Main menu with fonts and some effects And the rendering was finally changed to something better than colored preview ... so naturally the game is fully path traced (it is noisy, it is slow, but it is ... AWESOME). I will probably provide also non path traced build as playing it in fullHD is pretty much not possible (doing few samples per pixel is a bit of overkill, even for such small scene). Fig. 02 - DOOM has brought us ray casting and some interesting new effects, GROOM follows the spirit of DOOM and implements fully path traced rendering And yes, I've actually named it GROOM (no idea why - it just sounded well to me). So, there will be few more updates improving the playable area slightly, but generally this is it - I know it is small, but pulling off GPU ray tracer for dynamic scenes, that isn't too slow, isn't as easy as it sounds. It was a nice ride - and once release (which will happen within next 24 hours), you can try it yourself to see, how path tracing really is slow.
  6. Vilem Otte

    DOOM: Placeholders

    Currently, the only "playable" level is still full of placeholders - I'm still hoping for having the game finished on 18th, but we will see. In worst case - there might be some place holders left (while the game would still be playable). In the meanwhile I have added: Full physics (based on Bullet), not just character controllers, but even rigid bodies are technically supported (I'm still not sure whether I will do anything with them) Ability to control behavior of entities (doors, elevators and such are possible - but I don't think I will have enough time to make art for those, so we might be limited here) Improved the speed of dynamic BVH building by about factor of 10+ (by simply going through my code and removing dumb things) Get rid of most of the limitations (like total texture size was originally limited to just 8192x8192 texels, technically this is much higher now) Finished some sort of player movement I'm kind of satisfied with I'm realistic, the competition ends in about 2 days (and as most of us - I still have to be at work during those). I personally will be glad if I manage to make the scope I wanted for proof of concept demo - that includes: Simple menu (New Game/Exit) Single level Single type of enemy Single weapon Player can die in the game and player can win the game Some sort of Doom-ish UI Some advanced ray tracing effect or two I know this isn't much (and is most likely barely covering the scope for the challenge). In the meanwhile - behold - the placeholders (and no that 0.44 GRay/s is not joking - I'm currently limited by performance of actually displaying the buffer, not ray tracing nor BVH updates at all)!
  7. Vilem Otte

    Bare bones AAA team

    I will just input salary for my country (Czech Republic, next to Poland) - for software developers, so you get the picture. For senior developers the statistics says that on average they require $40,000 per year. For junior developers it is $18,000 per year. These are netto - as it is reported salary by the employees. (Source: Startupjobs.cz) So, your income budget would be (our taxation is around 45%) at $73,000 for seniors and $32,000 for juniors. As senior we generally count somebody with at least like 5 or better 10 years of experience. Often also working for foreign company for few years. Of course the salary heavily varies based on the branch where you work. Of course, you can also try luck in India, which may be even a bit more cheaper - forming even bigger team. The problem is, bigger doesn't necessarily mean better. And when you have to aim at qualified people (often with university degrees and experience) that know what they are doing, the costs difference tend to decrease.
  8. Vilem Otte

    Bare bones AAA team

    The Witcher 3 development costs were 306 mil. PLN - if you would convert that into USD that is $81 million. Witcher 3 is one of the largest open world RPG games I have played in past few years. I would say that it was a AAA experience. The Witcher 2 development costs were around 40 mil. PLN - about $10 million. I would dare to say that Witcher 2 was also AAA experience, despite being much shorter and not open world like its successor. It is considered as one of the cheaper AAA games. These games were developed in Poland, the costs for employees and living are slightly lower compared to Western Europe, and a lot lower than in F.e. California. You also need to take this into account.
  9. Vilem Otte

    Ludum Dare 45

    Introduction Yet another Ludum Dare event was around, and I found time to participate this time again (this time it is actually 10th Ludum Dare I've participated in so far). So, first of all - if you are interested, just roll down the post and play the game. Short gameplay video: Fig. 01 - A short gameplay video. What went right? This time around, we have decided to dive into the wonderful world of procedural generation quite fast, the actual work on the game started early saturday morning. Being heavily inspired by Angband, the decision was clear - our dungeon levels must be generated. Procedural generation of such dungeons is not really that hard, I have started off with simple algorithm and worked from there: Generate a 2D axis-aligned BSP tree, randomly terminating after at most N levels or when criteria is met Shrink rooms represented by leaves of the BSP tree Make connections in bottom-up order (first between leaves, then between interior nodes and leaves - picking always the closest room to the currently processed one). The results were quite promising, producing blocktober worthy images like: Fig. 02 - Randomly generated dungeon level At this stage entrance and exit were placed, followed by the monsters and power ups which you can pick up in the dungeon. The last thing to solve in level design were first and last level. Although as our levels during the generation phase are defined as byte arrays (containing actual chars defining what is on given tile) we simply used file and injected it into the process. Our first level looks like this: ### ###X### #.#...# #....## ##....# #...#.# ####### Where: # - represents wall . - represents empty space X - represents exit out of the level Apart from actual procedural generation, the development went quite smooth. I enjoyed making a non-realistic cartoon-ish models, and skeletal animation enabled just for the character hands. The models ended up being quite cute in the end: Fig. 03 - Mushroom in Unity engine editor, version without hands The game mechanics also overall felt quite comfortable (as I'd say finished enough to be playable), compared to some of my previous entries. Picking a smaller scope game definitely played in favor of this. What went wrong? As usual, time. Each game jam (and it doesn't matter whether it is Ludum Dare or anything else), it comes to the point where you have to start cutting features out to make it on time. Because at the end of the day for game jam, you either finish a game or you don't. We had to cut out various features that would make the game a LOT better, but most important of those was audio. Sound effects and music tend to improve the atmosphere a lot - or to be precise - it is one of the most underestimated feature of the game, with which atmosphere rises or completely falls. Due to our time constraints we ended up completely without audio in the game. User Interface, second item we left for the last moment of development was user interface, and we ended up having it very simple. While being informational enough for the game, making it graphically more attractive would improve the gameplay a lot (compared to having just default text giving you all the information you need). Next time around, I'd like to dedicate at least few hours on these two topics, even sacrificing some of the visual effects or details in world generation, to increase atmosphere of the game with audio and UI. Conclusion As said, this was 10th Ludum Dare for me, and I have enjoyed working on it very much. I'm quite satisfied with the game we have ended up with. For the next time, assuming I would have a free weekend, I'd like to participate again. Also, if any of you (who read this article) have participated in Ludum Dare, and would like me to play your game - please leave a comment here, and I will definitely play your game during the following weeks. Goodies So, for curious ones (I'm intentionally leaving just Itch.io link to the game, as I never know when I will remove it from the server - that would make the link inactive): Source code is available at: GitHub Game is available at: Itch.io Ludum Dare page is at: LDJAM
  10. Vilem Otte

    DOOM: Skeletal Animations & Dynamic BVHs

    @lawnjelly If it goes well, I'm considering either: Changing repository status to public on my server (maybe fork it to Github) Publishing that ray tracing library as a library (with few examples) Writing some articles about ray tracing dynamic scenes The library I'm using (a custom one) is OpenCL based renderer - it doesn't have hard syntax (it might need a review or two - and maybe updating the interface after the project - as I'm adding classes/methods whenever I need them for this project). I'm currently busy making and animating a daemon from hell. 100% programmer's art.
  11. A small update I was tackling today - and that is skeletal animation. Now, doing anything animated with ray tracers is quite a pain... Static scenes - you simply build acceleration structure once ... and then you can move camera around, and everything tends to be nice and easy Scenes with dynamic objects - multi-level BVHs can rescue you here, you build bottom level BVHs and then rebuild top level BVH, also this enables instancing as side product Scenes with dynamic geometry - you may go a bit nuts in here, as you NEED to rebuild BVH for given geometry So, current strategy for the DOOM project is somehow working, basically the scene allows you to add render nodes, which can be defined as either STATIC or DYNAMIC. The STATIC ones have BVH built once in the beginning, and never rebuilt again. On the other hand the DYNAMIC ones have BVH rebuilt every single frame. All of these bottom level BVHs are then leafs of top level BVH that is being rebuilt every single frame. My apologize for giving this in form of small post, and not a full featured article (once the Challenge is over, I already have quite a pile of topics I'd absolutely like to write about, most of those are related to ray tracing). In the meantime, I can at least feed you with simple sample...
  12. Vilem Otte

    DOOM: Multi Level BVHs

    @LandonJerre I'm glad to hear it is running on RTX 2080! Thanks for testing. I've figured out one application crash (BVH had a memory leak - and as it is rebuilt every frame, you can run out of memory quite fast), this might have been the cause of it. I was able to reproduce vertical black line on my GPU either - will look at it eventually (added into TODO list). Seems like RTX 2080 gives the highest performance so far (for later releases, I may want to add a way to send results to database (with hardware information and resolution)), that way we could compare how fast which GPU is - and I could make a minimal and recommended configuration (plus what resolution you should run) once the DOOM game is more game-like.
  13. Vilem Otte

    DOOM: Multi Level BVHs

    @dmatter Wow, I really didn't expect it running on Radeon HD 7790 (it is after all a bit older GPU). Now I can't test it on any NVidia directly (as I only have a friend who has it), which isn't really good - so I'll probably get one 2070 before end of September, so I can debug & run on it.
  14. Vilem Otte

    DOOM: Multi Level BVHs

    @JoeJ Thanks. Sounds good! @Rutin I'll try to add a bit more statistics and logging, to see why (it could be that I'm using too big surface for texture atlas, or too large workgroup).
  15. Vilem Otte

    DOOM: Multi Level BVHs

    One of the major challenges in real time ray tracing are dynamic scenes. It is not hard to pre-compute BVHs and then just render the scene, but working with dynamic objects and geometry is challenging. To handle such scenario I've implemented multi-level BVHs where top-level BVH is rebuilt every single frame, additionally for fully dynamic geometry I've added HLBVH (which is extremely fast to build). The first test I have (which you can try) is a Sponza atrium, which creates each single object in it as node within multi-level BVH. Each node has high quality SAH BVH pre-computed on start, and then a top-level BVH is built every single frame. Additionally it immediately supports instancing, which is a great feature! Geometry also have material (with diffuse map) assigned. To run demo, OpenCL 1.2 compatible hardware is required (I've tested this on AMD Rx 380, AMD Rx 590, NVidia 1070 and Intel HD Graphics 620 - where it runs quite slow, but still - runs) Raytracer Test No. 1 I don't like to promise future features, but if everything goes well - I'd like to have some skeletal animated model in soon along with finally some models that I'm working on for the actual game!
  16. Vilem Otte

    DOOM: Challenge accepted!

    I have been deciding to participate in Challenges for quite long time, mainly because I personally wanted to. And when DOOM was voted in, I decided I had to participate. Of course I have to start somewhere (I made a road map, which is something I'm trying to hold to - and throughout the challenge I'm going to switch between working on art and working on source code). So let's consider this as first post, and let's see what I will manage to finish in the end. A little sneak peek into how the project is looking at current stage. Some of you may have recognized that this is Sponza Atrium model (it is not easy to recognize it though - as the image is actually showing barycentric coordinates in red/green channel and distance traveled along the ray in blue) which I'm using for testing, as I'm working on using a custom GPU real time ray tracer as renderer (it is DOOM after all, which originally used a ray caster - so for me, it was natural to use a ray tracer).
  17. @Kylotan I was more pointing directly to the Funnel algorithm in case of Navmesh pathfinding (which is technically post processing step). My problem at the time was - that I had a planet with features on it (in reality it could really be any geometry - toroids, various space stations, etc.). By voxelization (voxelize all walkables, and then subtract all non-walkables) and then applying marching cubes algorithm I built a navigational mesh - which I also used to create a graph (i.e. triangles being nodes and edges of graph were connecting neighboring triangles). Finding the list of nodes to traverse when going from point A to point B was done with A* - so again straightforward. The problem was path optimization - to make the path look nice - i.e. a string pulling algorithm. The pseudo code in articles for those is mostly done in 2D assuming planar like map with single upwards direction. There were multiple solutions I have tried back then: treat it as spherical geometry (which works very well for such planet-like shapes - as long as you move on the surface), spherical geometry is not that hard once you get used to it another option is to use a common 'upward direction' for each 2 adjacent triangles you walk over which is F.e. normalize(N1 + N2) where N1 and N2 are triangle's normals. Project both triangles along this direction and you have your 2D coordinates for current step in Funnel algorithm I have already finished it back then and was quite satisfied with results. While that personal project never got finished, I still use the pathfinding algorithm and navmesh generation in my source code. Note: If I'm not mistaken (correct me if I'm wrong) - most of commercial engines does not support these like navmeshes (and I know this is a bit of corner-case scenario). I.e. try generating navmesh in engine of your choice for sphere on which characters could walk like on planet (i.e. center of gravity is in its center) - I don't think it is supported at all (at least it wasn't last time I've checked in Unity).
  18. @jb-dev and @thecheeselover I see thanks. In terms for navmesh generation I have tried: navmeshes directly created by artists (these tends to be bad, as artists generally don't understand navmesh that good - but they asked whether they can have control ... this didn't work in the end) navmeshes generated by artists by setting walkable/obstacle on surface - geometry then gets voxelized (varying voxel size - this can be used to simulate navmesh agent size ... first voxelize walkables, then 'subtract' obstacles), and then triangulated, this works quite well (although is computationally quite heavy - but can be preprocessed) the same as 2, but geometry doesn't get voxelized, instead it is getting clipped (supports just bounding boxes for non-walkables now) - generally - clip your walkable geometry with non-walkables, and then triangulate result. varying agent size could then be solved similarly to what you have described. Currently I'm still sticking with the middle one.
  19. Nice article! As I've been implementing NavMeshes myself - in most online resources I've missed some further description (majority of articles mention Funnel algorithm, but very few actually describe it in pseudo-code - that is the one thing I'm missing in this article, although from my own experience I know it takes a lot of time to write articles, and they simply can't contain everything). Now, as for NavMesh itself (and my personal curiosity!) - how do you generate navmeshes? Do you take agent sizes into account (and how?)?
  20. Vilem Otte

    Engine: Virtual Textures

    Implementing terrain tools is somewhat time consuming (in addition with me being quite busy in past 2 months - on topics I can't write too much about). Anyways terrain system and authoring tools in my engine start to look good, here is a small sneak peek: The circular thing is a terrain, some interesting features: Height map is a virtual texture of unlimited size (those who know how virtual textures work know that there are some limitations - related to your page table size, page indirection size, etc.) Virtual texture is editable by the user with base functions (I'm currently working on placing 'stamps') Terrain geometry is using seamless LOD using tile-based binary triangle trees Calculation of terrain LOD is based on specific position Allows to specify quality of rendering for shadowmap/voxelization/etc. Etc. The interesting part is selecting which tiles and at what quality are going to be loaded/rendered. I'm not using the approach RAGE used (i.e. not using feedback buffer), but I'm using selected LOD for given terrain tile (currently without frustum culling - which is still on todo list). Originally I wanted to have basic terrain support, but due to myself thinking about more and more features, and adding them - it is becoming quite big. The source still needs a bit of cleanup (before I write something on terrain) and I haven't connected it with vegetation nor physics system yet - these are both in todo list for now.
  21. Welcome to the first part of multiple effect articles about soft shadows. In recent days I've been working on area light support in my own game engine, which is critical for one of the game concepts I'd like to eventually do (if time will allow me to do so). For each area light, it is crucial to have proper soft shadows with proper penumbra. For motivation, let's have the following screenshot with 3 area lights with various sizes: Fig. 01 - PCSS variant that allows for perfectly smooth, large-area light shadows Let's start the article by comparison of the following 2 screenshots - one with shadows and one without: Fig. 02 - Scene from default viewpoint lit with light without any shadows (left) and with shadows (right) This is the scene we're going to work with, and for the sake of simplicity, all of the comparison screenshots will be from this exact same viewpoint with 2 different scene configurations. Let's start with the definition of how shadows are created. Given a scene and light which we're viewing. Shadow umbra will be present at each position where there is no direct visibility between given position and any existing point on the light. Shadow penumbra will be present at each position where there is visibility of any point on the light, yet not all of them. No shadow is everywhere where there is full direct visibility between each point on the light and position. Most of the games tend to simplify, instead of defining a light as area or volume, it gets defined as an infinitely small point, this gives us few advantages: For single point, it is possible to define visibility in a binary way - either in shadow or not in shadow From single point, a projection of the scene can be easily constructed in such way, that definition of shadow becomes trivial (either position is occluded by other objects in the scene from lights point of view, or it isn't) From here, one can follow into the idea of shadow mapping - which is a basic technique for all others used here. Standard Shadow Mapping Trivial, yet should be mentioned here. inline float ShadowMap(Texture2D<float2> shadowMap, SamplerState shadowSamplerState, float3 coord) { return shadowMap.SampleLevel(shadowSamplerState, coord.xy, 0.0f).x < coord.z ? 0.0f : 1.0f; } Fig. 03 - code snippet for standard shadow mapping, where depth map (stored 'distance' from lights point of view) is compared against calculated 'distance' between point we're computing right now and given light position. Word 'distance' may either mean actual distance, or more likely just value on z-axis for light point of view basis. Which is well known to everyone here, giving us basic results, that we all well know, like: Fig. 04 - Standard Shadow Mapping This can be simply explained with the following image: Fig. 05 - Each rendered pixel calculates whether its 'depth' from light point is greater than what is written in 'depth' map from light point (represented as yellow dot), white lines represent computation for each pixel. Percentage-Close-Filtering (PCF) To make shadow more visually appealing, adding soft-edge is a must. This is done by simply performing NxN tests with offsets. For the sake of improved visual quality I've used shadow mapping with bilinear filter (which requires resolving 4 samples), along with 5x5 PCF filtering: Fig. 06 - Percentage close filtering (PCF) results in nice soft-edged shadows, sadly the shadow is uniformly soft everywhere Clearly, none of the above techniques does any penumbra/umbra calculation, and therefore they're not really useful for area lights. For the sake of completeness, I'm adding basic PCF source code (for the sake of optimization, feel free to improve for your uses): inline float ShadowMapPCF(Texture2D<float2> tex, SamplerState state, float3 projCoord, float resolution, float pixelSize, int filterSize) { float shadow = 0.0f; float2 grad = frac(projCoord.xy * resolution + 0.5f); for (int i = -filterSize; i <= filterSize; i++) { for (int j = -filterSize; j <= filterSize; j++) { float4 tmp = tex.Gather(state, projCoord.xy + float2(i, j) * float2(pixelSize, pixelSize)); tmp.x = tmp.x < projCoord.z ? 0.0f : 1.0f; tmp.y = tmp.y < projCoord.z ? 0.0f : 1.0f; tmp.z = tmp.z < projCoord.z ? 0.0f : 1.0f; tmp.w = tmp.w < projCoord.z ? 0.0f : 1.0f; shadow += lerp(lerp(tmp.w, tmp.z, grad.x), lerp(tmp.x, tmp.y, grad.x), grad.y); } } return shadow / (float)((2 * filterSize + 1) * (2 * filterSize + 1)); } Fig. 07 - PCF filtering source code Representing this with image: Fig. 08 - Image representing PCF, specifically a pixel with straight line and star in the end also calculates shadow in neighboring pixels (e.g. performing additional samples). The resulting shadow is then weighted sum of the results of all the samples for a given pixel. While the idea is quite basic, it is clear that using larger kernels would end up in slow computation. There are ways how to perform separable filtering of shadow maps using different approach to resolve where the shadow is (Variance Shadow Mapping for example). They do introduce additional problems though. Percentage-Closer Soft Shadows To understand problem in both previous techniques let's replace point light with area light in our sketch image. Fig. 09 - Using Area light introduces penumbra and umbra. The size of penumbra is dependent on multiple factors - distance between receiver and light, distance between blocker and light and light size (shape). To calculate plausible shadows like in the schematic image, we need to calculate distance between receiver and blocker, and distance between receiver and light. PCSS is a 2-pass algorithm that does calculate average blocker distance as the first step - using this value to calculate penumbra size, and then performing some kind of filtering (often PCF, or jittered-PCF for example). In short, PCSS computation will look similar to this: float ShadowMapPCSS(...) { float averageBlockerDistance = PCSS_BlockerDistance(...); // If there isn't any average blocker distance - it means that there is no blocker at all if (averageBlockerDistance < 1.0) { return 1.0f; } else { float penumbraSize = estimatePenumbraSize(averageBlockerDistance, ...) float shadow = ShadowPCF(..., penumbraSize); return shadow; } } Fig. 10 - Pseudo-code of PCSS shadow mapping The first problem is to determine correct average blocker calculation - and as we want to limit search size for average blocker, we simply pass in additional parameter that determines search size. Actual average blocker is calculated by searching shadow map with depth value smaller than of receiver. In my case I used the following estimation of blocker distance: // Input parameters are: // tex - Input shadow depth map // state - Sampler state for shadow depth map // projCoord - holds projection UV coordinates, and depth for receiver (~further compared against shadow depth map) // searchUV - input size for blocker search // rotationTrig - input parameter for random rotation of kernel samples inline float2 PCSS_BlockerDistance(Texture2D<float2> tex, SamplerState state, float3 projCoord, float searchUV, float2 rotationTrig) { // Perform N samples with pre-defined offset and random rotation, scale by input search size int blockers = 0; float avgBlocker = 0.0f; for (int i = 0; i < (int)PCSS_SampleCount; i++) { // Calculate sample offset (technically anything can be used here - standard NxN kernel, random samples with scale, etc.) float2 offset = PCSS_Samples[i] * searchUV; offset = PCSS_Rotate(offset, rotationTrig); // Compare given sample depth with receiver depth, if it puts receiver into shadow, this sample is a blocker float z = tex.SampleLevel(state, projCoord.xy + offset, 0.0f).x; if (z < projCoord.z) { blockers++; avgBlockerDistance += z; } } // Calculate average blocker depth avgBlocker /= blockers; // To solve cases where there are no blockers - we output 2 values - average blocker depth and no. of blockers return float2(avgBlocker, (float)blockers); } Fig. 11 - Average blocker estimation for PCSS shadow mapping For penumbra size calculation - first - we assume that blocker and receiver are plannar and parallel. This makes actual penumbra size is then based on similar triangles. Determined as: penmubraSize = lightSize * (receiverDepth - averageBlockerDepth) / averageBlockerDepth This size is then used as input kernel size for PCF (or similar) filter. In my case I again used rotated kernel samples. Note.: Depending on the samples positioning one can achieve different area light shapes. The result gives quite correct shadows, with the downside of requiring a lot of processing power to do noise-less shadows (a lot of samples) and large kernel sizes (which also requires large blocker search size). Generally this is very good technique for small to mid-sized area lights, yet large-sized area lights will cause problems. Fig. 12 - PCSS shadow mapping in practice As currently the article is quite large and describing 2 other techniques which I allow in my current game engine build (first of them is a variant of PCSS that utilizes mip maps and allows for slightly larger light size without impacting the performance that much, and second of them is sort of back-projection technique), I will leave those two for another article which may eventually come out. Anyways allow me to at least show a short video of the first technique in action: Note: This article was originally published as a blog entry right here at GameDev.net, and has been reproduced here as a featured article with the kind permission of the author. You might also be interested in our recently featured article on Contact-hardening Soft Shadows Made Fast.
  22. For one of the upcoming projects (which will follow in some of the following posts), and as it had to fit the lore of the game, a black hole was necessary. Before going forward - let me add an image of what result I want to achieve: Artist conception of black hole - NASA/JPL-Caltech While the image is similar to the final effect I wanted to achieve, I did some changes on the effect to be more colorful and bright - but the original idea was from this image. The effect is actually separated in 3 major parts - Disk, Streaks and Post processing.Of course there is also the core of the black hole (which is just a small sphere, with black color). Disk The disk around black hole is actually just matter that rotates around the core. Depending on the distance from the core the average density will increase near the event horizon and decrease further from it. Near the core the density can be so high that it may eventually have temperature close to star - therefore there might be high emissive energy - and therefore light. Also, due to the time dilation (and therefore light having hard time escaping near the event horizon), the emissivity is getting lower very close near the event horizon. Anything beyond event horizon is invisible from the outside, because gravity there is so strong, not even photons can escape. At least that is what I understand from the topic from physics point of view. This actually explained what can be seen on the image and what is going on graphically, that is: The disk rotates around the core The density of disk decreases further from the core The emissive light decreases further from the core, and therefore some (outer) parts of the disk will be lit by inner part ... although inner part around the core has to be somehow darker Which can be solved with simple texturing and some basic lighting of the result. Using whirl-like texture as a basis proved to be a good start for me. I started off by creating a whirl-like texture that would define density in various parts of the disk, which resulted in this: Generating a normal map for lighting from this is actually quite straight forward (and easy in Substance Designer F.e.) - and after short time, I also had a normal map: Putting just these together with basic diffuse lighting (standard N.L) from the center (slightly above the plane) gives us some basic results: Next thing is defining emissivity. This is done simply by using 1D gradient texture for which the coordinate will be distance from the center. The gradient I came up with is: Notice the left part - which is near the event horizon.will give us similar feeling to the image as we're not jumping straight to bright value. Applying emissive value (as both - multiplier for the color, and as emission) gives us this look: Which looks good enough already - I personally played a bit with values (mainly playing with contrast and other multiplication factors - F.e. for alpha channel/transparency), and ended up with this result: Resulting pixel shader is as simple as: fixed4 frag (v2f i) : SV_Target { // Calculate texture coordinate for gradient float2 centric = i.uv * 2.0f - 1.0f; float dist = min(sqrt(centric.x * centric.x + centric.y * centric.y), 1.0f); // Lookup gradient float3 gradient = tex2D(_GradientTex, float2(dist, 0.0f)).xyz; // Light direction (hack - simulates light approx. in the middle, slightly pushed up) float3 lightDir = normalize(float3(centric.x, -centric.y, -0.5f)); // Use normals from normal map float3 normals = normalize(tex2D(_NormalsTex, i.uv).xyz * 2.0f - 1.0f); // Simple N.L is enough for lighting float bump = max(dot(-lightDir, normals), 0.0f); // Alpha texture float alpha = tex2D(_AlphaTex, i.uv).x; // Mix colors (note. contrast increase required for both - lighting and alpha) return fixed4((gradient * bump * bump * bump + gradient) * 0.75f, min(alpha * alpha * 6.0f, 1.0f)); } Streaks There are 2 streaks, directing upwards and downwards from the core. My intention was to make them bright compared to the core and blue-ish - to keep the background more colorful in the end. Each streak is composed from 2 objects, a very bright white sphere (which will take advantage of used post processing effects to feel bright), and a geometry for the streaks (instead of using particles). The geometry is quite simple - looks a bit like rotated and cut hyperbole, notice the UV map on the left (it is important for understanding the next part): This geometry is there 4 times for each direction of the streak, rotated around the origin by 90, 180 and 270 degrees. The actual idea for streaks was simple - have a simple geometry of cut surface, and roll a texture over it. Multiplying with correct color and distance from the beginning of the streak adds color effect that nicely fades into the background. To create a particles-like texture that varies in intensity I used Substance Designer again and come up with: By simply applying this texture as alpha, and moving the X-texture coordinate the streak is animated, like: Multiplying by wanted color gives us: And multiplying by factor given by distance from the origin of the streak results in: Which is actually quite acceptable for me. For the sake of completeness, here is the full pixel shader: fixed4 frag (v2f i) : SV_Target { // Texture coordinates, offset based on external value (animates streaks) float2 uv = i.uv.xy + float2(_Offset, 0.0f); // Alpha texture for streaks fixed alpha = tex2D(_AlphaTex, uv); // Distance from origin factor (calculated from texture coordinates of streaks) float factor = pow(1.0f - i.uv.x, 4.0f); // Multiplication factor (to 'overbright' the effect - so that it 'blooms properly' when applying post-process) float exposure = 6.0f; // Apply resulting color return fixed4(exposure * 51.0 / 255.0, exposure * 110.0 / 255.0, exposure * 150.0 / 255.0, alpha * factor); } Putting the effects together ends up in: Post Processing By using simple bloom effect, we can achieve the resulting final effect as shown in video, which improves this kind of effect a lot. I've added lens dirt texture to bloom. We need to be careful with the actual core - as that needs to stay black (I intentionally let it stay black even through the bloom). You can do this either by using floating-point render target before the bloom and write some low value instead of black (careful with tone mapping though - yet you might want to go even for negative numbers), or just render the core after the bloom effect. The resulting effect looks like: And as promised - a video showing the effect:
  23. Welcome to the second article on the topic of Automata, Virtual Machines and Compilers. This time around I'm going to show and explain how to extend the original source code in the first article I wrote on this by adding variables into it, and of course operators bound to them. Please read the first article if you have not already, otherwise this one may not make much sense for you. As for the required knowledge, understanding the previous article and topics in it is most likely enough. Definitions Before I jump into code examples and pseudo-code, I should introduce what is added and how it improves the language defined in previous article. While the previous article defined simple calculator, by adding variables the user is allowed to also use memory in a persistent way during the program execution, not only as temporary space for intermediate results during the computation. Grammar Each time somebody wants to do an extension to existing language, a grammar has to be extended. In this case the grammar used in previous article is the starting line. The grammar defined just a simple calculator: ::= [0..9]+ ::= + | - ::= * | / ::= () | ::= [ ]* ::= [ ]* To extend previous grammar with variables, some more expressions need to be added: Identifier - each language allows variables to be declared or used using variable name. It is up to the grammar to solve this problem, therefore an identifier expression needs to be added. I did so using rule: ::= [A..z _][A..z 0..1 _]* Such rule allows for variables beginning with alphabetic character (both lower or upper case) or underscore, followed by any alphanumeric character or underscore. Assignment operator - there has to be a way to assign result of an expression into a variable. Therefore and assignment operator is introduced. I used simple '=' symbol, although you are free to use others (another common is ':=') Type - a variable has to be declared. I decided to go C-like way of variable declaration, which means a type of variable has to be put first. Even though the language here does not support different types as of now, I decided to go with standard word 'int' for this one. Statement exit (Punctuation mark) - adding variables without ability to use multiple statements in the language would be boring. For separating the statements a statement exit rule (also denoted as punctuation mark) was introduced - semicolon ';' These are the bottom level rules, e.g. the rules that are directly resolved to sequence of characters. By adding those we also have to update the rest (e.g. the rules that are resolved to other rules) and add some new rules. Going from top level: Statement - the top level one, not anything exciting about this as the rule is straight forward: ::= E.g. and expression followed by semicolon. Expression - in exchange from the previous grammar the addition rule was renamed, the expression in new grammar resolves to either declaration or assignment rule. E.g.: ::= | Declaration - a declaration rule always declares one variable. There can be only one declaration in the statement (compared to C where declaration can be sequenced using sequence operator), always beginning with type rule, followed by identifier rule. And also it has to be possible to initialize the declared variable: ::= [ ]* Assignment - most likely the most complex rule so far. The assignment itself might not sound complex, yet there are two problems around it (both are also related to previous rule). Starting with example for the first problem: x = 3; y = x = 7; y = x + 7; y = x + 7 = 3; Let us go one by one. In the first case, it is straight-forward - the value 3 is stored in x. In the second case, the value 7 is stored in both: x and y. In the third case, logically one would state that the value of x is added to 7 and stored in y. Wrong! How does the compiler know whether it is going to write value into x or read from it? And what should even happen in the last case? To overcome this, a l-value principle has to be introduced. The value on the right side (when on the right we read) is written into the left side. The only exception is rule number 2, where it is possible to conside both: x and y, to be l-values. The fourth example is therefore invalid. Such rule is going to be: ::= [ ]* [ ]^ | So that Assignment rule is either a Sub rule (which is our former expression - e.g. arithmetic expression), or it contains identifiers in form of l-value (linked with assignment operator) with or without one assignment of arithmetic expression in the end. The '^' symbol stands for 'zero-or-one'-times. Sub rule - is a former arithmetic rule, other than name itself, nothing has changed for it. Factor - while Term rule stayed, the Factor rule is extended. Inside parentheses and Assignment is allowed (e.g. there can be l-value again in parenthesis with expression on right side of assignment operator), otherwise a Factor is and integer or an identifier This concludes all the changes to the grammar. I recommend picking a paper (or writing short script) and attemping to generate some programs using this grammar. Actually the grammar itself does not generate programs unless you add one more rule that allows generating zero or more statements, e.g.: ::= []* This way, the grammar is able to generate various applications. For the convenience I'm also attaching full grammar here: ::= []* ::= ::= | ::= [ ]^ ::= [ ]* [ ]^ | ::= [ ]* ::= [ ]* ::= () | | ::= ; ::= 'int' ::= [A..z _][A..z 0..1 _]* ::= = ::= + | - ::= * | / ::= [0..9]+ Assignment - Second problem Did I mention Assignment (and also Declaration) have two problems? By introducing the l-value, it is possible to determine whether that variable is going to be read, or written into. But what about the second one? The second problem is compiler-related, as before actually setting the value into variable the right part of the rule has to be processed (most of the operators can be solved left-to-right, while assignment is to be solved right-to-left). It is not a big deal, as the actual code for instatiation can be printed into assembly after the compiler processed the rest of the rule. Compilation Variable Handling Before going forward describing some of the changes I did (you can see all the changes in the source code accompanying the article), a discussion about where to store variables is necessary. The obvious answer here should be "in memory", which is partially correct, followed by a question: Where in a memory do we store variables? The program itself splits memory to two blocks once executed - the block where instructions are stored, where the variables can't be stored, and the block where stack (temporary) variables are stored. Of course, all permanent variables should be stored where temporary variables are. Temporary variables on the stack are always pushed during the arithmetic computation, but once that is finished all of them are poped up. Therefore all the variables can be stored there beginning with offset equal to 0. Following pseudo-code shows handling of identifier: Ident(boolean declare, boolean lvalue) { // If the identifier is declared (which means it is preceeded by 'type' token) if (declare is true) { // Store variable in a dictionary (key is variable name, value is offset from base stack address) // GetIdent() returns us name of written variable there // stackOffset is value initialized to 0 in the beginning variablesDictionary.insert(key = GetIdent(), value = stackOffset)); // All we need to do is push value in first register, as first register always store value after // computation printLine("push.i32 r0"); // And offset stack offset (so we can store more variables) stackOffset += 4; } // When we are not doing the declaration else { // If the variable is l-value, the value is stored in it (and not read), although we still need // to keep its value in first register (in case there is statment like 'x = y = 1;') if (lvalue is true) { string ident = GetIdent(); // If the variable name is not inside variable dictionary, there is an undeclared identifier // error. if (variablesDictionary.contains(ident) is false) { Expected("Undeclared Identiefier"); } // Print the assembly command, note that '[' and ']' are used to determine address. The // register SP is used as reference and and offset is added. // Due to push/pop instructions, the actual offset at the moment can be different, determining // correct offset and updating that value is left to disassembly stage printLine("mov.mem.reg.i32 [sp+" + variablesDictionary.getValueAtKey(ident) + "] r0 "); } else { // Actually almost the same as previous function, but different assembly command (not to store, // but to read value from memory into register) string ident = GetIdent(); if (variablesDictionary.contains(ident) is false) { Expected("Undeclared Identiefier"); } printLine("mov.reg.mem.i32 r0 [sp+" + variablesDictionary.getValueAtKey(ident) + "]"); } } } As mentioned in example code, the actual offset from stack pointer at the moment can differ (due to using variables in middle of computation), although when going through assembly it is possible to count the offset and update it (these values are going to differ only during the single statement computation - therefore this approach can be also used when introducing more complex control flow and function calling). This task is left to be performed by disassembler. Multiple statements and statements Implementing multiple statements in a row, and statements themselves is straight forward. For the sake of completness the pseudo-code for those is added: // Statement itself compiles single expression and always matches punctuation symbol (semicolon) // in the end. Statement() { Expression(); Match(PUNCTUATION_SYMBOL); } // Base compilation function, as long as there are statements in source code, it compiles those. Program() { while (Look()) { Statement(); } } Right-to-Left One of the mentioned problems was execution order. While in most cases (addition-subtraction and multiplication-division) the used way to perform the computation is left-to-right, in case of assignment it is right-to-left. While this problem seem hard to solve, let me demonstrate an example. Assuming we have a code: x = 4 + 7; Using the left-to-right compilation, the produced assembly would be: mov.mem.reg.i32 [sp+0] r0 mov.reg.i32 r0 4 push.i32 r0 mov.reg.i32 r0 7 pop.i32 r1 add.i32 r0 r1 Where the first line is generated by assignment (assuming variable x is at base stack address + 0 byte offset), the rest is generated by the right side of statement. If we want to switch the order - what about putting the left side into buffer and printing it after the right side has been resolved and printed? That works, of course using one buffer is not enough, but LIFO stack of such buffers is required (due to multiple assignments in single statement). The actual resolution in the code might be confusing if put here in form of pseudo code, the implementation can be seen in reference implementation done in C++. In the following code example the impact of such change is shown: // Statement: x = (y = 4) + 7; // Variable locations: // x -> [sp+0] // y -> [sp+4] // Which equals to: ' ' // First the expression in parentheses is solved: ' ' buffer -> 'mov.mem.reg.i32 [sp+4] r0' assembly -> 'mov.reg.i32 r0 4' // Printing the buffer after what is in assembly yields: mov.reg.i32 r0 4 mov.mem.reg.i32 [sp+4] r0 // The expression outside is solved next to: buffer -> 'mov.mem.reg.i32 [sp+0] r0' // Assignment into buffer (as it is l-value) assembly -> 'push.i32 r0' // Left operand of addition is pushed (result of parenthesis expression) assembly -> 'mov.reg.i32 r0 7' // Move second argument to register 0 assembly -> 'pop.i32 r1' // Pop from stack to register 1 assembly -> 'add.i32 r0 r1' // Add those together // Left side is in buffer, Right side in assembly, printing the buffer after it yields (with previous code): 'mov.reg.i32 r0 4' 'mov.mem.reg.i32 [sp+4] r0' 'push.i32 r0' 'mov.reg.i32 r0 7' 'pop.i32 r1' 'add.i32 r0 r1' 'mov.mem.reg.i32 [sp+0] r0' Which is correct assembly for given example. Obviously after executing this code, we end up with value of 4 in variable y, and value of 11 in variable x - which is correct status after processing given statement. Side note: As mentioned in one of the comments, generating an abstract syntax tree (which would be tree of structures), and then allowing to process left-then-right or right-then-left subtree is actually equivalent to this, what is actually done is traversing such abstract syntax tree (represented by the containers in the application), and switching execution order using stack on given sub-tree. Disassembly While disassembler in previous article was doing just replacement of instructions with numerical value, and possibly storing arguments as another values - variables complicate it a little bit. First of all it is required to add functions to parse register and offset put in into it - although that is straight forward. What is more interesting is calculating the actual offset. Imagine the following example: int x = 3; int y = 5; x = 4 + y * 3; The assembly generated by it is: // Variable locations: // x -> [sp+0] // y -> [sp+4] 'mov.reg.i32 r0 3' 'push.i32 r0 ' 'mov.reg.i32 r0 5' 'push.i32 r0 ' 'mov.reg.i32 r0 4' 'push.i32 r0' 'mov.reg.mem.i32 r0 [sp+4]' 'push.i32 r0' 'mov.reg.i32 r0 3' 'pop.i32 r1' 'mul.i32 r0 r1' 'pop.i32 r1' 'add.i32 r0 r1' 'mov.mem.reg.i32 [sp+0] r0 ' The line where we are reading from variable y is stating to read from [sp+4], basically stack base position incremented by 4 bytes. The problem is, that stack base position is not known during the runtime - only the stack pointer. Stack pointer is offset each push and pop (either increased or decreased, as the operations are only on 32-bit integers, it is increased/decreased always by 4 bytes), therefore a variable counting this offset has to be introduced, allowing to compensate for this. Note that this is a way how local variables can be stored. On x86 architecture a base pointer also exists (and in future articles on this topic, I would like to introduce procedures and their calling - which would lead to introducing base pointer register), such pointer points to beginning of current stack frame. Virtual machine A virtual machine extension reflects modifications done on disassebler. As these two new instructions are introduced: 'mov.mem.reg.i32' 'mov.reg.mem.i32' Their respective handling is introduced. Their arguments are: mov.mem.reg.i32 Moves value stored in register into memory at specified address. Integer representing register (pointer to memory where 0x0 is start of memory for application) (32-bit integer type) Integer Offset from that register (32-bit integer type) Integer representing register (32-bit integer type) mov.reg.mem.i32 Integer representing register (32-bit integer type) Integer representing register (pointer to memory where 0x0 is start of memory for application) (32-bit integer type) Integer Offset from that register (32-bit integer type) For better understanding of virtual machine executing the code, I recommend downloading attached source and looking at Main.cpp file. Conclusion While the language supports variable declaration at this point, there is still one more thing missing for making the language full featured - a way to control the flow of the application. Loops and branches - also known as control structures. Adding those would make the language full featured, which means capable of creating literally any application. As this sounds good, making the language usable needs more steps to take - while the features my vary on personal preference, in general I assume the list is going to look like this (also can be taken as approximate schema for following articles): Control structures (branching, loops) Procedures and their calling Floating point types pointer types Global variables Communicating with game engine Linking Custom data types (structures) Also, if you made it here, thank you for reading and don't forget to check out accompanying source and executable (x64, Windows binary + Visual Studio 2015 project). 24 Sep 2016: Ready for Peer Review 24 Sep 2016: Initial release
  24. Greetings! I'm about to step into a bit larger topic that does not seem to be related to games directly, but it actually is. The whole topic is about automata, virtual machines and compilers (and not just about them!). The topic is large, that is why it will not be included whole inside a single article, anyways after each article I would like to provide code that one can run and that is easy to understand. What knowledge do you need to understand this article? You definitely need to know something about programming. Some knowledge of automata and assembler is also welcome. And of course you need your favourite programming language to work in. In the first article I would like to demonstrate the basics behind simple math expressions, very basic theory behind compilers, disassemblers and computation machines. Formal Definitions Language Definition In the beginning we need to define our language. For start it is going to be fairly simple (hopefully we are going to extend it further) and our expressions will only consist of numbers (integer values), operators (for the beginning we're fine with +, -, * and /) and parentheses. Backus-Naur Form For language definition we are going to use so-called BNF (Backus-Naur Form) which is a method to describe context-free grammar. The form is actually set of simple rules written as: ::= expression Where the value on left side is a non-terminal symbol, and on the right side there is a set of terminal symbols (if any) followed by non-terminal symbols (if any). On the right side we can have multiple expressions separated using | (which means we can replace the non-terminal on the left with one of the expressions on the right). Parts of the expression can be enclosed in '[' and ']' brackets followed by '*' (which means that this expression is repeated N times, where N >= 0), or '+' (which means that this expression is repeated N times, where N >= 1), or '^N' where N is a number (which means that this expression is repeated exactly N times). An example language is: ::= [0..9]+ ::= + | - ::= * | / ::= () | ::= [ ]* ::= [ ]* Automata With this grammar, we can generate any possible expression that is valid in the language, which is great, but... we do not need to generate all valid inputs of our language, that will be left for the user, we need to accept the language and during that process it is needed to generate the code that is simple enough for some sort of virtual machine to execute. As the language is context-free, it is possible to use a recursive descent parser to accept (e.g. it is valid expression) or reject (e.g. it is invalid expression) the input. I'm not going to write too much about parsers in the series - I challenge users to try writing them. Of course you are free to use Flex or any other generator (but I do not recommend this, unless you know how exactly you would parse something - these production tools are awesome, but only when you know how the actual parsing works). What is a recursive descent parser? It is a top-down parser, built from mutually recursive procedures. Each of these procedures implements often one (and sometimes more) of productions of the grammar it recognizes. The exciting thing about this is that it closely resembles BNF! Compilation and Execution This is a process of producing machine code from some other language. Let me present a simple abstraction here, let us have a 'program': "Compute three plus four" Such sentence is, well... not really useful, we need to translate it into something we can work with. One of such formal languages is math. If we translate such sentence into: 3+4 We know a way to compute it (there are multiple computation models how to compute this), let me present one: MOVE reg0 3 MOVE reg1 4 ADD reg0 reg1 PRINT reg0 Having a machine with 2 registers, allowing for 3 instructions, where: Instruction MOVE takes 2nd argument (value) and moves it into 1st argument. Instruction ADD takes value stored in register in 2nd argument and adds it to value stored in register in 1st argument Instruction PRINT prints the value in 1st argument onto screen Execution process of such program will work as following: INSTRUCTION | REGISTERS [] | SCREEN [] [-, -] | [] MOVE reg0 3 [3, -] | [] MOVE reg1 4 [3, 4] | [] ADD reg0 reg1 [7, 4] | [] PRINT reg0 [7, 4] | [7] Such approach is known as imperative execution (we are giving orders to computing machine of what to do). This paradigm has one large advantage over other variants of execution: it is rather simple to make a machine that executes imperative language. Simple calculator Let us begin with something simple, and what is the most simple language we can imagine? It is a calculator of course! Our goal is to be able to handle 4 operators (+, -, * and /), parentheses and integer numbers. So, let us formally define a language in BNF. ::= [0..9]+ ::= + | - ::= * | / ::= () | ::= [ ]* ::= [ ]* Note: such language is exactly the one in the previous example. Compiler These 6 rules are to be rewritten into the source, I'm going to provide pseudo-code here (you can see full working code in accompanying project). Let us start with Integer, what we want to do when we hit an integer - well that is simple, we want to put its value into the register (e.g. somewhere where we can further work with it): Note that in our assembly we move the right value or register into left one. Also after each operation, the result is stored in left register. Integer() { printLine("mov.reg.i32 r0 %s", GetValue()); } Where GetValue is just a function that takes next token from stream, validates whether it is an integer (containing only 0..9). Such function can look like following: GetValue() { string output; int i = 0; string token = GetNextToken(); // Gets current token and moves to next do { if (token in [0..9]) { output += token; } } while (i < token.length); Match(); return output; } Note. for '+' (e.g. at least one repetition) we use a do-while construct. I'm intentionally going to skip add_operation and mul_operation rules and jump over to Factor. Factor() { if (NextToken is LEFT-PARENTHESIS) { Match(LEFT-PARENTHESIS); // Eats current token, matches it against argument and goes to next token Expression(); Match(RIGHT-PARENTHESIS); // Eats current token, matches it against argument and goes to next token } else { Integer(); } } This was quite obvious - in parentheses we always have another expression, outside we have just an integer. The following two Term and Expression are the interesting ones: Term() { Factor(); while (NextToken is MULTIPLICATION | DIVISION) { printLine("push.i32 r0"); if (NextToken is MULTIPLICATION)) { Match(MULTIPLICATION); // Eats current token, matches it against argument and goes to next token Factor(); printLine("pop.i32 r1"); printLine("mul.i32 r0 r1"); } else if (NextToken is DIVISION) { Match(DIVISION); // Eats current token, matches it against argument and goes to next token Factor(); printLine("pop.i32 r1"); printLine("div.i32 r1 r0"); printLine("mov.reg.reg r0 r1"); } else if (NextToken is NULL) // If there is no next token { // Crash compilation and print error Expected("Expected multiplication or division operation"); } } } Expression() { Term(); while (NextToken is ADDITION | SUBTRACTION) { printLine("push.i32 r0"); if (NextToken is ADDITION) { Match(ADDITION); // Eats current token, matches it against argument and goes to next token Term(); printLine("pop.i32 r1"); printLine("add.i32 r0 r1"); } else if (NextToken is SUBTRACTION) { Match(SUBTRACTION); // Eats current token, matches it against argument and goes to next token Term(); printLine("pop.i32 r1"); printLine("sub.i32 r0 r1"); printLine("neg.i32 r0"); } else if (NextToken is NULL) // If there is no next token { // Crash compilation and print error Expected("Expected addition or subtraction operation"); } } } What happens here? I know it can seem confusing at first - but we are doing exactly what BNF rule says to us. Note that we handled add_operation/mul_operation here in the while condition and based upon what operation we encountered we do different things. Actually we started working with operator precedence (that is why we always push the register value onto stack and pop it prior to working with that), the rest should be clear. Addition and subtraction instructions are inside Expression because they are lower priority compared to multiplication and division handled in Term (in recursion, the deeper we are, the higher the operator precedence) - so resolving inside parentheses and actual values have the highest precedence, multiplication and division are in the middle and addition and subtraction have the lowest precedence. I know this is not the clearest thing to understand, and I highly encourage you to try running the compiler in debug mode so you can actually see what is happening (and that precedence is solved correctly in this way). When implementing the compiler like this and running on some input like: (3 + 4 * (7 - 2)) / 2 We obtain assembly: mov.reg.i32 r0 3 push.i32 r0 mov.reg.i32 r0 4 push.i32 r0 mov.reg.i32 r0 7 push.i32 r0 mov.reg.i32 r0 2 pop.i32 r1 sub.i32 r0 r1 neg.i32 r0 pop.i32 r1 mul.i32 r0 r1 pop.i32 r1 add.i32 r0 r1 push.i32 r0 mov.reg.i32 r0 2 pop.i32 r1 div.i32 r1 r0 mov.reg.reg r0 r1 Let us demonstrate what the execution would look like (by stack we mean LIFO stack): mov.reg.i32 r0 3 // Registers [ 3, -] Stack [] push.i32 r0 // Registers [ 3, -] Stack [ 3] mov.reg.i32 r0 4 // Registers [ 4, -] Stack [ 3] push.i32 r0 // Registers [ 4, -] Stack [ 3, 4] mov.reg.i32 r0 7 // Registers [ 7, -] Stack [ 3, 4] push.i32 r0 // Registers [ 7, -] Stack [ 3, 4, 7] mov.reg.i32 r0 2 // Registers [ 2, -] Stack [ 3, 4, 7] pop.i32 r1 // Registers [ 2, 7] Stack [ 3, 4] sub.i32 r0 r1 // Registers [-5, 7] Stack [ 3, 4] neg.i32 r0 // Registers [ 5, 7] Stack [ 3, 4] pop.i32 r1 // Registers [ 5, 4] Stack [ 3] mul.i32 r0 r1 // Registers [20, 4] Stack [ 3] pop.i32 r1 // Registers [20, 3] Stack [] add.i32 r0 r1 // Registers [23, 3] Stack [] push.i32 r0 // Registers [23, 3] Stack [23] mov.reg.i32 r0 2 // Registers [ 2, 3] Stack [23] pop.i32 r1 // Registers [ 2, 23] Stack [] div.i32 r1 r0 // Registers [ 2, 11] Stack [] mov.reg.reg r0 r1 // Registers [11, 11] Stack [] I recommend to check a few more examples and hand-compute the registers and stack value on the run. It will definitely help to understand that operator precedence will work here and how the calculator works. Disassembler While I already mentioned how the computation works, I don't really recommend writing a virtual machine and processing strings. We can do better! Performing very simple disassembly and storing everything in binary form is a far better way - all you need is to think of the opcodes for your instructions, assign IDs to your registers and store all these numbers into binary form. Note that this way you can actually perform disassembly into x86 or x64 if you wish, but that is not necessary. Implementing a disassembler is very straight-forward and I recommend to look into accompanying source code, even though the implementation in there is not highly effective, it is easy to understand. Virtual Machine Up to here, we could only execute the code "by hand", but that ends now! What our virtual machine needs? It needs some memory where we can store stack and the actual code that is to be executed, and at least 4 registers - as our code is working with 2 (r0 and r1) and 2 more, one for stack pointer (as we need stack as temporary memory for our computation) and one for instruction pointer (so we know what we are executing right now). Now, we read (in binary) our application to the memory, place instruction pointer to the address where there is beginning of our application and stack pointer right after the end of or application in memory. E.g.: |O|U|R|_|A|P|P|L|I|C|A|T|I|O|N|_|S|O|U|R|C|E|_|C|O|D|E| | | | | | | | | | | | | | IP SP Where IP represents memory address where instruction pointer points to, and SP represents memory address were stack pointer points to. E.g. in this case our IP=0 and our SP=27. As each sub-computation stores result into R0 (register 0), the result of the whole computation will be also in register 0. Note that I'm actually printing both registers in the accompanying source code, once computation finishes. Conclusion The article showed basic principles in automata theory, formal languages and compilers. I know it wasn't too much and creating something useful out of this might take a lot more effort and time. Compilers aren't easy stuff, and there is not many recent articles about them. If the time allows, I'd be very glad to continue step-by-step adding more and more features into this little compiler, eventually making it useful. For next time, I promise to talk about types! Accompanying source code demonstrating sample implementation.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!