Jump to content

  • Log In with Google      Sign In   
  • Create Account

Vilem Otte

Member Since 11 May 2006
Offline Last Active Jul 22 2016 06:35 PM

#5286559 Area Lights with Deferred Renderer?

Posted by Vilem Otte on 12 April 2016 - 08:32 PM

I have actually implemented Arkano's method long time ago (around the time he posted it) and further extended it.

 

The original idea there is to compute attenuation based on distance from given object (he used planes only as far as I remember, this can be easily extended to spheres, tubes, triangle(s)). Specular lighting can be implemented in a nice (phong-like) way by using a single ray to perform a "real" reflection of the geometric object. Diffuse still looks good when using just a single point (F.e. the center of the object) - unless the light is too big. Use this with textures and ideally project them along with diffuse (+ use mip maps to blur it) and you've got an awesome lighting system.

 

If you want a more physically-based solution, well you should be looking at how it is done in physically based renderers (read path tracers). What we do, is to sample the light and calculate the diffuse to each of the samples (well this can also be done for the previously mentioned way - calculating lambert N times is still really cheap). Specular lighting could use the same trick (although the mentioned reflection just looks better as unless you have a lot of samples - you will have a noise, which is a problem).

 

The real problem here are shadows, of course ray tracing is the ultimate answer here (yet, I assume your scene data are not in a way where ray tracing in realtime is possible) - so you have to stick with shadow maps. Actually omnidirectional shadow maps (cube shadow maps), as for me percentage-closer-soft-shadows (PCSS) worked really nice, again it is not physically correct (but you can get quite close by modifying it) achieving nice and plausible shadows for the objects. Other than that I've seen some guys having really good soft shadows using shadow volumes approach and penumbra wedges, but it was expensive as hell even for less complex scenes.

 

 

Example:

gpuray.png

 

This is a interactively path traced scene (at around 30-50 spp per second on GPU) with area light. Note the shadows (those are with "caustics" here (not that powerful as the material is not clear and IoR is not high enough)). Yet, you should be able to see that even for quite small light the shadows are fully blurred for the parts that are closer to the light, which is a problem for technically any fast shadow map algorithm these days.




#5275598 DirectX 12 few notes and questions

Posted by Vilem Otte on 13 February 2016 - 10:17 PM

So, I've finally found time to start playing around with DirectX and Direct3D 12 - I've went through documentation, basic samples and even a simple engine sample. Now as I've been OpenCL/OpenGL guy for most of the time, I don't have that much experience with DirectX (last one I've used was version Direct3D 9). If somebody would be kind enough to read through my thoughts and possibly tell me if I'm thinking about it right, or wrong.

 

I will not go into basic samples, as I've went through them - I'm now trying to add Direct3D 12 renderer into my framework which still has OpenGL support (which I'm trying to replace). First of all, let me approx. state out how my Scene works - I have a graph structure (scene-graph like) holding entities, some of these point to one or more Model class instances (along with pointing to Transformation - which is actually just a 4x4 matrix). A Model class points a set of Mesh instances (plus it holds additional things like bounding box, etc. etc.). A Mesh points to Material and VertexBuffer/IndexBuffer, and also Texture, contains a descriptor for the VertexBuffer (describing how the data are interlanced), and Texture (describing what each texture represents), and ShaderParams - which is a constant buffer (along with pipeline state items). Material points only to a Shader.

 

Note. I know I could move ShaderParams up to Entity and I most likely will at some point, but I wanted to keep the description simple for start.

 

So, once I'm actually rendering with a camera - I obtain a set of pointers to Mesh instance that are going to be rendered (of course with matrices). This is all processed outside of actually doing rendering with OpenGL right now. For D3D12 I'm still thinking about structures and how to work with them. Let's jump ahead, so I have this set of pointers (with some additional info for each record). In OpenGL when I don't do any sorting (most basic rendering), I just loop through all these, set pipeline state, attach shader, bind textures, set shader parameters, draw, and continue in loop.

 

In D3D12 this will be a lot different though:

 

1.) I am not doing any immediate changes like in OpenGL case - once I start application, I create CommandQueue, which I will fill with CommandLists every frame and let it draw (I can either pre-build CommandLists, or build them each frame - when necessary). Anyways I think I understand the command queue and command list concept - so the problem is not here.

 

2.) Each of my Mesh instance should contain a PipelineStateObject, which will be attached during the rendering. This will completely describe which shader is attached and part of the ShaderParams that hold info about F.e. Blending.

 

3.) I don't fully understand RootSignature - as far as I got it, this is a object holding some parameters description I'm sending to shader. So technically for each shader I should have a RootSignature (or respectively have one, but change its content)..

 

4.) How am I passing uniforms into shader - only through RootSignature? (Assuming I don't want to use constant buffer)

 

If you got here and have an answer on my curious questions I'd be glad. Thank you!




#5269839 Reflections from area light sources

Posted by Vilem Otte on 07 January 2016 - 08:51 AM

I have never read that paper so far, yet I do have an implementation of "area lights" (which have solid performance, yet they are definitely not 100% accurate).

 

The basic idea goes as following - each single area light (whatever the shape is), can be de-composed to some basic primitive (for me basically rectangles were enough (e.g. quads); although I've decided to re-work some parts of this and support triangles directly). Now the ideas went and were implemented as following:

 

Implementing single rectangular untextured light - there are 2 light components that need to be calculated -> diffuse and specular.

For specular, I've used simple math - as I use deferred renderer, I have a position and normal of each pixel in scene, from here I calculate reflection vector and do a ray cast. If I get intersection than I can basically calculate the color using a BRDF-like function. Of course this yields a hard reflection, but one can always use multiple samples and randomness (based on surface roughness) to achieve rough-surfaces. Nevertheless with normal mapping it really looks good.

 

For diffuse, it has been a bit more tricky - now for small area lights you technically can calculate lambert against a center-point of  the light, and as an attenuation you calculate distance from closest point on rectangle (which is actually projecting point onto plane and then checking whether you are in bounds of rectangle or not). For larger area lights, again, multiple points (aka sampling) works good.

 

Now, the real challenges were 3 - changing shape of light, adding projective texture for light and using multiple lights.

 

First two things can be handled quite easily - you can project each point to the plane using some projection and use that for projective texturing (with clever mip-map selection based upon distance it looks really cool); changing shape of light is now straight forward - by changing texture (using alpha channel for me, but you could also use color-key). Dang, two things handled simply.

 

Last one is tricky - now if your number of lights is quite low you can brute force it. In case of high number of lights I go with BVH approach (e.g. build bounding volume hierarchy on top of that to increase ray cast performance).

 

There is yet one more thing to solve, your light (specular and diffuse) is visible through walls, etc. But with good shadowing algorithm this problem is can be dealt with quickly.

 

If you are particularly interested in some cases, I could provide some math with explanation (and possibly pseudo-code/code in case you want to implement it).




#5264445 Rendering UI with OpenGL with C++

Posted by Vilem Otte on 01 December 2015 - 12:01 PM

I have a custom UI implemented inside my software, let me try to explain how I work with it...

 

My UI builder basically takes some kind of file and builds user interface out of it (windows with buttons, etc. etc.). This is all stored only in memory, along with that it has process function that is called upon each input event - and this processes whole active user interface. Nothing of this is rendered and therefore it can be processed in separate thread (of course some synchronization is done as we need to be thread-safe).

 

The process function is straight forward - each UI is some kind of graph (actually in my case it is always a tree) - you have a 'root' node and nodes under it. On each event (mouse move, mouse click, etc.) you propagate those events through the graph and process each node with it. Upon meeting some conditions you do something. Therefore all the processing is done in separate thread (but beware, sometimes you need to mutex-lock some data - to avoid race conditions).

 

During the UI initialization (or UI update - in terms that some widget and its children become visible) the widgets are inserted into UI scene instance (therefore UI scene is re-built). A scene on my side is something that holds entities (entity is for example: Light, Camera, Static Mesh, 2D Widget Rectangle, etc.). This scene therefore contains a single camera and UI widgets -> it is rendered inside a frame buffer and mixed together with other outputs (note that it actually needs to be re-rendered only when there are some changes, I'm currently rendering it each frame but technically I don't need to - I could keep it inside a texture).




#5255933 Need help with GLSL and Raytrace

Posted by Vilem Otte on 06 October 2015 - 08:06 PM

I will do shameless self-promotion.

 

If I recall correctly, they are using ray-sphere algorithm derived not from analitic view, but from geometry view (as it ends up with the code with less instructions in total that is more precise, than naive implementation of analytic-view dervation, note that they are equal).

 

Now for the self-promotion, check out the article http://www.gamedev.net/page/resources/_/technical/math-and-physics/intersection-math-algorithms-learn-to-derive-r3033

 

I did the derivation of geometry-view based one in there.




#5254754 What is the best indirect lighting technique for a game?

Posted by Vilem Otte on 30 September 2015 - 04:42 AM

If you're asking about most scalable with fast computation times that is robust - I can recommend only one thing and that is path tracing (to be precise - bi-directional path tracing with multiple importance sampling). It is scalable, robust, physically correct and also fast (actually one of the fastest ways to correctly compute GI), yet getting it real time without noise is close to impossible (with todays hardware).

 

Now, there are few solutions that can be used which are fast enough and give you quite cool effect (closely resembling what GI should look like) and at solid speed. For fully dynamic scene and lighting I've so far used a solution similar to reflective shadow maps. For each light in the scene you cast rays that hit the surface at some given position - this is a position where virtual point light will be placed.

After this step you generate literally a TON of virtual point lights, so some kind of algorithm to merge neighbors into one is used (you can place them into grid, for each cell average their color and intensity and use one light from F.e. cell center point). Now, you pick some (lets say N) of those VPLs (based upon distance from camera, intensity, and in general how much effect they will have onto final screen), and generate small shadow map for them (either using ray tracer or rasterization). This map is used to generate secondary shadows (it doesn't need to have high resolution, blurry is good here) - I've used VSM to keep them nice and blurry. To accelerate this process, simplified geometry of the scene can be used.

 

Each of that VPL shadow can be stored for next frame (unless something dynamic moved in its range), this also can be added to their importance - so in the end you will quickly have shadow casting on all VPLs (yet it will have quite large overdraw in general). This can handle diffuse-only global illumination (sorry, no caustics - there are other solutions for them, mostly pre-computed).

 

Advantages - no need for generation of voxels (or SVO), better secondary shadows comparing to SVO, supports fully dynamic scenes, can store shadow maps and precompute them for non-dynamic parts

Disadvantages - large overdraw, needs fast generation of shadow maps, storing shadow maps? (I've used 'texture atlas of shadow maps'), might need 2 scene representations (you can use more complex one, but your shadow map generation phase will be slow)




#5229787 win32 cpu render bottleneck

Posted by Vilem Otte on 19 May 2015 - 05:06 AM


The only reason one writes a software rasterizer is:
#1: Learning.
#2: Mega-advanced occlusion culling.
#3: It is the year 1993 and hardware-accelerated graphics aren’t mainstream yet.

 

#4: You're doing it in OpenCL/CUDA in a massively parallel way as proof-of-concept for some epic new hi-tech technique (does that count as software rasterizer?)




#5229548 How to create an MSAA shader? (For deferred rendering)

Posted by Vilem Otte on 18 May 2015 - 01:47 AM

Actually in my game engine I've implemented full MSAA for deferred renderer. In OpenGL simple approach works as following:

 

  • Render your scene into G-Buffer, storing everything using multisampled textures with n samples
  • Once doing the shading phase, you have to perform it per-sample and then resolve

Your G-Buffer shader will still look the same, you will just write the output into texture created using glTexImage2DMultisample, you might also want to set multisample renderbuffer storage (for your render buffer). This modification is fairly simple.

 

In shading phase you pass in the multisampled texture(s) - they are read using Sampler2DMS instead of Sampler2D. And you have to read samples explicitly using texelFetch where you also pass which sample do you want to read.

 

This should help you start with the basics.

 

 

 

Just a note about one serious problem - when to do resolve? The problem is, that when you render multisampled G-Buffer, resolve during shading, then tone mapping (or basically any other post processing effect) might ruin the anti-aliasing. The solution is quite simple - you render multisampled G-buffer, shade per-sample writing into multisampled buffer, apply each post processing effect on multisampled buffer producing multisampled buffer (incl. tone mapping in the end) and then you resolve. This can sound as a bit of a problem (as it will need a lot more computation power and memory to do this).

 

A lot of game engines although do resolve during shading, apply post-processing on already resolved buffer and then use FXAA hack to smooth out edges where sharp edges appeared (because it is a lot faster and they think gamers won't notice). (I personally don't like FXAA - it in my opinion blurs whole image and degrades the final image quality - using high quality MSAA (with lots of samples) is really looking better)




#5229545 no Vsync? why should you

Posted by Vilem Otte on 18 May 2015 - 01:28 AM

Because in many games I never hit monitor refresh rate.

My monitor is 60 hz. My framerate is usually somewhere between 40 and 50. Vsync enabled would reduce me to 30, possibly even less if stop-n-wait happens.



Keep in mind that there are some incredibly low spec machines. Such as Atoms. Vidcards with 64 bit DDR3 buffers for 30 bucks are still on the shelves and they do sell. Dudes on budget running Intel GMA.

 

I'd also note that low-performance doesn't mean running on budget. This is more important in other software businesses than games in general - but I've already met conditions where we used low-power (without any active cooler!) consumption machine (that was quite expensive) ~ and the machine was a lot slower compared to same price machines with active cooling and over 10 times the power consumption.

 

Although I'm not really sure whether low-power machines count in PC gaming today (for mobile platforms the situation is different!).




#5229294 VSM depth squared

Posted by Vilem Otte on 16 May 2015 - 05:21 AM

I bet you haven't heard about Chebyshev's inequality so far, that is the part you are missing (and that is why you don't get why there is depth squared). Let me try to explain:

 

Start with the definitions:

 

Mean (Expected Value)

is by definition:

https://latex.codecogs.com/gif.latex?\mu=E[X]=x_1%20p_1+%20x_2%20p_2%20+%20...%20+%20x_k%20p_k

Note that X is our random variable taking x1 with probability p1, x2 with probability p2 and so on.

 

Standard Deviation

is by definition:

https://latex.codecogs.com/gif.latex?\sigma=\sqrt{E[X-\mu^2]}=...=\sqrt{E[X^2]%20-%20%28E[X]%29^2}

It measures the amount of value dispersion around the mean. Low standard deviation means random values close to mean, higher otherwise.

 

Variance

is by definition

https://latex.codecogs.com/gif.latex?Var(X)={E[(X-\mu)^2]} \Leftrightarrow Var(X)={E[X^2] - (E[X])^2}=\sigma^2

Variance measures how fart he number from X spread out from mu

 

Chebyshev's Inequality

the equation used in VSM (there are more equivalent variants of this inequality) is:

https://latex.codecogs.com/gif.latex?P%28x%3Et%29=\frac{\sigma^2}{\sigma^2%20+%20%28t%20-%20E[X]%29}

when t > E[X]

 

Applying this to shadow mapping:

Let our E[X] be the average depth value over filter region

Let our E[X2] be the average squared depth value over filter region

Then sigma = E[X2] - (E[X])2 is the variance

And our t is the real distance from light

 

We can compute the probability of whether we are in shadow (note that this will create smooth edges for the shadow), but also, for perfectly lit areas the inequality would return wanted probability equal to 0 -> that is why we actually compute:

https://latex.codecogs.com/gif.latex?\mathcal{P}=max\left%28\frac{\sigma^2}{\sigma^2%20+%20%28t%20-%20E[X]%29},%20s=\begin{cases}1;%20t%20\leq%20E%28x%29\\0;%20otherwise}\end{cases}\right%29

 

So - the reason why we use depth squared should be clear now (it has roots inside probabilistic theory). I haven't checked (mathematically) whether linear-z and z would work; it would need math proof to be useful (or in case you tried with some success, please share the results smile.png ).

 

EDIT: I'd embed those links as actual equations (images), but I'm unable to do so for some reason




#5228670 VSM depth squared

Posted by Vilem Otte on 12 May 2015 - 05:05 PM

Alright, let me try to explain the purpose of VSM (and why it was created). With standard shadow mapping you have pretty much everything you need to cast curved shadows on curved surfaces - although there is still a bit of a problem, percentage close filter (used to smooth out the shadow result) is quite expensive - it can't be separable.

 

Now, what is PCF trying to do is the following:

  • For each pixel take the NxN samples (with the currently computed pixel in the center of this area)
  • On each of this samples determine whether their depth is further than the single depth value in the center, mark samples further and those that are not
  • The percentage of samples (from total number of samples) further will determine the color at given pixel (e.g. it will "blur" the shadow at the edges)

The VSM takes very similar approach,as we don't care about the samples, but just about the percentage that is further than single depth value, we can compute this using Chebyshev's inequality between expected value and average (or estimated value) ~ actually it will give us bound, not the actual percentage (as opposing to real PCF).

 

The advantage of VSM is in using hardware 2x2 bilinear filtering and possibility of separable gaussian blur (instead of doing non-separable PCF). Yet, as we are just calculating the bound, not the actual value, a lot of troubles can be produced (light bleeding read ~ it is solvable, but solving it degrades shadow quality a little bit, at least in my opinion).

 

 

 

 

I recommend to read the paper & there is also short NVidia paper (just google them).

 

 

 

All in all, I've tried VSM (and also some others like ESM/EVSM) ~ and so far I think that bilinear PCF still looks the best for generic case (note that it is quite hard to do plausible shadows ~ "real like" ~ for area-like lights, where you have hard shadows where the object touches ground and it gets blurrier ... the thing is this also should be for sunlight and such). VSM doesn't look so good when your scene geometry is complex (light bleeding ... and darkening makes your shadow edges look ugly instead of having nice blur), I don't really recommend them for sunlight (and we're also not using them for sunlight here).

 

For "point lights" (actually we use area lights only) the situation is different, performing good bilinear PCF on cubemap is quite expensive, so I'm just fine with using VSM there and a bit of darkening (actually I'm still trying to figure out how to make area lights in interior look even better). PCSS (Percentage Closer Soft Shadows) work awesome for them, yet they are quite computation heavy.

 

*By generic case I mean fairly complex scenes with a lot of objects, so VSM in general creates a lot of light bleeding in there ~ which describes the scenes we use here




#5222578 First person wall sliding, what am I missing?

Posted by Vilem Otte on 11 April 2015 - 05:42 AM

I see that you're working in 2D, so in the following equations I will work in 2D space only (noting axes as X (horizontal) and Y (vertical) as per math convention). This technique can be extended to 3D and work properly.

 

Let us define a scene - we have a moving object (further also player) and a static object (further wall). The player is bounded by circle of radius r and center C. The wall is defined as 2 points (A and B). The player is moved with some speed and under direction defined by vector S (the length of the vector is actual speed value).

 

Let us start by defining a line-circle colision.

 

Given equation of line (note that we first express the direction of line and express line using origin, direction and define range for which the line exists) and circle:

 

$$\textbf{V} = \textbf{B} - \textbf{A}$$

$$\boldsymbol{P_{line}} = \textbf{A} + t\cdot \textbf{V}, t \in [0, 1]$$

$$r^2 = (x - C_x)^2 + (y - C_y)^2$$

 

Given these equations we are searching for set of such t where the equation for line and circle are equal. Note that first equation represents 2 equation (for each axis) ... e.g. in this case we are searching for hit point. These two equations are put together and solved:

 

$$r^2 = ((A_x + t \cdot V_x) - C_x)^2 + ((A_y + t \cdot V_y) - C_y)^2$$

 

Which ultimately goes to quadratic equation returning us true when t (to be precise any of two t values - as each quadratic equation is either insolvable = no intersection, or returns two values ... in the real number domain) is in between 0.0 and 1.0, false otherwise. Note that we're not taking into account situation where our circle is larger than wall and containing it (which can be handled too - as one t would be negative and another one t positive).

 

Of course we can compute how much our circle penetrates the line (actually for testing if there is any collision, this is enough - although you have to compare the distance to radius later) - simply by computing distance between player's center and wall.

 

$$\textbf{F} = \textbf{B} - \textbf{C}$$

$$d = {{|V_x F_y - V_y F_x|}\over{|\textbf{V}|}}$$

 

One could get wrong idea, that by just pushing sphere backwards by this penetration in direction of vector that is orthogonal to line directing towards sphere center we would get perfect sliding. Actually for small penetration values d this should be correct, where:

 

$$d < r$$

 

Assuming our tests have absolute (or at least very good) numeric precision! Which is sadly, not the real case (seriously, we are taking square roots and using divisions - the precision hurts here). What can accidentally happen? Our C would get, in single computation step behind the wall (effectively being pushed in wrong direction), or it can even jump through the wall not detecting any collision at all (this can especially happen when our time step is non-constant!!!).

 

An idea for solution - let us have timestep t, why can't we compute two step functions using half the timestep. This helps, but the problem is still there (it just pushes it a bit further away), okay, so instead of 2 samples, let's do 1000. Yeah, this will be okay for almost all the scenarios (and brutally ineffective), but the problem is still there. To get rid of the problem, we need to do infinite number of samples.

 

First simplification, we treat the sphere as center only, if the distance between this point and line is less than r we have a collision. Right now we just need to do infinite tests of point vs line (simplier, yet still unsolvable).

 

Second simplification, infinite points starting at point C moving under constant direction s form a line segment (further player-line)! So all we need is to test two line segments for distance, if it is less than radius r, then we need to find two closest points, one on line AB and one on players line.  We put another circle C' onto the closest point on line AB and test this circle against player-line. That way we get point X, where circle C intersects line AB.

 

So, how we work from there - we transform our player to the point X where we know it collides with wall. Now we know how much "time in between frames we already used" and we also know "how much time we are left". So we will just move along the wall (in direction orthogonal to wall normal that is continuing the direction of movement, e.g. it doesn't goes opposite to the movement) for the rest of time between frames.

 

Such way is much more precise and properly handles cases where our player would jump through the wall.

 

 

 

Of course this approach can be extended into 3D, as most of the physics engines use such intersection tests so they can properly model a reaction.

 

EDIT: Just a note I needed to add, never undo movement - always use swept tests, they are more precise (especially when using divisions and square roots (or inverse square roots) for calculating distances - and you have to use them).




#5222383 A practical real time radiosity for dynamic scene

Posted by Vilem Otte on 10 April 2015 - 04:31 AM

 

And unfortunately to do it "right" is hard, as in probably provably NP Hard if you want classical computing terms for such.

 

Actually it is not that hard - there are "two" common unbiased rendering algorithms - path tracing (and its variants) and progressive photon mapping. Both are, well, quite simple in principle.

 

I'm not sure whether the problem is NP hard in terms of classical computation (but you are probably right), the thing is, that these things are solved using randomized algorithms. These algorithms are using monte carlo approach to the problem - their probabilistic error will be O(1/sqrt(N)), respectively O((log n)^k/N) for quasi method.




#5221056 GCC -malign-double

Posted by Vilem Otte on 02 April 2015 - 09:29 PM

Hello,

 

I'm not sure why you need that setting with physics, but let me explain what it does:

 

When you allocate memory it is always on some virtual address (which is located on some physical memory page ~ when you're working with it; it can be swap-in and swap-out between physical memory and hard drive) ... these virtual pages tend to have size of 4KiB (because 4KiB is base physical page size on x86), or basically a multiply of 4KiB - that is just fyi).

 

Now, on each modern CPU you have so called FPU & SIMD processor. It is Floating Point Unit & Single-Instruction-Multiple-Data ~ you have multiple values in single register and perform single operation over all of them (4x float in SIMD SSE is well know ~ 4x float ~ 4D vector).

 

Let's continue (I will describe just details for SIMD as I don't remember FPU specifications by heart), when you're reading data from memory to register, there is one instruction that allows you to quickly load data from memory into this SIMD register; and one to save - they are 0F 28 and  0F 29 in assembly written as MOVAPS. These instructions perform fast load of 16 bytes from memory address (or another register) or fast save, there is just one condition - the memory address must be aligned on 16-bytes boundary (physically!).

 

This can become a bit problematic when one use virtual memory, although there is one nice property we have - each virtual address of 0x0 definitely begins at 16-byte boundary (because it has to be assigned with physical page ~ which always begins at 16-byte boundary); and it always takes whole such page. E.g. when virtual address X modulo 16 is 0, the physical address during the computation will definitely also be modulo 0 equal to 0 (the modulo operation here means that given address is 16-byte aligned).

 

Now, what you, as a programmer need to know - all the allocations & delocations (both stack, and heap based) must be performed on 16-byte boundaries ~ e.g. each such address must be equal to 0 after 'mod 16' operation. There are OS-specific functions to handle the heap allocation correctly (_aligned_malloc under Windows OS, posix_memalign under POSIX-based OS, etc.); stack based allocations must be hand-specified to the compiler (using __declspec(align(16)) under MSVC or __attribute__((aligned(16))) under GCC).

 

Now the previous also applies for doubles & long long (although they are on 8-byte boundary, not 16-byte ... fyi, there are also 32-byte registers and on some CPU architectures even larger); The mentioned compiler directive -malign-double forces all doubles and long long to be aligned at 8-byte boundary (as they will actually use aligned alternatives of load/store instructions resulting in better performace).

 

Nothing is free though - in case you have structure where there is 8-byte double (aligned on 8-byte boundary) and 1 byte - you have to add unused 7 bytes as pad (e.g. in general your memory usage can be increased).

 

My apologize if I wen't a bit too much into hardware ~ but I wanted to share info about concepts why it is like this.

 

EDIT: So in general it isn't that bad (it can actually be good and yield better performance), yet there might be some troubles (and crashes) when using memory alignment concept without further knowledge behind it.




#5220881 Is my triangle intersection code correct?

Posted by Vilem Otte on 02 April 2015 - 03:57 AM

In fact there is a lot of triangle intersection code algorithms, out of all, most interesting are:

 

Moller-Trumbore http://www.google.cz/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CCwQFjAC&url=http%3A%2F%2Fpublic-digital-library.googlecode.com%2Fsvn%2Ftrunk%2FCG%2FGlobal%2520Illumination%2FRay%2520Tracing%2FFast%2520Minimum%2520Storage%2520Ray-Triangle%2520Intersection.pdf&ei=dg8dVeauKIKAUYb6gKgI&usg=AFQjCNF0PrfmXHhQeqCGY63JhQ56bY4DSw&bvm=bv.89744112,d.d24&cad=rja

 

which is just a standard either single-side or double-side barycentric test. It doesn't need any additional storage and in general is very fast on GPUs and CPUs.

 

Woop http://jcgt.org/published/0002/01/05/paper.pdf

 

I find it very fast on the CPU. It needs more storage compared to MT test (plus for shading you will need original vertex coordinates), so I find it to behave a bit worse (on some of the GPUs) compared to MT.

 

Out of others - Plucker test is one of the most robust, Badouel (projection) test behave very fast on SIMD SSE on CPU, and so on. I'm also linking to (a bit older) good summary of algorithms and their comparsions in terms of speed - http://gggj.ujaen.es/docs/grapp14_jimenez.pdf

 

I hope this can help. I could link you to my implementation of gpu-raytracer on GitHub, but that repo is heavily WIP and I haven't pushed for few weeks (I have modifications locally), plus the setup is a bit harder as of now (but I'm working on it!). https://github.com/Zgragselus/OpenTracer ... there is obsolete CPU code with variation of barycentric test and woop test in OpenCL (which is used in runtime).






PARTNERS