Jump to content

  • Log In with Google      Sign In   
  • Create Account


Ohforf sake

Member Since 04 Mar 2008
Offline Last Active Today, 05:20 AM
-----

#5169531 What is a browser?

Posted by Ohforf sake on Yesterday, 10:52 AM

Note that you don't need OpenGL/DirectX to make a fully functional browser: http://en.wikipedia.org/wiki/Lynx_(web_browser)


#5169526 Ambient Occlusion that doesn't use normals

Posted by Ohforf sake on Yesterday, 10:33 AM

There are three distinct questions here I think:
1. Compute a value per vertex, or for each pixel of a low resolution AO texture? You should probably go with the latter, all vertex baked things usually look ugly.
2. How to cast rays to compute occlusion? The approach you described uses a rasterizer as a cheap way to cast rays, but you can just as well implement a cheap raytracer. The latter has the benefit, that for one spot, you can shoot rays just in every direction and you are not restricted to a < 180° fov. If you want to stick with the rasterization approach, you can take 6 pictures with 90° fov for each spot, similarly to how you would render a cube map. That would allow you to shoot rays in every direction but stick with rasterization instead of raytracing.
3. Finally, do you have to shoot the rays in the general direction of the normal? Usually no, just shoot rays in every direction, evenly distributed, and if more than 50% don't hit anything, the spot is completely unoccluded, if 0% don't hit anything it's completely occluded.
For every spot on a flat surface, about 50% will always hit that surface, so they are wasted computational effort. If you have the normal of the surface, you can skip every direction that is pointing backwards into the surface, but you don't necessarily need it.


#5169351 Declaring temporary variable to save 1 multiply?

Posted by Ohforf sake on 26 July 2014 - 12:42 PM

The mechanism, that Erik referred to, is called Common Subexpression Elimination and pretty much every compiler supports it to some degree.

If you don't care about the exact order of the operations, you can turn on unsafe math optimizations (-ffast-math for the gcc compiler), which will allow the compiler to optimize more aggressively, possibly pulling out subexpressions even if this changes the order of additions within a sum, or the order of multiplications within a product.


#5166550 BSP Tree Question

Posted by Ohforf sake on 13 July 2014 - 05:48 AM

Short answer: yes

Long answer: Consider a rectangle in 2D (works the same, but has less sides).

Every node splits the space into two halves. Leaf nodes labels it's space as empty or solid.

So for a rectangle
   +-------+
   |       |
   |       |
   |       |
   +-------+
you could start by choosing the left side as the first splitting plane (in 2D a splitting line):
  A | B
    |
    |
    +-------+
    |       |
    |       |
    |       |
    +-------+
    |
    |
So your root node now has a splitting plane and two children A and B. A will remain a leaf node and label the left hand side as empty.
B we could now split further by, for example, the top edge:
  A |
    |
    | [B] C
    +-------+------------
    |       | [B] D
    |       |
    |       |
    +-------+
    |
    |
so the node B now has two children C and D. C again labeling it's space as empty. Now after splitting D for example by the right hand side we get:
   A |
     |
     | [B] C
     +-------+------------
     |       |
     |       |
     |       |
     +-------+
     |       |
     |       |
     |       |
     |[B,D] F|[B,D] E
and there you have it: one splitting plane on the left, and further down the tree one on the right. E, which is a child node of D, which again is a child node of B, denotes the right hand side as empty. Now we can finish things up by splitting F
   A |
     |
     | [b] C
     +-------+------------
     |       |
     |       |
     |B,D,F]H|
     +-------+
     |B,D,F]G|
     |       |
     |       |
     |       |[B,D] E
and labeling G as empty and H as solid.

And there you have it, a BSP tree with a total of 4 splits, one for each side.


#5166384 hyphotetical raw gpu programming

Posted by Ohforf sake on 12 July 2014 - 05:17 AM

I mean, I can describe how a traditional CPU works down to the NAND gate level (and possibly further), but I'd be interested in learning about GPU internals more.

Phantom pretty much described, how it works (in the current generation), but to give a very basic comparison to CPUs:

Take your i7 CPU: It has (amongst other things) various caches, scalar and vectorized 8-wide ALUs, 4 cores and SMT (intel calls it "hyperthreading") that allows for 2 threads per core.
Now strip out the scalar ALUs, ramp up the vectorized ALUs from 8-wide to 32-wide and increase their number, allow the SMT to run 64 instead of 2 "threads"/warps/wavefronts per core (note that on GPUs, every SIMD lane is called a thread) and put in 8 of those cores instead of just 4. Then increase all ALU latencies by a factor of about 3, all cache and memory latencies by a factor of about 10, and also memory throughput by a significant factor (don't have a number, sorry).
Add some nice stuff like texture samplers, shared memory (== local data store) and some hardware support for divergent control flows, and you arrive more or less at an NVidia GPU.

Again, Phantom's desciption is way more accurate, but if you think in CPU terms, those are probably the key differences.


#5165836 Precompiled shaders

Posted by Ohforf sake on 09 July 2014 - 09:39 AM

I personally have much, much more expierience in OpenGL than DX, and in OpenGL you should never use precompiled shaders. IMHO


The OP was referring to the intermediate representation (bytecode, as alessio called it) of the shader, not the "actual" binary that is running on the GPU. OpenGL has nothing similar, whatsoever. There simply is no IR in OpenGL, you have to build it yourself.


#5164848 FBO Questions

Posted by Ohforf sake on 05 July 2014 - 02:30 AM

The hardware depth test has certain optimizations in place which can significantly speed up the rendering of occluded fragments. However, those optimizations require additional memory which is why depth attachments are more then a simple texture. You can't just use them as a regular render target, write to them, and still expect that additional memory for the optimizations to be consistent.
OpenGL actually has a method for aliasing the format of textures, called TextureView, and as far as I'm aware, even TextureViews can't change a depth texture into a regular one.

I think the best option is to allocate 3 buffers: The depth attachment, into which you render, the intermediate texture and the final texture. Note that the first 2 can be reused by other lights after the filtering, as long as the other shadowmaps have the same or a smaller size.


#5164736 FBO Questions

Posted by Ohforf sake on 04 July 2014 - 07:58 AM

@Question 1: I presume you want to reuse the memory of the shadow map? Because otherwise there is no reason to to use a DepthComponent texture as the final target.

@Question 2: The results of doing that are usually "undefined" which means anything can happen (including the intended) but the behavior can be different for different vendors, driver versions, GPU-loads, ... In short: don't do it.
The texture cache, through which you read, is not kept coherent with the video memory so writing a pixel does not effect the copy of that pixel in the texture cache, which is probably what you are seeing here. In most cases, you can read and write to the same texture, if you only read one pixel per thread and it's the very pixel you write, but in your case you are reading more than one pixel.


#5164496 Problem with Deffered Rendering

Posted by Ohforf sake on 02 July 2014 - 11:57 PM

I took the liberty to look the format up in his repo:
for (unsigned int i = 0 ; i < GBUFFER_NUM_TEXTURES ; i++) {
    glBindTexture(GL_TEXTURE_2D, m_textures[i]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB32F, windowWidth, windowHeight, 0, GL_RGB, GL_FLOAT, NULL);
    glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, GL_TEXTURE_2D, m_textures[i], 0);
}
 
glBindTexture(GL_TEXTURE_2D, m_depthTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT32F, windowWidth, windowHeight, 0, GL_DEPTH_COMPONENT, GL_FLOAT, NULL);
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, m_depthTexture, 0);
Now I'm pretty sure, there are no real GL_RGB formats, only GL_R, GL_RG and GL_RGBA. But OpenGL should extend that automatically. However note, that the textures are incomplete: no calls to glTexParameter{i,f}. I would expect s.th. like this:
    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, 0);

    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
    glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
Regarding the CodeXL images, can you hover over the pixels with your mouse and look at the numeric content? Maybe the colors are just the result of some weird banding filter, that CodeXL automatically applies to all float buffers.


Also note, that your GBuffer is extremely big. You currently have 64 bytes per pixel, not counting the depth buffer. Once you have everyting running correctly, you might want to decrease that a bit.


#5164289 Multitexturing not working

Posted by Ohforf sake on 02 July 2014 - 06:16 AM

Pretty sure this:
glUniform1i(glGetUniformLocation(m_shader->GetProgramID(), "gTex0"), GL_TEXTURE0);
has to be
glUniform1i(glGetUniformLocation(m_shader->GetProgramID(), "gTex0"), 0);


#5164060 How do I generate mipmaps using GL Image?

Posted by Ohforf sake on 01 July 2014 - 10:20 AM

From a cursory glance it seems like you can load DDS files into an image set and load that into a texture. DDS files support pre baked mipmap levels, that are stored alongside the full resolution image. By the looks of it, the createTexture method just reacts to the presence (or absence) of mipmap levels in the image set.


#5164038 Cry Engine or Unity?

Posted by Ohforf sake on 01 July 2014 - 07:36 AM

To my knowledge, neither CryEngine nor Unity have the supporting infrastructure for MMORPGs out of the box, if that is what you are looking for. I have never attemted to write a MMORPG, so I don't know if there are stock frameworks that can be bought for it. You might have to develop the infrastructure yourself.

For prototyping (testing gameplay ideas, etc.) Unity is usually considered to be the better choice.

As game development goes, I think most would agree that MMORPGs are the hardest to pull off. Apart from experience you need a large team (you need to create a lot of stuff) and a lot of money. I believe the best course of action to develop (or be part of the development of) an successfull MMORPG is to aquire the neccessary skills and then join a company like blizzard, that is already working on it.


#5164025 How to monitor video card memory

Posted by Ohforf sake on 01 July 2014 - 06:11 AM

AFAIK, nothing you can do in a normal user space process or in/with XNA/DirectX should be able to crash the system, only your own program. In CUDA, with messed up pointer arithmetic, you can crash the display driver, but HLSL doesn't expose any pointers.

Can you build a minimal working (== system crashing) example program?


#5163937 sse-alignment troubles

Posted by Ohforf sake on 30 June 2014 - 03:48 PM

Sadly the other thread with the weird stack alignment bug got closed, so I can't post there anymore. But it seems like fir in his foresight created enough of them for everyone, so I'm just gonna post here.

For those of you, other then fir, who might have (or at some point in the future will) stumble across the same problem (aligned loads from the stack, generated by the compiler result in unaligned addresses) do not despair. There is another tool, next to the debugger, that fir also doesn't need, but that is very handy in this case, and which helped me a lot when I had to solve the very same problem (and I'm only sharing this, because it was a real WTF? moment for me). This tool is called google.

Just google for "GCC windows stack alignment" and pick (for example) the 3rd link that comes up, and you will get a nice explanation of the problem, alongside the solution.

Or, you can post a disassembly dump in the nearest internet community and wait for someone to figure it out, while you chill and play tetris *ducks and runs for the door*.


#5163613 encounter weird problem when turn on the color blending.

Posted by Ohforf sake on 29 June 2014 - 07:07 AM

The following is not meant to turn you away, but since you described yourself as a "rookie in OpenGL", I think it should be pointed out to you to prevent any misconceptions:

OpenGL is a thin (most will argue still too thick) api towards the GPU, providing you with the most basic interface to render and shade triangles. You may have noticed, that it doesn't provide any means to load models or textures. The newer versions of OpenGL don't even support lighting out of the box. The idea is that you implement those things on top of OpenGL. This holds for transparency as well. Transparency is not as simple as enabling blending, you have to implement some form of algorithm for it on top of OpenGL. "Depth peeling", as suggested by L.Spiro, is on of those techniques. Splitting the model into parts, whose rendering order gets determined by the camera position (what haegarr suggested), is another one. There are quite a few more.

Using OpenGL (or, for that matter, OpenGL ES, Direct3D, Metal, Mantle, ...) means that you will have to write a lot of code around it,
as these are not intended to be full fledged rendering engines.




PARTNERS