# B_old

Member

767

2. ## "Random Digit Scrambling" for quasi montecarlo sampling

In the paper Efficient Multidimensional Sampling the authors describe a technique they call Random Digit Scrambling (RDS). Has someone more implementation details on this algorithm? The paper gives source code for three different radical inverse functions that can be used to implement RDS but I don't understand how they should be applied. I especially don't understand how they jump from Hammersley samples to RDS.  func Hammersley(i, numSamples uint32) Vector2 { return MakeVector2(float32(i) / float32(numSamples), radicalInverse_vdC(i)) } Can the above snippet easily be changed to something that generates RDS?
3. ## "Random Digit Scrambling" for quasi montecarlo sampling

Hammersley sequences or the like are a higher level construct than radical inverse functions (in fact the "end product" of the whole algorithm), not a low-level primitive.   I read the paper and don't understand how they combine the routines to get their RDS.  The scrambled part still makes sense to me. func ScrambledHammersley(i, numSamples, r uint32) Vector2 { return MakeVector2(float32(i) / float32(numSamples), ScrambledRadicalInverse_vdC(i, r)) } But apparently they are combining all of the three radical inverse functions to get some result.
4. ## Decouple simulation and rendering: Interpolate particles

If simulation and rendering run at different frequencies it can be useful to interpolate between two simulation steps during rendering for smooth animations. For moving meshes I simply interpolate the transformation on the CPU before sending to the GPU.   For particles, which I simulate entirely on the CPU, I'm less sure about a good strategy. Currently I keep the particle array from the previous simulation frame around and send both to the GPU where I do the interpolation. I figured doing this on the GPU is faster even though I'm sending twice the data over now. Does this make sense or would you do the interpolation on the CPU as well?   I have two arrays of particle structs. One for the previous and the other for the current frame. Before each simulation frame I just copy the array. I send them to the GPU as two separate buffers. Would it be smarter to store it as one interleaved array?   Particle rendering is currently not a bottleneck for the scenes I have (at least not the number of particles), but I would like to set it up somewhat sane. How would you handle this?
5. ## Decouple simulation and rendering: Interpolate particles

I see. Thanks for the input!
6. ## OpenGL 4.4 render to SNORM

Yes, it returned GL_FRAMEBUFFER_COMPLETE.   Maybe the easiest thing is to render to UNORM and map [-1, 1] to [0, 1] in the shader. Not really a big deal, but I wanted to find out what the problem is.
7. ## OpenGL OpenGL 4.4 render to SNORM

Hi,   is it possible to a SNORM texture in OpenGL 4.4? Apparently they are not a required format for color targets in 4.2.   I want to render to a RG16_SNORM target to store normals in octahedron format. The linked paper contains code that expects and outputs data in the [-1, 1] range and I was just assuming that it would automatically work with SNORM textures.   The output seems to get clamped to [0, 1] though. It checked with a floating point render target and got the expected results so I don't think it is an issue with the code.   Should this work? Am I maybe doing something wrong when creating the texture?   EDIT:   D3D11 hardware supports SNORM render targets, so I guess I'm doing something wrong.
8. ## OpenGL 4.4 render to SNORM

Yes, I'm checking for completeness. The behavior I get is completely identical to just using a UNORM target, so it seems to be silently converting to that. I also don't get any debug output from OpenGL.   This is with a GeForce GTX 570, driver 331.38 on Ubuntu.   Do you have experience with rendering to this format?
9. ## OpenGL 4.4 render to SNORM

I don't know. It says:   We are silent about requiring R, RG, RBA and RGBA rendering. This is an implementation choice.   As the hardware seems to perfectly capable of rendering to SNORM I expect that it is implemented for all drivers. Has someone here successfully rendered to SNORM with OpenGL?   EDIT:   This OpenGL 4.4 core spec document also does not mark the SNORM formats as something that must be supported for a color target. Maybe it is really not supported to render to SNORM. Can anybody confirm this?
10. ## Registering an uninstanciable reference type: Identifier 'XXX' is not a data type

I have a C++ type that I don't want to instantiate from scripts and only use as handles, preferably by passing it to the constructor for script classes that should be able to interact with it.   I followed the documentation and used asOBJ_NOCOUNT and did not provide a factory behavior. Actually, this is all I did (never mind the wrapper): engine.register_object_type("Scene", 0, asOBJ_REF | asOBJ_NOCOUNT); engine.register_object_method("Scene", "const string &name() const", asMETHODPR(scene::Scene, name, () const, const std::string&)); When I try to load a script that looks like this: class Scene_test { Scene_test(Scene@ scene) { scene_ = scene; } void on_scene_loaded() { print("Hello World"); } Scene@ scene_; }; I get an error stating "Identifier 'Scene' is not a data type".   Any idea what I'm missing?
11. ## Registering an uninstanciable reference type: Identifier 'XXX' is not a data type

Oh, namespace was indeed a problem. Sorry. The default namespace was set to "scene", but writing scene::Scene in the script did not change the error message. As a test I changed the namespace to "", after which I get "There is no copy operator for the type 'Scene' available."
12. ## OpenGL Uniform buffer updates

I use uniform buffers to set constants in the shaders. Currently each uniform block is backed by a uniform buffer of the appropriate size, glBindBufferBase is called once per frame and glNamedBufferSubDataEXT is called for every object without orphaning.   I tried to optimize this by using a larger uniform buffer, calling glBindBufferRange and updating subsequent regions in the buffer and this turned out to be significantly slower. After looking around I found this and similar threads that talk about the same problem. The suggestion seems to be to use one large uniform buffer for all objects, only update once with the data for all objects and call glBindBufferRange for every drawcall.    Is this the definite way to go with in OpenGL, regardless of using BufferSubData or MapBufferRange? At one place it was suggested that for small amounts of data glUniformfv is the fastest choice. It would be nice to implement comparable levels of performance with uniform buffers.   What is your experience with updating shader uniforms in OpenGL?
13. ## Uniform buffer updates

OK, I think I understand. If we are thinking about the same thing calls to glBindBufferRange actually have to have an offset that is a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, which on my machine (and I think on many other cards) 256.   Just now I tried to evaluate if the uniform updates are a bottleneck in my case. For this test I stripped down the rendering pipeline as much as I could, regarding OpenGL interaction. I simulated the performance of the "optimized" uniform updates by replacing glBufferSubData(float4x4) with glBindBufferRange(). I compared the two approaches with 1K and 4K draw calls for very simple geometry (same vb for every draw  call) and could not see any noticeable difference.   I concluded that the optimized version could not possibly be faster than just calling glBindBufferRange() for every differently transformed object, which in turn means this is not my bottleneck.    So has the driver situation improved or is my test/conclusion flawed?
14. ## Uniform buffer updates

Thanks, that answers my question! Have you tried this architecture that works well for OpenGL with a D3D backend? Will it also work well there? At least it would make it less annoying, that this is apparently a driver weakness.   I'm indeed already iterating over each object twice but I'm wondering if I don't have to do it a third time now, because the first iteration is followed by a sort which could tell me when I don't need to update the per-material buffers for instance.    I'm also wondering about another thing. Is it very important that the uniform buffer is large enough to fit every single object drawn per frame, or can you achieve good performance with a buffer that is large enough to contain some object data before the data has to be changed. To me it sounds like that should already help, but then again the handling of uniform buffers shouldn't be so hard in the first place.
15. ## Uniform buffer updates

What I currently do: bindBufferBase(smallUniformBuffer); for (o : objects) { bufferSubData(smallUniformBuffer, o.transformation); draw(o.vertices); } What I think I should be doing: offset = 0; for (o : objects) { memory[offset] = o.transformation; ++offset; } bufferData(hugeBuffer, memory); offset = 0; for (o : objects) { bindBufferRange(hugeBuffer, offset); draw(o.vertices); } At first I was a bit frustrated because I am used to the Effects of the D3D-Sdk, but after reading the presentation about batched buffer updates it seems a D3D application can also benefit from doing it this way. So the architecture can be the same for both APIs.   @richardurich: "Were updates far enough apart to avoid conflicts (can be perf hit if call A writes to first half of block A, and call B writes to second half of block A)?" Can you explain this a bit more. Are you saying, that it is not good to write to the first half of the buffer and then to the second half, although the ranges don't intersect?
16. ## Question about code in "Stupid Spherical Harmonics (SH) Tricks"

The Stupid Spherical Harmonics (SH) Tricks article contains several code snippets and one is about irradiance environment maps.   The code is split in a CPU and a GPU part. And I am a bit confused about what happens where.  I interpret it the way, that the SH coefficients are pre-processed before placing them in shader constants so that the code on the GPU can be somewhat simpler. Is that correct?   AFAIK you can interpolate two sets of SH coefficients to get a blended result. Is this still possible with the "processed" SH that are used by this shader code? Or would the interpolation have to be done beforehand?   Microsoft provides a small API that can project a cube map into spherical harmonics. I found the code really helpful but unfortunately I couldn't find any concise information about how to "sample" the SH coefficients with a direction vector.
17. ## glGetTexImage/glGetTextureImageEXT with render to texture and FBO

I render to the individual faces of  a cubemap by setting different views on the cubemap texture on the bound FBO. After rendering is done I want to download the texture data using glGetTextureImageEXT. I read somewhere that I should unbind the FBO before calling glGetTextureImageEXT. If I do that the first call to glGetTextureImageEXT works as expected, but the second call gives all zeros. However, if I bind the FBO, set with the different texture views, before glGetTextureImageEXT, I get exactly the data I expect. The most confusing thing is, that it works a single time when no FBO is bound.  (I tried it with glGetTexImage and it behaves exactly the same.) Any idea how this is supposed to behave?
18. ## glGetTexImage/glGetTextureImageEXT with render to texture and FBO

Ok, it seems like I provided a bad description.   What I am doing is rendering to texture and then I want to download the data to the CPU. As far as I know PBOs are only a means of meaking the downloading stuff asynchronous, which can be a performance win if handled smartly.   Anyway, glGetTextureImageEXT should download the texture data to system memory, but it only works the "first" time for me. Unless I bind the FBO which I used to render to the texture in the first place. And this is contradictory to some information I found on this topic.
19. ## Efficient Hammersley distribution on the GPU

OK, that's good to know.
20. ## Efficient Hammersley distribution on the GPU

I'm looking for an efficient way to get hammersley distribution points on the GPU. I found this page with some GLSL code that relies on bit-shifts. Is this efficient on the GPU? Are you aware of more efficient implementation? Is it better to create a look-up-table?
21. ## Integrate BRDF from UE4 presentation

I'm trying to implement the image based lighting described in the UE4 presentation.   The have a function called IntegrateBRDF() which is supposed to return two values (a, b) that are both somewhere between 0 and 1. In my tests b seems to be in the [0, 0.15] range however. Has anyone played with this function as well? Do I probably have an implementation error or is it supposed to be like that?
22. ## OpenGL OpenGL equivalent of D3D staging textures

I'm reading about texture readback in OpenGL and as usual I find many different possibilities.  From D3D I'm used to copying the texture I want to read back into a staging texture which I than map in order to read the pixels. Should I basically do the same in OpenGL or is there a preferred way? There is both glReadPixels() and glGetTexImage() and they seem to basically do the same thing. Pixel buffer objects are often mentioned in this context and seem to be a means of making the previous calls asynchronous until you map the PBO. Is that correct? So is glGetTexImage() & PBO a good approach or should I still copy to a different texture first, so that I can continue rendering to the texture I want to read back?
23. ## Whats your preferred DDS reader?

I gave GLI another try and have to admit, that the mipmap problem was entirely my mistake. I was able to solve the problem with BC5 encoded files saved from gimp and it seems to be fixed in the next version. All in all I'm quite content with the library now.
24. ## OpenGL Whats your preferred DDS reader?

Since doing more Linux/OpenGL stuff I have been wondering what people are using to load DDS files. Freeimage for instance decompresses the data automatically, which is not what I want in this case. I then tried GLI but unfortunately it fails read mipmaps, at least for files generated with the DDS plugin for Gimp and seems to have even more trouble with DXT5 compressed files. Is there some library that is more robust? Are people generally implementing their own readers? Or compressing textures on the fly? Maybe I should take a look at Devil although it is no longer maintained. Can you recommend it?
25. ## OpenGL Odd behavior with arbitrarily sized textures

I stumbled over this while porting font printing code from D3D to OpenGL. I place all glyphs I potentially want to draw in an atlas which can have an arbitrary size depending on the resolution of the glyphs. In the case I tested, I ended up with an 243x211 atlas. In OpenGL the atlas looks "skewed" and I can't figure out why. The problem does not appear if the atlas has a size of 256x201 for instance, without changing anything else. So I think the problem is directly related to the texture size. The data used to create the texture is allocated like this: data = new unsigned char[x * y]; So I'm not explicitly controlling the alignment or anything. Any idea what could be going on here? Do I need glPixelStorei() in some form? NPOT textures are supported by the hardware.