B_old

Member

767

689 Good

• Rank

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

2. "Random Digit Scrambling" for quasi montecarlo sampling

Hammersley sequences or the like are a higher level construct than radical inverse functions (in fact the "end product" of the whole algorithm), not a low-level primitive.   I read the paper and don't understand how they combine the routines to get their RDS.  The scrambled part still makes sense to me. func ScrambledHammersley(i, numSamples, r uint32) Vector2 { return MakeVector2(float32(i) / float32(numSamples), ScrambledRadicalInverse_vdC(i, r)) } But apparently they are combining all of the three radical inverse functions to get some result.
3. "Random Digit Scrambling" for quasi montecarlo sampling

In the paper Efficient Multidimensional Sampling the authors describe a technique they call Random Digit Scrambling (RDS). Has someone more implementation details on this algorithm? The paper gives source code for three different radical inverse functions that can be used to implement RDS but I don't understand how they should be applied. I especially don't understand how they jump from Hammersley samples to RDS.  func Hammersley(i, numSamples uint32) Vector2 { return MakeVector2(float32(i) / float32(numSamples), radicalInverse_vdC(i)) } Can the above snippet easily be changed to something that generates RDS?
4. Decouple simulation and rendering: Interpolate particles

I see. Thanks for the input!
5. Decouple simulation and rendering: Interpolate particles

If simulation and rendering run at different frequencies it can be useful to interpolate between two simulation steps during rendering for smooth animations. For moving meshes I simply interpolate the transformation on the CPU before sending to the GPU.   For particles, which I simulate entirely on the CPU, I'm less sure about a good strategy. Currently I keep the particle array from the previous simulation frame around and send both to the GPU where I do the interpolation. I figured doing this on the GPU is faster even though I'm sending twice the data over now. Does this make sense or would you do the interpolation on the CPU as well?   I have two arrays of particle structs. One for the previous and the other for the current frame. Before each simulation frame I just copy the array. I send them to the GPU as two separate buffers. Would it be smarter to store it as one interleaved array?   Particle rendering is currently not a bottleneck for the scenes I have (at least not the number of particles), but I would like to set it up somewhat sane. How would you handle this?
6. OpenGL 4.4 render to SNORM

Yes, it returned GL_FRAMEBUFFER_COMPLETE.   Maybe the easiest thing is to render to UNORM and map [-1, 1] to [0, 1] in the shader. Not really a big deal, but I wanted to find out what the problem is.
7. OpenGL 4.4 render to SNORM

Yes, I'm checking for completeness. The behavior I get is completely identical to just using a UNORM target, so it seems to be silently converting to that. I also don't get any debug output from OpenGL.   This is with a GeForce GTX 570, driver 331.38 on Ubuntu.   Do you have experience with rendering to this format?
8. OpenGL 4.4 render to SNORM

I don't know. It says:   We are silent about requiring R, RG, RBA and RGBA rendering. This is an implementation choice.   As the hardware seems to perfectly capable of rendering to SNORM I expect that it is implemented for all drivers. Has someone here successfully rendered to SNORM with OpenGL?   EDIT:   This OpenGL 4.4 core spec document also does not mark the SNORM formats as something that must be supported for a color target. Maybe it is really not supported to render to SNORM. Can anybody confirm this?
9. OpenGL OpenGL 4.4 render to SNORM

Hi,   is it possible to a SNORM texture in OpenGL 4.4? Apparently they are not a required format for color targets in 4.2.   I want to render to a RG16_SNORM target to store normals in octahedron format. The linked paper contains code that expects and outputs data in the [-1, 1] range and I was just assuming that it would automatically work with SNORM textures.   The output seems to get clamped to [0, 1] though. It checked with a floating point render target and got the expected results so I don't think it is an issue with the code.   Should this work? Am I maybe doing something wrong when creating the texture?   EDIT:   D3D11 hardware supports SNORM render targets, so I guess I'm doing something wrong.
10. Registering an uninstanciable reference type: Identifier 'XXX' is not a data type

Oh, namespace was indeed a problem. Sorry. The default namespace was set to "scene", but writing scene::Scene in the script did not change the error message. As a test I changed the namespace to "", after which I get "There is no copy operator for the type 'Scene' available."
11. Registering an uninstanciable reference type: Identifier 'XXX' is not a data type

I have a C++ type that I don't want to instantiate from scripts and only use as handles, preferably by passing it to the constructor for script classes that should be able to interact with it.   I followed the documentation and used asOBJ_NOCOUNT and did not provide a factory behavior. Actually, this is all I did (never mind the wrapper): engine.register_object_type("Scene", 0, asOBJ_REF | asOBJ_NOCOUNT); engine.register_object_method("Scene", "const string &name() const", asMETHODPR(scene::Scene, name, () const, const std::string&)); When I try to load a script that looks like this: class Scene_test { Scene_test(Scene@ scene) { scene_ = scene; } void on_scene_loaded() { print("Hello World"); } Scene@ scene_; }; I get an error stating "Identifier 'Scene' is not a data type".   Any idea what I'm missing?

OK, I think I understand. If we are thinking about the same thing calls to glBindBufferRange actually have to have an offset that is a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, which on my machine (and I think on many other cards) 256.   Just now I tried to evaluate if the uniform updates are a bottleneck in my case. For this test I stripped down the rendering pipeline as much as I could, regarding OpenGL interaction. I simulated the performance of the "optimized" uniform updates by replacing glBufferSubData(float4x4) with glBindBufferRange(). I compared the two approaches with 1K and 4K draw calls for very simple geometry (same vb for every draw  call) and could not see any noticeable difference.   I concluded that the optimized version could not possibly be faster than just calling glBindBufferRange() for every differently transformed object, which in turn means this is not my bottleneck.    So has the driver situation improved or is my test/conclusion flawed?

Thanks, that answers my question! Have you tried this architecture that works well for OpenGL with a D3D backend? Will it also work well there? At least it would make it less annoying, that this is apparently a driver weakness.   I'm indeed already iterating over each object twice but I'm wondering if I don't have to do it a third time now, because the first iteration is followed by a sort which could tell me when I don't need to update the per-material buffers for instance.    I'm also wondering about another thing. Is it very important that the uniform buffer is large enough to fit every single object drawn per frame, or can you achieve good performance with a buffer that is large enough to contain some object data before the data has to be changed. To me it sounds like that should already help, but then again the handling of uniform buffers shouldn't be so hard in the first place.

What I currently do: bindBufferBase(smallUniformBuffer); for (o : objects) { bufferSubData(smallUniformBuffer, o.transformation); draw(o.vertices); } What I think I should be doing: offset = 0; for (o : objects) { memory[offset] = o.transformation; ++offset; } bufferData(hugeBuffer, memory); offset = 0; for (o : objects) { bindBufferRange(hugeBuffer, offset); draw(o.vertices); } At first I was a bit frustrated because I am used to the Effects of the D3D-Sdk, but after reading the presentation about batched buffer updates it seems a D3D application can also benefit from doing it this way. So the architecture can be the same for both APIs.   @richardurich: "Were updates far enough apart to avoid conflicts (can be perf hit if call A writes to first half of block A, and call B writes to second half of block A)?" Can you explain this a bit more. Are you saying, that it is not good to write to the first half of the buffer and then to the second half, although the ranges don't intersect?