• Content count

  • Joined

  • Last visited

Community Reputation

122 Neutral

About teichgraf

  • Rank
  1. Hello, is it possible with the fixed function pipeline to use a floating point texture (GL_ARB_texture_float), render some overlapping quads with this texture to a FBO, render the FBO to a fullscreen quad and perform an alpha test on the rendered fbo? I think this should be no problem with a shader, but I haven't used GLSL before. And I want to avoid the complexity of them, if it's possible without.
  2. Efficient 2D Metaballs

    Quote:Original post by ElectroDruid ...with some kind of post-process - anti-aliasing, or a small amount of motion blur... That's what I thought before. If I try multisampling / blurring and leave the artifacts as they are, maybe it looks really interesting like a viscid fluid. Quote:Original post by ElectroDruid ...or maybe even just rendering to a texture that's bigger than your window so that it smooths over the edges a bit when you draw it on a window-sized quad. I also thought that, but I don't think this would work, because the render-to-texture step copies the back-buffer to a texture, which needs to be 2^n. So the texture has to be the same size as the frame. Or am I wrong? Quote:Original post by ElectroDruid You say you're generating the sprite texture for your metaballs in code. How does your code generate the alpha channel for the sprite - what kind of falloff function are you using? I used the squared distance to the center of the texture. And putting this all in 255 discrete steps is too small, as you and I thought. Although it looks good and it is the right function, it generates these artifacts, caused by the low value range [0, 255]. Now I use the square root (length) -> radial gradient (looks like the texure you mentioned). This texture function doesn't generate these artifacts. So it all depends on the texture. Maybe I will try another function for generating the texture or / and using floating point textures. If I know how. Thanks for your help!
  3. Efficient 2D Metaballs

    Thanks for your reply. The texture that I use is generated in the code with size = 2 * metaball.radiusSquared. And it is rendered to a quad with the same size. Actually no resizing has to be done. ? But If I disable mip-mapping I only see blue quads without any alpha-blending. ?? Could you give me some hints on the GL-settings, what texture you are using and at which size it is rendered?
  4. Efficient 2D Metaballs

    Quote:Original post by ElectroDruid That worked a treat! I'm very pleased with the result :) At the moment I am writing a demo using the same approach for meta-circles (texture-rendering and alpha-testing). But I have ugly artifacts at the borders of the meta-circles. I think this is caused by the 2^8 alpha channel size. It looks like this: For attentuation I use this generated texture (64x64): Even a larger texture doesn't look better. ? I have also MipMaps for the render and the attentuation texture enabled: GL.Hint(HintTarget.GenerateMipmapHint, HintMode.Nicest); GL.TexParameter(TextureTarget.Texture2d, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear); GL.TexParameter(TextureTarget.Texture2d, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear); Do you encounter similar problems? How can one avoid these artifacts?
  5. Fast Vector Math library for .Net

    Quote:cignox1 Yes, that's what I was refering to. I also pointed this out in a previous post: Quote:Original post by teichgraf The nicest would be if I have a VectorN class with all operators and Vector3 / Vector2 which are derived from VectorN. So i don't have to implement the operators twice. But I think this is also slow because of the virtual calls and for-loops in the operators. But I have another question: Quote:Original post by Fiddler OpenTK also comes with a fairly full-featured math library (Vectors, Matrices, Quaternions).As a bonus, it contains both single precision and double precision structs, which will come in handy in the future when video cards will support 64bit precision. I did not found double and float structs. There are only float structs in the current implementation. And how can one write generics in .Net 2.0 for double and float. Such as Vector3<float> ... ? Thanks in advance.
  6. Fast Vector Math library for .Net

    I have made some performance tests for three Vector3 float structs: OpenTK, Sharp3D and the Vector struct from Richard Potters Codeproject Codeproject . For the test I used my own built OpenTK, a own built Sharp3D Dll and a separate Dll-project for Potters Vector3 class. I converted Potters Vector3 to float. Here are the results for the release build: Testing 30.000.000 iterations. OpenTK Add Func - 00:00:00.3910775 OpenTK Add Op - 00:00:01.3609497 Sharp3 Add Func - 00:00:01.0324446 Sharp3 Add Op - 00:00:01.6581686 Potter Add - 00:00:01.6894548 OpenTK Sub Func - 00:00:00.3754344 OpenTK Sub Op - 00:00:01.3609497 Sharp3 Sub Func - 00:00:01.0480877 Sharp3 Sub Op - 00:00:01.6738117 Potter Sub - 00:00:01.6581686 OpenTK Mul Scalar Func - 00:00:00.3597913 OpenTK Mul Scalar Op - 00:00:01.2983773 Sharp3 Mul Scalar Func - 00:00:01.0637308 Sharp3 Mul Scalar Op - 00:00:01.7676703 Potter Mul Scalar - 00:00:01.5643100 OpenTK Div Scalar Func - 00:00:00.5005792 OpenTK Div Scalar Op - 00:00:01.2670911 Sharp3 Div Scalar Func - 00:00:01.7676703 Sharp3 Div Scalar Op - 00:00:02.3621081 Potter Div Scalar - 00:00:02.3777512 OpenTK Dot - 00:00:00.9855153 Sharp3 Dot - 00:00:01.0011584 Potter Dot - 00:00:01.2201618 OpenTK Cross Copy - 00:00:01.5799531 OpenTK Cross Ref - 00:00:00.4380068 Sharp3 Cross - 00:00:01.7363841 Potter Cross - 00:00:01.8615289 OpenTK Length - 00:00:00.7821550 Sharp3 Length - 00:00:00.7821550 Potter Length - 00:00:04.2082447 OpenTK Length Squared - 00:00:00.3754632 Sharp3 Length Squared - 00:00:00.3598189 OpenTK Normalize - 00:00:02.0024704 Sharp3 Normalize - 00:00:02.7533968 Potter Normalize - 00:00:10.3252380 OpenTK Normalize Fast - 00:00:02.1589134 It seems that the methods from OpenTK are the fastest. It's hardly surprising that passing the arguments by ref is significant faster than the usually copy pass. Here is the complete Source code for my performance test: using System; using System.Collections.Generic; using System.Text; namespace PerfTest_Vector3 { class Program { private delegate void TestDel(); static void Main(string[] args) { // Init. Random rand = new Random(); float scalar = (float)rand.NextDouble(); float x1 = (float)rand.NextDouble(); float y1 = (float)rand.NextDouble(); float z1 = (float)rand.NextDouble(); float x2 = (float)rand.NextDouble(); float y2 = (float)rand.NextDouble(); float z2 = (float)rand.NextDouble(); OpenTK.Math.Vector3 tk1 = new OpenTK.Math.Vector3(x1, y1, z1); OpenTK.Math.Vector3 tk2 = new OpenTK.Math.Vector3(x2, y2, z2); OpenTK.Math.Vector3 tk3 = new OpenTK.Math.Vector3(); Sharp3D.Math.Core.Vector3F sh1 = new Sharp3D.Math.Core.Vector3F(x1, y1, z1); Sharp3D.Math.Core.Vector3F sh2 = new Sharp3D.Math.Core.Vector3F(x2, y2, z2); Sharp3D.Math.Core.Vector3F sh3 = new Sharp3D.Math.Core.Vector3F(); Vector3F rp1 = new Vector3F(x1, y1, z1); Vector3F rp2 = new Vector3F(x2, y2, z2); Vector3F rp3 = new Vector3F(); const int iters = 30000000; // Test Console.WriteLine("Testing {0:n} iterations.", iters); Console.WriteLine(); // Test Add Test("OpenTK Add Func ", iters, delegate() { OpenTK.Math.Vector3.Add(ref tk1, ref tk2, out tk3); }); Test("OpenTK Add Op ", iters, delegate() { tk3 = tk1 + tk2; }); Test("Sharp3 Add Func ", iters, delegate() { Sharp3D.Math.Core.Vector3F.Add(sh1, sh2, ref sh3); }); Test("Sharp3 Add Op ", iters, delegate() { sh3 = sh1 + sh2; }); Test("Potter Add ", iters, delegate() { rp3 = rp1 + rp2; }); Console.WriteLine(); // Test Sub Test("OpenTK Sub Func ", iters, delegate() { OpenTK.Math.Vector3.Sub(ref tk1, ref tk2, out tk3); }); Test("OpenTK Sub Op ", iters, delegate() { tk3 = tk1 - tk2; }); Test("Sharp3 Sub Func ", iters, delegate() { Sharp3D.Math.Core.Vector3F.Subtract(sh1, sh2, ref sh3); }); Test("Sharp3 Sub Op ", iters, delegate() { sh3 = sh1 - sh2; }); Test("Potter Sub ", iters, delegate() { rp3 = rp1 - rp2; }); Console.WriteLine(); // Test Mul Scalar Test("OpenTK Mul Scalar Func ", iters, delegate() { OpenTK.Math.Vector3.Mult(ref tk1, scalar, out tk3); }); Test("OpenTK Mul Scalar Op ", iters, delegate() { tk3 = tk1 * scalar; }); Test("Sharp3 Mul Scalar Func ", iters, delegate() { Sharp3D.Math.Core.Vector3F.Multiply(sh1, scalar, ref sh3); }); Test("Sharp3 Mul Scalar Op ", iters, delegate() { sh3 = sh1 * scalar; }); Test("Potter Mul Scalar ", iters, delegate() { rp3 = rp1 * scalar; }); Console.WriteLine(); // Test Div Scalar Test("OpenTK Div Scalar Func ", iters, delegate() { OpenTK.Math.Vector3.Div(ref tk1, scalar, out tk3); }); Test("OpenTK Div Scalar Op ", iters, delegate() { tk3 = tk1 / scalar; }); Test("Sharp3 Div Scalar Func ", iters, delegate() { Sharp3D.Math.Core.Vector3F.Divide(sh1, scalar, ref sh3); }); Test("Sharp3 Div Scalar Op ", iters, delegate() { sh3 = sh1 / scalar; }); Test("Potter Div Scalar ", iters, delegate() { rp3 = rp1 / scalar; }); Console.WriteLine(); // Test Dot Test("OpenTK Dot ", iters, delegate() { scalar = OpenTK.Math.Vector3.Dot(tk1, tk2); }); Test("Sharp3 Dot ", iters, delegate() { scalar = Sharp3D.Math.Core.Vector3F.DotProduct(sh1, sh2); }); Test("Potter Dot ", iters, delegate() { scalar = rp1.DotProduct(rp2); }); Console.WriteLine(); // Test Cross Test("OpenTK Cross Copy ", iters, delegate() { tk3 = OpenTK.Math.Vector3.Cross(tk1, tk2); }); Test("OpenTK Cross Ref ", iters, delegate() { OpenTK.Math.Vector3.Cross(ref tk1, ref tk2, out tk3); }); Test("Sharp3 Cross ", iters, delegate() { sh3 = Sharp3D.Math.Core.Vector3F.CrossProduct(sh1, sh2); }); Test("Potter Cross ", iters, delegate() { rp3 = rp1.CrossProduct(rp2); }); Console.WriteLine(); // Test Length Test("OpenTK Length ", iters, delegate() { scalar = tk1.Length; }); Test("Sharp3 Length ", iters, delegate() { scalar = sh1.GetLength(); }); Test("Potter Length ", iters, delegate() { scalar = rp1.Magnitude; }); Console.WriteLine(); // Test Length Squared Test("OpenTK Length Squared ", iters, delegate() { scalar = tk1.LengthSquared; }); Test("Sharp3 Length Squared ", iters, delegate() { scalar = sh1.GetLengthSquared(); }); Console.WriteLine(); // Test Normalize Test("OpenTK Normalize ", iters, delegate() { tk1.Normalize(); }); Test("Sharp3 Normalize ", iters, delegate() { sh1.Normalize(); }); Test("Potter Normalize ", iters, delegate() { rp1.Normalize(); }); Console.WriteLine(); // Test Normalize Fast Test("OpenTK Normalize Fast ", iters, delegate() { tk1.NormalizeFast(); }); Console.WriteLine(); Console.WriteLine(); Console.WriteLine("Finished"); Console.ReadLine(); } private static void Test(string prefix, int iters, TestDel testFunc) { DateTime start = DateTime.Now; for (int i = 0; i < iters; i++) { testFunc(); } TimeSpan span = DateTime.Now - start; Console.WriteLine("{0} - {1}", prefix, span); } } }
  7. Fast Vector Math library for .Net

    Thank for the infos. @Geoff C: I would also prefer a managed version. I have also found the Sharp3D lib. at Codeplex and many other. But I think the project is dead (see the source code commits). ? The Codeproject implementation looks nice.
  8. Fast Vector Math library for .Net

    Thanks again for the answers. OpenTK sounds interesseting. But I miss a Matrix33 and a VectorN class. Maybe I could extent the OpenTK lib. At the moment I use Tao.OpenGl for rendering. So it should be no problem to switch to OpenTK. By the way, how could one use SIMD instructions in C#? unsafe { __asm { } } ??
  9. Fast Vector Math library for .Net

    Thanks for the replies! I thought that I have to implement it. :-( Fortunately I don't need a full featured math lib. So what could be the fastest way to code a vector3d / vector2d class with a nice interface to use. The nicest would be if I have a VectorN class with all operators and Vector3 / Vector2 which are derived from VectorN. So i don't have to implement the operators twice. But I think this is also slow because of the virtual calls and for-loops in the operators. Maybe it could be the fastest to implement the Vector3 / Vector2 separate as structs or sealed classes? How could I implement them for float and double without having separate classes. With C# generics I don't know how to do this like in C++. ? I would appreciate it if I could get some advice on this. Thanks in advance.
  10. Hello I am searching for a non-commercial fast vector algebra library, which has bindings for .Net and implements some object-oriented features like operator overloading. It would also be good if it uses SIMD instructions. All .Net math libraries that I found are either commercial or they don't support a nice interface. Maybe I have to port a C/C++ lib on my own. But I don't want to reinvent the wheel. :( Thanks in advance. [Edited by - teichgraf on February 29, 2008 4:05:41 AM]
  11. VBO Performance with different cards

    Thanks for the answer. I will try it. For further discussion, please us the same post on: http://www.opengl.org/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=3&t=014685#000000 Thanks!
  12. Hello I am using VBOs for large Triangle Data with GL_DRAW_STATIC. The vertex array is interleaved with normal and position: NX,NY,NZ X,Y,Z. The strange think is, I get very different performance on two computers: Computer A: Intel Pentium 4 2,8 GHz 512 MB RAM NVIDIA Quadro4 750 XGL 128 MB Driver: 85.96 Windows XP SP2 Computer B: AMD Athlon 64 FX-60 Dual-Core 2,61 GHz 3072 MB RAM NVIDIA GeForce 7900 GTX 512 MB Driver: 85.96 Windows XP SP2 The rendered scene has 1.789.819 Triangles which are drawn indexed with glDrawElements. Number of Patches/VBOs: 9168. Total VBO size: 26,24 MB I also tried a scene with 384.219 Triangles, 70 VBOs with total size of 4,81 MB. On computer A the speed-up gained from VBOs is 250% compared to Vertex Arrays. The performance gain on computer B is just 10% or in some cases 0%! I checked for failures with NV PerfKit 2.0 but no results. The new driver version 91.31 (?) has no effect. :-( NV PerfKit also shows, that the VBOs on computer B are stored in the video memory. So there should be a huge performance gain. I stored the triangle index array also in a VBO - nothing. Is the PCI-Express of computer B so fast, that no performance gain with video mem VBOs is reached? No!: Another test showed, that another PC with the same Quadro card as computer A and the newest driver has the same frame rates as computer A - 250% gain. On a PC with GeForce 5200 FX and 6X.XX driver the performance gain is also very good. But a GeForce 5950 Ultra with 84.21 driver does not render faster with VBOs. All no PCI-E ??????? So it has to be the card? What is wrong with the VBO implementation on the different cards? Or what could be the reason? My VBO code does the following: Init: glGenBuffersARB(1, &name); glBindBufferARB(GL_ARRAY_BUFFER_ARB, name); glBufferDataARB(GL_ARRAY_BUFFER_ARB, nVertz*vertexSize, pVertz, GL_STATIC_DRAW_ARB); Render: glBindBufferARB(GL_ARRAY_BUFFER_ARB, name); glInterleavedArrays(GL_N3F_V3F, stride, (CHAR*)NULL); glDrawElements(GL_TRIANGLES, nIndices, GL_UNSIGNED_INT, pIndices); Any suggestions or comments for this problem? Thanks in advance!
  13. OpenGL How to parallelize OpenGL

    Thanks for the replies. For now it seems that I missunterstand this NVidia presentation or the presentation is not correct. Quote:Original post by ronnybrendel my approach would be to try to keep one thread busy with calculations(and whatever you have to do) and doing blocking rendering routines on the other thread?! im an ogl beginner tho i dont know much about cad, but if you have a huge amount of data coming from RAM, maybe the bottleneck is the bus? not the cpu ... The data is already on the GPU using VBOs. Quote:Original post by ronnybrendel its not possible to parallelize opengl itself, but you can parallelize your calculations ( but they have to be resource-unrelated, or syncronized well ) That is what I am trying to do now. By the way, I already use OpenMP for parallizing, but the final result should be always the same as libary-based threading. OpenMP is just compiler-based and simpler to use. :-)
  14. Hello, my task is to parallelize an existing OpenGL based visualization of CAD datas. It should use the new capabilities of modern dualcore consumer CPUs to speed up the drawing. So I "only" need to create 2 threads (for each core one thread) and render. But this is not possible, because every thread needs his own Render Context (RC) or has to make the main RC his own with wglMakeCurrent(). And this context switch consumes a lot of time and the parallized version is even slower than the sequential. I read in a NVidia GDC06 presentation (http://developer.nvidia.com/object/multi-thread-gdc-2006.html), that the new driver supports multithreading and producer/consumer threads could dispatch commands to the GPU. But it does not work without a RC switch. I also tried that each thread renders half of the data to its own RC and after rendering is finished, the data of the 2 rendered buffers are copied into memory, composed together and then drawn to the front buffer. But this also results in a poor runtime behaviour. After that, I just started a simple OGL demo two times on the dualcore machine. The FPS of one demo were divided by two, if the second demo was started. This is why I think that it is not possible to speed up an OGL application with simply parallelizing it on a multicore machine, because there is only one GPU at all. Have you any other ideas, suggestions, information or related links for me? If parallelizing of OGL is not possible, where could I find some proves or informations about that? In this NVidia presentation they show some interessting benchmarks in this. How do they get those speed ups? Please help. Thanks in advance!
  15. I got it: if(FAILED(hr = D3DXComputeTangent(g_pSphereMesh, 0, 0, D3DX_DEFAULT, TRUE, NULL)) ) ^ The usage index was wrong with the new vertex declaration. Thanks anyway!