• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

152 Neutral

About todderod

  • Rank
  1. The reason he creates a depth-renderbuffer and attatches it to the FBO in the article is so that OpenGL can perform depth-test on everything rendered there, just like it does on a normal framebuffer rendering (read: the screen). You should be able to see similarities between normal framebuffer rendering (rendering to the screen), using pbuffer rendering, and using framebuffer objects (fbo) rendering. These are all part of the same idea: to render things using OpenGL. Framebuffer rendering you set a pixelformat, ie, how meny depthbits/stencilbits and whatnot you want to use. Here you can red back from all seperate buffers (colour, depth, stencil, ...) using glReadPixels etc etc. PBuffer rendering you do the same, you set a format on it and have the same properties as a normal framebuffer, you have colourbuffers, depthbuffer, etc etc, which you can read back using glReadPixels, but you can now also bind the colourbuffer to a texture, to use it in later rendering stages. Using FBO, this is just the same as the two previous methods, except that everything have been more streamlined to render colouroutput into a texture. Hopefully you will see that FBO is no different from how OpenGL operates normally when rendering to the screen, using its depthbuffer algorithms for determining weater or not a pixel will be rendered or not. So nope, not all buffers have to be of any real use to YOU as in needing to read them back, OpenGL can make good use of them even if there ain't a quick way for you to get it into a texture. Also if you do need a depthbuffer, you can for instance use a shader that writes world/object/depth/whatever to the colouroutput and use that as a texture, instead of the actuall near/far clipped Z-values in the depthbuffer. Hope it cleared some things up.
  2. Can't write depth values in shader

    That is because how the depthbuffer is specified, 0 being the near clipplane and 1 the farplane, any value not in that range lacks meaning. Remember that the actual value stored in the depthbuffer is a non-linear transformation of a objects distance from the nearplane, if you need more info about that gogle is your friend. The only "buffer" that opengl will not enforce the [0..1] range (as it must to per specification), are float colorbuffer formats.
  3. GLSL newbie

    2) Im sure that the 2900MPixels/sec fillrate is with a "fragment pogram" that only consists of copying the interpolated colorvalue from the vertex program over to the framebuffer. The more instructions your fragment program consists of the more time it takes before next pixel/fragment can be computed, and thus the fillrate lowers. 5) Not that I know of, but look at NVShaderPerf. Given a shader it will show you what their compiler does with it and give out performance statistics aswell. The program outputs the sheaduled amount of cycles it should take, the amount (and what) instructions is used etc etc. It also estimates a pixel fillrate of your fragment programs.
  4. What defines a Pass?

    Quote:Original post by Yann L Quote:Original post by Name_Unknown Quake3 did something like up to 12 scene passes and could do multiple passes per face(s) as well. You mean Doom3. Please, do not take Doom3 as an example - it's the perfect example of how to NOT do it. Its insane number of passes are a direct result of a design mistake Carmack made (and he later admitted it, yet it was too late to correct): stencil shadows. Using better suited shadow algorithms (shadow map based, today that would probably be something like PSM/TSM), you can easily get Doom3 style graphics with 2 to 4 passes per frame. He was not that far of tho, Q3 engine could do up to 10 "passes" on a frame(I say in quotes here since people seem to be huntung definitions and I sure don't feel like defining a "pass" in this general context :P). According to this article Brian Hooks (id software) at a SIGGRAPH '98 course explained Q3's different passes. Quote: * (passes 1 - 4: accumulate bump map) * pass 5: diffuse lighting * pass 6: base texture (with specular component) * (pass 7: specular lighting) * (pass 8: emissive lighting) * (pass 9: volumetric/atmospheric effects) * (pass 10: screen flashes) Only on the fastest machines can up to 10 passes be done to render a single frame. If the graphics accelerator cannot maintain a reasonable framerate, various passes (those in parantheses) can be eleminated. But for modern engines to do such humongous amount of passes is quite rare. In the good old times(tm) there was so little you could do at once so you had to resort to repeating. Nowdays you can often just smash everything out at once/in parallell. So far the most amount of passes I have had to use myself ever was 3, for a realtime water caustics algorithm for GPU, and even then the third pass could be skipped as it was just used to fake some eye-water refractions. If you have to do *alot* of passes you should really be asking yourself, am I doing the right thing(tm) now?
  5. Quote:Original post by justo edit: err, dont know if you know this, but the GPL is in there, maybe it got stuck in with the linux port...a bit stricter than what you seem to want. Eeek, that is my misstake. GNU Autotools put that file there by itself, I meant to change that to a (for now) empty file. Same with a few other temporary files now I see, such sloppy work from my part. Ignore the "COPYING" file and its contents, it is there by misstake.
  6. Quote:Original post by JavaCoolDude thanks for porting the GUI to Linux, I'll get in touch with you as soon as I get a new build up and running. I fixed the widget problem so now it is working perfectly under Linux, except that the ./configure script is somewhat flawed when it comes to detecting the glpng and pthread libs. I emailed you the sourcetree, and suggest you use that as basepoint for further updates, I don't feel like re-doing all those modifications :)
  7. After alot of brutal messing around got it running on Linux. Not untill it was to late I realised how to use GNU Autotools to in a good way(tm) to use the source structure JavaCoolDude used so I will need to redo it once I get some more free time (I flattened it and placed all .cpp and .h files in a src directory). And hmm, fix the mouse events cause at the moment they are not being sent to widgets properly so can't use any of the widgets really, bargh, but that should not be to hard to fix, just wanted to show of a screenshot running 2 instances of the GLUT test app on Linux before I went to bed. Edit: My gosh I spell bad.
  8. OpenGL Console Window and OpenGL

    Hmm, yes, as phantom said, you would have to do it yourself, OpenGL is kindof a rendering API :D Anyways, assuming you meant a standard Windows(tm) console that was fairly easy, I added one to my own opengl window class, I just roughly chopped out the implementation of the console window here if it would give you any pointers to what to do: (Window:: is my own homebrewed opengl enabled window class) void Window::RedirectIOToConsole(){ CONSOLE_SCREEN_BUFFER_INFO coninfo; FILE *fp; // allocate a console for this app AllocConsole(); // set the screen buffer to be big enough to let us scroll text GetConsoleScreenBufferInfo(GetStdHandle(STD_OUTPUT_HANDLE), &coninfo); coninfo.dwSize.Y = MAX_CONSOLE_LINES; SetConsoleScreenBufferSize(GetStdHandle(STD_OUTPUT_HANDLE), coninfo.dwSize); // redirect unbuffered STDOUT to the console lStdHandle = (long)GetStdHandle(STD_OUTPUT_HANDLE); hConHandle = _open_osfhandle(lStdHandle, _O_TEXT); fp = _fdopen( hConHandle, "w" ); *stdout = *fp; setvbuf( stdout, NULL, _IONBF, 0 ); // redirect unbuffered STDIN to the console lStdHandle = (long)GetStdHandle(STD_INPUT_HANDLE); hConHandle = _open_osfhandle(lStdHandle, _O_TEXT); fp = _fdopen( hConHandle, "r" ); *stdin = *fp; setvbuf( stdin, NULL, _IONBF, 0 ); // redirect unbuffered STDERR to the console lStdHandle = (long)GetStdHandle(STD_ERROR_HANDLE); hConHandle = _open_osfhandle(lStdHandle, _O_TEXT); fp = _fdopen( hConHandle, "w" ); *stderr = *fp; setvbuf( stderr, NULL, _IONBF, 0 ); // make cout, wcout, cin, wcin, wcerr, cerr, wclog and clog // point to console as well ios::sync_with_stdio(); } (I might have ripped this of someone else and modified it to my own needs in the first place btw) usefull includes #include <windows.h> #include <stdio.h> #include <fcntl.h> #include <io.h> #include <iostream> #include <fstream> Don't know if some definitions are missing, but I am short on time now. Provided as is . Edit: one of meny spelling errors corrected.
  9. Pipeline optimizitations are amongst the hardest nuts to crack. I don't have much new to say from what already been said here in essence you will have to test what actually works for you and your application, there is no other way to know. A few general pointers tho, *roughly* drawing from front to back order will indeed as you say help rendering times due to early z buffer rejections, but in reality only if it occluded something that would have been expensive to draw. Your front to back order does not really have to be exact either, if you can save alot of time in sorting by having it roughly, that usually works for the best. Not all objects does have to be sorted either, small and cheap objects that are unlikely do occlude anything would perhaps normaly just be a waste of time sorting them, what you really wanna render first is large objects very close to the camera, occluding as much as possible. However what often can make huge leaps in terms of performance is a better use of hardware resources. Most state changes in OpenGL are extremly expensive, and the often most missused state change would be the texture units. Changing a texture is often very expensive! Thus if you use alot of texture make sure you are are not changing textures ant more than you absolutely have to. Sort your objects by texture usage (this might sound as it goes against sorting by front/back order, but try both, and then try a hybrid approach!). Try merging textures, if you have 4 64x64 textures perhaps they can be put into one 128x128 texture, essentially freeing up 3 texture units (due to filtering it may not always be possible, but for a terrain engine I know someone who got magnitudes of increased performance from texture merging and texture sorting)
  10. Looking pretty fly there. Since my precious Radeon 9800 broke down I am running a oldschool Geforce2 MX with 32Mbyte of RAM, and figured it is nice to give some feedback of how well the GUI would run on legacy(if it isn't legacy, it should be :P) hardware. Getting 250-260fps when the size is the standard of starting the application, maximizing it gives ~100fps (1280x1024). This is a 1.8Ghz Athlon (2500+ with AMD way of rating) Usefull standard widgets as you are saying yourself missing is a textfield and dropdown lisbox, aswell as a normal multiselective listbox. I assume the sliders can be placed vertical aswell? A minor widget that could be usefull would be a image/icon area, wich then would basicly be a un-clickable button as you have it now. I am a firm beliver in making things crossplatform and would love to see this as a _easy to use_ platform independant library (well atleast something more than windows only). Since I know you don't mind sharing code/solutions I know making it open source in some way would not be alien too you, and I dig that. Don't be afriad to ask for help :-)
  11. Quote:Original post by zedzeek u can simplfy your shader a bit ( eg the following line) also fresnel = ( sin(a-b)*sin(a-b) / (sin(a+b)*sin(a+b)) ) + tan(a-b)*tan(a-b)*tan(a+b)*tan(a+b); fresnel = 1.0; // (1.0-fresnel); on nvidia the first line wont even get compiled (ie its ignored completely) since youve overwritten it with the next line (i have no idea what the ati compiler does, perhaps its not that smart?) Unfortenatly that is a typo from me when pasting the code to gamedev. This is what I posted: /*float a = acos(dot(TriNorm.xyz,-lightVector)); float b = acos(dot(-TriNorm.xyz,refract(lightVector, TriNorm.xyz, 1.0/1.33))); fresnel = ( sin(a-b)*sin(a-b) / (sin(a+b)*sin(a+b)) ) + tan(a-b)*tan(a-b)*tan(a+b)*tan(a+b); fresnel = 1.0; // (1.0-fresnel); This is what it is suposed to have said: /*float a = acos(dot(TriNorm.xyz,-lightVector)); float b = acos(dot(-TriNorm.xyz,refract(lightVector, TriNorm.xyz, 1.0/1.33))); fresnel = ( sin(a-b)*sin(a-b) / (sin(a+b)*sin(a+b)) ) + tan(a-b)*tan(a-b)*tan(a+b)*tan(a+b); */ fresnel = 1.0; // (1.0-fresnel); What I meant is that just because ATI handles the "refract" funktion built into GLSL horrendlesly bad (atleast for me), I simply just set the "fresnel" term to 1.0 since I just been to lazy to write code to calculate that refraction (wich is simple). To add the effect on Nvidia the comments are removed and the lasst line is changed from: fresnel = 1.0; // (1.0-fresnel); to fresnel = (1.0-fresnel); Perhaps it makes more sence now ? And yes, that entire calculation can be made much much much more effective, especially if some approximations are made, however for Geforce 6800 class GPU the vertex shader is far from hitting bottleneck, the ~50+ instruction fragmentshader is more likely to block the performance. What that is really funny seems to be the progress of NVidia driver development. In the original post I gave the output of NVShaderPerf for the fragment shader wich scheaduled it during ~51 cycles with Forceware 61.77 drivers. Today I now saw there was an updated version of the tool that used 66.93 drivers instead, and the compiler remains the same for my shader, 51 instructions, however it manages to scheadule it on just 29.75 cycles now instead. Havn't been able to access that machine yet to update drivers and benchmark, but nice to see that the internals in the drivers are still making progress.
  12. 3D Labs Parser Tests come out "success" for both fragment and vertex program. And I wish I was working for a company! This is mearly a academic research project.
  13. After setting up RenderMonkey and *correctly* seting up the environment for my shaders to work in I unfortenatly recieve this: OpenGL Preview Window: Compiling fragment shader API(OpenGL) /Effect Group 1/Effect1/Pass 0/Fragment Program/ ... success OpenGL Preview Window: Compiling vertex shader API(OpenGL) /Effect Group 1/Effect1/Pass 0/Vertex Program/ ... success OpenGL Preview Window: Linking program ... success Link successful. The GLSL vertex shader will run in software due to the GLSL fragment shader running in software. The GLSL fragment shader will run in software. Wich I think is pretty sad since the nvidia compiler manages to compile code that (seemingly) would fit EASY into a Radeon 9800 GPU. *Persuades ATI to make better compiler, or a compiler that gives you more information of what it makes of your code* Debuging shaders unfortenaly is very hard in GLSL when the shader is big, because of driver optimizations. If you don't use something the driver will remove it from the code entirely so, and turning of optimizations yields shaders that are so bloated there is no way to execute them. That is the reason why I showed what instructions my programmed assembled down to using the Nvidia compiler, instructions that in no way should be offencive to such capeble cards such as the 9800 and X800. I belive however you two have inspired me to a few other things I could possibly do to locate the offending part of the project.
  14. Thanks for the reply _the_phantom_ It seems it was so long since I looked at rendermonkey that I must then have missunderstood RenderMonkeys support for OpenGL. What I mean with the error logs is that I check weather or not compilation and linking was successfull, if they werent the log is read back and printed. They give successfull compilation and linking. I am sorry that I was vague and missed out what I mean to write about the fragment shader, I have deliberatly avoided all loops and ifstatments since I know Radean class hardware have troubles with them, and all dynamic branching have been avoided as you need aprox ~1000 spatial pixels for them to take the same branch to gain any speed from it (Nvidia states that and I belive them since it makes sence with how hardware is implemented) and I am no where near that. "branching" is acheaved through float comparison and multiply gl_FragColor with the result to turn force fragment color to vec4(0.0 ,0.0 ,0.0 ,0.0) I however have 2 functions oneliners actually that are there to make a repeated test more readable. I have only used Rendermonkey for 15 minutes now but I am amused that it claims compiling vertex and fragment shading compiles and links successfully, but that they will run in software because "Invalid sampler named...". Wonder how it can think it sucessfully compiles then :) (And it is even more strange since it does work on the Geforce). Most likely a RenderMonkey twitch but it is something new to work on. Would be great if there was some way I could see the resources the shader uses up too (given ATI compiler). Edit: Forgot to mention another fact also now I see. The coordinates for the texture lookup are derived withing the fragmentshader using the fragments positoin multiplied with the inverse screen resolution, but I don't see any problems with that either since I know that works in my testapplications even on ATI cards.
  15. (This is a long post) I am developing a program that makes heavely use of Vertex and especially the programmable fragment shader. AT the start of the project GLSL was choosed as the language for implementation, I have however run into some problem where it on a Nvidia Geforce 6800 card will render at ~15fps, while it using either Radeon 9800 pro or X800 Pro will take about 3 minutes per frame to render. I am quite bedazzled to why it that happend since I don't see any reason for it. I am hesitating to post the actuall shaders as I am unsure to weather or not I am allowed to publish them yet but this is how the system is implemented. Sending data is done through Immediate mode calls at the moment because of the layout of the data that needs to be sent recieved. [Reason I do this follows: I need to send six arrays of vec3 (float) that will remain constant for ~15 vertices only before they change value again (ammount of vertices in the stream is in order of 100k-1,000k), and everything is recalculated every frame, nothing can really be assumed to be constant, and using immediate mode calls to glMultitexCoord3fvARB sending it down through GL_TEXCOORD[0 through 5]_ARB saves me from manually copying them around in memory. Immediate calls acts like "statechanges" in OpenGL for the texcoord streams. SO what I send down (per vertex) is the following: 3float (vertex), 6x3float (texcoords). There is only 1 texture bound, is is a floatingpoint texture rendered with Render-To-Texture extension (internal format is RGBA 16 bit float ATI format, supported by both 6800 series geforce and 9800, x800 ati cards). The render target of the entire operation is a 8 bit texture using render-to-texture extension and a simple glBlendFunc(). [The render-to-texture, pbuffers and texture are working, this has been thourgly tested on both Geforce 6800 and Radeon 9800 card in a slightly different mechanism of sending data and other shaders, just wanted to inform of the situation to avoid any confusions/assumptions] A vec4 containing the view_port and a float containing 1.0/float(screen_resolution) are sent down as uniforms on shader creation. Per frame basis a lightvector is sent down using uniform also. That is how all data is sent to the shaders. The Vertex shader I can post as it dosn't contain any *sensetive* algorithm/method. uniform sampler2D DepthTexture; uniform vec3 lightVector; varying float fresnel; varying vec3 Point[3]; varying vec3 Line[3]; varying vec4 TriNorm; void main() { gl_Position = ftransform(); Point[0] = gl_MultiTexCoord0.xyz; Point[1] = gl_MultiTexCoord1.xyz; Point[2] = gl_MultiTexCoord2.xyz; Line[0] = normalize(gl_MultiTexCoord3.xyz-Point[0]); Line[1] = normalize(gl_MultiTexCoord4.xyz-Point[1]); Line[2] = normalize(gl_MultiTexCoord5.xyz-Point[2]); TriNorm.xyz = normalize(-cross(Point[1]-Point[0],Point[2]-Point[0])); TriNorm.w = length(cross(Point[1]-Point[0], Point[2]-Point[0])); vec3 lightVec = normalize(lightVector); /*float a = acos(dot(TriNorm.xyz,-lightVector)); float b = acos(dot(-TriNorm.xyz,refract(lightVector, TriNorm.xyz, 1.0/1.33))); fresnel = ( sin(a-b)*sin(a-b) / (sin(a+b)*sin(a+b)) ) + tan(a-b)*tan(a-b)*tan(a+b)*tan(a+b); */ fresnel = 1.0; // (1.0-fresnel); } The fresnel is turned on for Geforce 6800 and off for all ATI cards btw since in testing applications Radeon 9800 (atleast on my sytem) executing the "refract" function from GLSL dropped framrate in the order of magnitude 100 times. The fragmentshader used I don't want to paste incase it gets me in trouble but I can explain some properties it have. Using Nvidias shaderperf program it has been possible to see what amount of code it translates the fragment program into and here is the output on on "NVShaderPerf -a NV40 fragmenstsader.glsl" # 51 instructions, 7 R-regs, 1 H-regs -------------------- NV40 -------------------- Target: GeForce 6800 Ultra (NV40) :: Unified Compiler: v61.77 Cycles: 51.36 :: R Regs Used: 10 :: R Regs Max Index (0 based): 9 Pixel throughput (assuming 1 cycle texture lookup) 125.49 MP/s This is indeed a big shader but from what I been able to see it would be well within the resource limits (instruction count wise and register use wise) of that of a Radeon 9800 and X800. The shader only utilizes a very small set of ARB_fragment_program instructions: ADDR (Addition) MULR (Multiplication) MADR (Multiplication + Addition) MOVR (Move) (just a few) RCPR (1.0 / x) (only 3) RSQR (reciprocal square root) (only 1) DP3R (vectorwise dot) (only 1) SGER (greater than comparision) (only 2) SGTR (another greater than comparision) (only 1) TEX (normal 2D sampler with floatingpoint target) (only 1) Wich means the shader is basicly loads of additioning and multiplying. Is there anyone with the information I have given would know if anything I have mentioned above would make a ATo card (9800 or X800) go into sowftware mode for this ? It is mildly annoying as in the project NO Nvidia specific extension have been used, only ATI one (for the floatinpoint texture) and remaining ARB. The project runs like a charm on the Geforce 6800 card but not on 9800. Even ruling out Everything I posted as correctly programmed for ATI cards would be greatfull as I have now spent well over a week's work trying to figure out what went wrong. The latest ATI drivers (4.12) are being used btw Does the ATI cards have problems sending float data through texcoords? Are the ATI interpolators given to much work ? Is the ATI compiler failing to produce code for the fragmentshader that stays withing resource limits ? (I get NO errors through OpenGl or the log of any shader), it renders correctly on the ATi cards, but with 1 frame every 3-4 minutes instead of the geforce 15 frames/second. I really really would like to run this on ATI cards before I try and get it published (and after that open up all sources for public viewing ofcourse!) On a sidenote I wonder why ATI does not produce any public tools to aid developers with GLSL (shaders in generals even) on their cards like nvidia are doing (cg and nvshaderperf)... Edit: Noticed it was source and not code-tags on this forum. Edit2: The code was incorrect, a closingtag for a comment was missing. Thanks for reading this huge post, but I have gone blind from starrying on the problem for so long now. [Edited by - todderod on January 11, 2005 7:14:30 PM]
  • Advertisement