• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

193 Neutral

About torakka

  • Rank
  1. I let a video speak for itself. Popping or not? You'll be the judge. This is done using the technique I presented, old video, recorded 3+ years ago. If I were to do this today shadows would be first thing in the priorities as the amount of detail that can be rendered is already a solved problem and not very interesting. http://www.liimatta.org/misc/scape1.mpg The map is 8192x8192 fixed grid, the limit for size of the rendered area is memory not performance, it caps even on older/slower hardware to the refresh rate of the display.. hence the generation of data on-fly and caching it is interesting direction of development) The data is cached though, so only the stuff nearby is in GPU's local memory in higher detail. Stuff is dropped out of the cache when new higher detail blocks are requested. This version doesn't do level of detail in reverse direction, it only does use block sizes from 8x8 to 256x256 (this is to keep the index range as 16 bits), if the block size falls below 8x8 it would make sense to combine four blocks in 2x2 configuration into single 8x8 vertex block. If that is done the view distance would be nearly infinite. The problem would be filtering as high frequency samples wouldn't stand out as intended. That might be a problem, but as I haven't done THAT I don't speak out of my ass. What I write is a tried and true fact. Thank you. But all that is irrelevant. It doesn't really matter how good the code is as long as the job gets done. All that is needed is good graphics. That's the only thing someone else will be able to see. Only bad code will stand out, if the performance is bad. When it's good it's not interesting or does not stand out anyways. "OOoooh a GREAT ENGINE!" -comments usually translate to: "I like the textures" or "The modeling work is pretty good here." .. fact.
  2. Take advantage of vertex streams, don't use fixed struct. The heightfield is a grid, so have fixed blocks of various sizes: 64x64, 128x128, 256x256 and so on. The xy coordinates for each block are *shared*, you only change the *offset*, this can be a constant over the whole rendering call, eg. will move the block where you want it to be. The height data for each lod level is stored like this: float height0; float height1; Store two heights. One for the current level, and one for the next bigger level. What you will be doing is to interpolate between the "base" level and the next level. When you read the next level, you switch to more detailed (or coarse) lod level. The change will be extremely smooth, no popping AT ALL. Very smooth. Only information you need to store are the heights. You won't redundantly store the xy at all. You are also free to compute those if you have integer index inside the grid. Whichever floats your boat. This all is trivial. The interesting bit is lighting. When the resolution changes the lighting resolution will change. How you hide that? Use textures to encode the lighting specific information. If you want normal maps anyway, this is a very good reason to want to use them. Normal maps are textures and their resolution (and filtering) is independent from the geometry resolution. The filtering can be tricky with normals maps. You can compute the interpolation factor for each vertex using a trivial attenuation function. However, it might be trickly to map it just right from 0.0 to 1.0 for the base lod range. Easier way is to use same interpolation factor for the whole "tile", but that approach has a caveat you have to work around where the tiles meet. If you do it per vertex you can ensure that shared vertex positions where the tiles meet are contiguous. Two different lod levels meeting at the "edge" is usually handled by having different tiles for each case for each edge, or patching tiles. If you store *three* height levels for three different lod levels (-1, current, +1) then you don't have to worry about that either but it will burn three height values per vertex. But it's still only three floats, compare to storing (x,y,z) and the storage is the same. Now (x,y) is synthesized or streamed from vertex buffer which is shared between all tiles and the height values is product of two or three discrete height values! This technique works like a champ and isn't any "great idea that might not work", I implemented this and can say for certain that it's very good technique. I have not been in the game or engine programming business for a while now and have no use for this technique whatsoever anymore. Much more interesting problem is the texture management. You can't just dump a lot of texture data to the API and let it sort things out, for a large world you run out of memory and storage. Synthesizing the textures might be a good way to proceed: keep the generated textures cached for fast use, but when you need more data synthesize it using the GPU (render to texture is useful :) If you can synthesize the height data aswell, even better, this means you can therefore also synthesize the normal map. For efficiency you want indexed primitives, a lot of vertices are shared in a grid like mesh landscapes tend to have. Also, with indexing you can use different parts of a single vertex buffer and reduce the vertex buffer configuration overhead. You can also store all lod levels into a single vertex entry, and use the lod to choose which to blend or use cool quadratic or similiar blending function (but this is much more expensive ofcourse). Two or three values is a nice tradeoff and makes putting lod levels next to each other much simpler. Warning: you want to ensure that lod level won't change too rapidly within one block or the fusing of different lod levels becomes easily non-trivial. The two-height approach with patching tiles is the easiest to implement as you choose the base level per tile. There are many ways to do this and finding the favorite can be a lot of fun!
  3. >I'm fairly certain that putting anything in namespace std is forbidden by the >standard. Even writing "namespace std {}" is illegal. Only the standard library >itself may open that namespace. You are supposed to put the std::numeric_limits specializations into std namespace. Recommended reading: ISO IEC 14882.
  4. Do you return const references?

    In programming the small things like copying objects adds up. If you have expression: float3 n = normalize(a + b * length(c)) + cross(a - b, a - c); Suddenly HALF the work is copying values around, not actually doing the computations which we intend to to. The ONLY thing here we are interested are n.x, n.y and n.z - the faster we get them the better we are off. This case doesn't give much leverage for using const reference return values but that's life. :) One important factor to notice for this thread's sakes is NRVO (Named Return Value Object), it wasn't until fairly recently that Microsoft products started implementing this optimization strategy. They still don't do it perfectly, experimentation to find the facts yourself (tm) is advised. Also pay heed that on different compilers like g++ the rules are again different, it all depends what tools you use. And another warning that should be said is that when again new Microsoft compiler comes out thigns could turn around once again. The *safest* thing to do is to do the RIGHT THING (tm) to begin with. Then you're not at the mercy of the latest compiler so much. Good, solid code rarely goes slower with the latest compiler upgrade so you guys should be safe when you do the RIGHT THING (tm). It's up to everyone himself to figure it out what the RIGHT THING (tm) for them. If someone wants to return by value when const reference would do the trick just tandy, that's their call. So what if the value is "optimized" out, that's on the compiler you use today what about tomorrow? What about other platform, other compiler? x86 is a common compiler am I wrong and Windows is a common platform, right or wrong? Likewise, why care about petty things like endianess or alignment, why bother, those always worked out for my (insert name of my windows application here)
  5. color masking?

    It might best to generate the alpha where you generate the original graphics and store the alpha in a fileformat that supports this. You can do colorkey->alpha conversion easily in software yourself but the result will be on/off toggle 1-bit alpha, not very good, I'd let the artist control what he wants the graphics to look like.
  6. C++ operator woes

    inline Vector3D operator + (const Vector3D& v1, const Vector3D& v2) { return Vector3D(v1.x+v2.x, v1.y+v2.y, v1.z+v2.z); } Try that.
  7. haegar, there's a "bug", but luckily it won't have any side effects besides just using a bit too much memory: char* convertedData = new char[size * 3]; It might be easier to avoid the error if the size is renamed to "numPixel" or "pixels" or similiar, now from this line it's not apparent that it's not number of pixels in the image but number of char's in the image. :) FWIW, the conversion snip I gave is fast and precise. I'm invisible it seems. ;)
  8. There's many ways to skin a cat, but keeping the precision is a good way. uint16* s = ...; // src image in RGB565 uint32* d = ...; // dest image in ARGB8888 uint32 v = *s++; uint32 u = ((v & 0xf800) << 8) | ((v & 0x07e0) << 5) | ((v & 0x001f) << 3); u |= ((v & 0xe000) << 3) | ((v & 0x0600) >> 1) | ((v & 0x001c) >> 2); *d++ = 0xff000000 | u; This writes into 32 bit ARGB8888, sorry. It's trivial to change to write into RGB888 (24 bit) but it's not very good thing as writing to buffer in such format is not very cool. You have to write byte-at-time (otherwise the destination buffer might overflow by one byte, or the alignment might slow things down or crash completely-- think ARM and some RISC architechtures, on x86 misalignment costs performance with ALU instructions and is error with some SIMD instructions, YMMV anyhows, best bet is to write byte-at-time which sucks =) One possibility with 24 bit writes is to unroll the loop in specific ways so that can write 16 or 32 bits at a time, again, this adds complexity obviously. That said, the "conversion" might look fairly complex at quick glance. We could just shift and mask, but that is a problem as someone pointed out with the precision range. 16 bit ARGB4444 is a good example: a full value of intensity for a channel there is 15. The target precision is 255, let's look at this in binary: 1111 -> 11110000 The resulting "maximum" intensity is, duh, 0xf0 .. this is way off from 255 (0xff, the full intensity for a channel). In practise the image would become darker, often not what you want. Cheap workaround for this is to replicate the most significant bits, in other words, repeat the bit pattern. If we "name" the bits: ABCD <- 4 bit value ABCDABCD <- 8 bit value, notice the pattern repeated here .. and that's what this code snip is doing. Easy way to invoke this code is to use my little image loading library (which does other neat stuff, too): surface* so = surface::create("bla.png"); so->reformat(pixelformat::rgb888); uint24* image = so->lock<uint24>(); so->unlock(); so->release(); // thank you we're done here The pixelformat is much more flexible: it knows about floating point color, YUV and compressed surface formats like DXTc, YUV overlays and such things. We're not limited to some built-in formats here, the pixelformat::rgb888 was just a pre-built format, you can do this just as fine: so->reformat( pixelformat(24,0xff0000,0x00ff00,0x0000ff,0) ); That would achieve the same thing as using the built-in format above. Since your goal is to actually just save the buffer you already have, you can do this: surface* so = surface::create(width,height,pixelformat::rgb565,imagePtr,imageStride); so->save("screenshot.jpg"); Eg. if you just know the pointer to the image memory (imagePtr), you can create surface reference (like above) which only POINTS to the memory you provide. Then you tell the API to save the surface and file appears in the HD. It doesn't touch the pixels in your source image, so they will retain in RGB565 or whatever format it is you have. Or you can create the piece of memory using the API to begin with and then avoid the create overhead (which is neglible but adds to the size of the sourcecode :)
  9. Return array to use in glColor3fv

    If the data doesn't change over time, it's best to store it in display list or vertex buffer object (VBO). This way the driver can store the data in graphics card's local memory. Desktop graphics cards are connected to the host system with PCI, AGP, PCIe and other low-bandwidth buses, the local bandwidth can be 20-30 GB/s which is a big benefit. Also, when you use glDrawArrays() or glDrawElements() you lose the overhead of glBegin, glVertex* and other calls. These calls must *build* the dataset dynamically. The graphics processor (GPU) works asynchronously with the CPU so it means the GPU has to be told what to do. This is implemented as a stream of commands to the GPU. More commands, slower the rendering. The immediate mode (glBegin/glEnd paradirm) usually dispatches the drawing command after glEnd(), when all the required data is collected and processed into format that can be handled by the GPU. This is significant CPU overhead. Then the data has to be transfered over the bus to the GPU's local memory. This introduces latency, the command cannot start processing until it can guarantee that all data is ready to be used. Ofcourse a good driver architechture doesn't poll but rather works async internally aswell. So there must exist a buffer for commands inside the GPU. Now, if the buffer is filled with commands that use data which is already sitting there in the GPU's local memory, the commands can execute without delay. This is important, because OpenGL specification has a strict drawing order, it cannot process a draw command before other even if all the data for later command is already present: the GPU has to stall until it can process the pending orders before the one that is ready to go as it were. Look at what parameters are in glDrawArrays() and glDrawElements(). The glDrawArrays() is minimal amount of data to transfer. glDrawElements() requries transfer of index array, but that can be eliminated if the array is placed on a VBO, then the overhead for this call aswell is minimized. You really don't want to use glBegin/End paradigm. It's inefficient. Also, if you steer away from it you get additional benefit: OpenGL ES doesn't have immediate mode so your applications will work on mobile graphics hardware with less modifications to the sourcecode. About datatypes. The OpenGL GLSL 1.10 shading language pretty much encourages the use of "big" datatypes. Think of "int" and "float" (and to a degree, "half", aka. OpenEXR float, or float16). Internally, you could think, that vec4 means 4 x 32-bit float .. ivec4 means 4 x 32-bit int and so on. This means that internally, these are the formats the data is converted after reading from arrays or immediate mode "packets" (I encapsulate the informaiton regarding implementation detail with this term in quotes ;) The biggest thing about datatypes is savings in bandwidth and storage. The rule-of-thumb is that your data is going to be converted to either bool, int or float for the vertex and fragment processing (accuracy withstanding, can be modified with mediump, etc. keywords as far as the hw implementation will leverage) If you have 32-bit colors in, say, ABGR8888 format it makes sense to store it as such in the arrays. There is absolutely no benefit in storing the color in 4xfloat even if that is what it will be used as in the vertex program. The reason is that reading from arrays in different formats is implemented in silicon. If you don't use the type conversion then the transistors implementing this functionality sit idle there in the GPU. It doesn't make any difference from this point of view if you do conversion or not. Where it DOES make a difference is caching schemes inside the GPU. There is large amount of "slow" DDR memory. Be it DDR2, DDR3, DDR4 of any flavour you can imagine or whatever they invent next. But there is always faster on-chip memory which is not very large (if it were the chips would be even larger and consume even more power and would require even more powerful cooling.. the cards are noisy enough as they are ;) If your data is 32 bits wide, keep it in that format. Don't explode it to 128 bits as no additional information is stored. If the implementation is smart, as you can think it is for companies which been at this for a decade now (or more), it will keep the data as small as possible as late as possible. The byte-to-float, short-to-float etc. conversions are practically free. The cost been paid already. The transistors are there already. It's not a free lunch but it's pre-paid lunch. Take advantage of it. Make your application more efficient. You may not notice it on the latest super-duper card if your load isn't too high, but bring the same work to lower-end graphics chip and you might begin to suffer. Thinking in hardware terms like GPU is different from thinking in software developer's terms. A software developer, if not thinking things through, will automatically assume that "there is type conversion, it must be instruction of some sort" and that leads to thinking that it's like a CPU program that this instruction is called in sequence of other instructions. It doesn't work that way. It's a lot more complicated than the example I am going to give, but bear with me it is only for illustration purposes. Let'a assume we have a hypothetical OpenGL implementation. We have glDrawArrays() call which we are implementing. We have arrays which have stride and base address and datatype (GL_FLOAT, number of elements, that sort of thing). We store this stuff into VBO. We can choose how to implement this. We have all the power in the world. We are GODS of our own creation. We want efficient hardware. We want good memory bandwidth. So. We store the array in as small footprint of memory as possible. We want to *pack* different arrays into SOA format so that when we want one "packet" of data (namely a vertex), we want to do that with as few reads as possible. If we scatter the data around the GPU's local memory that means we need more memore read requests. Bad. Even more bad is that we cannot keep so many locations in cache at the same time as cache uses fixed size blocks (there are reasons for this aswell but I am going into 100 tangents already as it were ;) Since we packing the data, it makes sense to keep it as small as possible aswell. So we don't blow up byte to float and so on. We want circuitry in the chip which can "decompress" these structs which are dynamically configurable. So, we need unit in the chip which can do this decompression. It returns, say, "4 x float" data presentation to the processing units implementing for example vertex shader. After and even before this point it depends heavily on the architechture what we really want to do. The GLSL (and Cg/HLSL) design drives the hardware design to specific directions but it doesn't mean all designs will end up identical. There are many ways to skin a cat. But some things are same for all: memory bandwidth and footprint issues. For everyone the fast on-chip memory is expensive, off-chip stock memory is cheaper. These facts are same for all, they drive the design. These facts drive the way what is sound software engineering practise to drive the hardware. Don't promote datatype prematurely. But don't be shy using float arrays either when you need the precision and range.
  10. Return array to use in glColor3fv

    You could do: struct color3f { float r,g,b; }; color3f bla(...) { ... Or: void bla(float* color, ... ) { color[0] = ...; ... OpenGL also has functions to use small than float components for the API calls. This saves space, why store 24 bit pieces of data into 96 bits? That's waste of bandwidth & memory. glVertex*() calls are not very optimal either. use glDrawArrays() or glDrawElements(), if you use these with VBO you should get nice speed. Unless you use display list. ;)
  11. > As for malloc and the structure, > please let me know what sophisticated developers use. new/delete?
  12. Bytesort vs. BitSort

    Don't do pass bit bit if you wind up with 8 times the number of passes. Think of it in these terms: what does it cost to do each pass multiplied by the number of passes. You'll end up 8 times more passes, which are *more* expensive too. Highly likely to be much slower. ;) If you're concerned with memory usage, you don't have to have 256 bins times the size of the array you are sorting. You can have number of items array, then have 256 pointers to this array (beginning of each slot). What you do is one extra pass in the beginning: *count* the slots! Then do a second pass which assigns. It's not worth the effort to dynamically grow the bins ala' std::vector. Either do the counting pass or just have large bins (eg. memory usage vs. memory footprint tradeoff). Timing with your data will tell what's fastest. Don't forget to compare with the common quicksort for example, it may be a lot slower or a lot faster. It depends on your data too. Time and check, that way you'll know your options and there's no guessing.
  13. I hope you take into consideration that the mesh data is in world coordinates in .3ds files, if you want model coordinates you want to transform the vertices with inverse of the node xform.
  14. Bitwise Operations

    CTar: "Would you mind explaining that? I have rarely seen bitwise operators mentioned in actual computer science books (not programming), and I have never heard of anyone describing them as the foundation of anything." You might be interested in reading the "Digital Fundamentals" by Floyd. You might have heard of these "transistors" they speak of? With these tiny little things we can implement binary logic operations like XOR, OR, AND, NOT and so on. These are the BASIC building blocks of computer chips. Every instruction your tiny little processor executes is *implemented* using these things, I assure you they exists and it is NOT magic. I repeat: it is NOT magic. It's mathematics, electronics and other sleight-of-hand the hard-working people at Intel, AMD, ATI, NVIDIA and other companies are using to steal your hard earned money. ;) It's a mild statement that to be a competent programmer worth the title to understand atleast what the hell all this is based on. OH, and these are also implemented as instructions in most generic purpose CPU's for the reason that they are very useful for computing all kinds of things your imagination might conjure. What you think happens when you write expression such as: if ( a && b ) { ... } (where a and b can be sub-expressions, the emphasis is on the && operator) The point being that you will be hard pressed to find actual "&&" (logical and) instruction in the instruction set in the architechture you might be compiling software for. Take a wild guess what might be happening "behind the scenes" ? These things are all around the topic of computers and I'm baffled that I have to even mention this in a *programming forum* of all things. Geez.
  15. why!? why!? WHY!?!? bitwise hell

    > Explain to me how if((2 + 6) & (3 == 4)) can ever mean anything useful? Depends on how you get to that expression in the first place. After that point it is a constant expression and can be evaluated at compilation time to the value of zero. It would make more sense if symbolic names were used, that sort of thing is done all the time. In that case it's context specific. The only possible problem with that I can think of is that if there is simpler or more descriptive way to do the same thing why it wasn't used and why should I care?
  • Advertisement