• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

224 Neutral

About janos

  • Rank
  1. Whether you use occlusion culling or not is really dependent on yout dataset. Since, even the faster raster renderer is O(N) at best in number of primitives, (or even batches), growing your world complexity will result in an inverse linear slowdown. Now consider an indoor fps or a racing game seen at street level in a town. What you really see tends to reach a limit as you increase the size of your world and its complexity. (that is if you avoid cases like hilltops with whole city views) Rasterization with no early sub linear geometry culling is eventually going to fail (and it does), as you always have to scan your object dataset. Culling will avoid having to even consider some objects to render. I know for most of you, what I'm saying is already unsderstood, but I'm saying this for those who think brute force hw will solve the issue. It certainly helps, but it does not stand too much complexity. hw that does real occlusion culling will have to know of the scenegraph in some way, like raytracing hw would.
  2. Mipmapped SATs

    Hmm, yes you could indeed use SAT, I'm sorry. but then you have to use pixel area and not UV area Actually, if you have a SW renderer, you will manage with either one (SAT or AAT).
  3. Serenade, What I meant about hw support is this: When the hw wants to read a memory region, it must figure out its address in local vmem, since it only has a virtual address available. Then it must be able to trigger a page fault asking for the driver, or a on card memmory manager to update the memory from the sysram to the local vidmem. Of course, you could take a conservative approach and have the D3D runtime or the driver conservatively update what might be needed to the video memory before it is used (eg, upload a texture and all its mipmaps to the vidmem before drawing a primitive that might use it), but this method has two pitfalls: First is must make sure that all the texture and all its mipmaps are uploaded before any use, even if only one texel of the texture is ever accessed not present in local vid mem. Then, it does not help parallelism, since without a page fault mechanism, the raster has to wait for the full upload. This results in loss of raster and shader / memory upload transfer parallelism and in a waist of local vidmem. Then again, I agree that with you on the fact that D3D could at least offer this in their API, and have a non driver conservative fallback for cards that do not have vm mechanisms. I believe DX10 WGF1 or whatever will offer this. [Edited by - janos on September 29, 2005 9:08:04 AM]
  4. Mipmapped SATs

    Is would work very fine, is you have an averaging SAT (eahc pixel value is the avg of all the pixels in the corner before it). This way you will get the same values (albeit not at the same precision) on two different mipmaps. And don't use anything else than a normalized filter (box is preferrable in this case) to crate mipmaps.
  5. I have tried SH lighting for the terrain, but just for fun. It is not practical since the terrain can definitely not be considered as a point like object with a far away lighting environment ( aprerequise for PRT lighint, at least in the form we are talking about), because of all the objects on it.
  6. I am the 3D R&D guy on Atari's ActofWar, and, although I can not go into detail, here's some insight: Partial lighting precomputation is used for the terrain. Models use per vertex or per pixel SH PRT lighting. Dynamic lighting with extra render passes can be added or removed, depending on HW. I tried using far lights as a contribution to sh lighting, but the effect of this are quite minor, because you rarely have very large vright light that are not the sun. The sun (and far away direction lights) is treated separately from the sh lighting, with dynamic shadowing, specular, etc... And ambient sky plus backlights is used to tune the fixed SH contribution of the sky. The main problem with having precomputed lighting on a level, is that it produces large maps that are not suitable for making small downloadable levels. I hope this helps.
  7. pixel shader compile error

    Try this : Make the pixein and verteout structs the same that is include a float4 pos : POSITION in pixin, even though you will not be able to access it. BTW, I tested your code as is on Cg ps_2_0 profile and on HLSL, and it works fine.
  8. pixel shader compile error

    Just write In.dif * tex2D(BaseTexture, In.tex.xy) + SomeColor; note the .xy at the end of In.tex that should do it (that is because Cg supports an overloaded version of tex2D that uses a float3 as texcoord) cheers Janos
  9. ->Oxyacetylene The speed difference is not better with strips, not even by a percent. You still need to run a cache preserving reordering of your triangles to optimize post VS cache usage. If you are running a card later than a 9800Pro for example, you should be able to yield more than 25~30 Million triangles per second with a rather simple VShader. If you are running at 10~12MTris/s and that triangle trhoughput is your bottleneck, the go ahead and optimize that. If you yield significantly lower triangle throughputs, then VS and vertex cache are not your bottleneck, so you'll have to look elsewhere for memory thrashing, high batch count, video memory leaks, etc...
  10. I just want to mention that I have gone through the trouble of benchmarking this : strips do not produce any perfomance benefit compared to cache frendly triangle indices. (that is on any hw after the GF1, I have not benchmarked this for previous hw). They can even hurt performance, because long strips bring the current triangle far from the ones in post VS cache. Then again, you might have another specific reason to want strips. Here's something I noticed though : drivers usually store VB and IB in AGP memory since vertex throughtput is rarely the limit. You will therefore be AGP bound if you try to test vertex speed limits. Janos
  11. Cg has had many codegen bugs (and still has AFAIK). You'd better get used to looking at the generated asm code :( (the game I worked on was full Cg) compare both generated code; they shouldn't be too different, so the error is likely to stand out. Good Luck, Janos
  12. [c++, asm] A slow stack?

    the fact that the stack is not a part of the cache is not a problem, because it "becomes" a part of the stack when you use it. Generally speaking, the levels of the stack that are farthest from main() always reside in L1 cache. That means that the stack always runs as fast as it could. In your example, you are definitely L1 cache or instruction bound. all the memory you use (since you are touching it 1000 times, and it is not a large amount of memory) resides in cache.
  13. kd-tree building

    I'm sorry, I'm not allowed to post code for this, but the pseudo code I wrote above covers this instead of having new lists each time, you only have a start index and a count that will map to your two "sorted" lists. i put sorted in quotes, since they will end up unsorted because of the split process. anyway, the invariant you have at each entry of your recursive split function is that Lx[startindex...endIndex[ and Ly[startIndex..endIndex[ do point to the same set of elements, and each is sorted according to its own axis (Lx[startindex...endIndex[ is X sorted and Ly[startindex...endIndex[ is Y sorted) the notation Lx[startindex...endIndex[ means the range of Lx that starts from startIndex and that ends with endIndex-1
  14. kd-tree building

    You are recreating vectors every time you perform the subdivision. You can manage by using just ranges within vectors as I suggested. Reduced allocation cost and data move should help with speed. Tell me how fast/slow you are going, and I'll tell you if it is consistent with the resuls I get ok? btw, alternately subdivideing x and y is not the right solution, just like median split is not the right solution, because it may not match the anisotropy present in your data.
  • Advertisement