Jump to content
  • Advertisement

Z01

Member
  • Content Count

    295
  • Joined

  • Last visited

Community Reputation

134 Neutral

About Z01

  • Rank
    Member
  1. Thanks. I've also been looking at the structures-of-structures layout as an alternative to AoS or SoA, maybe I'll fiddle with that.
  2. I had a look through the Intel optimization manual a few days ago but I couldn't find anything specific, but you're right, maybe I should try the architecture manual instead, it might discuss such things more explicitly. And then there's the AMD manuals which might have something. The fact that the intel manual doesn't mention anything about the number of arrays/streams (that I could find) makes me think its a non-issue (or a trade secret).
  3. I'm thinking of converting an array-of-structures to a structure-of-arrays as an optimization in some SSE code. Its usually a good idea. I'm concerned though because the structure would be converted to 22ish different arrays. Is there a limit to the number of arrays that a CPU prefetcher will work on? (ie. is there a limit to the number of memory access patterns the prefetcher can remember, note that I'm *not* asking about the number of prefetches in flight) Obviously the thing to do is try it and measure the performance difference. The problem is that I figure it will take about 3-5ish days of work to change the code around and I'm wondering if there's some hard limit on the number of prefetch prediction patterns that might mean I'm following a dead path. I thought I read something once about such a limit but I can't find any info on it now. Concretely, (as an example) say I'm trying to parallelize 5x5 matrix inversion. Suppose I have an array of 1 million Matrix5x5 I want to invert. My question in this case is would I prefer the Matrix5x5 to be handed to me as a single array-of-structures or a structure of 25 arrays for SSE processing? eg. perhaps this would allow me to eliminate a lot of SSE shuffles by processing 4 matrices at once.
  4. This is the kind of things I've read (from http://mindprod.com/jgloss/jni.html): Quote:A JNI call is very slow, in the order of .5 to 1.0 microseconds, the equivalent of pages of linear Java code to do a simple method call. You would think there would be a tiny generated machine code thunk to bridge between Java and C. Not so — at least in any JVM I know of, Java branches to some general purpose code that interpretively constructs the C parameters. This code is not highly optimised. It seems Sun wants to strongly discourage you from using native methods just for speed. This means you don’t want to hop back and forth between Java and C, but rather to go to C, and stay there a decently long time before returning. This means that you can’t use C to speed up short operations, only long ones, because of the overhead tacked on in getting to C wipes out any savings. Not just there, but in many places. The paper I linked and other papers I've found have numbers in them to back up their claims. However, none of the information I've been able to find is terribly recent and there have been optimizations going on to JNI. Some JVMs handle it a lot faster than others. I'm trying to find benchmarks to figure out which JVM to use. The latest news I could really find was a Sun Java developer on this thread http://mail.openjdk.java.net/pipermail/discuss/2008-March/001118.html: Quote: Some members of the Java2D team would incline to respectfully disagree with the assessment that JNI is "that" fast =) Jim Graham wrote an extensive JNI benchmark (which tests method calls with different types/number of arguments, Get/Set methods, GetPACritical, etc), and ran it against multiple Java releases. We have an internal page with the results. Unfortunately we don't have the bandwidth to get the benchmark out. The net result is that while the jni performance had improved over the years (the amount of improvement varies from platform to platform) it is still not satisfactory in many areas. Our per primitive cost is still mostly consists of jni overhead for small primitives (think fillRect(1x1)).
  5. I`m investigating using C++ for the main application and Java as a scripting language. Ignore the whole static versus dynamic typing and learning curve arguments - I only want to talk about performance. Java is fast and C++ is fast but JNI (the interface between them) is notoriously slow. My research with Google has come up with numbers like: *calls from Java to C++ are 2-3 times slower than Java to Java calls *calls from C++ to Java are 10-20 times slower than Java to Java calls But all is not lost... some smart people are working on improving this situation. For example there`s a paper called `Inlining Java Native Calls At Runtime` (link: http://www.usenix.org/events/vee05/full_papers/p121-stepanian.pdf) that talks about ways these bad performance numbers can be overcome. Given this paper was written 4 years ago I would have expected some of the Java implementations to have now reduced their native call overheads. Given that the paper was cowritten by IBM employees and they tweaked the IBM Java compiler and JVM, I`m guessing that one or more of IBM JDKs might have these optimizations in them. However, as far as I can tell, IBM licensing agreements on their JDKs don`t make them free to use for commercial purposes. This brings me to the point of my post. I am wondering if anyone else is aware of any *recent* JVM comparisons where Java to C++ and C++ to Java calls have been profiled. Or perhaps someone has practical experience along the lines of using Java as a scripting language from C++. I am trying to find a free JVM that has a low overhead going both ways between Java and C++.
  6. Z01

    Mipmapped SATs

    Okay, I understand what you mean by averaging (basically use an AAT = averaged area table instead of a SAT), but I don't understand whats the problem with SATs and mipmaps.
  7. Is it possible to mipmap summed area tables (SATs)? Naively, you would expect something like this to work, but would anyone have any references to this being tried before? All the references I can find to SATs don't mention mipmapping. I'm trying to reduce the memory used by SATs in a software renderer.
  8. I'm trying to figure out how to query the amount of video memory on the card using DX. I'm coming from a GL background here and I'm not totally familiar with the DX API. This is what I've found so far: In DX8/9 I can do this: int AvailableTextureMem = g_pd3dDevice->GetAvailableTextureMem(); According to the docs, this returns the amount of card memory plus the amount of bios reserved video RAM, which is what I don't want. I've noticed that in DX7 you can do: DDSCAPS2 ddsVidMemcaps; ZeroMemory(&ddsVidMemcaps, sizeof(DDSCAPS2)); ddsVidMemcaps.dwCaps = DDSCAPS_VIDEOMEMORY; hRet = g_pDD->GetAvailableVidMem(&ddsVidMemcaps, &dwVidMemTotal, &dwVidMemFree); to query the exact amount of video memory. Is there some DX9 structure member that I've missed somewhere that gives me this information, or do I have to use the DX7 interface?
  9. That sounds what I was looking for, thanks a bunch. Btw, the memory is allocated on the heap for a pooled memory manager.
  10. How can I explicitly call the constructor for a class? template <class T> class Allocator3D { .... // Returns NULL on failure T *Alloc() { if (!FreeList) if (!CreateNewChunk(SizeIncrement)) return NULL; T *ret = FreeList; FreeList = Next(FreeList); ret->T::T(); // compiler gives error about class not having a T() function return ret; } .... };
  11. I was reading this article here on how the next big programming revolution after OOP will be parallel computing: http://www.gotw.ca/publications/concurrency-ddj.htm I thought it was really interesting and was wondering if anyone had any good book recommendations for further reading? Has anyone come across any books related to game/graphics programming algorithms & design patterns on multiproc machines? This is coming from a programmer with over 4 years of C++ experience and familiar with multitheaded programming.
  12. Does anyone know how to setup a Norton's Internet Security (2005) firewall rule so that I don't have re-setup the firewall settings each time I rebuild my application? It seems the firewall rules that Norton's sets up are attached to a specific executable plus some other kind of data (like the date the exe was last modified or some kind of checksum) that changes everytime I rebuild it... I typically rebuild my app like 10 - 50 times a day, and I'm starting to consider removing Norton's altogether because it just harasses me so much. What I really want to do is setup a firewall rule that simply allows an executable/DLL with a specific name in a specific directory to use sockets, no checksums or date checks. I would even settle for a rule that allows all applications in a specific directory to use sockets. I've looked on Symantec's website and did some google searches, but no real luck.
  13. I've written a antialiased floating point software renderer that uses an a-buffer (that's an antialiased, area averaged, accumlation buffer, NOT accumulation buffer) type mechanism to do pixel subsampling and I'm having problems where you have triangles of two different materials of a mesh meeting. Sometimes bits of the background shows through on common edges. I suspect this is because of floating point round off errors when determining which of the subsample points belong to which triangle. If you have an edge that's shared between two triangles with different materials due to floating point roundoff errors sometimes a subsample point belongs to both triangles or neither triangle. Does anyone have any hints/experience/references on how to resolve this kind of problem?
  14. Suppose I had a Matrix class, and I need double and float versions of the class, but each one has different SSE/SSE2 optimizations for certain methods that they can't share. Using 'class template specialization' I can create a whole copy of the Matrix<> class, one for floats and one for doubles, but if I do that, I'm going to have 2 copies of the same code, so I might as well code up separate Matrixf & Matrixd classes. Is there anyway to specialize the methods of a template class so I can have separate float and double versions? If I call an external template function in the member function, I can specialize the external function... so I might be able to do it that way...
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!