crowley9

Members
  • Content count

    115
  • Joined

  • Last visited

Community Reputation

226 Neutral

About crowley9

  • Rank
    Member
  1. object culling

    DX10 level GPUs support predicated occlusion culling that will allow you to schedule one drawcall that depends on the results of another without flushing the software command queue. This helps some, although you should still try to batch your predicates and hierarchical predicate culling is a little tricky to get right.
  2. It sounds like some sort of resolution mismatch: - If you are using an LCD monitor (pretty likely), is your screen mode set to the native resolution of your monitor? It should be. - Is your backbuffer exactly the size of the /client area/ of your window? If not, the stretching is likely to be causing this issue. This sounds likely if you are resizing without recreating your device. EDIT: It is likely the second point, since you are able to capture this with a screenshot.
  3. Raytracing & Global Illumination

    There is a very simple MC pathracer in 500 lines or so of C++ here. Source and executable are linked off the page.
  4. > I seem to only be able to get up to around 2.5GB before I crash, when I have 8GB Are you compiling to a 64bit target? If not, then you most likely only have 2GB of available virtual memory for your process. > I have 2020 objects for which I must perform an O(n^2) operation on The question everyone is likely to ask is what it is that you are doing that requires O(n^2) time? Do you really have to consider all pairs? Or is there some sort logical or spatial subdivision/hierarchy that will allow you to reject or combine clusters of objects? > My first problem is in writing a multi-threaded caching system. > I'm using a lockless queue for tasks, and I'm afraid that using a locking system for the cache will be its own bottleneck Sounds like complete overkill - first, tell us what you are trying to do, and secondly fix the disk issue. For operations as heavy weight as yours seem to be (operating on 400kb of data), I doubt that lockless vs. lock will make a significant difference.
  5. > Err.. Branch misprediction on a function pointer call? It's an unconditional jump. It's not an if-then-else, but it is a jump to a variable target. Most CPU architectures (e.g., current generation intel and earlier) cannot speculatively execute variable branch targets, causing a pipeline stall. Calling it a "branch" is somewhat ambiguous, but not unheard of since any if-then-else could be replaced by putting a function pointer (to if and else clauses) in a variable depending on some condition (E.g., the class of an object), and then jumping to the variable's address.
  6. > From the documentation terminology, it sounds like a command list just records a list of commands. Yes, roughly. > And from this example, the command lists are still executed on the main thread. No, they are scheduled to be executed by the main thread, but they are actually executed on the HW. > So where is the savings? The construction of the command lists can be done in parallel on multiple calls.
  7. Since you are not actually using semi-transparent objects, and are just using alpha as a (fully opaque vs. fully transparent) mask, you should be using alpha-TESTING, not alpha-BLENDING to get transparency, then you can draw from front to back. This way, the alpha-TESTED pixels in your soccer ball that fail the alpha test will not write Z values, which cause the corresponding pixels in the red quad to be depth killed.
  8. IB and sorting triangles

    > Anyway, can I reaily assume that the order of IB determines the actual rendering order ? Yes, Direct3D requires that this is functionally what has to happen.
  9. Box-Box Visibility test

    I spent quite a bit of time working on this problem. It is an easy problem to express that has surprisingly difficult solutions. Here are some of my publications.. My thesis, specifcally covers a mechanism to do this accurately/analytically, as well a mechanism to do this via rasterization (assuming you want to query many boxes from your one box). Since you are doing box to box queries (and not box to all-triangle), and since it is no longer the year 2000, you could probably modify the GPU algorithm to use occlusion queries, and implement the adaptive hemicube refinement heuristic in a shader.
  10. > data rendering + reading to PBO: 142 FPS How many pixels are you reading back (what is "width" and "height"?). At 142FPS, 18MB/frame is about 2.5GB/s. This is about half PCIE 2.0 16-lane performance. You may want to verify that your PCIE port is not 8 lanes (more common on laptops). If it is 16x, then it may be serialization caused by synchronization as Krypt0n suggested.
  11. DX10 has better support for ths sort of thng (staging buffers, and DO_NOT_WAIT locks), which will likely give you better performance. If using DX10 is an option for you, this is the way to go.
  12. BRDF implementation problem

    I did this by choosing the ray from a canonical distribution first (i.e., around the vector (0, 1, 0) ). Then I built a matrix that transforms the vector (0, 1, 0) onto my reflection ray, and applied it to my point on the canonical distribution. In my case I was writing a path-tracer, so building the matrix once for many rays was pretty cheap (relatively speaking). I would also like to know if there is a better/faster method of doing this.
  13. Portals and PVS's can be used together to good effect, or be used completely separately. Let's go through the combinations: Portals (only): - Visibility is computed from the camera /point/, tending to improve accuracy. - Visibility is computed on the fly, removing the need for preprocessing and additional data structures. - Can (with difficulty) handle the dynamic object occluding static object case. PVS (only): - Does not in general need a portal/cell subdivision of the scene, making it useful for outdoor scenes, forests, indoor scenes with a gazillion visible portals (this kills the efficiency of portal based algorithms), etc. - Can process offline, so it can spend minutes to hours chugging away at generating optimal (for a view cell) visibility. - Does not in general handle dynamic occludiing static object cases (but does handle static covers static, and static covers dynamic). PVS + Portals: - Uses portals to compute the PVS for each room. - It can preprocess the anti-penumbra of the portals so that only the visible PARTS of the cells are rendered (usually to expensive to do at runtime). - The set of visible cells are predetermined, so little run-time clipping is needed. - Portals can be combined with PVS so that the PVS is cut down to the camera POINT subset of the PVS, but still only considering the PARTS of the cell that are potentially visible from the camera view cell. The net result is higher accuracy that normal portal rendering, at the cost of preprocessing the scene. - Using portals to generate the PVS is typically much easier/more-accurate/faster than computing the PVS for generic triangle based scenes. Realtime portals paper Portals+PVS papers/thesis (go to 1992) Non-portal PVS papers/thesis
  14. The operation you give there doesn't project a 3D vertex onto a 2D plane. It projects a 4D homogeneous point onto the 3D hyperplane w=1. To actually apply the camera perspective you first need to apply a perspective transform matrix (see last paragraph). Homogeneous coordinates really just allow infinitely far ("ideal") points to be represented. Working in a space that supports these allows you to perform linear transforms in 4D (or one dimension up, generally) that map to non-linear transforms in 3D (or one dimension lower, generally), once the division by w is performed. For example, if you shear x, y and z against w in the "4D" homogeneous space, you affect a translation in "3D" (quotes are used since a point in such a homogeneous space is not technically 4D, since all points are the same to multiples by a positive scalar). The "perspective transform" matrix that is commonly used (google for it) is also not actually a projection. It performs a 4D linear transform that warps the space so that points near the view plane scale larger in X and Y than points at the far plane, effectively warping a frustum into a cuboid. A true projection (one that is idempotent) would lose the depth information, whereas this sort of transform retains it for use in the z-buffer (or w-buffer). The actual true projection then comes in afterwards by simply ommitting the z-value when rasterizing the triangle on 2D (although the z-value is still interpolated over the z-buffer).
  15. why batch limits?

    No, DX9 (and earlier) batch them as well, this has been true for many many years. One good way to increase your batch limit is multi-thread your application so that your game logic, ai, physics, etc. happens on other cores. [Edited by - crowley9 on January 30, 2010 9:21:34 PM]