• Content count

  • Joined

  • Last visited

Community Reputation

194 Neutral

About Martin

  • Rank
  1. Both the sqrt and invsqrt functions involve moving data to/from int/float registers, this may be slow on your modern target hardware, some consoles in particular. Additionally instruction pipelines may well be deeper than they were when these functions were written making incorrect branch predictions more expensive than they used to be. To profile functions make 100,000's of calls to them in a loop. (You might want to vary inputs) read a timer before the start of the loop (google QueryPerformanceCounter) and again at the end and print out how long it took. My advice if your new to games programming, don't worry too much about the speed of the 'little things' until they show up in your profile. There is a whole world of interesting algorithms used in game development which is where the real speed comes from, that and memory access patterns. When a little maths function appears in your profile you can usually optimise it out, if not, at least you have a context and test case for trying to optimise the maths function itself. That is of course unless it's optimising the little things which really interest you. In my current engine we actually do have approximate versions of a number of standard maths functions which are actually quicker on some platforms, we have a unit tests for each which fail if the approximate versions are not faster. (or do no produce expected results)
  2. If you implemented tessellation with displacement mapping you wouldn't need geo-mipmapping. e.g. you would only need to transmit your lowest LOD (2x2 ?), tessellate that and displace the vertices I think if you're using DX11 there is no need to combine LOD with tessellation, let tessellation do all the heavy lifting on chip and use all the graphics cards bandwidth for textures etc.. instead of pulling in lots of vertices (not that that's really much of a bandwidth saving but it is some)
  3. Quote:Original post by KulSeran Quote: One of the reasons I need this is because there are no good tools for profiling multiple threads on consoles. I find that hard to believe. PIX on the 360 is excellent. Sony has equally in-depth profilers we've used at work, though I've never personally used them. PIX on 360 is excellent however it has a few failings and profiling threads is one of them.
  4. Hi All, Thanks for the replies. One of the reasons I need this is because there are no good tools for profiling multiple threads on consoles. I simply want the PC version of my engine to work consistently with the console version. I'm working to try and do away with the concept of 'the main thread', there are simply tasks, and threads available to service them. Losing an entire core to timing would be extremely painful / skew performance metrics more than having inaccurate timers. Good to here that problems with QueryPerformanceCounter aren't common on multiple core machines. I might be able to write some code which detects there are issues and inform the users that his profiles are suspect that that PC. Thanks, Martin
  5. Hi, It's been some time since I've done any programming in PC land. Now, my problem is that I have a multi-threaded engine running on PC, any thread may request to read the performance counter at any time. using RDTSC is clearly a no go, it counts CPU cycles on the host core, variable rate CPU frequencies are problematic is a thread switching onto a different core. Clock functions are way too low resolution QueryPerformanceCounter is only ok as long as all calls are made from the same thread. The idea is then to kick up a thread just sits there waiting to be asked to make a call to QueryPerformanceCounter, it then uses a 64 bit atomic exchange instruction to write the result back to the calling thread. Easy enough, the issue is how to do this fast without the timing thread taking up a lot of CPU resources running all the time. Waiting the thread on an event would work but it could take considerable time to wake up -> the event signal would be slow -> the timing would be inaccurate. (It is used in an inbuilt profiler as well as for other operations) Has anyone seen any good articles on this subject / have any experience / thoughts to share before I go off reinventing the wheel. Many thanks in advance, Martin
  6. I started with a Canyon racer on the BBC Micro. The walls of the canyon where Brownian motion (I didn't know it was Brownian motion at the time, but I do now) with a minimum width. There were rocks to avoid and gates to go between for points. Over time the minimum width reduced and the number of rocks to avoid increased. Red gates increased your score while yellow gates your score multiplier, (it was 4 colour mode, white, black, red and yellow) failure to make a yellow gate reset the score multiplier. I still think this was a good first game, bit harder to make it impress these days though, people would expect nice canyon walls, per pixel collision, nice water etc.. I agree Tetris is a good idea, I don't think it had been invented back then though..... (I am showing my age here, not good) Other games from the 8 bit era are good starters: Millipede Painter etc.. Cheers, Martin
  7. Apologies, my last post was incorrect, but I deleted it too late. I see what you're getting at now I think there's an error in the code float s = D3DXVec3Dot(&dir,&D3DXVECTOR3(0,0,1)); if( s <= 0.001f ) //then we have a problem :D should be float s = D3DXVec3Dot(&dir,&up); if( s >= (1.0f - 0.001f) ) //then we have a problem :D The perpendicular vector solution would still solve your problem / remove the need for this test
  8. I think you mis-read my post, I'm not talking about changing your 'dir' vector, just making a different choice for you 'up' vector to something which is guaranteed always to work
  9. Yes, the ever popular look at algorithm becomes increasingly unstable as the look at direction approaches +/- the up vector. Because your cone is invariant under z axis rotation, you can use any perpendicular vector to your 'dir' vector (there are an infinite number) as your 'up' vector. This should solve your problem, there's no way this will break down / de-stabilise.
  10. To avoid shimmer due to aliasing the grid points must remain sampling the same points on the height texture. That is, for any given resolution of the grid the sample points move in whole multiples of the resolution. Therefore the grid does move in relation to the viewer, but only a small amount, when the sample points change the grid is re-centred about the camera.
  11. You're quite right, it can't really handle finite terrain such as islands, the common solution is to model the sea floor as well. To handle finite terrain you would need for a given data point to specify x,y and z so that it could collapse triangulation to degenerate tri's, this would obviously be quite wasteful for the case you want to optimise, being on land.
  12. I would recommend looking at separating axis theorem, no I am not familiar with using voronoi regions for collision. Are we talking about convex hulls?
  13. Load time or background task procedural content generation is very cool. It saves on disk space (good for downloadable and user generated content games) and it easily scales to the system resources available. Depending on how fast you can generate the content it might also be faster than reading the data, at least from optical media. (Important for us console developers)
  14. CPU look up tables have exactly the same issue You have to ask yourself, does the ALU time you save out weigh the cost of fetching the table through the data cache? If the table is frequently used it may already been in the cache, if not it's an expensive fetch from main memory.
  15. There isn't a clear cut answer unfortunately Texture sampling isn't a fixed cost, factors include: 1. Texture access pattern (how cache friendly is it?) 2. Format of texture being sampled (floating point will be more expensive etc..) 3. Type of sampling being used (point, bilinear, trilinear etc..) 4. Type of texture being used (volume will be more expensive than cube / normal) 5. Size of texture (in bytes) 6. Whether or not mip mapping can be used to reduce texture cache thrash 7. What other texture reading is going on at the same time competing for the cache 8. Are texture reads dependant on each other? ALU time isn't a fixed cost, how expensive is the thing you're wanting to compute? Whether computation will beat texture reading depends therefore on many factors: 1. How expensive is the texture read 2. How much ALU time does the texture read save you (In the case of the old normalising cube maps, these days, not a lot) In answer to your question, there is no definitive answer to which is better, results will vary with different algorithms and different graphics cards. If you're texture bound, convert texture lookup to ALU computation. If you're ALU bound, convert ALU computation to texture lookup.