Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

324 Neutral

About c-up

  • Rank
  1. It doesn't use CUDA or DirectX, it's OpenCL and OpenGL. I've just installed Ubuntu on my PC at home so watch this space.
  2.   Well, I don't like humans much, so good   Are you saying you'd use it if I did a Linux version? That can be arranged.
  3. Thanks Kryzon, that's put a big smile on my grumpy old face :-)   The 3rd one is my favourite - what font is it? However, I do also like the way the UP is superscripted in the first two, and they'll fit better on the website.   Which one do you prefer?   I'll put something on the website when I get a chance (it's milestone week at real work so it'll have to wait 'till the weekend.)
  4. Debug > Attach to process...
  5. Hi folks,   As promised version 0.3 has just been uploaded to the website with a full source code release.   This version has libraries for audio (OpenAL) and networking (based on sockets but wrapped in classes.)   It also has support for touch input (see the OpenGL sample) and improvements to the way generics are implemented.   Please let me know if you have any problems or other feedback.   Thanks.
  6. Using normal vs velocity as you describe won't work if you're still allowing intersection and then trying to resolve it after the fact I'm afraid, because what if the character jumps just before he reaches the block? Even if he doesn't jump he might still have a tiny amount of upward velocity due to floating point inaccuracy. This might cause you to put an epsilon in, which could then fail in some other subtle case.   Using continuous collision would be pretty straightforward if you only have axis aligned boxes and is possibly the only general solution to this problem. However, if you also have irregular collision shapes then you're likely to get into a world of floating point rounding hurt.   Another random idea - if you're only using rectangles then you could say that the top of the moving box starts off higher than the bottom of the other box and therefore can't have hit, but this only works for aligned boxes I think.   I also actually think your other idea of checking the x axis then the y axis could have legs. Of course it'll also fail if you then drop from above very fast onto a gap that isn't wide enough to fit down!   Designing around the issue is desirable but might not be possible if you want moving platforms or sliding doors, etc.   Wish I could give you a definitive solution but this is indeed quite a tricky problem.
  7. c-up of course - because it's an extra life for c, or a power-up if you prefer. http://www.c-up.net
  8. It just occurred to me reading this that when I talk about releasing source code I just mean the c++ that comprises the compiler itself and runtime systems (garbage collector, job manager, etc.)   All of the actual C-UP source code is already available as it's embedded in the object files that are part of the download. It can be extracted out into individual files using the 'extract' command as described in the user guide. This includes source for things like the rigid body library, the debugger itself, collections, parallel collections, string formatting, opengl/cl bindings, etc.   You can also just use the debugger to browse source code if you wish (and even view the generated assembly language interleaved with the source.)
  9. Been a while since I wrote a platformer but the main reason not to apply gravity when on the ground is to prevent your character gradually sliding down slopes.   Whether you see this issue in the presence of slopes depends upon how you respond to colllisions of course: - if you just say he's hit a 'floor' and move him straight up to resolve the collision then you won't see this problem - if you move him in the direction of the normal of the surface then he will slide unless you also implement a friction model - however, a general friction model can be difficult to get feeling precise enough for a platformer   In my opinion the no gravity option is easier to work with for getting very precise platforming mechanics but the gravity always on version can give you nicer secondary effects like little skids and bounces when you land a jump.   If all of your platforms are flat (i.e. no slopes) then there's not much in it.
  10. c-up

    Sweep AABB Collision/Response

    It's a pretty low brow solution but if you only have 3 moving bodies (a puck and 2 paddles) then you could just run the physics with very small time steps.
  11. I keep mentioning shared memory. Here's how the scatter part of RadixSortP above might look if shared memory was used to parallelise it. Note the shared flag on the two output arrays of RadixScatter. All shared does is prevents the job manager making a dependency between this shared memory and any other shared memory, which matters in this case because we're passing the entire itemsAlt and prefixes arrays to each call to RadixScatter and without shared the calls would all depend on each other and be serialised.   I haven't compiled the code below but it should illustrate the concept.   // parallelism occurs at function scope, so we must introduce a new scatter function parallel void RadixScatter<T>(shared T[] local itemsOut, shared int[] local prefixes, const(T)[] local itemsIn, int shift) { foreach (const(T)& item; itemsIn) { // AtomicAdd returns the old value int o = AtomicAdd(&prefixes[(item.RadixSortKey << shift) >> 24], 1); itemsOut[o] = *item; } } // and here's the modified scatter part in RadixSortP int split = items.Length / 4; // arbitrarily split 4 ways parallel { int s = 0; for (int i = 0; i < 3; i++, s += split) RadixScatter<T>(itemsAlt, prefixes, items[s .. s + split], shift); // remainder RadixScatter<T>(itemsAlt, prefixes, items[s .. items.Length], shift); }
  12. Hodgman mentioning radix sort got me wondering so I've knocked up an implementation.   On my 2 core laptop (AMD E-450 APU 1.65GHz), sorting 1024*1024 items takes: 1 core = 678ms 2 core = 368ms (1.84x speedup)   On my 4 core desktop (AMD Athalon 2 X4 630 2.8GHz), sorting the same number of items takes: 1 core = 333ms 2 core = 190ms (1.75x) 3 core = 136ms (2.45x) 4 core = 106ms (3.14x)   Pretty reasonably speedups.   Anyway, here's the radix sort code. It recursively subdivides the problem to generate the parallelism as described here: http://software.intel.com/sites/default/files/m/b/3/6/3/8/22396-ParallelRadixSort_andreyryabov.pdf   It doesn't do parallel scatter as Hodgman suggested for reasons commented on in the code. I'd be interested to know if there's a better approach than I took.   As well as it being parallel internally you can also run multiple (non-overlapping) sorts in parallel if you need to.   // parallel radix sort // in-place stable sort of items array into ascending key order // type to be sorted must have RadixSortKey uint property // you can pass in temporary workspace otherwise it will allocate one internally on the stack public parallel void RadixSort<T>(T[] local items, T[] local itemsTemp = null) { // workspace required for double buffering T[] local itemsAlt = itemsTemp; if (!itemsAlt) itemsAlt = new local T[items.Length];// = void; // = void prevents zeroing the allocated memory (in theory - currently crashes compiler) if (items.Length > 4096) RadixSortP<T>(items, itemsAlt, 0); else RadixSortNP<T>(items, itemsAlt, 0); } private parallel void RadixSortP<T>(T[] local items, T[] local itemsAlt, int shift) { // calculate prefix values int[] local prefixes = new local int[256]; // if lots of keys then calculate prefixes in parallel if (items.Length >= 32768) { int split = items.Length / 4; int[] local [4] prefixesPar; for (int i = 0; i < 4; i++) prefixesPar = new local int[256]; parallel { int s = 0; for (int i = 0; i < 3; i++, s += split) RadixSortCountPrefixes<T>(prefixesPar, items[s .. s + split], shift); // remainder RadixSortCountPrefixes<T>(prefixesPar[3], items[s .. items.Length], shift); } // sum the prefixes using simd add (stack allocated arrays are at least 16 byte aligned) foreach (int[] local pfxPar; prefixesPar) { int8[] local pfxSrc8 = cast(int8[] local)pfxPar; int8[] local pfxDst8 = cast(int8[] local)prefixes; foreach (int8& pfx, i; pfxSrc8) pfxDst8 += *pfx; } } // if not a large number of prefixes do them in serial else { RadixSortCountPrefixes<T>(prefixes, items, shift); } // prefix is sum of all previous values int sum = 0; foreach (int& prefix, i; prefixes) { int newSum = sum + *prefix; *prefix = sum; sum = newSum; } // scatter values out to destination array // can be done in parallel if shared memory is used but: // - you'd need to use atomic increments which would kill performance if you have lots of the same key // - would make the sort unstable // (maybe there's another way to partition the work I'm not seeing that avoids these issues) foreach (T& item; items) { int o = prefixes[(item.RadixSortKey << shift) >> 24]++; itemsAlt[o] = *item; } // done? if (shift < 24) { // now sort the sub-arrays in parallel if they're reasonably large or non-parallel if they're not parallel { int s = 0; for (int i = 0; i < 256; i++) { int e = prefixes; if ((e - s) > 4096) RadixSortP<T>(itemsAlt[s .. e], items[s .. e], shift + 8); else if ((e - s) > 1) RadixSortNP<T>(itemsAlt[s .. e], items[s .. e], shift + 8); else if ((e - s) > 0) items[s] = itemsAlt[s]; s = e; } } } } // non-parallel version called by the master version above when arrays are sufficiently small private void RadixSortNP<T>(T[] local items, T[] local itemsAlt, int shift) : parallel { // calculate prefix values int[256] prefixes; RadixSortCountPrefixes<T>(&prefixes, items, shift); // prefix is sum of all previous values int sum = 0; foreach (int& prefix, i; prefixes) { int newSum = sum + *prefix; *prefix = sum; sum = newSum; } // scatter values out to destination array foreach (T& item; items) { int o = prefixes[(item.RadixSortKey << shift) >> 24]++; itemsAlt[o] = *item; } // if not done sort the sub-arrays if (shift < 24) { int s = 0; for (int i = 0; i < 256; i++) { int e = prefixes; if ((e - s) > 1) RadixSortNP<T>(itemsAlt[s .. e], items[s .. e], shift + 8); else if ((e - s) > 0) items[s] = itemsAlt[s]; s = e; } } } // radix sort helper parallel void RadixSortCountPrefixes<T>(int[] local prefixes, const(T)[] local items, int shift) { foreach (const(T)& item; items) prefixes[(item.RadixSortKey << shift) >> 24]++; }   And here's the little testbed I used to generate the results at the top:   import System.IO.Console; uint randSeed = 0; public void SeedRand(uint seed) { randSeed = seed; } public uint Rand() { randSeed = (randSeed * 1103515245 + 12345) & int.MaxValue; return randSeed; } public struct RadixItem { private uint key; private int value; public this() { key = Rand(); value = Rand(); } public uint RadixSortKey() const : parallel { return key; } } public int Main(string[] args) { SeedRand(12345); static const int numItems = 1024 * 1024; static const int numRuns = 10; RadixItem[] items = new RadixItem[numItems]; RadixItem[] itemsTmp = new RadixItem[numItems]; // uncomment this line to set number of threads job queue can use (it clamps internally to 1 .. number of physical cores on machine) //System.Profile.JobQueue.SetNumThreads(2); // run radix sort performance test Console.WriteLine("Timing radix sort of %d items", &numItems); ulong totalCycles = 0; // 10 runs for (int r = 0; r < numRuns; r++) { foreach (RadixItem& item; items) *item = RadixItem(); ulong startCycle = GetProcessorCycleCount(); RadixSort<RadixItem>(items, itemsTmp); ulong endCycle = GetProcessorCycleCount(); totalCycles += endCycle - startCycle; // check sorted for (int i = 1; i < items.Length; i++) Assert(items.RadixSortKey >= items[i - 1].RadixSortKey); } // report average time taken float totalMs = cast(float)totalCycles / cast(float)GetProcessorCyclesPerSecond() / cast(float)numRuns * 1000.0f; Console.WriteLine("Average time taken over %d runs = %.2fms", &numRuns, &totalMs); return 0; }
  13. I've fixed all of the issues you raised. The new version is on the website (I just overwrote the existing one.)   The crash was a divide by zero when the debugger was minimized (yes, it has been that well tested.)   Files will open in the debugger if they have one of these extensions: .cup .cl .frag .vert .glslv .glslf .glv .glf .txt   A couple of other things have changed: - The optimiser works now (at least on all the tests I've run). It doesn't do that much optimisation yet, but some. - Windows raw input is used instead of DirectInput so input.dll is no longer needed and you don't need to call Input.Init/Shutdown/Poll any more.   Thanks again, Si
  14.   Please ignore this comment - it's total garbage. Tell me what file extensions you use for your shaders and I'll make it happen though.
  15. Thanks for the feedback and compliment.   Not sure what you mean by the out flag ignoring case. Do you mean that the file that gets written doesn't have the same case you passed into the option?   If you embed your shader files as resources in the program then you will be able to view them. Pass them in as input files to the compiler and then reference them in program with r"MyShader.glsl" (the 'r' prefix being the important thing.) I do actually plan to make the compiler build shaders that are passed in this way so you can see errors of them without having to run your program. The only reason I haven't done this yet is because in Windows you have to make a window before you can run the shader compiler (I think) which is a pain.   At least you got a callstack  but I'll put in a proper error.   Any problems you get please let me know and I'll try to fix them asap.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!