nightcracker

Members
  • Content count

    18
  • Joined

  • Last visited

Community Reputation

142 Neutral

About nightcracker

  • Rank
    Member

Personal Information

  1. Simple and fast random number generator

    I forgot about this topic, and it has turned out to be very interesting! More people that are interested than I thought.   I thought I did it, but I forgot to link the paper of Tyche-i, which can be found here: http://eden.dei.uc.pt/~sneves/pubs/2011-snfa2.pdf
  2. Hey everyone,   I was not impressed with the random quality nor speed nor usability of the default rand() C function. So I took the best non-cryptographic RNG function I know of, and implemented it.   Here it is for your enjoyment! Usage is straightforward, speed varies between 5-12 times as fast than rand(), depending on the situation (inlining, register pressure, etc). Memory usage is low - only 16 bytes per RNG (you usually only need one).   #include <cstdint> #include <ctime> class FastRNG {     /* Implementation of the Tyche-i RNG by Samuel Neves and Filipe Araujo.        Period is expected to be 2^127, with a very good uniform distribution. */ public:     /* If no seed is passed to the constructor it will use the current time        in seconds as the seed. */     FastRNG(uint64_t seed = time(0)) {         a = seed << 32;         b = seed & 0xffffffffu;         c = 2654435769u;         d = 1367130551u;         for (int i = 0; i < 20; ++i) {             randint();         }     }     /* Returns a random integer between 0 and 2^32 - 1, inclusive. */     uint32_t randint() {         b = rotl32(b, 25) ^ c;         d = rotl32(d, 24) ^ a;         c -= d;         a -= b;         b = rotl32(b, 20) ^ c;         d = rotl32(d, 16) ^ a;         c -= d;         a -= b;         return b;     }     /* Returns a number between min and max, inclusive. Small statistical        biases might occur using this method - expected bias is < 2^-32. */     int32_t randrange(int32_t min, int32_t max) {         if (min > max) {             uint32_t t;             t = min;             min = max;             max = t;         }          return min + uint32_t((randint() * uint64_t(max - min + 1)) >> 32);     }     /* Returns a random double between 0.0 and 1.0 inclusive. */     double rand() {         return randint() * (1. / 4294967296.);     } private:     uint32_t a;     uint32_t b;     uint32_t c;     uint32_t d;     static inline uint32_t rotl32(uint32_t x, const int n) {         return (x << n) | (x >> (32 - n));     } };
  3. Extremely fast sin approximation

    [quote name='Cornstalks' timestamp='1331306306' post='4920684'] I just profiled this and I'm getting std::sin to be about the same as your fast_sin. Profile results: [code] --------+---------------------------------------------- Samples | Line of code 43 | float fast = static_cast<float>(fast_sin(d)); 43 | float standard = std::sin(d); | 395 | double fast = fast_sin(d); 402 | double standard = std::sin(d); --------+---------------------------------------------- [/code] Profiler Settings: [code] /include:fast_sin /include:std::sin Profiling collection: CPU sampling [/code] Project Optimization Settings: [code] Release Maximize Speed (/O2) Enable Intrinsic Functions: Yes (/Oi) Favor Size or Speed: Favor faster code (/Ot) (everything else are defaults for Visual Studio Release build) [/code] Testing Set up: [code] Visual Studio 2010 Ultimate Microsoft Windows 7 64-bit Intel Core 2 Duo 2.66 GHz MacBook Pro5,1 No other applications were open [/code] Code used in profiling: [code] #include <iostream> #include <cmath> double fast_sin(double x) { int k; double y; double z; z = x; z *= 0.3183098861837907; z += 6755399441055744.0; k = *((int *) &z); z = k; z *= 3.1415926535897932; x -= z; y = x; y *= x; z = 0.0073524681968701; z *= y; z -= 0.1652891139701474; z *= y; z += 0.9996919862959676; x *= z; k &= 1; k += k; z = k; z *= x; x -= z; return x; } void runTestDouble() { double max = 0.0; double start = -10000.0; double end = 10000.0; double step = 0.00001; double d = start; while (d < end) { double fast = fast_sin(d); double standard = std::sin(d); double error = std::abs(fast - standard); max = std::max(max, error); d += step; } std::cout << "Double max error: " << max << std::endl; } void runTestFloat() { float max = 0.0f; float start = -900.0f; float end = 900.0f; float step = 0.001f; float d = start; while (d < end) { float fast = static_cast<float>(fast_sin(d)); float standard = std::sin(d); float error = std::abs(fast - standard); max = std::max(max, error); d += step; } std::cout << "Float max error: " << max << std::endl; } int main() { runTestDouble(); for (int i = 0; i < 100; ++i) { runTestFloat(); } } [/code] Program results: [code] Double max error: 0.000296046 Float max error: 0.000296056 [/code] Yes, the other parts of the code dominate the execution time/CPU samples (especially std::abs()), but I don't see a significant difference between the two functions, and certainly not a factor of ~8.3. [/quote] That is weird. I profiled with gcc -O3, with MinGW. It might be possible that glibc's sin is incredibly slow compared to VC's, I don't know. Either way, this is how I profiled (it uses stopwatch from [url="https://github.com/nightcracker/commonc/blob/master/src/stopwatch.c"]https://github.com/n...src/stopwatch.c[/url]): [CODE] #include <stdio.h> #include <math.h> #include "commonc/stopwatch.h" double fast_sin(double x) { int k; double y; double z; z = x; z *= 0.3183098861837907; z += 6755399441055744.0; k = *((int *) &z); z = k; z *= 3.1415926535897932; x -= z; y = x; y *= x; z = 0.0073524681968701; z *= y; z -= 0.1652891139701474; z *= y; z += 0.9996919862959676; x *= z; k &= 1; k += k; z = k; z *= x; x -= z; return x; } int main(int argc, char **argv) { volatile double prevent_opti; int i; double d; cc_stopwatch_t timer; /* cc_sin test */ printf("Testing cc_sin\n"); /* performance check */ { cc_stopwatch_start(&timer); for (i = 0; i < 10000; i++) { for (d = -16*M_PI; d < 16*M_PI; d += 0.1) { prevent_opti = fast_sin(d); } } printf("fast_sin time: %f\n", cc_stopwatch_gettime(&timer)); cc_stopwatch_start(&timer); for (i = 0; i < 10000; i++) { for (d = -16*M_PI; d < 16*M_PI; d += 0.1) { prevent_opti = sin(d); } } printf("sin time: %f\n", cc_stopwatch_gettime(&timer)); } (void) prevent_opti; return 0; } [/CODE] Giving me these results (consequently, not one time): [code]fast_sin time: 147.236370 sin time: 1252.330147[/code] I used GCC 4.6.1, 64 bit Windows 7 SP 1, and 2.2GHz single-core Celeron.
  4. Extremely fast sin approximation

    This is a extremely fast sin approximation I have been working on for a while and I thought that I'd share it. Before starting off with the code and how I derived this approximation, let's start off with some data: [code]fast_sin time: 148.4ms sinf time: 572.7ms sin time: 1231.2ms Worst error: 0.000296 Average error: 0.000124 Average relative error: 0.02%[/code] As you can see, this approximation is around 3.9 times as fast as sinf and 8.3 times as fast as the default sin. This was tested with all optimizations turned on. The worst error is 0.000296, meaning that this approximation is very usable. On top of that, on the points 0, pi/6, pi/2 (and multiples of those) fast_sin gives the exact correct results. Now, this isn't all magic and ponies, there of course are downsides. The biggest speed-up I've gotten is using a fast truncate (more about this later). This method relies on IEEE754 and that the FPU is in double-precision mode. If either of those two aren't available the trick can not be used and we have to do a slow truncate (cast-to-int). This slows the routine down by about a factor 2. It's still a lot faster, but not as fast as it could be. What implications does this have? Well, for most of us: none. On gamedev here we mostly target Windows and sometimes Mac & Linux. All of those are little endian. And if you don't mess around with your FPU it is most likely in double-precision mode. Allright, now let's see the code: [code]/* uncomment the next line if you're on a big-endian system */ /* #define BIG_ENDIAN */ /* uncomment the next line if you can not assume double-precision FPU or IEEE754 */ /* #define NO_FAST_TRUNCATE */ /* we need to do some custom hacking for MSVC */ #ifdef _MSC_VER typedef __int32 int32_t; #else #include <stdint.h> #endif inline int32_t fast_round(double x) { #ifndef NO_FAST_TRUNCATE const double MAGIC_ROUND = 6755399441055744.0; /* http://stereopsis.com/sree/fpu2006.html */ union { double d; struct { #ifdef BIG_ENDIAN int32_t hw; int32_t lw; #else int32_t lw; int32_t hw; #endif }; } fast_trunc; fast_trunc.d = x; fast_trunc.d += MAGIC_ROUND; return fast_trunc.lw; #else if (x < 0) { return (int32_t) (x - 0.5); } else { return (int32_t) (x + 0.5); } #endif } inline double fast_sin(double x) { const double PI = 3.14159265358979323846264338327950288; const double INVPI = 0.31830988618379067153776752674502872; const double A = 0.00735246819687011731341356165096815; const double B = -0.16528911397014738207016302002888890; const double C = 0.99969198629596757779830113868360584; int32_t k; double x2; /* find offset of x from the range -pi/2 to pi/2 */ k = fast_round(INVPI * x); /* bring x into range */ x -= k * PI; /* calculate sine */ x2 = x * x; x = x*(C + x2*(B + A*x2)); /* if x is in an odd pi count we must flip */ if (k % 2) x = -x; return x; }[/code] Now that doesn't seem to be very clear, with magic constants all over the place. I will explain those later. Note the two macros at the top, you should define BIG_ENDIAN if you're targeting a big-endian system, and if you can not assume IEEE754 or double-precision mode FPU you must define NO_FAST_TRUNCATE (this slows the method down). Allright, the explanation! First I will explain how I calculate the sine, with some highschool math. I have the idea from [url="http://devmaster.net/forums/topic/4648-fast-and-accurate-sinecosine/"]this devmaster thread[/url]. If you draw the sine function, you can see that in the range [-PI/2, PI/2] looks like a regular fifth degree polynomial. To approximate sin(x) I chose the formula [font=courier new,courier,monospace]f(x) = ax^5 + bx^3 + cx[/font]. In order to get a good approximation I made an equation system with known points of sin(x). I set f(pi/2) to 1, f(pi/6) to 1/2, and the derivative of f(pi/2) to 0 (if a derivative is zero then the formula is "flat" on that point, which is exactly what we want). This results in the following equation system and solution: [pre]Equations: f(x) = ax^5 + bx^3 + c f(pi/2) = 1 f'(pi/2) = 0 f(pi/6) = 1/2 Solution: a = 9 / (4 * pi^5) b = -41 / (8 * pi^3) c = 201 / (64 * pi)[/pre] However, we still have a major problem. This is only usable in the range [-PI/2, PI/2] and not outside of it! This is possible to solve with a modulus, but that is way to expensive. Instead I divide x by PI (or multiply by 1/PI) and then round it to an integer. Then I substract that integer times PI from the original x to simulate a modulo. We have two more problems. If x is in the range [PI/2, PI], sin(x) is negative. So far we haven't accounted for that. The easiest solution is to re-use the integer calculated above. If that integer is odd, we are in the negative part of the sine function, and we must flip our result. Our last problem is rounding a double to an integer. In my original code I found that 50% of the time is spent in the [font=courier new,courier,monospace]round[/font] function. After searching the internet for a while I found [url="http://stereopsis.com/sree/fpu2006.html"]this page[/url] by Sree Kotay giving a solution. And it works, beautifully, solving the final problem. The great part about this also is is that it doesn't rely on the standard library at all. This means that this is very usable for demos and such (that often don't link to the standard library for code size). And finally a "cool" version that has every comment and useful stuff removed. This version only works in a little endian, IEEE754 with double-precision FPU environment (which is pretty much the default Windows environment), but in the way it's written is very easily converted to assembler: [code]double fast_sin(double x) { int k; double y; double z; z = x; z *= 0.3183098861837907; z += 6755399441055744.0; k = *((int *) &z); z = k; z *= 3.1415926535897932; x -= z; y = x; y *= x; z = 0.0073524681968701; z *= y; z -= 0.1652891139701474; z *= y; z += 0.9996919862959676; x *= z; k &= 1; k += k; z = k; z *= x; x -= z; return x; }[/code]
  5. I am making a 2D graphics library in Python designed to make 2D easy and fast. Internally it uses OpenGL. The library does cover a bit more then a graphics library, it also includes windowing and sound (and all necessary features to get working, like texture loading). I've got a good beginning working with good performance, and I'd like to expand and refine my feature set. Now I'm wondering, what features should a 2D graphics library have? After some brainstorming I made a little list: [list] [*]Fast sprite class for displaying rotated, scaled, colored and translated textures [*]Sprite batching [*]Drawing geometry: lines, polygons, circles, etc [*]Alpha with blending modes on all of the above [*]Easy texture loading (from file and memory) [*]Texture regions (and texture atlases) [*]Blitting texture(regions) into other textures [*]Particle systems [*]Off-screen rendering surfaces (FBO's?) [/list] But I feel I have only scratched the surface of features. What am I missing? One thing you must keep in mind that all of this in Python with C extensions. Any work in the library is fast, any work the user must do is relatively much slower. In addition to that Python calls are quite expensive, so the library largely benefits from batching. (In case anyone cares, the library in question is pygrafix and can be found here: http://github.com/nightcracker/pygrafix)
  6. Nope. After asking this on stackoverflow too ([url="http://stackoverflow.com/questions/8681904/opengl-still-tries-to-blur-even-with-gl-nearest-gl-texture-2d"]http://stackoverflow.com/questions/8681904/opengl-still-tries-to-blur-even-with-gl-nearest-gl-texture-2d[/url]) it seems to be a SOIL bug! Basically, if I pass SOIL_FLAG_POWER_OF_TWO to SOIL_create_OGL_texture it will go wrong. If I don't it won't (and this is only because I apparently have support for NPOT textures).
  7. An image says a thousand words, so what about two? I have this map art: [img]http://i.imgur.com/Rd9zv.png[/img] In order to actually use this as a map I scale this texture 6 times. This however didn't go as expected: [img]http://i.imgur.com/PA6Kr.png[/img] All the OpenGL code is in my homebrew 2D OpenGL rendering library, and since OpenGL is a state machine it is hard to actually document the whole rendering process. But here is +/- what I do (the code is Python): [CODE]width, height = window.get_size() glViewport(0, 0, width, height) glMatrixMode(GL_PROJECTION) glPushMatrix() glLoadIdentity() glOrtho(0.0, width, height, 0.0, 0.0, 1.0) glMatrixMode(GL_MODELVIEW) glPushMatrix() glLoadIdentity() # displacement trick for exact pixelization glTranslatef(0.375, 0.375, 0.0) glEnable(GL_BLEND) glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) glEnable(self.texture.target) # GL_TEXTURE_2D glBindTexture(self.texture.target, self.texture.id) glTexParameteri(self.texture.target, GL_TEXTURE_MIN_FILTER, GL_NEAREST) glTexParameteri(self.texture.target, GL_TEXTURE_MAG_FILTER, GL_NEAREST) glPushMatrix() glTranslatef(self.x, self.y, 0.0) # self.x and self.y are two negative integer offsets glScalef(self.scale_x, self.scale_y, 1.0) # scale_x and scale_y are both 6 glBegin(GL_QUADS) glTexCoord2i(0, 0) glVertex2i(0, 0) glTexCoord2i(0, 1) glVertex2i(0, self.texture.height) glTexCoord2i(1, 1) glVertex2i(self.texture.width, self.texture.height) glTexCoord2i(self.texture.width, 0) glVertex2i(self.texture.width, 0) glEnd() glPopMatrix() glDisable(self.texture.target)[/CODE] However, this "blurring" bug doesn't occur when I use GL_TEXTURE_RECTANGLE_ARB. I'd like to also be able to use GL_TEXTURE_2D, so can someone please point out how to stop this from happening?
  8. Simple 2D game... Simple collision? Probably not? =-=

    You might want to publish this: [url="https://raw.github.com/gist/1460947/e94f96497ee759e64828a3e0b3ea7defb170a7b8/bitmask.c"]https://raw.github.c...0a7b8/bitmask.c[/url] [url="https://raw.github.com/gist/1460953/199aed3e3d6acf9353e25c1eb582c53eaa8777ea/bitmask.h"]https://raw.github.c...777ea/bitmask.h[/url] I ripped it out of pygame, it has very good performance. LGPL.
  9. I'm designing and implementing a Python OpenGL toolkit with fast 2D sprite support, very much like pyglet (yet with a completely different approach, for example with a lot of code written in C for speed). But I'm really in the dark about what coordinate system to use. AFAIK most 2D libraries and software uses a top-left origin (with y increasing meaning going DOWN) while OpenGL uses a bottom-left origin by default. Also, AFAIK there are no performance arguments against either solution, since the conversion is trivial. So what is the best origin for a 2D graphics library, bottom-left or top-left?
  10. Weird bug in pyglet prevents me from using it

    I'm sorry, I talked to that friend again and got more information. He has the latest nVidia drivers installed, and opengl works in general. The thing that doesn't work is pyglet.sprite.Sprite. If I blit textures directly it works, but the pyglet's sprite class doesn't work. Any ideas? Should I maybe report this bug to the pyglet devs?
  11. I am writing a 2d game in Python with Pygame. I only use Pygame for the rendering, and in the way I have written everything so far it is pretty easy to switch renderers, so I made a pyglet prototype and showed it to a friend. Weird thing is, everything worked find on my computer, but he just got a plain black screen. He even got the black screen with the pyglet samples. He even reinstalled pyglet and python, and I even tried shipping an executable built with py2exe, but alas - he still got the black screen. Now this is really demotivating. I would love to use pyglet, because it seems so much nicer, faster and easier to extend with opengl than pygame, but this is really preventing me from switching. I have never had any issues with running pygame everywhere. What could be the cause of this, and should it affect my choice?
  12. Choosing the right tools

    A few other people and I are starting off on a new game. Neither of us has previous non-trivial game development experiences (though extensive game modding experience). The game is a robotics game in the spirit of Robot Arena 2. The goal in the game is to design robots and fight with them against other (AI/multiplayer) robots. The game is in-door (with the exception of maybe 1-2 outdoor arena's). We are going to program the game in C++ with scripting support in either Lua or Python. What libraries/engines can you recommend? I figured we are going to need at least a 3d engine, a physics engine and a sound engine. I found Irrlicht, a seemingly nice 3d engine with a clear C++ syntax. It also has a nice toolchain with irrEdit. Does anyone have an opinion on this, because I saw a lot of (dated) "meh" posts on the internet. I have no idea about what physics engine to use. The main purpose of the goal, fighting other robots, severely relies on physics (e.g., collision detecting, saws, hammers, velocity, weight, crunching). The physics engine must be very good for this. Robot Arena 2 used the Havok engine, which wasn't that great (at least, at that time, it's a game from 2003). Quoting wikipedia: "[font="sans-serif"][size="2"]One of the main reasons behind the game's relative commercial failure was the relative instability of the Havok physics engine."[/size][/font] [font="sans-serif"] [/font] [font="sans-serif"][size="2"]Please give some tips/suggestions [/size][/font]