
Advertisement
FordPerfect
Member
Content Count
22 
Joined

Last visited
Community Reputation
114 NeutralAbout FordPerfect

Rank
Member
Personal Information

Role
Programmer

Interests
Programming
Recent Profile Visitors
The recent visitors block is disabled and is not being shown to other users.

Fill CPU cache with NOPs
FordPerfect replied to petyakurochkin's topic in General and Gameplay Programming
Does int main(int argc, char **argv) { asm( ".rept 262144\n\t" "nop\n\t" ".endr\n\t"); return 0; } compile faster? 
Any software rendering tutorial without gdi
FordPerfect replied to ryt's topic in General and Gameplay Programming
That's a lot to cover, and the answer somewhat depends on what you already know. First thing I'd recommend is getting a firm grasp on basics: concepts, terminology and mathematics involved in the subject. There's a lot of confusion going on, and for no good reason, as far as I can tell, because this things are, in fact, well defined and clearly explained  the trick is knowing where to look. Clear terminology helps immensely: e. g. when people talk, say, that "1/z is linear in screenspace" it really helps to know which z it is (disclosure: it is z in eyespace, same as w in clipspace, up to a sign). So it helps to know: at least rudimentary basics of matrix algebra (you really don't need much to start); basics of homogeneous coordinates (how they are used in computer graphics); what the relevant coordinate systems and transformations are called (objectspace > eyespace > clipspace > NDC > window coordinates); what's clipping and how it is defined (w<=x,y,z<=+w for OpenGL); and, yes, why "1/w is linear in screenspace" (see e. g. http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/ ). A decent text on computer graphics should clear that. I guess, "OpenGL Programming Guide: The Official Guide to Learning OpenGL" ("Red Book") works (the 1.1 version is free: https://www.glprogramming.com/red/ ), as do some others. "Computer Graphics: Principles and Practice" is a pretty comprehensive textbook, which might be overkill for the task (but then again, it's probably worth reading anyway). To reiterate: the point of the above is to get a grasp on the concepts  you do not necessarily need to read the textbook from cover to cover (though you might want to). Also, GLSpec (e. g. https://www.khronos.org/registry/OpenGL/specs/gl/glspec21.pdf ) can be very helpful, as it contains (unlike most texts) much of the "implementator's view" of OpenGL. But, possibly not right away. Probably same for it's DirectX equivalent. As an aside: mathematics tends to age rather gracefully, so the older texts (mathcentric ones, that is) are probably ok. Now, that it is dealt with, on to the actual topic of writing a software rasterizer. The best explanation I know of is the one by Fabian Giesen: https://fgiesen.wordpress.com/2011/07/06/atripthroughthegraphicspipeline2011part6/ https://fgiesen.wordpress.com/2011/07/10/atripthroughthegraphicspipeline2011part8/ https://fgiesen.wordpress.com/2013/02/08/trianglerasterizationinpractice/ https://fgiesen.wordpress.com/2013/02/10/optimizingthebasicrasterizer/ This describes the halfspace rasterizer, which is predominant type in modern rendering  both hardware and software. Another popular type (which was a lot more common some time ago) is a scanline rasterizer. Chris Hecker has a series of articles on the subject. (see http://chrishecker.com/Miscellaneous_Technical_Articles ) Some additional useful links: https://www.cs.drexel.edu/~david/Classes/Papers/comp17506pineda.pdf https://web.archive.org/web/20171128164608/http://forum.devmaster.net/t/advancedrasterization/6145 http://blog.simonrodriguez.fr/articles/18022017_writing_a_small_software_renderer.html https://www.scratchapixel.com/lessons/3dbasicrendering/rasterizationpracticalimplementation/overviewrasterizationalgorithm The task of putting that on the screen is relatively trivial  you just generate a framebuffer in RAM and put blit it on screen using whatever API you have on hand (BitBlt, StretchDIBits, SDL_BlitSurface, glDrawPixels, what have you; or generate the image right in the videomemory, if we are talking 13h). 
Assembly language?
FordPerfect replied to Embassy of Time's topic in General and Gameplay Programming
I find it amusing, how .kkrieger is mentioned as motivation, given that it is written almost entirely in C++: https://fgiesen.wordpress.com/2012/02/13/debrisopeningthebox/ https://github.com/farbrausch/fr_public As for the main question  I would like to know the answer myself. I'm rather sceptical of the videotutorials (of pretty much any kind) on the subject. What I want is a proper textbook. And there doesn't seem to be many of those around. I get the impression, that the mindset is like "whoever wanted to learn assembly did it in 90's already, and if you are trying to learn now  sucks to be you", which is unfortunate. My best suggestion would probably be using books (& environment, e. g. DosBox) from 90's and then slowly assimilating information on more modern assembly, scattered on the net. Which is... suboptimal. About the best recent textbook on assembly I know of would be "The Art of Assembly Language", which for one thing I would not call exactly 'modern' anymore, and for another has made some questionable choices for my tastes (fascinations with macros is one example: they might be useful to write assembly, but arguably not to learn it). There is "Intel 64 and IA32 Architectures Developer's Manual", of course. That is a great reference, but it is frightening trying to imagine someone trying to use that behemoth as a textbook. 
Mingw objcopy undefined reference
FordPerfect replied to Anri's topic in General and Gameplay Programming
I guess somewhat more selfcontained solution is to use incbin from within inline asm. See e. g. https://www.devever.net/~hl/incbin and https://gist.github.com/mmozeiko/ed9655cf50341553d282 . 
Please help with ASM question
FordPerfect replied to DividedByZero's topic in General and Gameplay Programming
Yes, float works too. I was about to reply that. Actually, FIDIV, FIMUL, and FISUB might make the code a bit smaller. Also note, that you can (I think) force FDIV to produce the same result as integer division (by setting precision to full (64bit mantissa), and rounding to RoundToZero (i.e. truncate)), if you care about it. 
Please help with ASM question
FordPerfect replied to DividedByZero's topic in General and Gameplay Programming
Do you have access to 64bit operations? That would simplify things a bit, as 64bit versions of imul, idiv and cmp solve problem quite straightforwardly. If this is purely 32bit code you need to construct them from 32bit pieces. If you're not in the mood to try to figure it out yourself, you can just copy the C++ code into gcc.godbolt.org, and see what compilers do in this situation: https://godbolt.org/g/2F9a1e As you can see, the division is especially heavy (multpliy and compare are relatively easy: "imul+mul+add" and "cmp 0+cmp 1000000000" respectively). Compilers just implement it as a function call. The code for __divdi3 can be seen here: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/wordsize32/divdi3.c;h=257d93cc37c011f2714b48ccb0c476f5a4a39319;hb=HEAD 
Error in remainder of integer? (scaling offset)
FordPerfect replied to suliman's topic in For Beginners's Forum
Note, that the above "fix" can still disagree with drawStuff by 1 pixel due to the roundoff (though in rather small percentage of cases). So for (int i = startY; i < startY + h; i++) // hori gfx.lineRelative(0, i * TILE*scale + pan.y, gfx.screenX, 0, col); is probably preferred, as it should agree with drawStuff completely (assuming compiler optimizations didn't break it). 
Error in remainder of integer? (scaling offset)
FordPerfect replied to suliman's topic in For Beginners's Forum
Small aside: floats are less magical, than many people make them to be. They can be counterintuitive to unwary, but you can, in fact, ensure some things with certainty. I'm really not fond of that particular wording, as it conflates several issues (and in some ways is untrue). As far as I can tell 1*n+1*n==2*n holds for all nonNaN floats. Back to the topic at hand. Some clarifications would be welcome (e. g. drawStuff does not mention pan at all), but there seems to be enough information now to piece the problem together. First off, if you may simply go with that. Now, as to why the original code doesn't work. The reason have already been mentioned by few people, but I'll elaborate. The lines are at i * TILE*scale + pan.y on the yaxis. The first (least y) line that fits on the screen is at i=startY such that 0 <= startY * TILE*scale + pan.y < TILE*scale pan.y <= startY * TILE*scale < TILE*scale  pan.y startY = ceil( pan.y / (TILE*scale)) {assuming scale>0} It's offset is ceil( pan.y / (TILE*scale)) * (TILE*scale) + pan.y which is floor(pan.y / (TILE*scale)) * (TILE*scale) + pan.y that is mod(pan.y, TILE*scale), assuming modulo is defined to be always nonnegative, which std::fmod isn't. This is different from mod((int)pan.y, (int)(TILE*scale)) which you are doing (once more, disregarding negatives). In fact, difference can be estimated as frac(TILE*scale)*startY (not always true, since it can wraparound), which may be quite large. This explains why integer TILE*scale work fine. So the original code can be (I think) fixed as follows: float oy=fmodf(pan.y,TILE*scale); int offY = (int)(oy<0.0f?TILE*scale+oy:oy); 
need more randomization
FordPerfect replied to zfvesoljc's topic in General and Gameplay Programming
Well, yes, the question does not seem very clear to me. The way that I'm reading this is that you have a lot of objects (particles) which should have about 10 parameters that are unique per object (random), but constant during object liftime, and you want to avoid storing all 10. Is that right? If so, then having a noise function and regenerating parameters whenever you need them is certainly a viable option. It adds some overhead, compared to just storing values, but decent noise function can be pretty fast. Then you just generate random value based on object's id and an index of a parameter, something like get_noise(obj_id*16+parameter_id). You can of course use stateful RNG (noise function is stateless), either per object, or globally stored, but is there any reason to? I once started a topic on noise functions, which may contain some useful links: The http://marcbreynolds.github.io/shf/2016/04/19/prns.html library has several nice functions. Also there are several posts by Jonathan Blow on similar topic (he did go with a (seekable) stateful RNG, not that I can see a good reason why): http://numbernone.com/blow/blog/programming/2016/07/07/braid_particles_1.html http://numbernone.com/blow/blog/programming/2016/07/08/fabianonlcgfastforward.html http://numbernone.com/blow/blog/programming/2016/07/13/braid_particles_2.html If you need float from a noise function that returns an integer, that is easy too. For example: // Returns a random float, uniformly distributed in [0; 1). float get_rnd_float(uint32_t seed) { uint32_t bits=get_rng_uint32(seed); bits=bits&0x000FFFFFFu; // 2^241, 24 being float's precision (i. e. full mantissa size, including implicit bit). float ret=float(int32_t(bits)); // int>float may be faster than uint>float. ret*=5.96046448e8f; // 2^{24}. return ret; } 
How to write code for winning probabilities
FordPerfect replied to SinnedB's topic in General and Gameplay Programming
Or more strictly: generate a random number uniformly distributed in [0; 55) (it can be integer or real, though there doesn't seem to be much reason to use reals). BTW, the binary search might be unnecessary. It reduces asymptotic time from O(n) to O(log n) if we know probabilities in advance, but does not help the asymptotic if each invocation uses unique list of probabilities. In practice, linear search might be faster than binary for small sizes (e. g. <8 possibilities), which are probably very common. The code might look like this (C++): // Chooses random index i in [0;n) with probability weights[i]/sum(weights). unsigned int choose( unsigned int n, const unsigned int *weights) { unsigned int ret=0; unsigned int cur=0,sum=0; for(unsigned int i=0;i<n;++i) sum+=weights[i]; if(sum==0) {printf("WARNING: sum of weights is zero!\n");return 0;} unsigned int c=rand()%sum; while((cur+=weights[i])<=c) ++ret; return ret; } 
Highquality integer noise function
FordPerfect replied to FordPerfect's topic in General and Gameplay Programming
The https://marcbreynolds.github.io/shf/2016/04/19/prns.html deserves mention. Sorry to bump the old thread, but this may be useful to people who stumble on this specific topic. 
Well, yes. I meant mostly that it would be nice to stay in C++03 if feasible. I'm curious to test changing all shifts to shl/shr in some codebase and measuring performance impact. Can anyone suggest a good guinea pig? Requirements: 1. Heavy user of shifts. 2. Small & easy to build. No larger than say quake2, and ideally singlefile fitting in whatever limit rextester has (64 KB?).

BTW, @frob, thanks for your comments. Now that I think about it, adding optional asserts to make this code double as runtimecrulesviolationsdetector seems reasonable. I still maintain, that main purpose is that such cases are normal part of semantics, and not errors.

I'm not sure examples are particularly convincing, yet... One thing that comes to mind is fixedpoint. It is somewhat reasonable to view integers as a special case of fixedpoint (fraction_bits=0), and it makes even more sense to allow bits to represent only fractional part (fraction_bits=32; this is unsigned fixed_point  somewhat unusual) to cover [0; 1). Also I imagine such thing could come up in reader/writer on stream of bits (operations work on [0 .. wordsize] bits). And a<<(bc) thing happened to me writing texture interpolation (on CPU in fixed point)  I worked around it adding bias to shift.

Seems sound sense. This depends on what we actually mean when we try to shift non2complement signed integers. If we go with Knuth's definition above and treat shift as mathematical operation, operating on numerical values rather than bitwise representations, than L(1) is actually correct (for both signed end unsigned L). You can claim that it is not very sensible, and I can claim that it is still about the most sensible thing we can do if we do not know anything about our numbers (representation, etc). In practice, I'm fine with ignoring non2complement entirely. Similar to above, and an example would be appreciated (honestly curious). My point is to make welldefined semantics for all inputs. As there are no invalid inputs, there is nothing to assert. If it is my code  yes. If I'm trying to make something librarylik for other people to use  not so much. Fair enough. I only tested x86. Also (newer) GCC did >>31, not cmove. My question was purely about C++ semantics (linkage, multiple definitions, etc.), not actual inlining which basically is orthogonal to 'inline' keyword.

Advertisement