Advertisement Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

114 Neutral

About FordPerfect

  • Rank

Personal Information

  • Role
  • Interests

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. FordPerfect

    Fill CPU cache with NOPs

    Does int main(int argc, char **argv) { asm( ".rept 262144\n\t" "nop\n\t" ".endr\n\t"); return 0; } compile faster?
  2. That's a lot to cover, and the answer somewhat depends on what you already know. First thing I'd recommend is getting a firm grasp on basics: concepts, terminology and mathematics involved in the subject. There's a lot of confusion going on, and for no good reason, as far as I can tell, because this things are, in fact, well defined and clearly explained - the trick is knowing where to look. Clear terminology helps immensely: e. g. when people talk, say, that "1/z is linear in screen-space" it really helps to know which z it is (disclosure: it is z in eye-space, same as w in clip-space, up to a sign). So it helps to know: at least rudimentary basics of matrix algebra (you really don't need much to start); basics of homogeneous coordinates (how they are used in computer graphics); what the relevant coordinate systems and transformations are called (object-space -> eye-space -> clip-space -> NDC -> window coordinates); what's clipping and how it is defined (-w<=x,y,z<=+w for OpenGL); and, yes, why "1/w is linear in screen-space" (see e. g. ). A decent text on computer graphics should clear that. I guess, "OpenGL Programming Guide: The Official Guide to Learning OpenGL" ("Red Book") works (the 1.1 version is free: ), as do some others. "Computer Graphics: Principles and Practice" is a pretty comprehensive textbook, which might be overkill for the task (but then again, it's probably worth reading anyway). To reiterate: the point of the above is to get a grasp on the concepts - you do not necessarily need to read the textbook from cover to cover (though you might want to). Also, GLSpec (e. g. ) can be very helpful, as it contains (unlike most texts) much of the "implementator's view" of OpenGL. But, possibly not right away. Probably same for it's DirectX equivalent. As an aside: mathematics tends to age rather gracefully, so the older texts (math-centric ones, that is) are probably ok. Now, that it is dealt with, on to the actual topic of writing a software rasterizer. The best explanation I know of is the one by Fabian Giesen: This describes the half-space rasterizer, which is predominant type in modern rendering - both hardware and software. Another popular type (which was a lot more common some time ago) is a scanline rasterizer. Chris Hecker has a series of articles on the subject. (see ) Some additional useful links: The task of putting that on the screen is relatively trivial - you just generate a framebuffer in RAM and put blit it on screen using whatever API you have on hand (BitBlt, StretchDIBits, SDL_BlitSurface, glDrawPixels, what have you; or generate the image right in the video-memory, if we are talking 13h).
  3. FordPerfect

    Assembly language?

    I find it amusing, how .kkrieger is mentioned as motivation, given that it is written almost entirely in C++: As for the main question - I would like to know the answer myself. I'm rather sceptical of the video-tutorials (of pretty much any kind) on the subject. What I want is a proper textbook. And there doesn't seem to be many of those around. I get the impression, that the mindset is like "whoever wanted to learn assembly did it in 90's already, and if you are trying to learn now - sucks to be you", which is unfortunate. My best suggestion would probably be using books (& environment, e. g. DosBox) from 90's and then slowly assimilating information on more modern assembly, scattered on the net. Which is... suboptimal. About the best recent textbook on assembly I know of would be "The Art of Assembly Language", which for one thing I would not call exactly 'modern' anymore, and for another has made some questionable choices for my tastes (fascinations with macros is one example: they might be useful to write assembly, but arguably not to learn it). There is "Intel 64 and IA-32 Architectures Developer's Manual", of course. That is a great reference, but it is frightening trying to imagine someone trying to use that behemoth as a textbook.
  4. I guess somewhat more self-contained solution is to use incbin from within inline asm. See e. g. and .
  5. FordPerfect

    Please help with ASM question

    Yes, float works too. I was about to reply that. Actually, FIDIV, FIMUL, and FISUB might make the code a bit smaller. Also note, that you can (I think) force FDIV to produce the same result as integer division (by setting precision to full (64-bit mantissa), and rounding to RoundToZero (i.e. truncate)), if you care about it.
  6. FordPerfect

    Please help with ASM question

    Do you have access to 64-bit operations? That would simplify things a bit, as 64-bit versions of imul, idiv and cmp solve problem quite straightforwardly. If this is purely 32-bit code you need to construct them from 32-bit pieces. If you're not in the mood to try to figure it out yourself, you can just copy the C++ code into, and see what compilers do in this situation: As you can see, the division is especially heavy (multpliy and compare are relatively easy: "imul+mul+add" and "cmp 0+cmp 1000000000" respectively). Compilers just implement it as a function call. The code for __divdi3 can be seen here:;a=blob;f=sysdeps/wordsize-32/divdi3.c;h=257d93cc37c011f2714b48ccb0c476f5a4a39319;hb=HEAD
  7. Note, that the above "fix" can still disagree with drawStuff by 1 pixel due to the roundoff (though in rather small percentage of cases). So for (int i = startY; i < startY + h; i++) // hori gfx.lineRelative(0, i * TILE*scale + pan.y, gfx.screenX, 0, col); is probably preferred, as it should agree with drawStuff completely (assuming compiler optimizations didn't break it).
  8. Small aside: floats are less magical, than many people make them to be. They can be counterintuitive to unwary, but you can, in fact, ensure some things with certainty. I'm really not fond of that particular wording, as it conflates several issues (and in some ways is untrue). As far as I can tell 1*n+1*n==2*n holds for all non-NaN floats. Back to the topic at hand. Some clarifications would be welcome (e. g. drawStuff does not mention pan at all), but there seems to be enough information now to piece the problem together. First off, if you may simply go with that. Now, as to why the original code doesn't work. The reason have already been mentioned by few people, but I'll elaborate. The lines are at i * TILE*scale + pan.y on the y-axis. The first (least y) line that fits on the screen is at i=startY such that 0 <= startY * TILE*scale + pan.y < TILE*scale -pan.y <= startY * TILE*scale < TILE*scale - pan.y startY = ceil(- pan.y / (TILE*scale)) {assuming scale>0} It's offset is ceil(- pan.y / (TILE*scale)) * (TILE*scale) + pan.y which is -floor(pan.y / (TILE*scale)) * (TILE*scale) + pan.y that is mod(pan.y, TILE*scale), assuming modulo is defined to be always non-negative, which std::fmod isn't. This is different from mod((int)pan.y, (int)(TILE*scale)) which you are doing (once more, disregarding negatives). In fact, difference can be estimated as frac(TILE*scale)*startY (not always true, since it can wrap-around), which may be quite large. This explains why integer TILE*scale work fine. So the original code can be (I think) fixed as follows: float oy=fmodf(pan.y,TILE*scale); int offY = (int)(oy<0.0f?TILE*scale+oy:oy);
  9. FordPerfect

    need more randomization

    Well, yes, the question does not seem very clear to me. The way that I'm reading this is that you have a lot of objects (particles) which should have about 10 parameters that are unique per object (random), but constant during object liftime, and you want to avoid storing all 10. Is that right? If so, then having a noise function and regenerating parameters whenever you need them is certainly a viable option. It adds some overhead, compared to just storing values, but decent noise function can be pretty fast. Then you just generate random value based on object's id and an index of a parameter, something like get_noise(obj_id*16+parameter_id). You can of course use stateful RNG (noise function is stateless), either per object, or globally stored, but is there any reason to? I once started a topic on noise functions, which may contain some useful links: The library has several nice functions. Also there are several posts by Jonathan Blow on similar topic (he did go with a (seekable) stateful RNG, not that I can see a good reason why): If you need float from a noise function that returns an integer, that is easy too. For example: // Returns a random float, uniformly distributed in [0; 1). float get_rnd_float(uint32_t seed) { uint32_t bits=get_rng_uint32(seed); bits=bits&0x000FFFFFFu; // 2^24-1, 24 being float's precision (i. e. full mantissa size, including implicit bit). float ret=float(int32_t(bits)); // int->float may be faster than uint->float. ret*=5.96046448e-8f; // 2^{-24}. return ret; }
  10. Or more strictly: generate a random number uniformly distributed in [0; 55) (it can be integer or real, though there doesn't seem to be much reason to use reals). BTW, the binary search might be unnecessary. It reduces asymptotic time from O(n) to O(log n) if we know probabilities in advance, but does not help the asymptotic if each invocation uses unique list of probabilities. In practice, linear search might be faster than binary for small sizes (e. g. <8 possibilities), which are probably very common. The code might look like this (C++): // Chooses random index i in [0;n) with probability weights[i]/sum(weights). unsigned int choose( unsigned int n, const unsigned int *weights) { unsigned int ret=0; unsigned int cur=0,sum=0; for(unsigned int i=0;i<n;++i) sum+=weights[i]; if(sum==0) {printf("WARNING: sum of weights is zero!\n");return 0;} unsigned int c=rand()%sum; while((cur+=weights[i])<=c) ++ret; return ret; }
  11. FordPerfect

    High-quality integer noise function

    The deserves mention. Sorry to bump the old thread, but this may be useful to people who stumble on this specific topic.
  12. FordPerfect

    Well-defined shifts

    Well, yes. I meant mostly that it would be nice to stay in C++03 if feasible. I'm curious to test changing all shifts to shl/shr in some codebase and measuring performance impact. Can anyone suggest a good guinea pig? Requirements: 1. Heavy user of shifts. 2. Small & easy to build. No larger than say quake2, and ideally single-file fitting in whatever limit rextester has (64 KB?).
  13. FordPerfect

    Well-defined shifts

    BTW, @frob, thanks for your comments. Now that I think about it, adding optional asserts to make this code double as runtime-c-rules-violations-detector seems reasonable. I still maintain, that main purpose is that such cases are normal part of semantics, and not errors.
  14. FordPerfect

    Well-defined shifts

    I'm not sure examples are particularly convincing, yet... One thing that comes to mind is fixed-point. It is somewhat reasonable to view integers as a special case of fixed-point (fraction_bits=0), and it makes even more sense to allow bits to represent only fractional part (fraction_bits=32; this is unsigned fixed_point - somewhat unusual) to cover [0; 1). Also I imagine such thing could come up in reader/writer on stream of bits (operations work on [0 .. wordsize] bits). And a<<(b-c) thing happened to me writing texture interpolation (on CPU in fixed point) - I worked around it adding bias to shift.
  15. FordPerfect

    Well-defined shifts

    Seems sound sense. This depends on what we actually mean when we try to shift non-2-complement signed integers. If we go with Knuth's definition above and treat shift as mathematical operation, operating on numerical values rather than bitwise representations, than L(-1) is actually correct (for both signed end unsigned L). You can claim that it is not very sensible, and I can claim that it is still about the most sensible thing we can do if we do not know anything about our numbers (representation, etc). In practice, I'm fine with ignoring non-2-complement entirely. Similar to above, and an example would be appreciated (honestly curious). My point is to make well-defined semantics for all inputs. As there are no invalid inputs, there is nothing to assert. If it is my code - yes. If I'm trying to make something library-lik for other people to use - not so much. Fair enough. I only tested x86. Also (newer) GCC did >>31, not cmove. My question was purely about C++ semantics (linkage, multiple definitions, etc.), not actual inlining which basically is orthogonal to 'inline' keyword.
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!