Jump to content
  • Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

111 Neutral

About FordPerfect

  • Rank

Personal Information

  • Role
  • Interests

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. FordPerfect

    Assembly language?

    I find it amusing, how .kkrieger is mentioned as motivation, given that it is written almost entirely in C++: https://fgiesen.wordpress.com/2012/02/13/debris-opening-the-box/ https://github.com/farbrausch/fr_public As for the main question - I would like to know the answer myself. I'm rather sceptical of the video-tutorials (of pretty much any kind) on the subject. What I want is a proper textbook. And there doesn't seem to be many of those around. I get the impression, that the mindset is like "whoever wanted to learn assembly did it in 90's already, and if you are trying to learn now - sucks to be you", which is unfortunate. My best suggestion would probably be using books (& environment, e. g. DosBox) from 90's and then slowly assimilating information on more modern assembly, scattered on the net. Which is... suboptimal. About the best recent textbook on assembly I know of would be "The Art of Assembly Language", which for one thing I would not call exactly 'modern' anymore, and for another has made some questionable choices for my tastes (fascinations with macros is one example: they might be useful to write assembly, but arguably not to learn it). There is "Intel 64 and IA-32 Architectures Developer's Manual", of course. That is a great reference, but it is frightening trying to imagine someone trying to use that behemoth as a textbook.
  2. I guess somewhat more self-contained solution is to use incbin from within inline asm. See e. g. https://www.devever.net/~hl/incbin and https://gist.github.com/mmozeiko/ed9655cf50341553d282 .
  3. FordPerfect

    Please help with ASM question

    Yes, float works too. I was about to reply that. Actually, FIDIV, FIMUL, and FISUB might make the code a bit smaller. Also note, that you can (I think) force FDIV to produce the same result as integer division (by setting precision to full (64-bit mantissa), and rounding to RoundToZero (i.e. truncate)), if you care about it.
  4. FordPerfect

    Please help with ASM question

    Do you have access to 64-bit operations? That would simplify things a bit, as 64-bit versions of imul, idiv and cmp solve problem quite straightforwardly. If this is purely 32-bit code you need to construct them from 32-bit pieces. If you're not in the mood to try to figure it out yourself, you can just copy the C++ code into gcc.godbolt.org, and see what compilers do in this situation: https://godbolt.org/g/2F9a1e As you can see, the division is especially heavy (multpliy and compare are relatively easy: "imul+mul+add" and "cmp 0+cmp 1000000000" respectively). Compilers just implement it as a function call. The code for __divdi3 can be seen here: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/wordsize-32/divdi3.c;h=257d93cc37c011f2714b48ccb0c476f5a4a39319;hb=HEAD
  5. Note, that the above "fix" can still disagree with drawStuff by 1 pixel due to the roundoff (though in rather small percentage of cases). So for (int i = startY; i < startY + h; i++) // hori gfx.lineRelative(0, i * TILE*scale + pan.y, gfx.screenX, 0, col); is probably preferred, as it should agree with drawStuff completely (assuming compiler optimizations didn't break it).
  6. Small aside: floats are less magical, than many people make them to be. They can be counterintuitive to unwary, but you can, in fact, ensure some things with certainty. I'm really not fond of that particular wording, as it conflates several issues (and in some ways is untrue). As far as I can tell 1*n+1*n==2*n holds for all non-NaN floats. Back to the topic at hand. Some clarifications would be welcome (e. g. drawStuff does not mention pan at all), but there seems to be enough information now to piece the problem together. First off, if you may simply go with that. Now, as to why the original code doesn't work. The reason have already been mentioned by few people, but I'll elaborate. The lines are at i * TILE*scale + pan.y on the y-axis. The first (least y) line that fits on the screen is at i=startY such that 0 <= startY * TILE*scale + pan.y < TILE*scale -pan.y <= startY * TILE*scale < TILE*scale - pan.y startY = ceil(- pan.y / (TILE*scale)) {assuming scale>0} It's offset is ceil(- pan.y / (TILE*scale)) * (TILE*scale) + pan.y which is -floor(pan.y / (TILE*scale)) * (TILE*scale) + pan.y that is mod(pan.y, TILE*scale), assuming modulo is defined to be always non-negative, which std::fmod isn't. This is different from mod((int)pan.y, (int)(TILE*scale)) which you are doing (once more, disregarding negatives). In fact, difference can be estimated as frac(TILE*scale)*startY (not always true, since it can wrap-around), which may be quite large. This explains why integer TILE*scale work fine. So the original code can be (I think) fixed as follows: float oy=fmodf(pan.y,TILE*scale); int offY = (int)(oy<0.0f?TILE*scale+oy:oy);
  7. FordPerfect

    need more randomization

    Well, yes, the question does not seem very clear to me. The way that I'm reading this is that you have a lot of objects (particles) which should have about 10 parameters that are unique per object (random), but constant during object liftime, and you want to avoid storing all 10. Is that right? If so, then having a noise function and regenerating parameters whenever you need them is certainly a viable option. It adds some overhead, compared to just storing values, but decent noise function can be pretty fast. Then you just generate random value based on object's id and an index of a parameter, something like get_noise(obj_id*16+parameter_id). You can of course use stateful RNG (noise function is stateless), either per object, or globally stored, but is there any reason to? I once started a topic on noise functions, which may contain some useful links: The http://marc-b-reynolds.github.io/shf/2016/04/19/prns.html library has several nice functions. Also there are several posts by Jonathan Blow on similar topic (he did go with a (seekable) stateful RNG, not that I can see a good reason why): http://number-none.com/blow/blog/programming/2016/07/07/braid_particles_1.html http://number-none.com/blow/blog/programming/2016/07/08/fabian-on-lcg-fast-forward.html http://number-none.com/blow/blog/programming/2016/07/13/braid_particles_2.html If you need float from a noise function that returns an integer, that is easy too. For example: // Returns a random float, uniformly distributed in [0; 1). float get_rnd_float(uint32_t seed) { uint32_t bits=get_rng_uint32(seed); bits=bits&0x000FFFFFFu; // 2^24-1, 24 being float's precision (i. e. full mantissa size, including implicit bit). float ret=float(int32_t(bits)); // int->float may be faster than uint->float. ret*=5.96046448e-8f; // 2^{-24}. return ret; }
  8. Or more strictly: generate a random number uniformly distributed in [0; 55) (it can be integer or real, though there doesn't seem to be much reason to use reals). BTW, the binary search might be unnecessary. It reduces asymptotic time from O(n) to O(log n) if we know probabilities in advance, but does not help the asymptotic if each invocation uses unique list of probabilities. In practice, linear search might be faster than binary for small sizes (e. g. <8 possibilities), which are probably very common. The code might look like this (C++): // Chooses random index i in [0;n) with probability weights[i]/sum(weights). unsigned int choose( unsigned int n, const unsigned int *weights) { unsigned int ret=0; unsigned int cur=0,sum=0; for(unsigned int i=0;i<n;++i) sum+=weights[i]; if(sum==0) {printf("WARNING: sum of weights is zero!\n");return 0;} unsigned int c=rand()%sum; while((cur+=weights[i])<=c) ++ret; return ret; }
  9. FordPerfect

    High-quality integer noise function

    The https://marc-b-reynolds.github.io/shf/2016/04/19/prns.html deserves mention. Sorry to bump the old thread, but this may be useful to people who stumble on this specific topic.
  10. FordPerfect

    Well-defined shifts

    Well, yes. I meant mostly that it would be nice to stay in C++03 if feasible. I'm curious to test changing all shifts to shl/shr in some codebase and measuring performance impact. Can anyone suggest a good guinea pig? Requirements: 1. Heavy user of shifts. 2. Small & easy to build. No larger than say quake2, and ideally single-file fitting in whatever limit rextester has (64 KB?).
  11. FordPerfect

    Well-defined shifts

    BTW, @frob, thanks for your comments. Now that I think about it, adding optional asserts to make this code double as runtime-c-rules-violations-detector seems reasonable. I still maintain, that main purpose is that such cases are normal part of semantics, and not errors.
  12. FordPerfect

    Well-defined shifts

    I'm not sure examples are particularly convincing, yet... One thing that comes to mind is fixed-point. It is somewhat reasonable to view integers as a special case of fixed-point (fraction_bits=0), and it makes even more sense to allow bits to represent only fractional part (fraction_bits=32; this is unsigned fixed_point - somewhat unusual) to cover [0; 1). Also I imagine such thing could come up in reader/writer on stream of bits (operations work on [0 .. wordsize] bits). And a<<(b-c) thing happened to me writing texture interpolation (on CPU in fixed point) - I worked around it adding bias to shift.
  13. FordPerfect

    Well-defined shifts

    Seems sound sense. This depends on what we actually mean when we try to shift non-2-complement signed integers. If we go with Knuth's definition above and treat shift as mathematical operation, operating on numerical values rather than bitwise representations, than L(-1) is actually correct (for both signed end unsigned L). You can claim that it is not very sensible, and I can claim that it is still about the most sensible thing we can do if we do not know anything about our numbers (representation, etc). In practice, I'm fine with ignoring non-2-complement entirely. Similar to above, and an example would be appreciated (honestly curious). My point is to make well-defined semantics for all inputs. As there are no invalid inputs, there is nothing to assert. If it is my code - yes. If I'm trying to make something library-lik for other people to use - not so much. Fair enough. I only tested x86. Also (newer) GCC did >>31, not cmove. My question was purely about C++ semantics (linkage, multiple definitions, etc.), not actual inlining which basically is orthogonal to 'inline' keyword.
  14. FordPerfect

    Well-defined shifts

  15. FordPerfect

    Well-defined shifts

    I doubt it, considering that underlying instruction on x86 does essentially (value<<(amount&31)). And so (1u<<32) would come out 1u. I am told that on ARM shift does indeed zero out. And yes, compiler can detect UB and decide to do whatever.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!