• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

111 Neutral

About FordPerfect

  • Rank

Personal Information

  • Interests
  1. Please help with ASM question

    Yes, float works too. I was about to reply that. Actually, FIDIV, FIMUL, and FISUB might make the code a bit smaller. Also note, that you can (I think) force FDIV to produce the same result as integer division (by setting precision to full (64-bit mantissa), and rounding to RoundToZero (i.e. truncate)), if you care about it.
  2. Please help with ASM question

    Do you have access to 64-bit operations? That would simplify things a bit, as 64-bit versions of imul, idiv and cmp solve problem quite straightforwardly. If this is purely 32-bit code you need to construct them from 32-bit pieces. If you're not in the mood to try to figure it out yourself, you can just copy the C++ code into gcc.godbolt.org, and see what compilers do in this situation: https://godbolt.org/g/2F9a1e As you can see, the division is especially heavy (multpliy and compare are relatively easy: "imul+mul+add" and "cmp 0+cmp 1000000000" respectively). Compilers just implement it as a function call. The code for __divdi3 can be seen here: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/wordsize-32/divdi3.c;h=257d93cc37c011f2714b48ccb0c476f5a4a39319;hb=HEAD
  3. Error in remainder of integer? (scaling offset)

    Note, that the above "fix" can still disagree with drawStuff by 1 pixel due to the roundoff (though in rather small percentage of cases). So for (int i = startY; i < startY + h; i++) // hori gfx.lineRelative(0, i * TILE*scale + pan.y, gfx.screenX, 0, col); is probably preferred, as it should agree with drawStuff completely (assuming compiler optimizations didn't break it).
  4. Error in remainder of integer? (scaling offset)

    Small aside: floats are less magical, than many people make them to be. They can be counterintuitive to unwary, but you can, in fact, ensure some things with certainty. I'm really not fond of that particular wording, as it conflates several issues (and in some ways is untrue). As far as I can tell 1*n+1*n==2*n holds for all non-NaN floats. Back to the topic at hand. Some clarifications would be welcome (e. g. drawStuff does not mention pan at all), but there seems to be enough information now to piece the problem together. First off, if you may simply go with that. Now, as to why the original code doesn't work. The reason have already been mentioned by few people, but I'll elaborate. The lines are at i * TILE*scale + pan.y on the y-axis. The first (least y) line that fits on the screen is at i=startY such that 0 <= startY * TILE*scale + pan.y < TILE*scale -pan.y <= startY * TILE*scale < TILE*scale - pan.y startY = ceil(- pan.y / (TILE*scale)) {assuming scale>0} It's offset is ceil(- pan.y / (TILE*scale)) * (TILE*scale) + pan.y which is -floor(pan.y / (TILE*scale)) * (TILE*scale) + pan.y that is mod(pan.y, TILE*scale), assuming modulo is defined to be always non-negative, which std::fmod isn't. This is different from mod((int)pan.y, (int)(TILE*scale)) which you are doing (once more, disregarding negatives). In fact, difference can be estimated as frac(TILE*scale)*startY (not always true, since it can wrap-around), which may be quite large. This explains why integer TILE*scale work fine. So the original code can be (I think) fixed as follows: float oy=fmodf(pan.y,TILE*scale); int offY = (int)(oy<0.0f?TILE*scale+oy:oy);
  5. C++ need more randomization

    Well, yes, the question does not seem very clear to me. The way that I'm reading this is that you have a lot of objects (particles) which should have about 10 parameters that are unique per object (random), but constant during object liftime, and you want to avoid storing all 10. Is that right? If so, then having a noise function and regenerating parameters whenever you need them is certainly a viable option. It adds some overhead, compared to just storing values, but decent noise function can be pretty fast. Then you just generate random value based on object's id and an index of a parameter, something like get_noise(obj_id*16+parameter_id). You can of course use stateful RNG (noise function is stateless), either per object, or globally stored, but is there any reason to? I once started a topic on noise functions, which may contain some useful links: The http://marc-b-reynolds.github.io/shf/2016/04/19/prns.html library has several nice functions. Also there are several posts by Jonathan Blow on similar topic (he did go with a (seekable) stateful RNG, not that I can see a good reason why): http://number-none.com/blow/blog/programming/2016/07/07/braid_particles_1.html http://number-none.com/blow/blog/programming/2016/07/08/fabian-on-lcg-fast-forward.html http://number-none.com/blow/blog/programming/2016/07/13/braid_particles_2.html If you need float from a noise function that returns an integer, that is easy too. For example: // Returns a random float, uniformly distributed in [0; 1). float get_rnd_float(uint32_t seed) { uint32_t bits=get_rng_uint32(seed); bits=bits&0x000FFFFFFu; // 2^24-1, 24 being float's precision (i. e. full mantissa size, including implicit bit). float ret=float(int32_t(bits)); // int->float may be faster than uint->float. ret*=5.96046448e-8f; // 2^{-24}. return ret; }
  6. Or more strictly: generate a random number uniformly distributed in [0; 55) (it can be integer or real, though there doesn't seem to be much reason to use reals). BTW, the binary search might be unnecessary. It reduces asymptotic time from O(n) to O(log n) if we know probabilities in advance, but does not help the asymptotic if each invocation uses unique list of probabilities. In practice, linear search might be faster than binary for small sizes (e. g. <8 possibilities), which are probably very common. The code might look like this (C++): // Chooses random index i in [0;n) with probability weights[i]/sum(weights). unsigned int choose( unsigned int n, const unsigned int *weights) { unsigned int ret=0; unsigned int cur=0,sum=0; for(unsigned int i=0;i<n;++i) sum+=weights[i]; if(sum==0) {printf("WARNING: sum of weights is zero!\n");return 0;} unsigned int c=rand()%sum; while((cur+=weights[i])<=c) ++ret; return ret; }
  7. High-quality integer noise function

    The https://marc-b-reynolds.github.io/shf/2016/04/19/prns.html deserves mention. Sorry to bump the old thread, but this may be useful to people who stumble on this specific topic.
  8. C++ Well-defined shifts

    Well, yes. I meant mostly that it would be nice to stay in C++03 if feasible. I'm curious to test changing all shifts to shl/shr in some codebase and measuring performance impact. Can anyone suggest a good guinea pig? Requirements: 1. Heavy user of shifts. 2. Small & easy to build. No larger than say quake2, and ideally single-file fitting in whatever limit rextester has (64 KB?).
  9. C++ Well-defined shifts

    BTW, @frob, thanks for your comments. Now that I think about it, adding optional asserts to make this code double as runtime-c-rules-violations-detector seems reasonable. I still maintain, that main purpose is that such cases are normal part of semantics, and not errors.
  10. C++ Well-defined shifts

    I'm not sure examples are particularly convincing, yet... One thing that comes to mind is fixed-point. It is somewhat reasonable to view integers as a special case of fixed-point (fraction_bits=0), and it makes even more sense to allow bits to represent only fractional part (fraction_bits=32; this is unsigned fixed_point - somewhat unusual) to cover [0; 1). Also I imagine such thing could come up in reader/writer on stream of bits (operations work on [0 .. wordsize] bits). And a<<(b-c) thing happened to me writing texture interpolation (on CPU in fixed point) - I worked around it adding bias to shift.
  11. C++ Well-defined shifts

    Seems sound sense. This depends on what we actually mean when we try to shift non-2-complement signed integers. If we go with Knuth's definition above and treat shift as mathematical operation, operating on numerical values rather than bitwise representations, than L(-1) is actually correct (for both signed end unsigned L). You can claim that it is not very sensible, and I can claim that it is still about the most sensible thing we can do if we do not know anything about our numbers (representation, etc). In practice, I'm fine with ignoring non-2-complement entirely. Similar to above, and an example would be appreciated (honestly curious). My point is to make well-defined semantics for all inputs. As there are no invalid inputs, there is nothing to assert. If it is my code - yes. If I'm trying to make something library-lik for other people to use - not so much. Fair enough. I only tested x86. Also (newer) GCC did >>31, not cmove. My question was purely about C++ semantics (linkage, multiple definitions, etc.), not actual inlining which basically is orthogonal to 'inline' keyword.
  12. C++ Well-defined shifts

  13. C++ Well-defined shifts

    I doubt it, considering that underlying instruction on x86 does essentially (value<<(amount&31)). And so (1u<<32) would come out 1u. I am told that on ARM shift does indeed zero out. And yes, compiler can detect UB and decide to do whatever.
  14. C++ Well-defined shifts

    Hmm... To expand my earlier statement Consider this uint32_t a; uint32_t c; // ... uint32_t lo=a&((1u<<c)-1); uint32_t hi=a>>c; do_something(lo,hi); It seems entirely reasonable to allow c=32, especially considering c=0 is very much allowed. Yet it does notdoes work in C++. Knuth does define shifts in Section 7.1.3 of Volume 4 of The Art of Computer Programming ("Bitwise Tricks and Techniques") as x<<k=floor(2^k * x) for all integers; similar for >>, ensuring x<<(-k)=x>>k holds.
  15. C++ Well-defined shifts

    So, the code as it stands now: // This software is in the public domain. Where that dedication is not // recognized, you are granted a perpetual, irrevocable license to copy // and modify this file as you see fit. template<typename L,typename R> L shl(L value,R amount) { if(amount>=R(sizeof(L))*8) return L(0); if(amount<R(0)) { if(amount<=-R(sizeof(L))*8) { if(value<L(0)) return L(-1); return L(0); } return value>>(-amount); } return L((typename std::make_unsigned<L>::type)(value)<<amount); } template<typename L,typename R> L shr(L value,R amount) { if(amount>=R(sizeof(L))*8) { if(value<L(0)) return L(-1); return L(0); } if(amount<R(0)) { if(amount<=-R(sizeof(L))*8) return L(0); return L((typename std::make_unsigned<L>::type)(value)<<(-amount)); } return value>>amount; } Once again I would very much like if someone checked it correctness-wise. General comments are welcome as well. Some thoughts: 1. std::make_unsigned is C++11. Can we not require C++11? 2. if(value<L(0)) return L(-1); return L(0); does not introduce branch in either GCC or clang. 3. Should functions be marked inline? There seems to be subtle difference between template function and inline template function.
  • Advertisement