Same speed.1) How fast, relatively is addition and subtraction compared to and/or/xor?

Multiplication:How fast, relatively is multiplication, division, modulus compared to addition and subtraction?

- depends on the processor, can be almost as fast as addition or up to 30 times slower (if you multiply by a constant, you can emulate using shifts and additions, so it would be 3-4x slower) - if you multiply by a power of two, as fast as addition.

Division:

- if the divisor isn't a power of two, SLOW (if you divide by a constant, there are ways to make it faster, but it's quite difficult and often not worth it), up to 50 times as slow as addition. if the divisor is a power of two, as fast as addition.

Modulus:

- if the modulus is a power of two, as fast as addition. Otherwise, same as division (note that if you need both the quotient and the modulus, you can do it all in one shot, saves some time)

You want to make sure the PRNG is actually good before optimizing it for speed, but in general you want to aim for, say, 20 cycles per byte maximum for decent speed. This is computed by taking the number of cycles taken by your PRNG (on average) to calculate N bytes of data, and dividing that number by N. Quick chart to calculate cycles (this is just a back of the envelope calculation, as CPU's are good at rearranging instructions to maximize efficiency and do things in parallel):1.5) Could someone ball park a number of bit operations that would be acceptable to meet my goals for speed?

I'm building the RNG entirely out of and, or, xor, not, <<, >>... so I need to know a good maximum number of these I'm allowed to have.

- addition, subtraction, logical operations: 1 cycle

- multiplication: 4 cycles

- division: 25 cycles (1 cycle if power of two)

- modulus: 25 cycles (1 cycle if power of two)

So suppose your PRNG outputs one 32-bit number, it can do so in at most 80 cycles, which represents for instance 20 additions, 20 subtractions, five multiplications and a bunch of logical operations.

The speed is the same regardless of the shift (so if you do it twice, it's twice as slow as doing it once directly, unless your compiler optimizes it away). Bit rotation is also as fast as a shift (it is a permutation too, which are almost instant in hardware)2) Are two bitshifts as fast as one of equal size?

Like is { a << 5 } faster than { a << 2; a << 3; } or is it done one at a time and thus equally fast?

Also remember these operations aren't the only ones that exist, there is a very powerful concept called permutation (bit rotations are a subset of this and since they're fast it's usually the best way) and substitution (look-up tables! somewhat slow if not used properly, but provides extremely good diffusion)

*** This is assuming a modern/half-modern x86 processor, it will not be true for specialized or embedded hardware, in which division or multiplication may not even be available at all ***

**Edited by Bacterius, 28 April 2012 - 08:02 PM.**