why processors work in bytes and not bits

Started by
20 comments, last by SeraphLance 10 years, 7 months ago

access them by shifting

yeah I know but that shifting wastes some time

In c++ is there any way to read bit by bit without modifying a byte?

In ARM architectures a shift amount is available in nearly all instructions. For example (see if you can determine what it does -- LSL stands for Logical Shift Left)


LSL    R1,  R0,  #0x1
ADD    R0,  R0,  R1

Is equivalent to (with a side-effect on R1):


ADD    R0,  R0,  R0  LSL #0x1

ARM (>= v6 IIRC) provides unsigned and signed bitfield extract and insert instructions. To extract the upper byte of a word:


UBFX    R0,  R0, #24, #8     

;  Start at bit 24 of R0 and extract the next 8 bits and place into R0 (treat as unsigned)


I would highly recommend at least becoming acquainted with assembly language. I'm not saying to go out and become an expert or anything, but questions like these will be cleared up very quickly if you have a moment to look at how your source is converted to native code.

Advertisement

I never heard about it.. doe this mean that when you write a c code like this

char c = 10;
c+=10;

youre working on words?

Inside the CPU, you're always operating on 'registers'. Typically we define a "word" as being the size of a CPU register.

CPUs cannot work on RAM directly; they can only work on their registers. They have instructions that let them move data between RAM and their registers.

In between the CPU and RAM, there is a CPU-cache, which acts as a buffer for RAM. Each of these 3 components will likely operate on different sizes of data.

e.g.

*Between RAM and the cache, you might only be able to move 1024 bits at a time! Another cache might only be able to move 512 bits at a time. This is hidden from the programmer most of the time.

* Between the Cache and the CPU, you might only be able to move 32 bits at a time. Another CPU might allow you to move 8, 16, 32, or 64 bits to/from the cache. Again, it's up to the compiler to translate your high-level C++ code into specific instructions for each type of CPU.

On x86, the general purpose registers are 32-bits wide.

When I compile that code (without optimizations) and look at the assembly, the x86 asm is telling the CPU to:

* Write the byte "10" to RAM (to the local variable "c").

* Read that byte from RAM back into a register (download 8bits from the local variable "c" into a 32bit register).

* Add 10 to the register.

* Write the lower 8 bits of the register back to RAM (to the local variable "c").

So yes, whenever you're performing modifications on that char, it's actually being stored in a 32-bit ("word sized") register inside the CPU, and in 8 bits of RAM.

When those 8 bits are downloaded from RAM to the CPU register, the data is moved there through the cache. The cache probably isn't downloading just those 8 bits, it's probably downloading 512 bits of RAM to the cache (in case you need the nearby variables too!), and then passing on just those 8 out of the 512 to the CPU (and keeping the rest cached in case you need them soon).

A computer architecture course or text-book will teach these topic.

Computers work in a mixture of bits, bytes and different kinds of words and lines too, depending on the component tongue.png
e.g. on a 32-bit CPU, you can't really just operate on 1 bytes, you're usually operating on a whole word (and using masking to only seem to work on one byte), meanwhile your memory controller might work on 512bit lines of data, fetched from 8bit-aligned addresses...

I never heard about it.. doe this mean that when you write a c code like this

char c = 10;

c+=10;

youre working on words?

Perhaps too simple of an example. When I compile that code the assembler doesn't even do an add operation. It simply places a decimal 20 into the register representing c. That is, the compiler performs that add for you.

It translates into:


MOV    r5,  #0x14   ; place decimal 20 (hex 0x14) into r5

I'll work up an example later that's a little more involved.

The most basic reasons for processors to operate on large words is efficiency: all the control work and circuitry is amortized over alarger quantity of useful computational circuits. For example, an addition instruction uses the same instruction cache and does the same decoding work whether the adder it activates is one bit wide (possibly two gates only) or a sophisticated carry-lookahead marvel that takes the same time to add, say, 128-bit integers.

Omae Wa Mou Shindeiru


n.b. however, that algorithm only performs at peak performance for near-horizontal lines. For vertical lines you have lots and lots of very, very thin spans, which produce small masks, which means you don't get the full parallelization benefit of the large word size.
To combat this, you could store two copies of your voxel data set -- one as above, and the second one rotated 90º so that the bytes store 8 values in the y-direction instead of in the x-direction.
For a 3D data set, you'd store a 3rd copy of the data, laid out so that the bytes stores 8 values in the z-direction.

Yes, to all of this. Alternatively, if you wanted to get more clever, you'd use a 3-D tiling pattern when you lay out your voxel data (Morton order, etc...). Then you have a few choices: Do fast coarse tests for contiguous 3D chunks, with the need to do more precise tests within chunks that indicate they have a collision, or apply the same tiling operation to your ray (chunked according to your tile size). In any case, this avoids needing to store multiple copies of the data.

Is someone here inspired by this post and started a real time raycasting voxel image generator?

I am starting it but using GPU, I think it will be faster there.


I never heard about it.. doe this mean that when you write a c code like this

char c = 10;
c+=10;

youre working on words?

Inside the CPU, you're always operating on 'registers'. Typically we define a "word" as being the size of a CPU register.

CPUs cannot work on RAM directly; they can only work on their registers. They have instructions that let them move data between RAM and their registers.

In between the CPU and RAM, there is a CPU-cache, which acts as a buffer for RAM. Each of these 3 components will likely operate on different sizes of data.

e.g.

*Between RAM and the cache, you might only be able to move 1024 bits at a time! Another cache might only be able to move 512 bits at a time. This is hidden from the programmer most of the time.

* Between the Cache and the CPU, you might only be able to move 32 bits at a time. Another CPU might allow you to move 8, 16, 32, or 64 bits to/from the cache. Again, it's up to the compiler to translate your high-level C++ code into specific instructions for each type of CPU.

On x86, the general purpose registers are 32-bits wide.

When I compile that code (without optimizations) and look at the assembly, the x86 asm is telling the CPU to:

* Write the byte "10" to RAM (to the local variable "c").

* Read that byte from RAM back into a register (download 8bits from the local variable "c" into a 32bit register).

* Add 10 to the register.

* Write the lower 8 bits of the register back to RAM (to the local variable "c").

So yes, whenever you're performing modifications on that char, it's actually being stored in a 32-bit ("word sized") register inside the CPU, and in 8 bits of RAM.

When those 8 bits are downloaded from RAM to the CPU register, the data is moved there through the cache. The cache probably isn't downloading just those 8 bits, it's probably downloading 512 bits of RAM to the cache (in case you need the nearby variables too!), and then passing on just those 8 out of the 512 to the CPU (and keeping the rest cached in case you need them soon).

A computer architecture course or text-book will teach these topic.

If on register what is that register - whole rax or maybe al ?

I very seldom wrote asm routines and checking assembly

so I dont know, but it would be worth to check it (should take decent modern assembly course but almost all info on assembly I encounter is from the old age)

I suggest reading about boolean bitwise operations, like "and", "or", "xor", "not", "andnot", "blendv", "left shift", & "right shift", being the most important.

FTFY

Why'd you cross "not" out? Is it not useful (no pun intended)?

anax - An open source C++ entity system

I suggest reading about boolean bitwise operations, like "and", "or", "xor", "not", "andnot", "blendv", "left shift", & "right shift", being the most important.

FTFY

Why'd you cross "not" out? Is it not useful (no pun intended)?

this pun is interesting, Not is not usefull. hehe

I suggest reading about boolean bitwise operations, like "and", "or", "xor", "not", "andnot", "blendv", "left shift", & "right shift", being the most important.

FTFY

Why'd you cross "not" out? Is it not useful (no pun intended)?

This is the kind of sentence that gets Yoda in trouble: "Definitely useful not is." I use it all the time.

What are "andnot" and "blendv"?

This topic is closed to new replies.

Advertisement