Machine code and the semantic gap

Started by
6 comments, last by arnero 6 years, 10 months ago

I had the opportunity to code a lot of C# without too much constraints by framworks (no glue code). Then my style gets very structured (in the Pascal language kind of sense). It gets deeply nested. To relax I recently looked back at 8-bit CPUs and have to say, that I mostly use their commands. Like

i++
a=a+b,
sub a,b
brach if carry
if (size==0) then

seldomly also
x<<=1

I almost use no literals, BCDs and not many global variables.

As I kid I frowned upon the stack and wanted a large set of registered. Now when hunting bugs in complex code or defending my choice of coding against other programmers, I want constraints. The native data-structure for nested code is a stack. Now I think a main deficit of the 8-bit CPUs was that they didn't cache the stack. And one could not peek to some depth into it. Or overwrite the second topmost.

I do not know why the old CPUs were so obsessed with immediate values. I can understand a one byte offset into a data structure. You may operate on two structures, one in memoy and on the VIC-II or TED or whatever. So a 8-bit CPU needs at least two 8 bit page registers. Or better two 16 bit base adress. Or two page, one base. You can enumerate over large enumerables by 8-bit delta addresses. Forward jumps need one byte. Backwards jump should only accept a register value (lower 8 bits of address). You mark the start of the loop by pushing the ProgrammCounter without jumping. To call methods of an object one needs another base (to the code of the class), + 8bit offset into the specific method. Long jumps -- if you really need them -- can be composed by loading the page in one step and then jumping. All immediates are 8 bit.

I do not value orthogonality anymore. Using objects in all my languages, an opcode for me starts with the object I want to read. I think a nibble = 4 bits = 16 registers is fine. These are the base adresses, PC, top3 of the stack, A, D, X, stack counter, register set. So: Easy to fill up 16. While the value is routed to the ALU the decode unit does this: Then if you chose PC, the other register is probably the stack or immediate. 3 bits left to mask the flags for conditions. If you chose X you will probalby either count ( += signed 3 bit value, 0 => use immediate) or move (2 general purpose regs, top 2 of stack). For A you would chose between (+- ) and one of 4 general purpose as source and 2 (A or D) as target. For D then (+- &| ^ negation, invert, clear). BP for exception handling. You get the picture. Looking up a second source after instruction decode may cost a cycle :-( .

I would love to store code in registers, like ADD, ADC if you operate on large data structures or Decimals, or MUL and DIV if your function needs them or ROL Dec X BNE if you have to set pixels on that akward memory layout, ore mov mem,A, move A,mem decX BNE to have all memory bandwidth left for copy. But premature optimization... . A fullblown cache is bejond the scope. The flash optimzed so called 8-bit PIC ISA is ugly in comparison. Now I should look up Java and .NET runtime, but am put off by the 8 bit, 16 bit little big endian stuff and the RTTI and and and

Advertisement

What?

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Like Khatharr, I don't really understand your post. Do you have a question? Something you'd like to discuss? If so, consider reformulating your post to clarify this point and put it front and central. If not, perhaps this is better as a Journal post, the forums work best for collaborative discussions.

Think about the audience and your intended message when you do so. You might be starting a little too deep into whatever it is you are working on, try giving some context and guiding the reader along your path. Try not to mix high level ideas with in-depth technical details unnecessarily.

Semantic gap, indeed!

[ThreadStatic]
public static RegisterFile Registers;

public struct RegisterFile
{
// Add StructLayout.Explicit and offsets if you want the smaller registers with overlap.
  public ulong Rax;
  public ulong Rcx;
  public ulong Rdx;
  public ulong Rbx;
  public ulong Rbp;
  public ulong Rsi;
  public ulong Rdi;
  public ulong R8;
  public ulong R9;
  public ulong R10;
  public ulong R11;
  public ulong R12;
  public ulong R13;
  public ulong R14;
  public ulong R15;
}

public static void ADD(ref ulong a, ulong b)
{
  a += b;
  // Setting Rflags is an exercise to the reader.
}
There you go. x86 assembly in C#.
Keep it polite, folks.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Now I think a main deficit of the 8-bit CPUs was that they didn't cache the stack.
I think 8bit CPUs were too slow for caching. That is, memory was faster, so getting a value from memory or from a cache or register was equally fast. Since transistors are expensive, dropping the cache (or rather, never even consider caching) was the simpler solution.

And one could not peek to some depth into it. Or overwrite the second topmost.
I think you could, unless I misunderstand you. It amounts to changing the value of a parameter of a function. Obviously, that change never leaves the function scope, just like it does on todays computers. C solved that by pushing pointers onto the stack, pointing to memory outside the function and top stack-frame.

I do not know why the old CPUs were so obsessed with immediate values.
In that era, OO hadn't been invented, so you mostly had functions and global structs, and nothing much else. Also, CPU ticks were slow, which means you want to cram as much information in a single byte instruction as you can, to reduce ticks needed for an instruction.

I would love to store code in registers, like ADD, ADC
Probably too easy to mess up, or abuse for viruses, malware, etc. I am quite glad we left the idea of self-modifying code that you had for addressing large amounts of memory (changing the operand values of the store instruction).

If you want registers in C#, you can check my signature. It has __register keyword for thread-private variables. But vendor implementation compiler can send them to global memory.

Need an open-source multi-gpu OpenCL load-balancer for C#? Here it is= https://github.com/tugrul512bit/Cekirdekler/wiki it also has pipelining.

Hello world in all GPUs:


ClNumberCruncher cr = new ClNumberCruncher(
    AcceleratorType.GPU, @"
      __kernel void hello(__global char * arr)
      {
           printf(""hello world"");
      }
");
ClArray<byte> array = new ClArray<byte>(1000);
array.compute(cr, 1, "hello", 1000, 100); 

Sorry, I got nostalgic. A colleage of mine used to tell me that was all invented in the 1960s and then I look at my current code, where I limit choices so that intellisense makes sense. Also I avoid magic numbers. So in the end for every "next command" there are only less then 256 choices. In the meantime I read about RISC-V (on wikipedia) and what this design does is mostly do away with all the fancy ISA inventions. If a CPU-Designer from 1980s wakes up today, all we have to teach them is: Branch prediction and out-of-order excecution. Of course intellisense cannot be completely implemented in the few gates in the old days VS They tried to be compatible with the discrete CPU boards so. The truth is somewhere in the middle. GPUs and shaders and CPUs now also become all very similiar. Due to the fast paced development special crypto circuits or compression circuits are dated when they reach the customer.

This topic is closed to new replies.

Advertisement