GCC Assembly questions

Started by
10 comments, last by Super Llama 14 years, 7 months ago
I've been planning to use GCC inline assembly in my cross-platform engine for speed and brevity, but after reading about many of the fundamental differences (endianness, etc) in processors, I'm not so sure about it anymore. How much alteration would some basic inline asm require for it to run on different processors? If I didn't use any x86-specific instructions and were wary of endianness, would it require a significant amount of work to port the code from x86 to a PPC or ARM?
It's a sofa! It's a camel! No! It's Super Llama!
Advertisement
Well, there are no non-x86-specific instructions, so if all you use is that (i.e. don't use any), you're good to go.

Machine language really is just that: the language the processor accepts as its input. Different processors use different instruction sets. Okay, you're going to use assembly -- that's one level higher, but I don't suppose it's going to help you any. Consider "movl $42, %eax" (in AT&T syntax). What you're assuming here is: the processor has a "move hardcoded value into a register" instruction, also you're assuming the processor has an "%eax" register. (You're furthermore assuming the register can hold the value 42. This looks like a stupid point in this example, but you can instead consider "movl $1234567890, %eax".) This piece of assembly is going to produce correct machine code for any x86- or AMD64-based CPU. Maybe someone will point a CPU for which this piece of assmebly would produce a correct machine instruction too. But if you want to do anything even remotely useful with assembly, you're going to go CPU specific. That's what assembly is: a CPU-specific langauge.

Oh, and the usual: If you think you're smarter than your compiler, then stop: you most likely aren't.
You are better off looking into compiler intrinsics of one form or another for things like SSE/MMX. It gives the compiler a better idea of what you are trying to do, and it is easier to rewrite (or substitute through wrappers/macros) the C intrinsics to match other systems.
Inline assembler must be rewritten for different processor architectures. Then again, there are only a handful you are likely to be supporting under for the same "engine". You can always have a fall back to C for architectures you don't want to support.

Modern compilers can be coerced into producing very good assembly. Unless you deeply understand your computer at the assembly level you are unlikely to beat it by enough to make it work the maintenance issues that assembly brings.

Optimisation is a tricky thing. Before you start, you need to know which parts of your code are your bottleneck. Know, not guess. A profiler is one way of discovering this. 90% of the time is spent in 10% of the code, so optimising the other code will not pay off as much as concentrating on that critical portion.

Next up is to make sure that your high level algorithms are efficient. There is no point writing bubble sort in assembly - you are limited by the efficiency of the algorithm.

There are other things that you can take advantage of to improve the speed of your code without resorting to assembly. One is to learn about caches of your target system. If you can maximise the amount of your working set in cache memory, you will have performance gains.

Another avenue of research is to look into using compiler intrinsics.

After exhausting all the other options, you might think about assembly. Chances are your code has gotten fast enough in the mean time.

It depends on what you are doing, but you might be able to find libraries that have been prewritten and come already optimised. Something like physics would be a good example.
Well, I had been using vectors as a stack for my virtual machine, but I would prefer just to extend the stack and use push and pop on the physical stack. I was under the impression that inline assembly would resolve to the correct instruction for the architecture you assemble for, since practically every system has a 'mov' instruction, for example. I suppose I could just use malloc and a pointer for the VM stack...
It's a sofa! It's a camel! No! It's Super Llama!
There is next to no reason to write assembly for purposes of optimization.

Exceptions are when dealing with special functionality, such as SIMD. Of course, such concepts are typically not portable, and bring more baggage with them, such as alignment.

Then there's other gotchas. IIRC, PPCs fault when accessing non-aligned memory, while x86 just takes access penalty. SIMD may require strict alignment, so one needs to allocate memory properly. Floating point is horror even at best, some calls might change flags. And ARM is a whole different story anyway.


It's possible, it's been done, but it's very recommended to be paid for it. For hobby development, it's not worth it.

Just porting C or C++ code to PPC and ARM is challenging enough.
Well, I had been using vectors as a stack for my virtual machine,
Awesome

but I would prefer just to extend the stack and use push and pop on the physical stack.
Why? The VM is NOT your machine...unless they happen to have the same abi and run in the same address space. "the stack" on your machine should be different from the stack you are emulating.

I was under the impression that inline assembly would resolve to the correct instruction for the architecture you assemble for, since practically every system has a 'mov' instruction, for example.
No. This is totally incorrect. If you do not understand quite why, please feel free to do some research...additionally, are you sure you want to implement your own VM? You may be much better off using something someone else has done, such as lua.

I suppose I could just use malloc and a pointer for the VM stack...
You could, but don't. std::vector<> is a correct API wrapper around the exact same memory allocation types, and it is much better tested and less prone to dumb memory errors than raw pointers are.
Would this be applicable?
The engine, though currently written by a hobbyist (me), will eventually become a commercial engine with games released on valve's Steam distribution system, so I'm not able to include anything that isn't proprietary. However, as a seasoned Garry's Mod modder, I have a lot of experience with Lua and think its a very wonderful language. My language will take a lot of good ideas from Lua, though hopefully leaving them as separate as possible.

Thanks for the reassurance with Vectors. My previous problem was that I was unable to copy a pointer to the vector... I even tried casting it as an int and copying that. I've been able to push/pop strings, ints, longs, bools, and bytes just fine, but when it came to pointers, I couldn't seem to get it to work properly.
It's a sofa! It's a camel! No! It's Super Llama!
Assembly blocks don't translate between architectures, and I'm honestly not sure how many different architectures have assembler support under GCC. It would be interesting if GCC implimented LLVM assembler as a sort of "cross-platform inline assembly", but honestly there are so many things you do differently on different architectures due to alignment, endianess, and other small details that I'm not sure it would be feasible to support, much less practical to use.

throw table_exception("(? ???)? ? ???");

This topic is closed to new replies.

Advertisement