Archived

This topic is now archived and is closed to further replies.

Should I be learning assembly language...

This topic is 5501 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have just about got the basic 8086 instruction set learned and have an overview of the FPU instructions as described in the Intel manuals, and I've found a few benefits from it already just by being able to read a program's disassembly. I was thinking that learning assembler was going to be important for code optimisation, especially for someone hoping to get into 3D graphics. Can anyone in the industry give me an idea of how much assembler is actually relied upon nowadays? Can a good assembly language programmer beat the Intel compiler (I don't know if AMD have their own). And as for the SSE2 extensions, Intel reckon that you will often end up with faster code if you use their Vectoriser than if you work direct with the SSE2 instruction set, and that you will end up with pretty fast code if you work with their Vector classes. So even for working with the SSE2 instructions, is it worth knowing the SSE2 instruction set? When needing to optimise a section of code, are you going to get better results by reading the Optimisation manuals and knowing how to provide the right C for the compiler, or is assembler still called upon for the parts of the program that need the heaviest optimisation? Thanks [edited by - philscott on November 20, 2002 4:46:09 AM] [edited by - philscott on November 20, 2002 4:46:37 AM]

Share this post


Link to post
Share on other sites
Frankly, using assembly in very well-written code will generally get you next to nowhere. There''s a limit to just how fast you can get code to perform, no matter how assembly optimized, and if your C/C++ code is written properly and fully optimized, chances are the compiler will compile it to closely match that limit.

I''ve learned through personal experience that, to quote what many people on this board say, assembly isn''t a magic optimization bullet that''ll magically speed up all your functions. However, in some cases, you can optimize some stuff to save yourself a few dozen clock cycles here and there. For instance, a blitter of some kind that allows you to plot translucent sprites on the screen is somewhat needy and might benifit slightly from assembly optimization IF you know what you''re doing.

Here''s one thing to keep in mind: just because rearranging your code to swap out a mov or two might seem better, if you end up saturating your registers to do it you won''t be able to take advantage of parallelism and thus, even though your program may be shorter, it''ll take longer to execute. There''s a lot of stuff like this to keep in mind and the compiler may seem to do some strange stuff sometimes, but it often compiles to take advantage of stuff like this.

Basically, don''t bother optimizing in assembly uselessly. Just optimize speed-critical sections of your program if you really want an extra frame per second or two. And I recommand keeping a C/C++ version and an assembly version of functions to test them both and see which would perform fastest (over, say, a few thousand calls to the function). You''ll notice there isn''t that much of a difference in most cases.

Share this post


Link to post
Share on other sites
Is the gap then closing between the speed of code produced by a compiler and that produced by a good assembly programmer?

Looking at the amount of things you have to take into consideration to make effective optimisations with assembler, it looks like there are a lot of places where you could go wrong where a compiler would not. I''m not going to be applying for a job in the games industry till about 2006, and I''m wondering that, if the future of assembler is looking a bit bleak, is it worth the effort learning it when I could be spending the time improving skills in other areas.

Share this post


Link to post
Share on other sites
Performance gained through assembly language is only as good as your algorithm. You should be concerned more with using most efficient algorithm in a first place. I dont suggest going in and replacing every line of code with ASM equivalent. You only wanna do that on parts of program that are executed frequently, like blitting subroutine or something. Before you even start rewriting some portion of code in assembly, question yourself, can I really do this better than a compiler? If you''re certain that ASM will speed up a portion of your code, always setup a benchmark. Run the optimized subroutine say 100,000 times, and measure how many milliseconds it takes to execute that. Then run the original version to see if anything is gained. Optimize only parts that need optimization and are used freqently. With pretty good compilers out there don''t expect major performance boost either. The system will operate no faster than the slowest component which. If you''re doing programming for DOS then I say go ahead with extensive ASM optimization on things like loops, fills, but at the same time don''t re-invent what''s already been done. In Windows environment the performance gain may be lesser in general since Windows is programmed with API which are coded in C so cramming loads of ASM code will not buy you much. It''s important to optimize but don''t reinvent the wheel over one clock cycle.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
It can''t hurt.

If you know assembly well enough to sometimes beat a compiler, then you will much more able to write better code in the first place. By understanding assembly, and what makes code fast/slow and then seeing asm output of a compiler you learn an enormous amount about writing good code in the first place.

While maybe asm is becoming less common, if you understand and can program well in asm then you will be much more able to analyze a critical loop, check the compiler''s code, and then determine if you yourself can do something better.

But of course, always benchmark to make sure.

Cheers.

Share this post


Link to post
Share on other sites
Speaking from my personal experience, sure if your algos are slow they won''t be much faster in asm either. However, if you can get the code where 60% of the time is spent to be 40% faster it will buy you quite a bit of time. Even though no compiler will ever get nearly as good as good assembly programmer, the compilers code will probably be faster then bad asm code. I also think that any programmer should learn asm, it will make them better programmers even in high level languages. But you should not bother wasting time coding everything in asm. Only where it matters. First profile your code to find out where most the critical parts are, then start optimizing the innerloops of the critical parts ( ) in asm.

Share this post


Link to post
Share on other sites
Thanks for the replies.

I''m working in windows so surely things like blitters are provided in hardware through DirectX. My question came up after scanning through optimisation manuals and reading replies on the forum and realising how many seemingly obscure rules you would need to know in order to write good assembler. By the looks of things, Intel''s compiler, which is designed to optimise for the P4, would know those rules and be able to work with them far more faultlessly than a human could. And their benchmarks reckon that, while a routine using the SSE2 extensions written in pure assembler will significantly outperform standard C code, it will not out perform code optimised with their Vectoriser (whatever the hell that is)

None of the replies have made mention of what I am specifically referring to, so I''d still like a bit more help answering my question. I''ve read a number of replies on this forum saying how assembly language can regularly beat compiled code, but they don''t say which compiler they use. The code produced by MSVC standard edition is absolutely appauling and very easy to beat, but with optimisation options provided by the professional edition it can make optimisations that would often be impractical for a human to imitate (like converting constant multiplications to shifts and using the esp register to reference stack parameters instead of the ebp register). And this is only the microsoft compiler which doesn''t specifically target the latest processors.

I''ve learnt the 8086 basic instruction set and it has been a help to me, but on top of that there are the MMX, SSE and SSE2 instructions to learn and then all the optimisation rules on top of that. I don''t want to become proficient in their usage, even if its just with the intention of optimising that essential algorithm, if keeping up to date with the next generation chipsets is going to turn out more trouble than it''s worth, and then to find that compilers are going to end up closing the gap so much that speed differences are negligible anyway.

Thanks again.




Share this post


Link to post
Share on other sites
Well, question yourself what you want to do with your code: do you want it to be fast or compatible?

Learning MMX and SSE and all is great and can make for much more performant code, but you''ll be alienating processors not compatible with those. You could use some chipset-specific code to take advantage of something only, say, intel processors can do (say, some architecture related means of arranging the order in which you access registers; I dunno ^^ but it might perform slower on an AMD or plain fail to perform, so you''d need more code, etc...

C/C++, however, is generally compiled to execute on any computer so long as it runs the appropriate OS (and that it isn''t TOO old; let''s try running a 3D demo on a 8086 boys and girls :D) However you can''t micro-optimize sections of code and wouldn''t benifit from sparing a few cycles here and there. As dumb as it may seem, sparing a handfull of cycles in a tight loop executed many, many times every frame can have an actual effect on execution in some cases.

No matter how good a compiler gets, there''s always a case or two where you could optimize where the compiler cannot. At least not until they get AI. Regardless of the compiler there''s always something you could do in certain cases. A compiler isn''t god; it can''t, for instance, keep a common result in a register to avoid having to reload it every time it is used. It''ll translate your code the same way you wrote it while trying to translate each statement as well as possible. At least that''s how MSVC works: it handles every statement/line individually and produces assembly output independant of other statements. Registers aren''t reused. Dunno how other compilers work...

Basically, a good algorithm is a much better optimization than rewriting a bad algorithm in assembly.

Share this post


Link to post
Share on other sites
No compiler can beat a good assembly language programmer who knows what he''s doing. Period.

It''s more of a question of how effort you''re willing to put in versus how much you''ll get out. It''s often very easy to beat compilers, but it''s only worthwhile in certain situations.

I''d highly recommend learning assembly language well so that you understand what your code is doing and how to write better code. You should have a grasp on how programs really work -- from source code to CPU instructions.



---
Bart

Share this post


Link to post
Share on other sites
Learning assembler is good even if you dont use it in your programs. It gives you a deeper understanding of the machine and the architecture behind it. And it makes debugging alot nicer.



_______________________________
It''s not reality that''s important, but how you perceive things.

A man''s reach should exceed his grasp.

Share this post


Link to post
Share on other sites
Thanks for the replies.

Just one question I want to ask RuneLancer: Which compiler are you using? What you are saying about registers not being used as temporary storage and how your program is compiled exactly as it is written suggests to me that you are using MSVC Standard Edition. MSVC Professional Edition with optimisation options produces massively better code (which I assume is why it is so much more expensive).

Share this post


Link to post
Share on other sites
VC++ 6 pro, SP5, with processor pack, while better than the standard and academic versions (had ''em all at some point ), still produces appalling code.
Hand-coding critical functions in asm has yielded speed increases of 10% (quick reimplementation -> better memory access) to 150% (2 days worth of crazy optimization; end result: limited by decode bandwidth)!

Share this post


Link to post
Share on other sites
Naw, I don''t have standard edition. I think I introduced a bit of confusion with my "exactly as written" post. Yeah, it does optimize code to make use of bitshifts instead of multiplications when applicable, for instance. However let''s suppose you have code that does, say...

r = r + 1;
g = g + 1;
b = b + 1;
c = r | (g * 0x0100) | (b * 0x010000);

...it''ll be compiled something like this...

mov edx, r
add edx, 1
mov r, edx //R = R + 1
mov ecx, g
add ecx, 1
mov g, ecx //G = G + 1
mov eax, b
add eax, 1
mov b, eax //B = B + 1
mov edx, r
mov ecx, g
shl ecx, 8
or edx, ecx
mov eax, b
shl eax, 010h
or edx, eax
mov c, edx //c = r | (g << 8) | (b << 16)

Or something more or less to that end. The individual instructions are well defined and bitshifts have been implanted where necessary. However, assuming r, g and b aren''t reused, the following code could be written instead...

mov eax, r
mov ecx, g
mov edx, b
add eax, 1
add ecx, 1
add edx, 1
or eax, ecx
or eax, edx
mov c, eax

I''m not sure if this example is a good one since I still haven''t exactly gotten the hang of parallelism, but I''m pretty sure this code would execute faster. Most of the register-based operations don''t rely on the results of the previous instructions (save for the ORs) so parallelism would probably help speed it up even more, I think (corrections are welcomed if necessary; I''m still kinda learning as I''ve said ;P)

However the compiler (or at least mine) won''t optimize it this way (assuming my example code is even good to begin with) since it won''t read 3 instructions ahead to figure out r is later used and could be kept in a register to avoid reloading it, or that it can be completely destroyed after since it isn''t reused later on in the code.

Granted, the C code isn''t the best to begin with since we could simply do (whatever the original color was) + 0x010101 (and since we''re not checking for overflows anywhere, this makes adding 0x010101 even more valid of an algorithm), but meh.

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
Good god - even VC ain''t that bad
A little nitpick: you forgot the shifts in the 2nd piece of code.
Hey now, I didn''t do THAT bad of a job with my version... did I? ^^; Or did you mean the first piece of code?

Except for the shifts, yes. Overlook. Heh. ^^; Thanks for pointing that out.

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
Hand-coding critical functions in asm has yielded speed increases of 10% (quick reimplementation -> better memory access) to 150% (2 days worth of crazy optimization; end result: limited by decode bandwidth)!

Where do those figures come from? Are they part of the 68.57% of statistics that are made up on the spot? Do they represent improvement gained purely from structure preserving transformation into asm, or do they additionally account for algorithmic conversion? In fact, how do you determine whether there has been algorithmic conversion or not?

Share this post


Link to post
Share on other sites
Actually SabreMan, my sources have it that the number of statistics made up on the spot has risen this financial year to just over the 70% mark

As for the topic at hand... flame if you must, but I hold a belief (a somewhat foolish belief some may say) that if you write nice tight, smart code in a higher-level language, and take care with things like member alignment and packing etc, compilers will do a pretty nice job for you... and the bonus is that as the compilers get better, your code gets more optimised, with little-to-no extra effort on your behalf. Like others have said, concentrate on choosing an efficient algorithm, and implement it cleanly, and I think your time will be better spent.

Call me lazy, but didn''t we invent computers to do all the hard shit for us?

Share this post


Link to post
Share on other sites
quote:
Original post by Bad Monkey
Actually SabreMan, my sources have it that the number of statistics made up on the spot has risen this financial year to just over the 70% mark

What, you mean following this thread?
quote:

As for the topic at hand... flame if you must, but I hold a belief (a somewhat foolish belief some may say) that if you write nice tight, smart code in a higher-level language, and take care with things like member alignment and packing etc, compilers will do a pretty nice job for you... and the bonus is that as the compilers get better, your code gets more optimised, with little-to-no extra effort on your behalf.

The question isn''t really "is my code as fast as it can be"? It''s "is it fast enough for the intended purpose"? From that perspective, what you''ve said is completely agreeable. Any other perspective gives a lot of scope for mindless arguing.

What I was trying to get at earlier is that some folks claim you can achieve x% improvement via use of assembly, but I wonder how much of the optimisation also involves surreptitious manipulation of the algorithms, which could be achieved without asm? What would be really cool, and would probably satisfy everyone, is a compiler (or higher-level tool) that can be customised to emit particular sequences of asm (or other source) if so desired. This principle is incorporated in the thinking of recent research projects, such as Intentional Programming and TUNES.
quote:

Call me lazy, but didn''t we invent computers to do all the hard shit for us?

Yup, you can call me lazy too. I don''t even see why I should have to write programs when I can write programs to write my programs for me.

Share this post


Link to post
Share on other sites
SabreMan: I take that cheap dig to indicate your disbelief. Surprise - it''s actual measured data, and even repeatable
10%: (chess engine) increased speed of my asm implementations of nbits and leadz vs. the best I could get out of the compiler (measured in iterations / sec on typical data).
The algorithms are the same, and are based on lookup tables; I spent about 10 minutes optimizing.

150%: (CLOD refine code) increased average triangle throughput of my heavily optimized version (measured in the app via QueryPerformanceCounter).
davepermen, if you''re reading this: practically no difference between VC 6 and 7.
My version contains stuff simply not possible in C, e.g. rcr trickery, pretty complicated 3DNow! code, perfect scheduling for Athlons, ...

Share this post


Link to post
Share on other sites
In that case isn't it completely incompatible with Intel's processors?

You referred to RuneLancer's code and said it would be faster using PADDUSB. I doubt RuneLancer has started looking at the SIMD extensions given that, like me, he's a beginner. And the description of the Intel compiler, which I keep talking about, says that it can optimise for the SIMD extensions at levels which would be extremely difficult for a human to match. So in the case of RuneLancer's example, code produced with the correct optimisation options from an Intel compiler would outperform that written by his version of assembler.

I guess there isn't any way to do rotates in C, but I've recently been told that you use libraries if you want to make use of 3dNow!

You said you had to spend 2 day's optimising code just for the AMD processor. So you'd then have to do the same optimisations for the Intel processor and that doesn't sound very practical. I'm really going to have to side with the anti-assembly crowd on this one. I get a kick out of using assembler, but I can't see it as being the way forward.

[edited by - philscott on November 22, 2002 10:40:58 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
SabreMan: I take that cheap dig to indicate your disbelief.

It wasn''t a cheap dig - it''s a serious question. I''m a bit worried about your choice of words though: do you go crusading against unbelievers?
quote:

Surprise - it''s actual measured data, and even repeatable

That''s cool, I''m not in a position to say you''re lying, I just wanted to know more about where you got the figures from. I''m still skeptical of the general applicability of such figures, or even the validity of them [1] but we''ve had that discussion before.

[1] For the figures to be an entirely valid test of human written asm versus compiler-generated asm there has to be some criteria for demonstrating the code is structurally intact. It''s quite tricky to say whether that is true, because of course it''s different in some way else there would be no speed increase! So, what does "structurally intact" actually mean? If it isn''t structurally intact, are the structural changes ones that could have been made without assembler. Furthermore, it might be that your speed increases are simply down to the compiler generating shit code, and another compiler would fare better. If it came down to an organisation paying for a couple of Intel licences over spending a lot of time hand-optimising, that might be a sensible approach.

Share this post


Link to post
Share on other sites
I check for 3DNow! Pro support at startup, and fall back to the C version if not available.
I didn''t (don''t?) optimize for the P4 because a) I don''t have one b) I dislike the architecture.

ps> And the description of the Intel compiler, which I keep talking about, says that it can optimise for the SIMD extensions at levels which would be extremely difficult for a human to match. <
Howzat?

ps> So in the case of RuneLancer''s example, code produced with the correct optimisation options from an Intel compiler would outperform that written by his version of assembler. <
Well come on. No offense to RuneLance, but that code sucked

Practical? It was about 400 lines of asm, and those 2 days were well spent! The refine code, which accounts for almost all CPU time spent outside the GL driver, is now 2.5x as fast!
If the program isn''t fast enough (and it wasn''t), "the compiler''s pretty good" and "I''m too lazy to speed it up" doesn''t cut it.

Share this post


Link to post
Share on other sites
>> I take that cheap dig to indicate your disbelief.

sm> It wasn''t a cheap dig - it''s a serious question. I''m a bit worried about your choice of words though: do you go crusading against unbelievers? <

I take issue with the insinuation that I''m making up statistics.

>> Surprise - it''s actual measured data, and even repeatable

sm> That''s cool, I''m not in a position to say you''re lying, I just wanted to know more about where you got the figures from. <
Fair enough.

sm> I''m still skeptical of the general applicability of such figures <
I am not saying "you will achieve identical results", I am pointing out that a) VC 6/7 still suck b) it is easy to code circles around them.

sm> or even the validity of them [1] but we''ve had that discussion before. <
I think you''re missing the point
My criteria is "they do the same thing" - I have compared my asm version against the best I could get out of the compiler. Of course I did everything I could at a higher level (including tweaking the C code) before getting down&dirty with asm.

The Intel compiler won''t cut it. 3DNow! is necessary to take my Athlon XP to the max.

oh @philscott: I don''t think you''ll get around coding a separate version of time critical functions for current CPUs, if you really care about performance.

Share this post


Link to post
Share on other sites