Compiler VS Hand Tuned Assembly...

Started by
16 comments, last by outRider 16 years, 6 months ago
@the op: use your common sense, that involves reading material from respected sources about what compilers are good at, and what they aren't good at. Improve your algorithms, and again, and again, then use the tools at your disposal to improve your chances, namely intrinsics and then blocks of inline asm.

and its pretty much a given that when you're doing blackart stuff like this: http://www.codeproject.com/system/soviet_protector.asp you're going to need assembler. But this discussion is about speed not functionality.

To that end....

@Promit: I'm with you all the way.

@Rockoon1: By the sounds of things you've never dealt with a code-base larger than a few hundred thousand lines, and you probably have never had a PWHC or E&Y risk audit team analyze your code and give a rating for all the things Promit mentioned (specifically the ones which have absolutely nothing to do with how fast your code is). If you provided them with massive listings of pure assembler and then told them it was to gain performance, you probably would have been blind-folded and chucked into a black GMC panel-van and never heard from again.
Quote:the assembly language programmer has the freedom to try different methodologies
funny you should say that, because the C++ programmer from your direct market competitor is free to try out those same methodologies using a high-level language, a smart compiler and some pretty Pro tools like VTune which will tell you cool stuff like how well you're utilizing the Multilevel cache on your Intel chip... of course "the assembly programmer" could use these tools too, but, oh, wait...you need a C++ Compiler to instrument your code for VTune.

FYI> I'd love to work where Rackoon1 works, because it sounds like the chap doesn't have any deadlines.

@Skizz: I see where your rational and considered argument is coming from (and I also appreciate the fact that you can present your point without the condescending tone which Rackoon1 insists on using). Interestingly, you managed to make good use of the SIMD and Integer units in parallel, I think its only a matter of time till we're having this conversation over "hand written C++ or OpenMP5.0"? Especially now that CPU architectures like CELL B/E and AMD's new Quad cores are going to be the norm.
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website
Advertisement
Quote:@Rockoon1: By the sounds of things you've never dealt with a code-base larger than a few hundred thousand lines, and you probably have never had a PWHC or E&Y risk audit team analyze your code and give a rating for all the things Promit mentioned (specifically the ones which have absolutely nothing to do with how fast your code is). If you provided them with massive listings of pure assembler and then told them it was to gain performance, you probably would have been blind-folded and chucked into a black GMC panel-van and never heard from again.
Uh.. You're missing the point really. He said it was possible, not practical. There's a bit of a difference between them you know..

Quote:funny you should say that, because the C++ programmer from your direct market competitor is free to try out those same methodologies using a high-level language, a smart compiler and some pretty Pro tools like VTune which will tell you cool stuff like how well you're utilizing the Multilevel cache on your Intel chip... of course "the assembly programmer" could use these tools too, but, oh, wait...you need a C++ Compiler to instrument your code for VTune.
Well, yes. But there are quite a few optimizations that modern C++ compilers just can't handle, or where they are simply impractical, and where you'd have to resort to assembly language. Consider the dynamic recompilers used in modern emulators for instance. Or simply to take advantage of specific optimizations which your current compiler just doesn't support, there's no way you're going to get your C compiler to generate a SIMD instruction which it doesn't have a concept of.
Quote:Original post by drakostar
But your arrogance is completely unjustified. You're making sweeping assertions without the slightest shred of evidence.


What asserions are those? Do you understand them?

Quote:Original post by drakostar
Have *you* written any modern assembly?


yes

Quote:Original post by drakostar
x86-64 or SPARC assembly, for example?


x86-64, 80x86, 8088 ... need I go back farther?

Quote:Original post by drakostar
This isn't MIPS32.


no kidding - what does that have to do with it?

Its fine and all that you can bring up irrelevant details.. but dont think for a second that I am not knowledgable enough to know that they are irrelevant.

Quote:Original post by drakostar
The amount of factors to handle is enormous.


Not true. Processors are deterministic and follow a set of simple rules. The person who told you otherwise was preaching a faith.

Quote:Original post by drakostar
AMD published a five-volume, ~2000-page manual on AMD64.


great.. I guess thats a link to tech specs.. did you read it?

Quote:Original post by drakostar
Yes, if your compiler is being utterly stupid in some situation, hand-tweaking the assembly code it generates makes sense.


..or if the abstract machine the compiler is founded upon (such as the C "abstract machine") does not encompass the full feature set of the processor..

Does ICC have a rotate-through-carry intrinsic? no?

Then how on earth can you write any algorithm which is best performed by using those instructions? You can't. It is simply not possible with ICC.

Why wouldn't ICC have such an intrinsic? It isnt because the instruction is useless. It is because the C abstract machine has no concept of a flags register.

Can you show me the C compiler that can generate a function that will return bits on, and then take advantage of, the flags register? They don't do that.

How about the reverse of that? A function that takes as input the flags register? Can't do that either?

How about a C compiler that will correctly mix FPU and Scaler SSE instructions in order to reduce register pressure due to many floating point constants in play? They don't do that. While that reload of a constant, or the swizzle of an SSE register "seems" free .. it isnt free at all. Yet another execution unit wasting time.

Quote:Original post by drakostar
It was popular in the past because compilers generally sucked, and the instruction set was much more limited


The compilers sucked *even though* CPU's were simpler. Think about it.

Quote:Original post by drakostar
These days, in the vast vast majority of cases, you're not going to outperform ICC.


Speak for yourself.

Funny that every version of ICC brings greater speed, yet "in the vast majority of cases" it cannot be beaten.. you arguement is faith based. The facts speak for themselves.
Quote:Original post by silvermace
@Rockoon1: By the sounds of things you've never dealt with a code-base larger than a few hundred thousand lines


I'm wondering what the entire code base has to do with leveraging assembly language. Nobody suggested writing an entire program in assembler. Its the 80/20 rule all the way down the rabbit hole.

Quote:Original post by silvermace
, and you probably have never had a PWHC or E&Y risk audit team analyze your code and give a rating for all the things Promit mentioned (specifically the ones which have absolutely nothing to do with how fast your code is).


That is certainly true. I don't even know what PWHC or E&Y stands for...

Quote:Original post by silvermace
If you provided them with massive listings of pure assembler and then told them it was to gain performance, you probably would have been blind-folded and chucked into a black GMC panel-van and never heard from again.


There are certainly many reasons not to write something in assembler.. there are many reasons not to code in C or C++ as well..

Each language offers a unique feature set that is suited to the problem before you. This includes Visual Basic or other RAD language, as well as domain-specific languages. Each one is not nullified another. Each is a tool that has its place.

Now what do you do when performance is an issue yet you are using an algorithm that is provably asymptotically minimal? Throw your hands in the air and claim that ICC is the best asm programmer ever, so its impossible?

To quote Abrash: "There ain't no such thing as the fastest code"

Quote:
Quote:the assembly language programmer has the freedom to try different methodologies
funny you should say that, because the C++ programmer from your direct market competitor is free to try out those same methodologies using a high-level language


No, he isnt. He is free to try different algorithms but in each case he gets a single methodology for that algorithm out of the compiler..

Quote:
a smart compiler and some pretty Pro tools like VTune which will tell you cool stuff like how well you're utilizing the Multilevel cache on your Intel chip...


pssst... many of the feature sets of vtune and codeanalyst are specifically designed for someone who can fiddle with the instructions without interference from a compiler that thinks it knows best.

Quote:
I'd love to work where Rackoon1 works, because it sounds like the chap doesn't have any deadlines.


This has nothing to do with the OP's question. You can surely find many reasons not to code in assembler .. so what? You can also come dengerously close to casting ad hominems .. do you think that makes you right?

If you dont like my attitude, you should check the moderators first. My attitude was fine until he dismissed my comments and then flooded his reply with irrelevant detail after irrelevant detail. He didnt address my point at all. he want on and on with the assumption that what compilers do is otherwise impossibe. Such facts not in evidence and never will be because its not logically sound on the face of it.

I don't take kindly to people who try to come off as smart when really all they are doing is changing the subject or shifting the goal post. I can't imagine a reason for why he did it that doesnt deserve a negative response. Not a single point he made had contradicted what he dismissed.
Quote:Original post by Rockoon1
Quote:
Quote:the assembly language programmer has the freedom to try different methodologies
funny you should say that, because the C++ programmer from your direct market competitor is free to try out those same methodologies using a high-level language

No, he isnt. He is free to try different algorithms but in each case he gets a single methodology for that algorithm out of the compiler..

That's not true. For any given algorithm there are many ways to implement it in C++. Once you introduce intrinsics for vector instructions etc. you open up many more possibilities for the detailed implementation of an algorithm. Relatively minor changes in the way you express the implementation of an algorithm can have significant effects on the quality and performance of the assembly the compiler generates. There are things that you can't make the compiler do and that you have to resort to assembly for but in my experience it is almost never the best use of your time to do so, for a given amount of programmer time more siginficant performance benefits can usually be had by moving on to lower hanging fruit elsewhere.

Game Programming Blog: www.mattnewport.com/blog

Quote:Original post by mattnewport
Relatively minor changes in the way you express the implementation of an algorithm can have significant effects on the quality and performance of the assembly the compiler generates.


Let's think about that for a moment. The optimizing quality of a compiler is inversely proportional to the veracity of that statement for the compiler in question.
Quote:Original post by outRider
Quote:Original post by mattnewport
Relatively minor changes in the way you express the implementation of an algorithm can have significant effects on the quality and performance of the assembly the compiler generates.


Let's think about that for a moment. The optimizing quality of a compiler is inversely proportional to the veracity of that statement for the compiler in question.


What's your point? If optimizing compilers were perfect then this wouldn't be as necessary. They're not perfect. Some modern optimizing compilers are very good but even the best current optimizing compilers still need a helping hand from the programmer if they are to produce the most efficient code.

In some cases there are code changes to the implementation of an algorithm that preserve the results at a high level but that would not be legal for the compiler to make because they change the detailed semantics of the code. Even a theoretically perfect optimizing compiler might not be able to make all such changes because it doesn't know that changing a few of the least significant bits of the result of a calculation that uses floating point arithmetic is acceptable for your needs for example.

Game Programming Blog: www.mattnewport.com/blog

Quote:Original post by mattnewport
Quote:Original post by outRider
Quote:Original post by mattnewport
Relatively minor changes in the way you express the implementation of an algorithm can have significant effects on the quality and performance of the assembly the compiler generates.


Let's think about that for a moment. The optimizing quality of a compiler is inversely proportional to the veracity of that statement for the compiler in question.


What's your point? If optimizing compilers were perfect then this wouldn't be as necessary. They're not perfect. Some modern optimizing compilers are very good but even the best current optimizing compilers still need a helping hand from the programmer if they are to produce the most efficient code.

In some cases there are code changes to the implementation of an algorithm that preserve the results at a high level but that would not be legal for the compiler to make because they change the detailed semantics of the code. Even a theoretically perfect optimizing compiler might not be able to make all such changes because it doesn't know that changing a few of the least significant bits of the result of a calculation that uses floating point arithmetic is acceptable for your needs for example.


My point is self-evident. I agree with you that compilers are not perfect, that they need a helping hand, so I'm not sure why you're justifying yourself to me unless it is to concur. I think most who have commented here hold the opinion that compilers are not perfect.

But Rockoon's statement still holds, you cannot influence the decisions a compiler makes at the register level. Compilers play by static and rigid scheduling rules, suffer from idiosyncrasies, and do ship with bugs--which cannot be remedied at source, no matter how much you fiddle with your code.

As for whether or not it's the best use of time, that entirely depends on how much you stand to gain, on your skill level, and on how much time you have.

This topic is closed to new replies.

Advertisement