Thoughts on Nasm, etc..

Started by
14 comments, last by Cornstalks 11 years ago

When I was a kid, the progression to learning programming was BASIC first, then assembly. You couldn't get a C compiler for free back then, and DOS came with 'debug' letting you write little assembly programs. Later I moved onto C(++), Java, etc, and never went back to it. But I do understand your fascination with it, and at times, something I think about getting in to x64 assembly programming. Maybe later.

Don't let others give you grief about it though. I'm a hardcore C programmer, and even program C in an OO style (function pointers in structs for polymorphism, etc), and some people give me trouble for it. I understand C++ is there, but for my own personal projects (not at work: at work I use whatever the rest of the team is using because its not my project) I choose C.

Keep doing what you're doing. We need more assembly programmers, because somebody needs to fix the compiler when it spits out broken code, or write those high performance hardware drivers!

Advertisement

To touch a little more on what JTippets said, in production its potentially dangerous (to your performance, that is) for a non-expert programmer to delve into assembly directly. The reason, at its core, is this: modern CPUs are so complex that even an engineer from Intel or AMD, given a small routine written in assembly, can't tell you for sure how many cycles it'll take to execute--even ignoring the affect of memory latencies--at best they can give you a window, they can say "between 37 and 108 cycles". Performance today just depends on so much more than the stream of opcodes you're interested in.

Today's CPUs break down even single assembly statements into potentially-many smaller instructions, and re-orders them on-the-fly based on what execution units are available and who's parameters are ready. In fact, and I'm not making this up, only around 1% of the transistors in a modern CPU are actually used to compute anything -- Around 75% of the transistors are spent on caches (to hide memory latency), and the remaining 24% hide latency in other ways, like instruction re-ordering. All of this comlexity exists on a micro-scale, and is compounded by the number of different CPU micro-architecutes with different properties.

Up a level or two from that are questions like "How much bettor or worse will my program perform if I inline this function?", "What about at this call site?", "What about at that call site?" "What if I allocate this variable in a register?", "What if I allocate that variable in the register instead?". Its not even that a determined human with the right set of tools couldn't figure out the optimal instruction set at this level if given enough time, its that a human can't perform this function for the tens-of-thousands of permutations that are necessary to produce globally optimized code. This is why even very skilled assembly programmers today touch only those functions that are most performance-critical, and usually out near the very leaves of the programs call-graph -- there's more to be lost than gained to write entire (or even large-sections-of) programs in assembly today, and the cost is large.

I'm glad that you have an interest in assembly, its knowlege that's somewhat arcane but also indespensible when you really need it. Furthermore, having knowlege of what high-level program constructs (say, a loop, a switch statement, a virtual function table) or patterns (say, Duff's Device) will look like after the compiler translates them to assembly can help you be a more conciencious programmer. All good things.

However, as a practical exercise today, assembly-level programming is rightly relagated to the margins of programming. When writing and optimizing a program, or any part there-of, you should follow these steps:

  1. Write first for correctness, properly weighing performance against maintainability, and choosing good algorithms.
  2. Profile to identify performance-critical functions.
  3. IFF profiling reveals hot-spots, consider optimizing them by following the next steps, otherwise stop here.
  4. Consider algorithmic optimizations, or ways to reduce the ammount of work that needs to be performed in a given time. Space partitioning algorithms are a good example of this type of optimization. I would also consider this step to include methods of optimizing cache-locality, often called Data-Oriented Design, or DoD, and transforming the problem to utilized vector hardware (SSE, AVX, etc.).
  5. IFF algorithmic optimization and work-reduction still does not provide sufficient performance (as opposed to ultimate performance) consider futher optimization by following the next steps, otherwise stop here.
  6. Consider re-writing the code using intrisic functions to get closer to the metal, without tying the hands of the compiler to perform its usual optimizations. In general, a compiler or assembler will not optimize assembly code at all. Its assumed that if you're writing assembler, then you really mean to do exactly what you're doing, resulting complications be damned.
  7. IFF, after all of this, you can prove to yourself that you will generate better code than the compiler-intrics, consider further optimization by following the next steps, otherwise stop here.
  8. Consider re-writing in assembly language. If you follow these steps honestly, you should almost never end up here except in a few cases. Namely, that you as a programmer have insights into the code that the compiler cannot, that can employ hardware resources or CPU instructions that the compiler cannot or has no means for you to express at a higher level, or, for whatever reason, the compiler simply does The Wrong ThingTM with this particular piece of code.

throw table_exception("(? ???)? ? ???");

You can write in that style in a higher level language if you like:

Are those prototypes up there above the main function or just globals? Either way I can't wait to get good enough at it to throw a few for-fun versions of that down from off the top!

  1. Write first for correctness, properly weighing performance against maintainability, and choosing good algorithms.

Ravyne your entire post was super insightful! That number 1 is definitely the key, figure if you can do that part right then optimazation should become less and less of an issue. For me I'd rehash the thing till it's blazing fast and after awhile I suppose if you're good enough at that you can almost fit all of those steps into just number one there from the start. My C code can be unruly at times because I like to avoid using too many pointers but that's a whole different topic. Suppose to goes back to the performance thing being essential.

I'm a hardcore C programmer, and even program C in an OO style (function pointers in structs for polymorphism, etc), and some people give me trouble for it. I understand C++ is there, but for my own personal projects (not at work: at work I use whatever the rest of the team is using because its not my project) I choose C.

DracoLacorente, Was that style something you eased into on your own? I'm self-taught so it's a lot of give and take when it comes to style. I actually started off learning Python somewhat formally but it was so whitespace-unfriendly. The freedoms C gives you while staying kind of centered is why I like it. That and of course the inline assembly capability that people frown upon. I don't want to be that guy but it honestly just rings my bells.

I've seen it answered a few times but how does inline assembly fare as far as portabliity goes? I don't do it at all yet but would that make it slightly more portable or the opposite?

Editor // Joy-Toilet.com

Anything But Shitty Entertainment!

Ravyne your entire post was super insightful! That number 1 is definitely the key, figure if you can do that part right then optimazation should become less and less of an issue. For me I'd rehash the thing till it's blazing fast and after awhile I suppose if you're good enough at that you can almost fit all of those steps into just number one there from the start. My C code can be unruly at times because I like to avoid using too many pointers but that's a whole different topic. Suppose to goes back to the performance thing being essential.

I'm glad you found it useful, but I'm worried you misinterpret it a bit -- I want to drive home the point that is in no way meant to be viewed as something you can compress into a single step, nor should you want to. It's a process that aims to put your efforts squarely where they belong, while leaving code at the most abstracted level that it can meet performance requirements in. Trying to compress it all into one step is impossible without making dangerous assumptions that will end up costing you time, maintainability, and performance. An expert might skip steps 6 and 7 if, and only if, they are certain through wisdom and experience that the compiler cannot or will not generate the best code -- but they will never simply assume that to be the case, nor would they skip the earlier steps without first having hard data to prove that it falls into a performance hot-spot.

I'll share something I learned just yesterday which is tangential but illustrative of why the thing you think will work best, often doesn't.

I attended a meeting of the Northwest C++ Users Group last night. The topic was Visual Studio's Profile-Guided Optimization (PoGO, for short) feature and how it works. In brief, PoGO works by instrumenting a build of your application which you then run through various performance-sensitive scenarios in order to train it with real data. Then, you do the real (release) build in a way that incorporates that training to help inform the compiler how to generate the best code for real use-cases. For example, based on whether the conditional is likely to be true or false, it might swap the order of conditional branches in an 'if' statement so that the processor speculatively executes the correct branch in the majority of cases. If it does so, the CPU stalls less, and performance is increased. It also applies what it's learned about how-often, and from where every function call is made (this infuences whether the function should be inlined or not). It's all very complex, and I'm simplifying it here, but that's what it does in a nutshell.

When we got to the end of the presentation and the speaker was comparing results of PoGO-compiled code vs code compiled with -O2 (the highest level of non-PoGO optimization Visual Studio's compiler supports) there were some really interesting results. Not only was the PoGO-compiled code faster, it was also smaller, and it also only inlined about 5% of the overall call sites, vs around 20% or higher that were inlined by the -O2-compiled code. Now, it performs a number of other optimizations to achieve that, but think about those stats on their own -- The PoGO code used far less inlining than the -O2 code, and was faster and was smaller, all at the same time. Best performance is not achieved by being most-agressive with potential optimizations, its achieved by being really smart about where optimizations are applied, and by applying them in the context of real data about real scenarios.

Lets think about that another way: For all of the thousands of PhD-hours and hundreds of millions of dollars thrown into compiler research over decades, not even the compiler (and one of the best in the world, at that) can generate its best code without first profiling it!

throw table_exception("(? ???)? ? ???");

Quote

I'm a hardcore C programmer, and even program C in an OO style (function pointers in structs for polymorphism, etc), and some people give me trouble for it. I understand C++ is there, but for my own personal projects (not at work: at work I use whatever the rest of the team is using because its not my project) I choose C.




DracoLacorente, Was that style something you eased into on your own? I'm self-taught so it's a lot of give and take when it comes to style. I actually started off learning Python somewhat formally but it was so whitespace-unfriendly. The freedoms C gives you while staying kind of centered is why I like it. That and of course the inline assembly capability that people frown upon. I don't want to be that guy but it honestly just rings my bells.

I had zero style throughout high school and most of college. Even after college my style was somewhat poor. It was after working at my current company for about a year or so that some of the style and design of the code at work began to rub off on me an influence my code at home. There's a few key experiences that have shook my style quite dramatically:

  • I took a Java class at school. The ideas of objects and polymorphism were cool, but I saw a lot of stuff that seemed over-OO-ified. (the Integer class, for example). But the idea of interfaces and abstraction really influenced my style.

  • At work, one of the projects I help maintain is a large C codebase. It has full manual memory management. It sucks. There is one part of that code base (a data tree structure) that uses reference counting. I've adopted reference counting in my home-coded C projects.

This is typical of my style at home:


typedef struct inner_s {
   char* str;
   void (*do_something)(struct inner_s* inner);
} inner_t;

typedef struct {
   inner_t* i1;
   inner_t* i2;
} outer_t;

void free_outer(outer_t* x)
{
   z_free(x->i1);
   z_free(x->i2);
}

void foo(outer_t* x)
{
	x->i1->do_something(x->i1);
	x->i2->do_something(x->i2);  
}


void free_inner(inner_t* x)
{
  z_free(x->str);
}



void inner_do_something_interesting(inner_t* in)
{
  printf(" oh my god %s!\n", in->str); 
}

void inner_do_something_boring(inner_t* in)
{
  printf("I'm bored\n");
}

inner_t* inner_mk(char* string)
{
   inner_t* inner = z_alloc(sizeof(*inner), free_inner);
   inner->str = z_addref(string);
   inner->do_something = inner_do_something_interesting;
   return inner;
}

outer_t* a = z_alloc( sizeof (*outer), free_outer);


char* reused_string = z_strdup("hello");

a->i1 = inner_mk(reused_string);
a->i2 = inner_mk(reused_string);

//sometimes you want to override a method
a->i2->do_something = inner_do_something_boring;


z_free(reused_string); //don't need this anymore, we inner_mk addref'ed it

foo(a); //this calls the 'do_something' method on both i1 and i2


//sometime later, something else copies a reference to a:

a_copy = z_addref(a);
//
and then maybe the original a goes out of scope:

z_free(a);

//then eventually the copy goes out of scope:

z_free(a_copy);  

//now z->i1, z->i2 are freed, recursively freeing the last reference to 'reused_string' which is now eventually destroyed




C++ programmers tell me the above is crazy and I should have made C++ classes for inner and outer, and then used boost smart pointers. At work I program how I'm supposed to program to get things done. At home, I program for fun, whatever way I want.

C++ programmers tell me the above is crazy and I should have made C++ classes for inner and outer, and then used boost smart pointers. At work I program how I'm supposed to program to get things done. At home, I program for fun, whatever way I want.

If you were programming in C++, I might say they have a point. But if you're programming in C then they're clearly crazy for such suggestions smile.png

But to add to the discussion, I think some of the people have made excellent points. I won't repeat everything. I've primarily found knowing assembly useful when debugging release applications (where I haven't been able to reproduce bugs in debug builds), which has been really helpful to me. It doesn't happen a lot, but when it does... well, somebody has to get their hands dirty.
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

This topic is closed to new replies.

Advertisement