Jump to content
  • Advertisement
Sign in to follow this  
Decrius

Executable size: ASM vs. C

This topic is 3716 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Just started in ASM (using NASM), and expected the application (a simple hello world program) to be slightly smaller then the C (using MingW32) variant...and perhaps a bit faster. That was true, except for 'slightly' ^^. Infact, the C executable was 135 times bigger! ASM source:
bits 16
org 0100H

section .text

start:
    mov dx, helloworld
    mov ah, 9
    int 21H

    mov ax, 04c00H
    int 21H

section .data

helloworld  db  "Hello world!", 10, "$"

C source:
#include <stdio.h>

int main()
{
    printf("Hello world!");
    return 0;
}

The scripts are not equal though, ASM is in 16-bit mode, and I suspect C to use some other (or non-) software interrupts. So I did believe the C version would be somewhat bigger... But can someone explain me how such a simple program can be so big (MingW32, -s -Os only, Win32 XP)? What is included? Does it actually uses all the extra code (or data?)? If yes, what code/data is it? If not, why doesn't the compiler optimizes it? *confused* Thanks. PS: ASM was 26 bytes in size, C around 3,500 bytes.

Share this post


Link to post
Share on other sites
Advertisement
You are comparing two different type executables. First is com file. It has no headers and is just a raw machine code. Second is exe file and has a lot of headers, because exe files can have a lot more sections than only executable code (resources, dll imports, dll exports, debugging info, ...)

Share this post


Link to post
Share on other sites
Quote:
Original post by bubu LV
You are comparing two different type executables. First is com file. It has no headers and is just a raw machine code. Second is exe file and has a lot of headers, because exe files can have a lot more sections than only executable code (resources, dll imports, dll exports, debugging info, ...)


Ah alright, I see. The C exe, though, doesn't use any DLL or resources (appointed by me), and it has no debugging info (I specified it to have no debug info...).

Also, the ASM file is called .exe, but that's just the name :P.

But are these headers so big?

Share this post


Link to post
Share on other sites
Win32 PE executables include quite a bit of headers and padding, especially if you're aiming for an EXE size of a few kilobytes or so, perhaps even useless debugging and relocation information and such so try stripping it. Part of the reason why the PE files are so large is that by default each section is aligned on 4k boundaries to facilitate fast disk paging.
Additionally C runtime does a lot of processing even if you don't ask for anything (parsing the command line, opening console streams, initializing the memory allocator, handling the atexit chain, etc) and printf(...) is seriously overkill for writing a fixed string so if you're linking statically (which you probably aren't) then that's a big cost.

Here's a tutorial on how to slim down Win32 executables, unfortunately it's aimed towards Visual C++ but the same concepts still apply you've just got to find some other switches to use.

Unfortunately it's also true that compilers suck very badly at optimizing for size, especially for complex CISC architectures like the x86. A skilled hacker with the patience to turn over every byte of code looking for exploitable similarities or better instruction sequences can do amazing things in 4k.
Note that in my experience VC is significantly better at this than GCC, which barely seems to bother doing anything more than disabling performance optimizations when optimizing for size.

Oh, and you can always use an EXE-packer like UPX.

Share this post


Link to post
Share on other sites
Ah thanks...that makes sense. Though, if a program doesn't use atexit()...it shouldn't keep a list ^^.

Share this post


Link to post
Share on other sites
Keep in mind that 32bit ASM is the way to go these days especially since calling interrupts as you are doing will not always work in protected mode that most current versions of Windows uses. Also 16 bit code won't run at all under 64 bits!

Share this post


Link to post
Share on other sites
The other thing that should be obvious from the replies, is that the cost isn't linear. If you made a substantially more complex program with some data structures, some actual work being done, etc... The ASM version would grow linearly as you'd have to write all that code. The C version would grow much less than linearly, as most of the size right now is fixed-cost overhead. Eventually, once the program is complicated enough, the overhead would be noise (completely undetectable compared to the actual code size), and the only real difference would be in the size of the code itself. At that point the ASM version might still be smaller, if you managed to hand-write ALL of that code such that it was size-optimal. Of course, unless you're entering a 4k or 64k demo contest, there isn't really any point.

Share this post


Link to post
Share on other sites
Quote:
Original post by daviangel
Keep in mind that 32bit ASM is the way to go these days especially since calling interrupts as you are doing will not always work in protected mode that most current versions of Windows uses. Also 16 bit code won't run at all under 64 bits!


Yes, just reading my way through "Assembly language Step-by-step" by Jeff Duntemann...not yet arrived at the 32-bit part :P.

Quote:
Original post by osmanb
The other thing that should be obvious from the replies, is that the cost isn't linear. If you made a substantially more complex program with some data structures, some actual work being done, etc... The ASM version would grow linearly as you'd have to write all that code. The C version would grow much less than linearly, as most of the size right now is fixed-cost overhead. Eventually, once the program is complicated enough, the overhead would be noise (completely undetectable compared to the actual code size), and the only real difference would be in the size of the code itself. At that point the ASM version might still be smaller, if you managed to hand-write ALL of that code such that it was size-optimal. Of course, unless you're entering a 4k or 64k demo contest, there isn't really any point.


Ah okay, but I guess the C version (if the ASM version is written properly) would always be smaller...

Share this post


Link to post
Share on other sites
Quote:
Original post by Decrius

Yes, just reading my way through "Assembly language Step-by-step" by Jeff Duntemann...not yet arrived at the 32-bit part :P.


Ah, yes I read that book back in the DOS days(I still remember the Martian counting system and Eat at Joe's -LOL!) and it's a pretty good book but I don't believe it covers 32bit ASM unless it's the newer edition that might have?
Nevermind, I just noticed you are using the newer edition since it uses NASM whereas the version I learned from used Microsoft's MASM assembler.
Anyways, if you still like Assemby after this book I recommend getting the latest Kip Irvine Assembly book that covers 32bit ASM.

Share this post


Link to post
Share on other sites
Quote:
Original post by Decrius
Just started in ASM (using NASM), and expected the application (a simple hello world program) to be slightly smaller then the C (using MingW32) variant...and perhaps a bit faster.

That was true, except for 'slightly' ^^. Infact, the C executable was 135 times bigger!

ASM source:
*** Source Snippet Removed ***

C source:
*** Source Snippet Removed ***

The scripts are not equal though, ASM is in 16-bit mode, and I suspect C to use some other (or non-) software interrupts. So I did believe the C version would be somewhat bigger...

But can someone explain me how such a simple program can be so big (MingW32, -s -Os only, Win32 XP)? What is included? Does it actually uses all the extra code (or data?)? If yes, what code/data is it? If not, why doesn't the compiler optimizes it?

*confused*

Thanks.

PS: ASM was 26 bytes in size, C around 3,500 bytes.


If I remeber correctly, when you have a com file you want to do a int 20h, instead of the

mov ax, 04c00H
int 21H



This is used with exe files.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!