VC++? Really?

Started by
38 comments, last by Gladiator 23 years, 8 months ago
Okay, I thought VC++ was a good code optimizer until I tested the following. Not only does it waste a lot of memory for no reason, but the assembly code that i got from VC sucks |)![|< because it''s not optimized AT ALL... I''ve left only the ASM code of the test() function since that''s what''s important Here is both C and ASM code:
    
//////////////////// TEST.CPP ////////////////

void test(int mine[3][3])
{
	for (int i = 0; i < 3; i++)
			for (int j = 0; j < 3; j++)
				mine<i>[j] = 0xFF;
}

int main ()
{
	int test_array[3][3];

	test(test_array);

	return 0;
}
//////////////////// TEST.ASM /////////////////


?test@@YAXQAY02H@Z PROC NEAR				; test, COMDAT
; File d:\microsoft visual studio\myprojects\dsaff\sadsaf.cpp
; Line 2
	push	ebp
	mov	ebp, esp
	sub	esp, 72					; 00000048H
	push	ebx
	push	esi
	push	edi
	lea	edi, DWORD PTR [ebp-72]
	mov	ecx, 18					; 00000012H
	mov	eax, -858993460				; ccccccccH; Line 3
	mov	DWORD PTR _i$[ebp], 0
	jmp	SHORT $L218
	rep stosd

$L219:
	mov	eax, DWORD PTR _i$[ebp]
	add	eax, 1
	mov	DWORD PTR _i$[ebp], eax
$L218:
	cmp	DWORD PTR _i$[ebp], 3
	jge	SHORT $L220
; Line 4
	mov	DWORD PTR _j$221[ebp], 0
	jmp	SHORT $L222
$L223:
	mov	ecx, DWORD PTR _j$221[ebp]
	add	ecx, 1
	mov	DWORD PTR _j$221[ebp], ecx
$L222:
	cmp	DWORD PTR _j$221[ebp], 3
	jge	SHORT $L224
; Line 5
	mov	edx, DWORD PTR _i$[ebp]
	imul	edx, 12					; 0000000cH
	mov	eax, DWORD PTR _mine$[ebp]
	add	eax, edx
	mov	ecx, DWORD PTR _j$221[ebp]
	mov	DWORD PTR [eax+ecx*4], 255		; 000000ffH
	jmp	SHORT $L223
$L224:
	jmp	SHORT $L219
$L220:
; Line 6
	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	ret	0
?test@@YAXQAY02H@Z ENDP					; test
    
Let''s see what the code does.... ;Line 2 ... sub esp, 72 ... 72 bytes? what the ....? The above code allocates 72 bytes... not only does allocated uneeded memory (when it can simply pass a pointer to the array [which A LOT better], but it allocates double the amount of memory needed). Let''s see... A 3x3 matrix of int would take 36 bytes... 3x3=9x4=36... 3x3 being the dimensions of the array, which is 9 elements, and an int is 4 bytes, that makes it 36 bytes.. why does it allocate 72 bytes? same as you, i have no idea... and that''s the first problem really.. memory.. here comes the speed part (optimization) part... i have the Win32 Release option on, and have compiled it with Optimize for Speed... ; Line 3 ... add eax, 1 ... This should''ve been inc eax ... also, the whoop could have used a do/while type of structure when the value is supposed to go down to zero (meaning the start value in the for loop is zero). so.. mov ecx, 3 repeat: ... do whatever dec ecx jns short repeat when i tried to use a register for the loop index values, it didnt do anything.. it still used memory and im sure it didnt even try to use the regsiters.. C only looks good from the outside but nobody knows what it does when it gets in the low level stuff... when using the above code it''s not going to use all those comparisons and branches like in: ... cmp DWORD PTR _j$221[ebp], 3 jge SHORT $L224 ... and plus it will make the source code smaller in size (well a couple bytes )... AND take a few cycles less.. (in 3D engines a single cycle matters).. ... ; Line 5 mov edx, DWORD PTR _i$[ebp] imul edx, 12 ... first of all, they should keep that "i" index value in a register all the time... that''s very important!! second... instead of using IMUL they can shift the value to left just like you did in your VGA PutPixel routine... so: "i*12" will be the same as "(i<<3)+(i<<2)"... also a lookup table might be calculated prior to the program that has all the indirect addressing values which would make it faster since you dont have to calculated it for each item every time... See? these are just a few things that a programmer can do that the compiler can''t.. the code will be much faster and much more cleaner... just my thoughts.. any comments are welcome... take care! ------------------------------- That's just my 200 bucks' worth!
..-=gLaDiAtOr=-..
Advertisement
Dude, I''ll be honest, I''m not good at low-level stuff at all, and I didn''t even bother reading all your assembler explanations, I''ll trust ur word.

I don''t have time for this, but if u went far enough to analyze all this stuff, I guess you do. Why don''t u compile this same piece of code in VC, Borland, DJPP(or whatever it is) and in VectorC. Then look at the assembly code, and see what you can figure out. If you have a high performance timer, you could also test which code runs faster. Also, there are functions(they only work on NT) that measure exactly how many time slices the system gave to the thread.

I would really be interested in seeing results. I don''t know if u have time to do this and to post results, but if u do, I would be more then interested.
Well, I don''t have the time, since at the time I decided to inform you about the spaghetti code that C++ generates, I was just testing about the pointer passing/value passing type of thing... And I noticed all the garbage that C++ throws in to make it easy on itself. It probably will notice to change i = i * 2 (where i is an integer) with i = i << 1 or by any 2^n values possible, but the thing is that it cant connect two expressions like (i<<3)+(i<<2) to make up for i*12 which would be slower than the shifting bit... Of course this is not very important if your game is not THAT big, but if it starts getting complicated, every single line of code MATTERS to the performance of your game (or any other type of application that needs to execute FAST enough to be user-interactive). If I have time I might try to get some more info and test a few cases to see if VC is really as smart as it claims to be, or is it just all rumors...

Anyways.. if you have any comments/thoughts no matter what they are just write them down... my ears are wide open and are listening .. im out

-------------------------------
That's just my 200 bucks' worth!

..-=gLaDiAtOr=-..
Hi all.

I can''t agree with you more Gladiator. Also, in some other thread I was pointing out that VC++ 6(SP 3) generated the same assenly code for this:

A = A + 1;

as for this:

A += 1;

And someone (his nick escapes my memory) recomended me to use Borland compiler instead. But even then I don''t think a compiler can always optimize better than a good assembly language programmer.

Topgoro
We emphasize "gotoless" programming in this company, so constructs like "goto hell" are strictly forbidden.
The code is correct. I mean if you pass an array to a function that gets array in the parameters, it will generate exactly the same thing in ASM. Compilers don't think.
The " add eax,1 " line, is pairing with the "mov" right before it. And as it is a immediate value, it takes only one cycle.
Your code doesn't change anything on the array that you passed to it. The function should be:
void test(int** mine)
{
...
}

And don't worry so much about this low level thing. I can say for sure that Carmack, didn't write a single line in assembly for Quake 3.

Edited by - blazter on July 24, 2000 5:46:33 PM
quote:Original post by Blazter


And don''t worry so much about this low level thing. I can say for sure that Carmack, didn''t write a single line in assembly for Quake 3.



How do you know this? The engine source hasn''t been released...

well of the little of the source ive seen (none of the engine mind) thers were a few lines of asm not much but it was there.
You''re probably using Standard Edition, right? It does good for only $50! If you want "optimized" code from VC++, then fork out the $300(?) upgrade fee for Professional Edition.

Plus, you must have a LOT of extra time on your hands to go through and do all of this, I wish I had that much spare time. Oh well. Peace Out.
TITLE D:\VCProject\Learn\main.cpp
.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC ''CODE''
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC ''DATA''
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC ''CONST''
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC ''BSS''
_BSS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC ''TLS''
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif
PUBLIC ?test@@YAXQAY02H@Z ; test
_TEXT SEGMENT
_mine$ = 8
_i$ = -4
_j$171 = -8
?test@@YAXQAY02H@Z PROC NEAR ; test
; File D:\VCProject\Learn\main.cpp
; Line 3
push ebp
mov ebp, esp
sub esp, 8
; Line 4
mov DWORD PTR _i$[ebp], 0
jmp SHORT $L168
$L169:
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
$L168:
cmp DWORD PTR _i$[ebp], 3
jge SHORT $L170
; Line 5
mov DWORD PTR _j$171[ebp], 0
jmp SHORT $L172
$L173:
mov ecx, DWORD PTR _j$171[ebp]
add ecx, 1
mov DWORD PTR _j$171[ebp], ecx
$L172:
cmp DWORD PTR _j$171[ebp], 3
jge SHORT $L174
; Line 6
mov edx, DWORD PTR _i$[ebp]
imul edx, 12 ; 0000000cH
mov eax, DWORD PTR _mine$[ebp]
add eax, edx
mov ecx, DWORD PTR _j$171[ebp]
mov DWORD PTR [eax+ecx*4], 255 ; 000000ffH
jmp SHORT $L173
$L174:
jmp SHORT $L169
$L170:
; Line 7
mov esp, ebp
pop ebp
ret 0
?test@@YAXQAY02H@Z ENDP ; test
_TEXT ENDS
PUBLIC _main
_TEXT SEGMENT
_test_array$ = -36
_main PROC NEAR
; Line 10
push ebp
mov ebp, esp
sub esp, 36 ; 00000024H
; Line 13
lea eax, DWORD PTR _test_array$[ebp]
push eax
call ?test@@YAXQAY02H@Z ; test
add esp, 4
; Line 16
xor eax, eax
; Line 17
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END

Debug version on VC++ 5.0 enterprise!!!! You did something way wrong buddy!
So let me get this straight: the complaint from the posters here is:
"Dude, Microsoft''s compiler doesn''t optimize code when you turn off optimization, so it sucks!"??

[getting out shotgun]

Debug = no optimization. You know that code that fills the array with 0xcccccccc?? That''s the debug code--it goes in and intentionally fills uninitialized memory with 0xcccccccc and 0xcdcdcdcd (one for pointers & one for non-pointers, if I remember correctly) so that while you''re debugging you can tell if you''re referencing uninitialized data. This is going to cost more instructions .

If you REALLY want to see what your program compiles to, do this:
1) Start a new project
2) Switch to Release build
3) Go to Project->Settings
4) In the C/C++ tab, in the category Listing Files, change Listing file type to "Assembly with Source Code".
5) Recompile
6) Open the "program.asm" file in your Release directory, where program is your project name.

Be prepared: there will be several lines of source code accounted for by seemingly fewer lines of assembly. This is called code optimization. However, if you see at least one line of assembly for every line of source code, you''re looking in the Debug directory you daft fool.

Here''s what you SHOULD get:
    	TITLE	C:\projects\test\test.cpp	.386Pinclude listing.incif @Version gt 510.model FLATelse_TEXT	SEGMENT PARA USE32 PUBLIC ''CODE''_TEXT	ENDS_DATA	SEGMENT DWORD USE32 PUBLIC ''DATA''_DATA	ENDSCONST	SEGMENT DWORD USE32 PUBLIC ''CONST''CONST	ENDS_BSS	SEGMENT DWORD USE32 PUBLIC ''BSS''_BSS	ENDS$$SYMBOLS	SEGMENT BYTE USE32 ''DEBSYM''$$SYMBOLS	ENDS$$TYPES	SEGMENT BYTE USE32 ''DEBTYP''$$TYPES	ENDS_TLS	SEGMENT DWORD USE32 PUBLIC ''TLS''_TLS	ENDS;	COMDAT ?test@@YAXQAY02H@Z_TEXT	SEGMENT PARA USE32 PUBLIC ''CODE''_TEXT	ENDS;	COMDAT _main_TEXT	SEGMENT PARA USE32 PUBLIC ''CODE''_TEXT	ENDSFLAT	GROUP _DATA, CONST, _BSS	ASSUME	CS: FLAT, DS: FLAT, SS: FLATendifPUBLIC	?test@@YAXQAY02H@Z				; test;	COMDAT ?test@@YAXQAY02H@Z_TEXT	SEGMENT_mine$ = 8?test@@YAXQAY02H@Z PROC NEAR				; test, COMDAT; 3    : {	push	edi; 4    :     for (int i = 0; i < 3; i++)	; 5    :         for (int j = 0; j < 3; j++)	; 6    :             mine<i>[j] = 0xFF;	mov	edi, DWORD PTR _mine$[esp]	mov	ecx, 9	mov	eax, 255				; 000000ffH	rep stosd	pop	edi; 7    : }	ret	0?test@@YAXQAY02H@Z ENDP					; test_TEXT	ENDSPUBLIC	_main;	COMDAT _main_TEXT	SEGMENT_test_array$ = -36_main	PROC NEAR					; COMDAT; 10   : {	sub	esp, 36					; 00000024H; 11   :     int test_array[3][3];; 12   :     test(test_array);	lea	eax, DWORD PTR _test_array$[esp+36]	push	eax	call	?test@@YAXQAY02H@Z			; test; 13   :     return 0;	xor	eax, eax; 14   : }	add	esp, 40					; 00000028H	ret	0_main	ENDP_TEXT	ENDSEND    

This topic is closed to new replies.

Advertisement