Custom calling conventions in VC++

Started by
9 comments, last by phresnel 13 years, 4 months ago
Hi

Is it possible to create a custom calling convention in Visual Cpp (2010)? More specifically I would like to pass the first argument in EAX.


Thanks

PS: The goal is performance so I am not interested in a __declspec(naked) "wrapper" function.
Advertisement
I don't think you can create custom calling conventions. The best you can do is use __fastcall. It will pass first argument in ecx, second in edx and rest of arguments in stack.
I'd also recommend __fastcall. Otherwise, you're going to have to use a nakend function and some assembly to do the calling and function header / footer.
Quote:Original post by bubu LV
I don't think you can create custom calling conventions. The best you can do is use __fastcall. It will pass first argument in ecx, second in edx and rest of arguments in stack.
I know, and that's what I'm currently doing. However, since the function value is returned in EAX the processed argument in ECX will eventually have to be moved into EAX.

In Open Watcom, for example, the default calling convention passes the first argument in EAX which (in addition to saving a register swap/copy instruction) allows some pretty smart optimization when performing chained function calls, because the return value may be passed directly (without any overhead) from one function to another.

Nevertheless, VCpp is generally brilliant at optimization, so I am pretty sure I can achieve the same performance by inlining my functions.

Thanks for your reply.
If you're so worried about performance that you can't afford a single mov instruction, you almost certainly don't want to be using functions in the first place.

Unless you've profiled this, found it's a bottleneck, refactored your code and made all possible algorithmic improvements, you don't want to worry about changing calling convention, or even about assembly-level stuff.
Quote:Original post by Evil Steve
If you're so worried about performance that you can't afford a single mov instruction, you almost certainly don't want to be using functions in the first place.

Unless you've profiled this, found it's a bottleneck, refactored your code and made all possible algorithmic improvements, you don't want to worry about changing calling convention, or even about assembly-level stuff.
Thanks for you reply.

"If you're so worried about performance that you can't afford a single mov instruction, you almost certainly don't want to be using functions in the first place."

- That's not really the issue here. I know that a single MOV instruction (especially the reg-to-reg kind) is performed very fast by the CPU. The point is: If you could choose between two calling conventions, say A and B, and A consistently required one instruction more than B, wouldn't you choose B? Again, I know the concrete difference in performance is very small, but if you did have a choice why not use the faster one (where it made sense to do so, of course) :?

Unless you've profiled this, found it's a bottleneck, refactored your code and made all possible algorithmic improvements, you don't want to worry about changing calling convention, or even about assembly-level stuff.

- As indicated by what I wrote above the contents of the function is not really relevant in this matter, and if you use your disassembler as frequently as I do you should know there are several occasions where it makes a whole lot of sense to worry about calling conventions (and other 'assembly-level stuff').
Quote:
if you use your disassembler as frequently as I do you should know there are several occasions where it makes a whole lot of sense to worry about calling conventions (and other 'assembly-level stuff').

It sounds like you should be working in assembly then, at least for the routines in question. In general, that extra mov instruction is effectively free, because your CPU is waiting for memory.

Quote:
The goal is performance...

Outside of the tightest of inner loops, it is not the instructions but the memory access pattern that will govern performance. Shaving an cache miss is worth many, many saved instructions. And the nice thing is optimising the cache usage can be done without dropping down into assembly.

Finally, no I truly don't believe you really need a new calling convention. If there was a true need for it it would have been created already. Everyone wants high performance - your application is not unique in this regard.
Quote:Original post by rip-offIt sounds like you should be working in assembly then, at least for the routines in question. In general, that extra mov instruction is effectively free, because your CPU is waiting for memory.

Outside of the tightest of inner loops, it is not the instructions but the memory access pattern that will govern performance. Shaving an cache miss is worth many, many saved instructions. And the nice thing is optimising the cache usage can be done without dropping down into assembly.

Finally, no I truly don't believe you really need a new calling convention. If there was a true need for it it would have been created already. Everyone wants high performance - your application is not unique in this regard.

Thanks -- you have some really good points there.

By optimizing cache usage without using assembly do you mean via intrinsic functions? I remember reading an article by a programmer from a demo-group called Farbrausch who mentioned that intrinsic functions can actually lead to more efficient code than inline assembly because it allows compiler-optimization (which isn't the case with inline assembly since these sections are not altered by the compiler), so I guess it's a topic I should look into no matter what.
Quote:
By optimizing cache usage without using assembly do you mean via intrinsic functions?

No, you don't necessarily need intrinsics for this.

Cache optimisation involves restructing the data in memory so that while you are processing it you get the minimal number of cache misses. One term that is getting popular is "data oriented design" - Google it for more detailed information.
Quote:
I remember reading an article by a programmer from a demo-group called Farbrausch who mentioned that intrinsic functions can actually lead to more efficient code than inline assembly because it allows compiler-optimization (which isn't the case with inline assembly since these sections are not altered by the compiler), so I guess it's a topic I should look into no matter what.

Intrinsic functions are a good alternative to assembly. Inline assembly is treated as a black box by some compilers (Microsoft's C++ compiler is a notable one), which prevents optimisation of that code and can also inhibit optimisation of the surrounding code. I believe GCC can optimise inline assembly to some degree, at the very least it can understand it and use the understanding to tune the surrounding code.

Also remember if you can structure your code to be inline friendly when the compiler inlines your functions it can remove these redundant "mov" instructions too. Remember the inline keyword is only a hint, one that is largely ignored by modern compilers in favour of their own heuristics. Google for these heuristics and try get your functions to match these.

Quote:Original post by Dragonion

By optimizing cache usage without using assembly do you mean via intrinsic functions? I remember reading an article by a programmer from a demo-group called Farbrausch who mentioned that intrinsic functions can actually lead to more efficient code than inline assembly because it allows compiler-optimization (which isn't the case with inline assembly since these sections are not altered by the compiler), so I guess it's a topic I should look into no matter what.


I think he mean think your code to be cache friendly.

Take a look at http://macton.smugmug.com/gallery/8936708_T6zQX#593426709_ZX4pZ

or more generally http://cellperformance.beyond3d.com/articles/

which contains good example ;)

This topic is closed to new replies.

Advertisement