Sign in to follow this  
Dragonion

Custom calling conventions in VC++

Recommended Posts

Dragonion    131
Hi

Is it possible to create a custom calling convention in Visual Cpp (2010)? More specifically I would like to pass the first argument in EAX.


Thanks

PS: The goal is performance so I am not interested in a __declspec(naked) "wrapper" function.

Share this post


Link to post
Share on other sites
Evil Steve    2017
I'd also recommend __fastcall. Otherwise, you're going to have to use a nakend function and some assembly to do the calling and function header / footer.

Share this post


Link to post
Share on other sites
Dragonion    131
Quote:
Original post by bubu LV
I don't think you can create custom calling conventions. The best you can do is use __fastcall. It will pass first argument in ecx, second in edx and rest of arguments in stack.
I know, and that's what I'm currently doing. However, since the function value is returned in EAX the processed argument in ECX will eventually have to be moved into EAX.

In Open Watcom, for example, the default calling convention passes the first argument in EAX which (in addition to saving a register swap/copy instruction) allows some pretty smart optimization when performing chained function calls, because the return value may be passed directly (without any overhead) from one function to another.

Nevertheless, VCpp is generally brilliant at optimization, so I am pretty sure I can achieve the same performance by inlining my functions.

Thanks for your reply.

Share this post


Link to post
Share on other sites
Evil Steve    2017
If you're so worried about performance that you can't afford a single mov instruction, you almost certainly don't want to be using functions in the first place.

Unless you've profiled this, found it's a bottleneck, refactored your code and made all possible algorithmic improvements, you don't want to worry about changing calling convention, or even about assembly-level stuff.

Share this post


Link to post
Share on other sites
Dragonion    131
Quote:
Original post by Evil Steve
If you're so worried about performance that you can't afford a single mov instruction, you almost certainly don't want to be using functions in the first place.

Unless you've profiled this, found it's a bottleneck, refactored your code and made all possible algorithmic improvements, you don't want to worry about changing calling convention, or even about assembly-level stuff.
Thanks for you reply.

"If you're so worried about performance that you can't afford a single mov instruction, you almost certainly don't want to be using functions in the first place."

- That's not really the issue here. I know that a single MOV instruction (especially the reg-to-reg kind) is performed very fast by the CPU. The point is: If you could choose between two calling conventions, say A and B, and A consistently required one instruction more than B, wouldn't you choose B? Again, I know the concrete difference in performance is very small, but if you did have a choice why not use the faster one (where it made sense to do so, of course) :?

Unless you've profiled this, found it's a bottleneck, refactored your code and made all possible algorithmic improvements, you don't want to worry about changing calling convention, or even about assembly-level stuff.

- As indicated by what I wrote above the contents of the function is not really relevant in this matter, and if you use your disassembler as frequently as I do you should know there are several occasions where it makes a whole lot of sense to worry about calling conventions (and other 'assembly-level stuff').

Share this post


Link to post
Share on other sites
rip-off    10976
Quote:

if you use your disassembler as frequently as I do you should know there are several occasions where it makes a whole lot of sense to worry about calling conventions (and other 'assembly-level stuff').

It sounds like you should be working in assembly then, at least for the routines in question. In general, that extra mov instruction is effectively free, because your CPU is waiting for memory.

Quote:

The goal is performance...

Outside of the tightest of inner loops, it is not the instructions but the memory access pattern that will govern performance. Shaving an cache miss is worth many, many saved instructions. And the nice thing is optimising the cache usage can be done without dropping down into assembly.

Finally, no I truly don't believe you really need a new calling convention. If there was a true need for it it would have been created already. Everyone wants high performance - your application is not unique in this regard.

Share this post


Link to post
Share on other sites
Dragonion    131
Quote:
Original post by rip-offIt sounds like you should be working in assembly then, at least for the routines in question. In general, that extra mov instruction is effectively free, because your CPU is waiting for memory.

Outside of the tightest of inner loops, it is not the instructions but the memory access pattern that will govern performance. Shaving an cache miss is worth many, many saved instructions. And the nice thing is optimising the cache usage can be done without dropping down into assembly.

Finally, no I truly don't believe you really need a new calling convention. If there was a true need for it it would have been created already. Everyone wants high performance - your application is not unique in this regard.

Thanks -- you have some really good points there.

By optimizing cache usage without using assembly do you mean via intrinsic functions? I remember reading an article by a programmer from a demo-group called Farbrausch who mentioned that intrinsic functions can actually lead to more efficient code than inline assembly because it allows compiler-optimization (which isn't the case with inline assembly since these sections are not altered by the compiler), so I guess it's a topic I should look into no matter what.

Share this post


Link to post
Share on other sites
rip-off    10976
Quote:

By optimizing cache usage without using assembly do you mean via intrinsic functions?

No, you don't necessarily need intrinsics for this.

Cache optimisation involves restructing the data in memory so that while you are processing it you get the minimal number of cache misses. One term that is getting popular is "data oriented design" - Google it for more detailed information.
Quote:

I remember reading an article by a programmer from a demo-group called Farbrausch who mentioned that intrinsic functions can actually lead to more efficient code than inline assembly because it allows compiler-optimization (which isn't the case with inline assembly since these sections are not altered by the compiler), so I guess it's a topic I should look into no matter what.

Intrinsic functions are a good alternative to assembly. Inline assembly is treated as a black box by some compilers (Microsoft's C++ compiler is a notable one), which prevents optimisation of that code and can also inhibit optimisation of the surrounding code. I believe GCC can optimise inline assembly to some degree, at the very least it can understand it and use the understanding to tune the surrounding code.

Also remember if you can structure your code to be inline friendly when the compiler inlines your functions it can remove these redundant "mov" instructions too. Remember the inline keyword is only a hint, one that is largely ignored by modern compilers in favour of their own heuristics. Google for these heuristics and try get your functions to match these.

Share this post


Link to post
Share on other sites
Christuff    120
Quote:
Original post by Dragonion

By optimizing cache usage without using assembly do you mean via intrinsic functions? I remember reading an article by a programmer from a demo-group called Farbrausch who mentioned that intrinsic functions can actually lead to more efficient code than inline assembly because it allows compiler-optimization (which isn't the case with inline assembly since these sections are not altered by the compiler), so I guess it's a topic I should look into no matter what.


I think he mean think your code to be cache friendly.

Take a look at http://macton.smugmug.com/gallery/8936708_T6zQX#593426709_ZX4pZ

or more generally http://cellperformance.beyond3d.com/articles/

which contains good example ;)

Share this post


Link to post
Share on other sites
phresnel    953
Dragonion: Agner Fogs optimization guides are a very, very good starting point into optimization. Once you get very good, you might want to look into the AMD and Intel manuals. But it really will take a few years before they can be useful to you.

Don't get lost in making your application 0.3% faster. Start big first, get improvements that can make you application 300% faster, read algorithmic optimizations, data access and cache optimizations, the right compiler flags. Only once you applied those, and iff you are still in struggle then, apply a number of micro-optimizations. But most of the time, they are inane and make your code hard to read.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this