GCC Name decoration question

Started by
19 comments, last by StakFallT 12 years ago
After posting recently, I was reluctant to adding another post; fortunately this one is less about code and more about just getting information though. Bad part is, it possibly requires someone with intimate knowledge of the internals of GCC.

My question derived from a problem occurring by GCC's name decoration. When GCC compiles a program (Not at the linking stage yet), how does it determine that a function's return type can be coerced to the object that it is being set to?

Ok, that question right away to me, makes me see some ambiguities in what I mean (Some may interpret my question as: "Well a float can be truncated or rounded in value and stored in an int, or an int can be casted to a float...".). That's not what I'm asking smile.png Let me elaborate on my question. (It's going to sound a little "ramblish", but I think it may be necessary to demonstrate where I'm coming from and help anyone reading to understand where my confusion is. I could probably trim a line here or there, but I really think much of it is needed, unfortunately.)

I've put some code up on ideone.com located here: http://ideone.com/pKty2

The problem caused by the name decoration system utilized in GCC (That is somehow works around.):
My confusion stems from the fact that return types are not rolled into decorated names (They are with Microsoft's cpp compiler though; not sure about the .NET compilers...). Anyhow, ok so on line 31, we have an expression, more specifically a line that translates into an assignment operator call. According to the Annotated C++ Reference Manual, it's to catch certain errors. What it does not do is elaborate. What I get from its example is that the design choice was in reaction to a problem stemmed from rolling the return type into the decorated name. More significantly and specifically, because rolling a return type into the decorated name would now make identical function names that have identical function arguments have unique decorated names; since the return types are different and they (The return types) become part of the whole of the decorated name; making the entire decorated name different. (The book doesn't state the last few bits -- a major blunder imho since that's the problem it was trying to avoid; It only stated what was done and gave some code but you're left to logically legitimize the design choice by figuring out what the problem would have been.)

A worded example being played out that progresses in complexity:
Let's start out easy, and look at a normal simple function call with no parameters and just flat out function call (No assignments, etc.). (Rhetorical question) How does the compiler get the correct function? Through the decorating of the scope plus the name plus the arguments, then it goes down the list in the symbol table matching the signature and grabs the address, spins up a function call thunk (Might be stub; not sure on proper terminology here), and emits it as intermediate code representation. No problem, pretty straight forward, pretty simple and easy. Now let's, throw some arguments into the mix. Again, looks up the function matching the signature no problem.

However, in going back to the original problem, let's say there is a single function (In a class or not -- kind've irrelevant for the purposes of demonstrating my question.) just to make it easier. And let's say that function has no arguments, so it's argument list is "v" for void. Now we go back to assignment statement. Somehow the compiler knows whether or not that function (matching the scope, name, and argument list) is of an acceptable return type to the L-Value; despite the return type not being rolled into the decorated name. My question is how?! Please note: this question is irrespective to whether an int can be casted to a float, or a float truncated to an int, etc. My question is the compiler knowing the return is an int or a float -- or the more complex example, a compound object (a struct, class, etc...), which I suppose would be the same implementation in determining if it were a first-class/primitive type (int, char, bool, float, etc...).


My work towards trying to figure this out:
I started thinking about the Cartessian Product Algorithm (CPA), making a super set of smaller sub-sets, etc. But then I realized, wait a min. that doesn't answer the question, because if I'm able to use the CPA then that implicitly assumes I've already discerned the type returning by that function. Then I thought of the typical binary trees. Then I realized: "Well if that's the case, then the return type IS being referred to on some level at some point, somehow -- just not in the decorated name". Then I tried looking at the gcc code, that didn't go so well. heh. I can read code just fine, but the formatting, imho, was just about the most hideous format I had ever seen. I've done a bit of research on it, but I haven't found the answer to how the compiler knows to kick an error out on an assignment statement without knowing it's return type (Or maybe it does?). I know it's obviously not the decorated name, as per the Annotated C++ Reference Manual, and several sites I found online detailing the naming scheme. I've even tried checking the dragon book. This is what I found it says:

p. 391: (Continued from p. 390 on "6.5.3 Overloading of Functions and Operators")
"It is not always possible to resolve overloading by looking only at the arguments of a function. In Ada, instead of a single type, a subexpression standing alone may have a set of possible types for which the content must provide sufficient information to narrow the choice down to a single type"

p. 423:

  • "Function types. The type of a function must encode the return type and the types of the formal parameters"

Problem 1) Unfortunately, the dragon book is heavily based on designing a java like language, which may have different internal semantics than say GCC.
Problem 2) The first snippet references Ada. I'm not interested in Ada, I'm interested in C++ tongue.png

Then I started thinking well maybe it can tell if the types are equivalent enough based on the memory size of the types, but then I realize, again: I'd have to first know the type being returned by the function. There are plenty of solutions I can come up with (Essentially all boiling down to throwing more information at the problem, or rolling the return type into the decorated name -- which is essentially throwing more information that the problem.), but I kinda, personally, consider GCC the defacto standard, not Microsoft, not Borland, etc (Though they are very respectable compilers, and I don't have a single thing against them.). There's just still a piece of the puzzle missing... Any thoughts on this would really be appreciated! I'm kinda at a standstill until I figure this out...
Advertisement
I think you've got part of the process reversed. GCC doesn't determine the function type from the name decoration, it determines the name decoration from the function type. In order to call a function you need to provide a function declaration. GCC determines the arguments and return type from that function declaration. When it emits a call it generates a decorated symbol name from the function declaration. GCC won't go looking around in your generated object files for symbol names and extract functions from them, that's what header files are for.
Right. I follow that (At least I think I do.). If I understand you correctly, the function declarations in the header files cause a decorated name to be generated. But also, by the sounds of it, you're saying that the symbol tables don't hold the decorated names -- i.e. the decorated names are generated on the fly each time the call is made? Even still, wouldn't return types still have to be part of it?

If I understand the design correctly, GCC should generate the decorated name for each function declaration that the parser comes across. Then upon definition of the function, the code is then inserted into the function. And any call to any function, triggers GCC to generate a decorated name string to be used against the symbol table in the effort to find the appropriate function whose decorated name (That was already entered via a declaration previously.) matches that string. If it does, a call stub is created, if it does not, it errors that an "overloaded function conversion" (If it's an assignment statement.) cannot be found (Or whatever the error is.). So the decorated names are created so they can then act as uniquely named "bookmarks" for any calls to those functions. Is that right? If it is, you mentioned the return type is is part of the generated decorated name; how does that square with the ARM, and various sites that say the return type is not part of the decorated name?

Edit: I think I know why you may be saying I'm getting it reversed.
GCC doesn't determine the function type from the name decoration[/quote]

Well normally yeah, it doesn't. However, (this is what I'm trying to figure out), when you have these decorated names in a symbol table, and a call, as part of the assignment statement, is made to a function, how does the compiler know that the selected function call has an appropriate return type? It must do some kind of checking, otherwise it wouldn't error. In my example, (The ideone.com link I posted in the opening post.) I envision GCC to undergo these steps:
1) See the statement (On line 31 I believe, if memory serves correct.)
2) Generate a decorated name for the function call Calculate (Based on the scope path, the name, and arguments)
3) It then sees a function in its internal symbol table list and selects it
4) It then returns back to where the call initiated (Line 31), and says "Wait. No that's not valid, I'm passing you back a cTest object, and the integer type does not have an assignment operator to handle this (cTest) type so I'm emitting an error instead.".

How does it know what it's returning, so that it can make the determination whether it should error or not?
(this is just guessing)

When the compiler finds a function declaration, it adds the unmangled name and it's arguments to a list. When it stumbles upon a call to some_func() it iterates through that list and tries to find a function called some_func() which takes the arguments passed at the call site. If no exact match is found, then it must take implicit conversions etc into consideration. If it finds an overload that can be called with the given arguments, then it can verify that the return type of that function/overload can be used at the call site (for the assigment in your case). The name mangling has nothing to do with this.

[Edit]
The mangled name is just a unique identifier for that function/overload. It's not used for anything except referring to a specific function/overload.
hmmm...

[...] it can verify that the return type of that function/overload can be used at the call site [...][/quote] Right, but how does it know the selected function's return type without it being in the decorated name? I can't picture it executing through the code of the function to get the type of object that is going to be returned and then go with that, doing so would have complexities far beyond anything that can be imagined (Things like the order of the functions having their definitions filled out by having been fully parsed -- otherwise you wind up into a situation of one function call, calling another that also hasn't been finished and thus the return type isn't yet known, etc.)

By the sounds of it, I guess the call site is the answer-- upon declaration of a function a call site is created and has the return type stored there (While the return type is readily accessible) -- which means the return type IS being stored... But I thought call sites were only generated when a function call is made not when a function call is declared or defined? If it's being generated when a function call is hit, it's already too late to be storing information about the call it's trying to make since the return type is no longer able to be stored as the parser has moved on since it saw it. In the meantime I'm going to keep researching call sites, I kinda thought I read they were generated on the fly when a function is being called though. Plus, I thought call sites were mostly a construct in Run Time Environments, longer after the compiler stage?


On a side note, doing some quick searching on yahoo for call sites, I stumbled across a news article:
http://www.phoronix....item&px=MTA3NTE

Happy belated 25th birthday GCC! smile.png
Forget all about name mangling. That's not relevant at all here.

The parser parses the preprocessed source file from top to bottom. For every declaration it comes across (function, member variable, method, global, local variable, ..), it stores it in some table/list/whatever. When it parses a function/method body and comes across a function call, it looks up the function name in the table. If there are multiple functions in that table with the same name, it'll try to find one that can be called with the given arguments. Once it knows which function/overload is called, it knows the return type of that function/overload since that is stored in the table. Now it sees that the return value is to be stored in a variable. Since the variable and its type is stored in the table, it's easy to verify that the return type can be stored in that variable.
Well that was my thought exactly, was that the return type has to be stored -somewhere- if it's not in the decorated name. Because you said exactly what my thoughts were, if it's stored in the table it EASY to determine it's type -- follows my original thought was to just throw more information at the problem. The problem is, I haven't found anything that actually says it is stored in a table, list, call site, etc. Nothing. No mention of the return types ever, only to mention it's not decorated.

All the documentation I ever find on this issue is like it's saying: "I'm thinking of a number. I won't tell you what it is, but I'll give you one hint as to what it's not (In this example three hints), it's not a negative number, it's not a fractionalized number, and it's not an imaginary number." and then be left guessing what the number (In this case, the piece of the puzzle) is. Reminds me of how some of my previous math teachers taught: "I'm going to skip this step here here and here; and while your mind is spinning and is busy spending precious cpu cycles on trying to make sense of things I'll be long past the point I'm at now, which is already past where you are presently.". heh.

What you explained, is this referenced somewhere, or do you know this to be fact or is it a hunch? If it is a hunch, any ideas on how I can go about proving it, besides diving into the GCC code again?
I just assume that's how GCC does it.

Exactly how this is done is an implementation detail and might differ from compiler to compiler, versions of a compiler or even depending on compilation options. Some compilers/versions/settings/whatever may use a hash table, others a binary tree, some might even use an SQL database.. What you should be worried about is not exactly how it works but rather how it's expected to behave (documented in the standards) and that behaviour could be achieved by the steps I outlined above.
Yeah I know about how it can differ amongst compilers. Was just trying to follow GCC's way of doing it, since it's the defacto standard (To me that is). Then again as a con-gcc side, if GCC's source code is any indicator of how to do something, then it's most certainly in my best interest to not do it that way. Flipping back towards a pro-gcc side, 25 years is a long battle tested run. Though it makes me wonder how much of their code they would love to change and redo, but because it's so enormous it would take -WAY- too long and would be quicker to probably just do a rewrite -- which sides back towards not following their way. Either way, the ways you mentioned about hash tables, binary tree, or the potential of sql databases all implicitly state that the return type is being stored -somewhere- -- it's not just discerned automagically. I guess throwing more information at it is my answer. I just wish there was some more concrete answers as to how GCC does it was out there and then base whether I follow that route or make my own. Like you said, following the expected behavior should all that I need to do. Which is kinda a huge weight off my shoulders knowing I'm free to implement how I want. Thanks to you and SiCrane! :)
Actually, it doesn't HAVE to be stored somewhere, in theory the compiled could reparse the entire file to find the declaration it needs whenever it stumbles upon a function call. It would work just as good as storing it in some table would, just be A LOT slower.

This topic is closed to new replies.

Advertisement