Question about possible heap corruption triggered by flowing into a function call

Started by
15 comments, last by mrbastard 12 years ago
Yeah that was a lot of text smile.png I hope I read enough of it.

About maps/arrays:
Well you still have random access in the array, and a more efficient random access too, since you don't have to do the un-neccessary lookup of the mapping.
if you have the objects accessed by integers and map 0 -> Obj1, 1->Obj2, 2->Obj3 etc, you are never helped by the fact it is a map.
A vector would be exactly as flexible, and provide more efficient random access to the elements.
Its just a list of objects stored sequentially in memory, so access is a simple pointer addition.

A map becomes useful if you want to map other things (like strings) to objects, or if you have large gaps in your integer->object mapping.


About enums:
Yeah, how enum namespace works can feel a bit weird, but that is how the standard is.
You can work around it by either declare the enum inside a class it is used with (best if this is the only class that the enum is used with, or if the class is central to some module), or you can put it in a namespace.


About Case1:
You seem a bit confused on pointers... if you pass pointers, only the pointer is copied, and that is just a memory address.
The operator= for the class will not be called.
If you want to copy the object the pointer points to, you have to dereference the pointer. (Object copy = *ptr_to_object;)
If you don't override operator=, that will only copy the object itself, if the object has pointers to other objects, those objects will not be copied.
you have to handle the cloning of any referenced objects yourself (unless you want them to be shared)

don't confuse a pointer to an object with the actual object.
(also don't confuse objects (the instances in memory) with classes)

"this" is just a pointer to the current object, it doesn't assume anything on how the method was called. The C++ implementation will make sure it always points to the current object, when you are in a properly called method.
Note though that it is possible to call a method on a broken pointer, and that might then crash inside that method (if you are lucky), usually at some unexpected point, because "this" then doesn't point to a valid object of that type, which can cause all kinds of nasty side effects if you write to members within that method.
If you are unlucky, it might even crash in some entirely different method.

Maybe something like that is happening.

Case2:
Just a short note on destructors. When you call "delete someObject;", it will only call the destructor for that objects class, it will not automatically call the destructor of any objects within it, you have to do that yourself in the dtor implementation.
Advertisement
About maps/arrays:
Really? I never knew that. I Thought with vectors you had no random access and had to iterate through the elements to get at the element you wanted. Most of my cases are not stored by int though, this just happends to be one of the few cases though. Most of my map stuff is keyed by string.

Case 1: Nah, not confused, just grasping at straws. All my previous attempts to fix this problem have not worked, I went into a highly speculative mode. I know it's the pointer that is copied (That's why I'm using pointers in the first place), was trying to figure out if somehow the type's operator was being triggered (Despite the fact that it shouldn't be.). Heck, all other stuff doesn't make (observable) sense, why should the semantics of function calls be any different? lol

In regards to my mention of the "this" pointer, only reason I mentioned it was (Again, specualtive mode) because it's a pointer which to me implies someone using the "this" pointer implies, to me anyhow, that they're probably using other pointers; and if their using "this", they may want the address since it's not known to them. Ergo, if they're using the "this" pointer, they quite possibly are inside of a pointer object. I know, A LOT of assumptions, and I don't necessarily believe all of those assumptions are the typical case. However, when you get to the point of fighting a bug for days you really start to go down the rabbit hole of speculation and assumption; at this point I figured it may proove more fruitful than my previous attempts at fixing this problem :P

Case 2: Not according to: http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.11 Unless maybe it doesn't apply to pointer members. Which reminds me of a pretty common problem of shallow-copy and deep-copy (Which only works for non-pointers) semantics of maps containing pointers. Something I'm (Unfortunately) VERY familiar with. I've written more p-q tree traversal routines than I care to admit (And yes, binary trees were carefully looked over and over again before deciding to use any PQs :P ).
Hi

I apologise for not reading the entire thread, but if I understand correctly that you're looking for heap corruption and you can build on windows, then I suggest using GFlags to find the problem quickly. Links here. It may well be that you can do the same sort of thing with Valgrind.
[size="1"]
Pretty interesting utility! I never knew about gflags or umdh... Unfortunately I don't think it's really a tool that lends itself to use on every project, since each executable requires some setup (running gflags, and issuing a -i with +ust), then using umdh with the process id. Not to mention the whole debacle of 32-bit exes only being able to be looked at by the 32-bit umdh version; another way to put it, the 64-bit version won't work against 32-bit versions! There was quite a bit more of a mess, that involved me needing more space (over 500 meg for just a debugging tool *sigh*) and my virtual machine not extending the partition, and then I was able to add space to the VM but it came in the form of unallocated space to make another partition out of and I overextended the partion (Because I read that if a page file is on the system partition it can't be extended), so I needed double the space to move the partition, and play this shell game, but then vmware wouldn't give me back the overextension, and I found out that even after moving the page file, system partitions can't be extended (Without 3rd party tools) etc... What luck.

Anyhow, I got gflags and umdh to run, unfortunately, the output was less than stellar. I mean it gave me what looks like the equivalent to a valgrind output, however it was all addresses being referenced, no line numbers, no source files, not even any decorated names -- just addresses; I even had the "_NT_SYMBOL_PATH" environment variable set, but I'm wondering if maybe it's because I didn't have my program's symbol path set. Though, if this were the case, I should at the very least see system decorated names, and I didn't. Fortunately, I did luck out (First time I think since this bug started); I must've merged the changes I did on the copy of the source that was on my Linux host side to the Windows guest os side. This is significant because I was able to re-copy and paste the code from my Windows vm (Which included some new code, I.e. the formal parameter stuff), onto my Linux side and rerun it with Valgrind without any changes! Unfortunately the program segfaults before it gets to the point where I'm at when it runs on Windows. Unfortunately, Code::Blocks (While it's certainly pretty good), has nothing on Microsoft when it comes to tooltip object-inspection. One step forward, two steps back. The GFlags thing was a really good thought though! And your thread hit a number of conditions I've been operating under, especially the one that states: "assume that if you were going to find the bug by examination, you would have already done so", which was point number 2. I beleive point number 1 is occuring but I can't confirm it yet. I guess that's where I'm at; trying to confirm it. Once I do, then it's "Why is that occuring?", and once I answer why is it occuring, the answer to how to fix it presents itself (Which is essentially, "Just don't do that" lol.). I think part of the problem is I've also been operating under the condition of "how do I fix it" before I've even answered what the cause is and why it's occuring.

Something that keeps kicking around in the back of my head is, maybe it's because I'm running this on a virutal machine? But I can't picture it for two reasons: 1) It's VMWare, and while I certainly have my gripes about their software (Don't even get me started about having to modify their out-of-the-box module source to get it compiled under Linux), their product is really pretty good.There are times where I forget I'm actually running Linux it runs so fast and bug-free (Once it's compiled tongue.png ) and 2) (Perhaps the most important) It happends at the same spot in the code everytime, doesn't matter if I restart VS or restart the vm, never fails -- always the same spot. At some point though, I imagine if this continues for much longer I'm going to need to find another machine to run this on, (That is Windows native) to rule it out.
I was suggesting you used gflags with the full heap switch to try to get an access violation at the point the heap corruption occurs - read the second link in the post I linked to for an explanation of this. This access violation will be caught and allow you to attach a debugger so you can see the line that causes the problem. Note that this point may well have nothing to do with the symptoms you're seeing - those occur later.

You say the program segfaults on windows before the point you're interested in. Is this with Gflags full page heap enabled? if so, you just found your heap corruption!

Still even if not - what makes you so sure this is unrelated? Maybe you should see about fixing that segfault first, instead of ignoring it.

Also FWIW, no offense intended - your posts are a little verbose. It's much easier for people passing to see what you need help with if you stick to the point. That's not to say I don't want to chat, just that many people with limited time to offer technical help may not take the time to read your stuff.

Lastly - stop guessing and get some evidence to base your testing on, or you're just wasting your time. wink.png
[size="1"]
Reason for reply delay:
Sorry for taking so long getting back to you all. I had to rewrite a fairly important (But horribly-written) function, and add a ton of other stuff (All unrelated but needed); but I'll spare the details tongue.png

GFlags:
I guess I wasn't following ya too well. I had grabbed the debugging tools like you mentioned, but I went a step further and started following the examples on the Microsoft site; going as far as using umdh.exe. Didn't realize I didn't have to go that far -- only go as far as setting up gflags and attaching a debugger. Guess I overshot that pretty big. Some confussion I did have too though was you mentioned "full heap switch" There isn't an option named that. The closest one I was seeing is "Enable Page Heap".

Progress?:
I think I may have made some progress. It's quite bizarre and I'm not sure if it should be possible. (I'll try to be as concise as possible; I want to explain it for others if they run into the same situation.). If you have a object (Non-pointer) declared in a function, BUT that object acts as a container for pointers and non pointer members and you return the object out of the function; You will wind up having that container's non-pointer members get butchered, yet the pointer members will stay and when you highlight over the object in the IDE, the pointers will be valid yet any non-pointer things won't be (Integers will be rediculous values, boolean flags will set to false, etc). To me, this doesn't seem like it makes sense. I always believed that a non-pointer object has ALL it's members invalidated (Not necessarily the data any pointer members, in the container, point to, but the adddresses they are set to) when the object is no longer in scope; yet I made a change to a function fairly deep in, that returned a pointer that trickled down, and everything seemed to start working.
Sorry - I should have been more specific about gflags - it's fairly arcane and it was a bit much for me to expect you to pick it all up from context. FWIW I've never used (or even heard of!) umdh.exe

Here's what I was trying to describe, from the linked document:
GFlags /p /enable Program.exe / full

Enables full heap options for Program.exe. GFlags
inserts an entire page of protected memory into the
heap after each allocation by your program. Your
program will use much more memory and run much
slower than normal.[/quote]

Having this enabled means that you should get an access violation if you write into part of the heap that you shouldn't.

It sounds like you were returning a pointer to a variable declared on the stack (i.e. not using new or malloc) inside a function. Variables declared on the stack are destroyed at the end of the scope they were declared in. The value of your returned pointer is still the address where the locally declared variable resided, even though the object instance is no longer valid. That same block of memory may be reused later, and may be used for variables of different types and sizes.

You say you expected the memory used by the dead object to be invalidated - it is, but only in that it is no longer valid to assume it won't have been overwritten. It's equally invalid to assume that the runtime will have overwritten the old values. There may be debug runtimes which do this, but in general it would be a waste of time - why reinitialise something the programmer has said he won't touch any more? (which is essentially what you're doing when you let a variable fall out of scope)

There's nothing stopping you looking at the bit of memory as if it was still a valid instance of your object - or reinterpreting it as any other type - this is what you're doing when inspecting the pointer in the debugger. It also means code that returns a pointer to a variable that fell out of scope may seem to work for a while - by sheer chance some of the memory where your object used to be isn't overwritten immediately, and so it 'works'.... until you make a small change (e.g. adding a local variable, which because it's on the stack may end up using the same block of memory that was used inside your function call where your now-invalid object was declared) and suddenly the member variables that still seemed correct aren't any more.

That said, it's not really worth reasoning about which members happen to stay 'valid' and which don't - any such behaviour will be compiler-specific and subject to change, as well as depending heavily (perhaps non-deterministically) on the context. Maybe there is an implementation-specific reason the areas of memory once occupied by your pointer members don't get overwritten as quickly as the areas once occupied by your value members. More likely, it was an artefact of your object's layout and your use of the stack immediately after the function call. Either way - not to be relied upon.

So the moral of the story is: don't return pointers to local variables, and don't rely on undefined behaviour, even if it seems to work!

Glad you got it working now anyway :¬)
[size="1"]
Oh that's interesting, you went the command line approach...the full parameter makes much more sense now tongue.png In the gui there isn't an option that just says full. Apparently umdh.exe, takes the heap and call-stack logs and dumps out a dmp file that is essentially a valgrind type of log file, detailing (What appears to be) call stacks, and I suppose potential mem-leaks. I say "what appears to be" because as in my previous post, they were just references, I hadn't seen a single decorated name (System call, or otherwise); though I hadn't gone through the whole thing.

Yeah, normally I'm pretty careful about not returning pointers to local variables; and had I knew I was writting out something that became an undefined situation I wouldn't have done it (Case of working on such an enormous project it's hard to see the forest through the trees) tongue.png

A reflection on the now solved problem (Can be skipped, if you're uninterested in the "monday-morning quaterbacking"):
However, something interesting to note is, is that the object being returned was a pointer. However, this object had non-pointer members. Despite these non-pointer members being inside of an object that was clearly instantiated as a pointer object, these members were still invalidated -- despite also the fact that the object being returned from the function was a pointer. It has the behavior of a class having it's pointer members declared on heap, and it's non-pointer members declared on stack (When the class is instantiated as a pointer that is.) And somewhere along the fairly long function call chain it must've been returning a non-pointer and threw off the immediate pointer object (The one referenced in my ideone link. What a dangerous situation! Had I not got so hung up on the pointer objects, in the container, still "appearing" to be valid, I might have had the insight to check the non-pointer members to see them become invalidated, because it wasn't until I did notice this (Prior to make the change), that I got my confirmation. If I had made the change it suddenly started working, I wouldn't have had my confirmation. I'm just so freaking thankful that I observed the non-pointer objects become invalidated prior to making one of the deeply nested function calls return a pointer.

Normally I'd chalk this up to some optimization feature setting in the project settings but it's on debug with all optimizations turned off. *shrug* what is further odd though is that I didn't change the boolean flags of this container object to be pointers as well, they're still non-pointers; it's just that in the function where I set pointer declared container-object to, it is now being set to a return pointer. I think I might have been setting to a non-pointer, because I merely just wanted to set by value; which means I must not have had a custom operator set up to handle this and the default one was being used; I dunno...

Some end notes, and my thanks:
Anyhow, what you wrote made absolute sense and confirms my suspicions. I suspect you're absolutely right, it's not worth speculating any further, as it's most likely going to be compiler specific. It just strikes me as a little odd though, because I thought spec states that all members are guarenteed to be invalidated; though I know even commercial compilers aren't in complete spec. Thanks VERY much for the help, and to everyone else that also helped! smile.png


(If you happen to reply, I'll probably still refresh my browser window on occasion for a day or so, to make sure I hadn't missed any replies. However, unless absolutely necessary, I won't reply so that it can eventually fall off from the 1st page of threads.)
Happy to help. Sounds like you're on the right track, but there's still a couple of things you're missing. I think I understand the source of your confusion now, so I'll try to briefly dispel a couple of misapprehensions:

1) You keep mentioning the distinction between normally declared objects and those declared 'as a pointer'. You can't declare an object 'as' a pointer - you declare a pointer (which is an object in itself) and assign it the address of an object. The object pointed to can be on the stack or on the heap. The distinction you actually want to draw is between objects on the stack and those on the heap. The lifetime of objects on the stack is tied to the scope they're declared in. The lifetime of objects on the heap is managed by the programmer.

I realise you already mostly understand all this, I just think you may be tricking yourself by thinking in terms of objects declared 'as' pointers instead of in terms of the stack or heap.

2) You're still imagining the out-of-scope object being explicitly 'invalidated'. It's not. It may or may not have been overwritten as it's used for something else, but nothing is going to explicitly 'invalidate' it in the sense of setting it to a certain value. There may be debug runtimes that do this - the win32 debug runtime sets freed heap memory to 0xfeeefeee for example, but it doesn't do this for things allocated on the stack. For the stack, you get 0xcccccccc for uninitialised memory, but AFAIK it does nothing to freed stack memory.
[size="1"]

This topic is closed to new replies.

Advertisement