"Guess what the bug is" game..

Recommended Posts

Ysaneya    1383
A bit of context first: i recently discovered a crash bug when writing in a professional application a piece of code to log some strings into a byte buffer (in order to easily dump it to disk later). I'm not going to say what the bug is (it's part of the game!), nor on which O.S./system it happens, but i replicated it here in a simple example program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// and all your usual includes - that's not the point here

char *log_buffer = NULL;
int log_offset = 0;

// Function in which the crash happens:
void log_string(char *txt)
{
const int len = strlen(txt);
*((int *)(log_buffer + log_offset)) = len;
log_offset += sizeof(int);
memcpy(log_buffer + log_offset, txt, len);
log_offset += len;
}

int main(int argc, char *argv[])
{
// allocate enough room for our test:
log_buffer = new char[4096];
log_offset = 0;

log_string("This is a simple log string");
log_string("But this call is going to crash");

// we'll never reach that anyway but:
delete [] log_buffer;

return 0;
}

Will you be able to spot the problem ? Y.

Share on other sites
fractoid    703
Hmm... spider senses point to the line:
*((int *)(log_buffer + log_offset)) = len;
although the only thing wrong with it that I can see is that you might be copying an int to a non-dword-aligned block of memory.

Is it cheating to run it and see? :)

[edit: Can you initialize a const int with a calculated value like that? ]

Share on other sites
Ilici    862
The "const int" ?

Share on other sites
Empirical    190
It functions properly for me!

Share on other sites
CProgrammer    303
hmm,
The const int thing shouldnt be a problem.
Maybe its because you didnt explicitely null-terminate the strings(strlen takes a null-terminated string).
Causing the memory to be full on the second call.

-CProgrammer

Share on other sites
jyk    2094
Quote:
 Maybe it's because you didnt explicitely null-terminate the strings(strlen takes a null-terminated string).

That should be fine. strlen() operates on the argument txt, which in this case is a string literal. So it should be null-terminated.

I also ran the code with no problem. I stepped through it and everything worked as expected. As fractoid suggested, could it have anything to do with memory alignment?

Share on other sites
fractoid    703
He's given us a clue by explicitly telling us that he's not telling us what OS/system - maybe it's something to do with word size? (16 vs 32 bit integers?)

Share on other sites
mikeman    2942
Well, strlen() returns unsigned int, but he uses int for the variables. I don't know in which OS this could be a problem though.

Share on other sites
S1CA    1418
Quote:
 Original post by fractoidHmm... spider senses point to the line: *((int *)(log_buffer + log_offset)) = len;although the only thing wrong with it that I can see is that you might be copying an int to a non-dword-aligned block of memory.

Yep, I agree, on various CPUs like the 68000 you'll get an address exception for that because the write is on an odd address. The BCPL style string brings it all back ;o)

Share on other sites
Guest Anonymous Poster
I'm pretty sure that all compilers will align newly allocated memory on 32 or 64 bit boundaries to avoid weird problems like that. Even if they didn't, there's no reason why it wouldn't work: memory is byte addressable.

Share on other sites
fractoid    703
Hence the crash being on the second call, not the first - the block is DWORD aligned but the end of the string "This is a simple log string" is not.

I'd have to guess that, as S1CA says, it's a misaligned pointer that causes death on some architectures.

Share on other sites
Guest Anonymous Poster
But so what if it's not DWORD aligned? Memory is byte addressable. It just isn't as efficient if it's not DWORD aligned.

Share on other sites
Ysaneya    1383
Fractoid/S1CA found it :)

Memory is byte adressable, but only if your pointer is of type "char *".

If you are pointing to a short (2 bytes), your pointer must be aligned to 2 bytes.

If you are pointing to an int (4 bytes) like in my code sample, your pointer must be aligned to 4 bytes.

Same if you're adressing a long int or a double (8 bytes): memory must be aligned on 8 bytes.

I've seen it happen on different OS / systems, particularly Irix and Linux 64 bits.

It's strange because technically, this code is perfectly correct. Since the value of the pointer is not known at run-time, the compiler cannot issue a warning either.

On Win32 / Linux32, this code works, although i'd expect the access due to mis-alignement to be a bit slower. But it doesn't crash..

Imagine your surprise when you have to track that bug on some code that first looks 100% correct..? *cry*

Y.

Share on other sites
Guest Anonymous Poster
Quote:
 Original post by Anonymous PosterI'm pretty sure that all compilers will align newly allocated memory on 32 or 64 bit boundaries to avoid weird problems like that.

Yes, they do. The *allocation* will be aligned (nothing to do with the compiler there), so "log_buffer" will be at an aligned address.

Quote:
 Even if they didn't, there's no reason why it wouldn't work: memory is byte addressable.

Nope, I disagree. Look more closely and consider that code running on CPUs which don't allow 16-bit (word) or 32-bit (long word) writes to odd memory addresses. The 68000 is a real world example of such a CPU.

The OP says it crashes on the *second* log_string() call. So let's look at why:

First call to log_string()

1) "log_buffer" will be allocated at an even start address, and it will likely be aligned on a 32-bit boundary too (though that's allocator dependent).
So for the sake of this investigation, let's say that the initial value is "log_buffer = 1000".

2) during the first call to log_string(), "log_offset" will be 0 which is what it's initialised to at program start time.

3) So for the first log_string() call, "(log_buffer + log_offset)" is 1000 + 0 = 1000

4) "This is a simple log string" is 27 characters long, so "len = 27"

5) *((int *) casts the memory into an int pointer. On most platforms an "int" is either 16 or 32-bits. For the sake of this discussion we'll say it's 32-bits, or 4 bytes long.

6) "*((int *)(log_buffer + log_offset)) = len;" writes a single 4 byte int containing the value 27 to the memory at address 1000 (log_buffer + log_offset). The first log_string() call doesn't fail because the address is even.

7) after the int is written, log_offset is incremented by 4 bytes (sizeof(int)). So now log_offset = 4.

8) then the string is memcpy()'d, and then the log_offset has the length of the string added to it: so log_offset += 27, 4+27=31, so log_offset = 31.

Second call to log_string()

9) "(log_buffer + log_offset)" is 1000 + 31 = 1031

10) strlen("But this call is going to crash") = 31

11) "*((int *)(log_buffer + log_offset)) = len;" writes a single 4 byte int containing the value 31 to the memory at address 1031 (log_buffer + log_offset).

12) 1031 is an *ODD* memory address, and a whole 4 bytes (aka: dword or longword) is being written to that address. This *WILL* crash on a CPU which doesn't allow odd write addresses (I've written Amiga and Mac code professionally in the past and had to deal with this exact issue...)

Share on other sites
S1CA    1418
oops - the above AP was me. For some reason the board gave me a login screen with "Anonymous Poster" in the username field ?!?!!

Share on other sites
Of course the 68000 will throw an CPU exception that tells you exactly that it was this type of memory access error, and which address in memory that it failed... And your debugger should show you the exact line of asm, where you look at it, look at the registers, and go "D'oh!". Unless you're the type who's only debugging tool is "stare at the source until it hits you", you'll have no problems debugging this one.

Share on other sites
fractoid    703
Works for me. The only time I've ever had to look at the ASM source was recently, while using an embedded C++ compiler which cocked up and forgot to initialize a register. I don't think it's a matter of what level you debug at, more just general experience. Specifically, knowing, as a random and usually-useless piece of trivia the likes of which us techies love to collect, that while intel processors will groan and bear it, other architectures will fall over when given a misaligned pointer.

If you don't know that 0x0b84 is a valid value for an int pointer but 0x0b83 is not, then no amount of staring at assembler will help you.

[edit: obviously I have other debugging methods than just 'sit and stare at the source' but I find it's usually the quickest and easiest way to do it. ]

Share on other sites
abeylin    304
I think the program crashes because the code is ugly.

Share on other sites
Ysaneya    1383
Quote:
 I think the program crashes because the code is ugly.

The code sample is ugly because it's meant to be short & simple. In a professional work environment, sometimes you just don't have the time to make your code nice and well designed.

Y.

Share on other sites
Extrarius    1412
Quote:
 Original post by NamethatnobodyelsetookOf course the 68000 will throw an CPU exception that tells you exactly that it was this type of memory access error, and which address in memory that it failed... And your debugger should show you the exact line of asm, where you look at it, look at the registers, and go "D'oh!". Unless you're the type who's only debugging tool is "stare at the source until it hits you", you'll have no problems debugging this one.
Or if you're doing embedded development on a system that doesn't have a monitor/screen of any time. I guess there might be an in-circuit debugger, but such things can be difficult to use, especially when you can't constantly test your code on the target system because it is being tweaked by the electrical engineers at the same time you're tweaking away at software engineering.

Share on other sites
darookie    1441
Quote:
 Original post by abeylinI think the program crashes because the code is ugly.

Even though this does not relate to Ysaneya's little challenge, I can partially agree with that [smile].
Abusing pointers in such a way is not that great. Especially since this is a C++ program (note the delete [] at the end).

I wonder why the compiler won't complain.
Warning level too low???

But given the amount and complexity of the code on the other hand, it's just as good as it can get.

OT:
Since debug messages are short, why did you use an int for indicating length in the first place? Pascal strings are build like that, too and 'only' use 255 chars max. So the only issue with that would be casting from char to unsigned char (or even leave it as char and max out at 128 chars per message).

Share on other sites
Guest Anonymous Poster
Quote:
Original post by Ysaneya
Quote:
 I think the program crashes because the code is ugly.

(...)In a professional work environment, sometimes you just don't have the time to make your code nice and well designed.

Y.

And how much time did you waste on this debugging session ?

Share on other sites
C-Junkie    1099
Quote:
Original post by Anonymous Poster
Quote:
Original post by Ysaneya
Quote:
 I think the program crashes because the code is ugly.

(...)In a professional work environment, sometimes you just don't have the time to make your code nice and well designed.

Y.

And how much time did you waste on this debugging session ?
HA! owned!

Share on other sites
Dmytry    1151
Quote:
Original post by Ysaneya
Quote:
 I think the program crashes because the code is ugly.

The code sample is ugly because it's meant to be short & simple. In a professional work environment, sometimes you just don't have the time to make your code nice and well designed.

Y.

I heard something like that once.... bit less than 2 years ago, at my first job [grin]_[grin](when we spend waaayy too long time on debugging(roundly 4 times more than i spent on coding!) (in the middle of debugging, (when i said that there's bugs everywhere and proposed big rewrite), "boss" said that it's i can work so well with my own code, but in real world.... ) .i think not putting related data into struct and not checking for overflow in logger(wow, buffer overflow in debug code) is ugly.

Share on other sites
Ysaneya    1383
Quote:
 i think not putting related data into struct and not checking for overflow in logger(wow, buffer overflow in debug code) is ugly.

You are speaking the truth. In the actual program it's not like that, mind you. I *ONLY* did it like that in the sample to show the bug. Why would i put some buffer overflow checks or whatever for a code sample written in 30 seconds to show off here ? Code quality/design is not the issue we are talking about, are we ?

Y.