Jump to content

  • Log In with Google      Sign In   
  • Create Account

Memory Allocate new


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
23 replies to this topic

#1 monkeyboi   Members   -  Reputation: 188

Like
0Likes
Like

Posted 12 July 2012 - 08:45 AM

char* ta = new char[4];
int size = strlen(ta);
char* tb = new char[4];
Set a break point and monitor the memory address and context.

+ ta 0x02059318 "屯屯" char *


+ tb 0x02059358 "屯屯" char *


size 16 int

That is not what I am expecting. I only allocate 4 char space(4 bytes) for ta pointing to. But it has 8 or more bytes. The interval between ta and tb is 40 in hexadecimal which is 64 decimal. How could I calculate this to byte? and Why I got 16 using strlen? I am using a 64 bit system

If you don't mind please explain it step by step.

Thanks in advance

Jerry

Sponsor:

#2 BitMaster   Crossbones+   -  Reputation: 4098

Like
2Likes
Like

Posted 12 July 2012 - 08:58 AM

When you allocate memory it is (usually) filled with random garbage. Accordingly, calling strlen on any just allocated pointer is undefined. It might return any value. It might crash.

When allocating memory there is also the need to store memory management information somewhere. Where and how much is implementation defined, but placing it in front on the allocated block is one way to do it. That aside, there is no guarantee two successive news will allocate memory that is anywhere close to each other. Assuming new will usually allocate memory in a linear fashion (by no means guaranteed), even right after program startup the runtime library might have already been doing some allocations and deallocations and the first new could be allocated by reusing a hole.

Edit: In summary, strlen is not a viable method to check the length of a memory block. It does something completely different, that is return the length of a C string. Details of memory management are impossible to answer without talking about a specific compiler and build settings.

Edited by BitMaster, 12 July 2012 - 09:00 AM.


#3 BCullis   Crossbones+   -  Reputation: 1813

Like
0Likes
Like

Posted 12 July 2012 - 09:02 AM

Your array variable is a char*, not strictly 4 chars. Pointers on a 64-bit system are 8-byte aligned. (according to wikipedia)

Edited by BCullis, 12 July 2012 - 09:03 AM.

Hazard Pay :: FPS/RTS in SharpDX
DeviantArt :: Because right-brain needs love too

#4 monkeyboi   Members   -  Reputation: 188

Like
0Likes
Like

Posted 12 July 2012 - 09:24 AM

When you allocate memory it is (usually) filled with random garbage

Ok should the garbage be filled inside of the space that is just allocated. Like the example I put earlier ta 0x02059318 "屯屯" char * there are 8 random charactors in ta. But I only allocate 4 chars using new char[4].

Your array variable is a char*, not strictly 4 chars.

I know in 64 bit system char* pointer itself is 8 byte. I want to use it point to a 4 chars space.

#5 monkeyboi   Members   -  Reputation: 188

Like
0Likes
Like

Posted 12 July 2012 - 09:28 AM

there is no guarantee two successive news will allocate memory that is anywhere close to each other.

em this is quite right. but I have tested it several times all the results show the interval is 40 in hexadecimal. I know this can not promise anything but still can explain something.

#6 rnlf   Members   -  Reputation: 1123

Like
1Likes
Like

Posted 12 July 2012 - 09:33 AM

It was pure coincidence, that strlen returned 16. In this case, it could have returned zero, a million or even crashed your program. strlen reads the all the memory from the address you pass as its argument until the first null byte and returns the number of bytes it read. You allocated ta and put nothing in. strlen does not know how many characters you allocated. It's up to you to make sure, you never use more memory than you allocated. You get 8 random characters (which more likely are 16 random chars, just what your strlen returned, only most of them unprintable), because the first null byte in your memory is found after 8 (or 16) bytes.

If you write a null byte to the address pointed to by ta, strlen will return 0, and you the debugger will as well display an empty string. Try it:

char* ta = new char[4];
ta[0] = 0;
int size = strlen(ta);

my blog (German)


#7 larspensjo   Members   -  Reputation: 1540

Like
0Likes
Like

Posted 12 July 2012 - 09:47 AM

If you need space for strings, consider using std::string instead.
Current project: Ephenation.
Sharing OpenGL experiences: http://ephenationopengl.blogspot.com/

#8 monkeyboi   Members   -  Reputation: 188

Like
0Likes
Like

Posted 12 July 2012 - 09:54 AM

If you write a null byte to the address pointed to by ta, strlen will return 0, and you the debugger will as well display an empty string.

ok now I know strlen is very unreliable. And I try the code like this
char* ta = new char[4];
ta[5] = 0;
int size = strlen(ta);
ta[5] definitely beyond the original bound, but it still works and gets the result like

+ ta 0x021b9318 "屯屯?" char *

#9 boogyman19946   Members   -  Reputation: 1063

Like
2Likes
Like

Posted 12 July 2012 - 10:27 AM

Just because it's outside of the array range doesn't mean it's outside of a memory block allocated to your program as a whole. You've probably overwritten memory that was allocated to your program by the OS.
"If highly skilled generalists are rare, though, then highly skilled innovators are priceless." - ApochPiQ

My personal links :)
- Khan Academy - For all your math needs
- Java API Documentation - For all your Java info needs :D
- C++ Standard Library Reference - For some of your C++ needs ^.^

#10 rnlf   Members   -  Reputation: 1123

Like
2Likes
Like

Posted 12 July 2012 - 10:40 AM

strlen is not unreliable, it's just not meant to be used for what you are trying to use it.

But just as boogyman says, try to use standard library classes as much as possible.

Avoid "new" like the devil and if you know you need it, read up on smart pointers before.

my blog (German)


#11 monkeyboi   Members   -  Reputation: 188

Like
0Likes
Like

Posted 12 July 2012 - 11:18 AM

Just because it's outside of the array range doesn't mean it's outside of a memory block allocated to your program as a whole.

I remember the program will automatically chech if it is out of bound, and if it is there will be an assertion or something. I also try
char tc[4] = "123";
//char tc[4] = "1234";failed
tc[5] = 0; // why this line could work??


#12 arkane7   Members   -  Reputation: 213

Like
0Likes
Like

Posted 12 July 2012 - 11:37 AM


Just because it's outside of the array range doesn't mean it's outside of a memory block allocated to your program as a whole.

I remember the program will automatically chech if it is out of bound, and if it is there will be an assertion or something. I also try
char tc[4] = "123";
//char tc[4] = "1234";failed
tc[5] = 0; // why this line could work??



I think this is only because upon declaration the compiler KNOWS it only has 4 bytes in memory to use, so putting "1234" is actually 5 bytes since the zerobyte at the end.

But when you just put tc[5], you are directly dereferencing a pointer and the compiler is not aware of the size of the memory is has 'access' to. I'm guessing the compiler basically says "Oh i see only 4 bytes not 5, error".
But in the other case it just says "Oh put this byte in this memory address". it doesn't check to see if that is legal
Always improve, never quit.

#13 rnlf   Members   -  Reputation: 1123

Like
1Likes
Like

Posted 12 July 2012 - 11:39 AM

Sorry, you are mistaken. In C++, there are not checked array accesses. std::vector has a member function "at" which does just that. Did you learn Java before? They have checked array accesses.

Also
char tc[4] = "1234";
does not work, because it has an implicit null byte at the end, it is equal to
char tc[4] = { '1', '2', '3', '4', '\0' };

This is an error, the compiler may detect. But later on, detection of out of bounds accesses is hard (and in most cases impossible) for the compiler.

Also array indices for a 4-element array range from 0 to 3. tc[4] = 0; is already an error, even though it will only sporadically be detected by crashing the program.

EDIT: arkane7 was quicker Posted Image

Edited by rnlf, 12 July 2012 - 11:41 AM.

my blog (German)


#14 Ameise   Members   -  Reputation: 733

Like
0Likes
Like

Posted 12 July 2012 - 11:41 AM

If you need space for strings, consider using std::string instead.


Knowing and using std::* classes is not a replacement for actually understanding how the language works, and in particular how memory allocation works, particularly with a relatively low-level language such as C++. When he has a good understanding of the fundamentals, a basic understanding of the std::* classes and knowledge that he should use them should come along with it.

To the OP:

The reason that strlen is not returning the size of your array is simply due to the fact that that is not the purpose of strlen.

strlen returns the length of a C String, that is, a null-terminated character array.

You are allocationg 4 bytes of memory. However, that memory is coming out of the heap, and it is likely that there is "valid" memory after it (valid in that it won't crash, but undefined as per C++).

Since it is not initialized, it just has random junk. You are seeing strlen return 16. This is why:

| YOUR ARRAY |
[rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][rnd][0]

rnd is any random non-zero value.

strlen has no knowledge of your array or its size, and only has knowledge of the pointer and that it must seek the null terminator. The fact that you have consistently gotten 16 is coincidental.

It could very well return this:

| YOUR ARRAY |
[rnd][rnd][0][rnd]

In which case, strlen would return 3.

The reason it could CRASH is that it could have allocated your memory at the edge of a page allocation, for instance. When you go outside the buffer, you are now reading unmapped memory, which is a page fault. There are other reasons too, simply put - don't read outside of memory you are aware of.

#15 monkeyboi   Members   -  Reputation: 188

Like
0Likes
Like

Posted 12 July 2012 - 01:24 PM

Ah see Thanks a lot every one

Knowing and using std::* classes is not a replacement for actually understanding how the language works, and in particular how memory allocation works, particularly with a relatively low-level language such as C++. When he has a good understanding of the fundamentals, a basic understanding of the std::* classes and knowledge that he should use them should come along with it.

thumb up lol

However, that memory is coming out of the heap, and it is likely that there is "valid" memory after it (valid in that it won't crash, but undefined as per C++).

So in fact my array is not in the heap?

#16 arkane7   Members   -  Reputation: 213

Like
0Likes
Like

Posted 12 July 2012 - 02:27 PM

any time you use the "new" or malloc (C-style) allocation methods, it goes onto the heap, contrast to putting it on the stack (any local variables for example, like int f =3)

So indeed your array IS on the heap, since you used new. What I think Ameise was getting at is that since it is on the heap there is less chance of encountering important data, such as things like returning addresses (the place to go back to after a function call for example) which are stored on the stack (if you overwrite a return address it may very well cause a page fault and crash the program)

Edit: the point is that you do not want to go out of bounds of any array or invalid memory. Even if you overwrite harmless data, you risk the opportunity to contaminate either your own program executing, the operating system, or other programs. This is why std::string is much better alternative to C-style strings (char[]) because you don't directly deal with pointers. Also it has its own size() function, as well a multitude of manipulating functions. All in all std::string is more flexible, useful, and easier.

Edited by arkane7, 12 July 2012 - 02:30 PM.

Always improve, never quit.

#17 Ameise   Members   -  Reputation: 733

Like
-1Likes
Like

Posted 12 July 2012 - 02:32 PM

any time you use the "new" or malloc (C-style) allocation methods, it goes onto the heap, contrast to putting it on the stack (any local variables for example, like int f =3)

So indeed your array IS on the heap, since you used new. What I think Ameise was getting at is that since it is on the heap there is less chance of encountering important data, such as things like returning addresses (the place to go back to after a function call for example) which are stored on the stack (if you overwrite a return address it may very well cause a page fault and crash the program)

Edit: the point is that you do not want to go out of bounds of any array or invalid memory. Even if you overwrite harmless data, you risk the opportunity to contaminate either your own program executing, the operating system, or other programs. This is why std::string is much better alternative to C-style strings (char[]) because you don't directly deal with pointers. Also it has its own size() function, as well a multitude of manipulating functions. All in all std::string is more flexible, useful, and easier.


My other point still stands in that regards, though, in that he shouldn't be using std::string until he understands CStrings. In regards to the standard lib, I don't usually recommend using things until you have an understanding of how they work.

#18 arkane7   Members   -  Reputation: 213

Like
0Likes
Like

Posted 12 July 2012 - 02:51 PM

I first came accross std::string before i learned about char arrays. This was way back in HS in intro programming. But i feel you are right he needs to understand pointers, allocation, and buffer limitations.

Key thinks that monkeyboi needs to know about C-style strings (aka char arrays):
  • There must be a zerobyte/null byte inside that determines the end of the string. '\0' is the ASCII symbol for it, but 0 is also fine. When using strlen it counts the number of characters up until it find this zerobyte. This is why he had odd answers since the char[] was undefined upon initialization. When you put char[4] tc = "123"; you are really putting '1', '2', '3', '\0', filling it up; this extra step of putting the zerobyte at the end is just a feature of c++ (and C i believe), but only when you put var = "___", if you do it by individual bytes (var[4] = 'b') you MUST put a zerobyte in manually (tc[4] = '\0')
  • Even if it has a zerobyte, never access/overwrite an element outside of its bounds (dont ever use tc[-2] or tc[n], or any larger number, only 0->n-1 ).
  • Keep in mind the compiler will not warn you if you go out-of-bounds, since it doesn't really know the size of the array, only in the case of declaration and putting something too big will it ever tell you something (char [4] tc = "1234", was too big since '\0' is a fifth additional byte added on)
  • When you use new, it goes to the heap in the CPU memory, which is a special place for dynamically allocated memory, or run-time(?) allocation.
  • Always remember that char[] is still a pointer, as are all arrays. tc itself is the pointer to the first element of the array, as in the case above *tc would be '1'; *(tc+1) is '2', *(tc+2) is '3' and *(tc+3) is '\0'. When you use tc[0] or tc[2], it is just a "shorter" way of doing *(tc+2).
  • char* and char[] are in many ways equivalent, just char[] is explicity announcing its array property, while char* could just be a pointer to a single char or possibly pointing to the beginning of a char array
edit: i put this in a post below but this needs to be mentioned along with this
----other things about memory management:
  • When using new, as stated before, it goes to the heap. If you allocate too many items, you will run out of memory unless you deallocate them. When you continually allocate memory but never free it, this is called a Memory Leak
  • To deallocate, you use the delete key word.
    say I put SomeClass* x = new SomeClass(); to deallocate it just use delete x; (keep in mind this calls the destructor for SomeClass and frees the memory that x was pointing to, allowing later allocations to use that memory)
    but if you allocate an array you have to do something else; in the case of char* ca = new char[3]; you will have to use delete [] ca;

Edited by arkane7, 12 July 2012 - 03:23 PM.

Always improve, never quit.

#19 Ameise   Members   -  Reputation: 733

Like
0Likes
Like

Posted 12 July 2012 - 03:08 PM

I'm wondering why I was downvoted without any comments explaining what was wrong with what I wrote Posted Image

I first came accross std::string before i learned about char arrays. This was way back in HS in intro programming. But i feel you are right he needs to understand pointers, allocation, and buffer limitations.

Key thinks that monkeyboi needs to know about C-style strings (aka char arrays):

  • There must be a zerobyte/null byte inside that determines the end of the string. '\0' is the ASCII symbol for it, but 0 is also fine. When using strlen it counts the number of characters up until it find this zerobyte. This is why he had odd answers since the char[] was undefined upon initialization. When you put char[4] tc = "123"; you are really putting '1', '2', '3', '\0', filling it up; this extra step of putting the zerobyte at the end is just a feature of c++ (and C i believe)
  • Even if it has a zerobyte, never access/overwrite an element outside of its bounds (dont ever use tc[-2] or tc[n], or any larger number, only 0->n-1 ).
  • Keep in mind the compiler will not warn you if you go out-of-bounds, since it doesn't really know the size of the array, only in the case of declaration and putting something too big will it ever tell you something (char [4] tc = "1234", was too big since '\0' is a fifth additional byte added on)
  • When you use new, it goes to the heap in the CPU memory, which is a special place for dynamically allocated memory, or run-time(?) allocation.
  • Always remember that char[] is still a pointer, as are all arrays. tc itself is the pointer to the first element of the array, as in the case above *tc would be '1'; *(tc+1) is '2', *(tc+2) is '3' and *(tc+3) is '\0'. When you use tc[0] or tc[2], it is just a "shorter" way of doing *(tc+2).
  • char* and char[] are in many ways equivalent, just char[] is explicity announcing its array property, while char* could just be a pointer to a single char or possibly pointing to the beginning of a char array


Regarding 4, only if he uses the standard non-placement new operator. If it is overloaded, everything's up in the air. Placement new just uses whatever memory you're pointing to.

Regarding 5, 2[tc] is also '3' Posted Image I absolutely hate that syntax, though.

Edited by Ameise, 12 July 2012 - 03:09 PM.


#20 arkane7   Members   -  Reputation: 213

Like
0Likes
Like

Posted 12 July 2012 - 03:20 PM

Why were you downvoted??? you were spot on in everything you mentioned.

oh and yeah i completely forgot about 2[var] syntax. I never use it since its confusing.

What do you mean overloading new? I'm not sure i've come accross that use of new



Also monkeyboi other things about memory management:
  • When using new, as stated before, it goes to the heap. If you allocate too many items, you will run out of memory unless you deallocate them. When you continually allocate memory but never free it, this is called a Memory Leak
  • To deallocate, you use the delete key word.
    say I put SomeClass* x = new SomeClass(); to deallocate it just use delete x; (keep in mind this calls the destructor for SomeClass and frees the memory that x was pointing to, allowing later allocations to use that memory)
    but if you allocate an array you have to do something else; in the case of char* ca = new char[3]; you will have to use delete [] ca;

Edited by arkane7, 12 July 2012 - 03:21 PM.

Always improve, never quit.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS