Jump to content

  • Log In with Google      Sign In   
  • Create Account


C# Bytes Overhead Question


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 22 April 2012 - 03:51 PM

Alright so I'm trying to learn C# by reading C#4.0 in a nutshell - so far so good - but I've stumbled across this little tidbit of horror.


Storage overhead
Value-type instances occupy precisely the memory required to store their fields. In
this example, Point takes eight bytes of memory:
struct Point
{
int x; // 4 bytes
int y; // 4 bytes
}


Technically, the CLR positions fields within the type at an address
that’s a multiple of the fields’ size (up to a maximum of 8
bytes). Thus, the following actually consumes 16 bytes of memory
(with the 7 bytes following the first field “wasted”):
struct A { byte b; long l; }


Reference types require separate allocations of memory for the reference and object.
The object consumes as many bytes as its fields, plus additional administrative
overhead. The precise overhead is intrinsically private to the implementation of
the .NET runtime, but at minimum the overhead is eight bytes, used to store a key
to the object’s type, as well as temporary information such as its lock state for
multithreading and a flag to indicate whether it has been fixed from movement by
the garbage collector. Each reference to an object requires an extra 4 or 8 bytes,
depending on whether the .NET runtime is running on a 32- or 64-bit platform.



1) Is he saying that if I want to store a hundred byte types as data I'd be just as well off storing a hundred int types or float types because they all take up the same amount of space? Lol that's crazy, who's idea was that? I mean a Boolean already takes eight bits, which means that a boolean takes 64 bits now.

2) Also is he implying that a reference type (which I guess is like a pointer?) requires MORE data for the same thing? Shouldn't it contain JUST the administrative overhead or whatever? I mean I'm just pointing to the start of the memory right? Wait what the heckler is "additional administrative overhead" anyway? So pointing to a boolean takes some 16 bytes or something?

3) So structures, those are efficient instead, right? So if I store 4 bytes in a struct it's the same as storing one float right? I don't lose any space to craziness until I combine the two, right? Or is that just in the case of objects then? I thought structs were types and objects were types but I guess not.

This is so confusing, these questions probably don't even make sense lol. Am I even close to learning this right?

4) Here's a link to something that may or may not be related: http://www.simple-ta...llocation-cost/
According to the link arrays are good too, they don't waste any space like the objects or whatever.

But obviously you're not supposed to worry about these things but I would like to understand.


EDIT:

He goes on to say

Of the integral types, int and long are first-class citizens and are favored by both C#
and the runtime. The other integral types are typically used for interoperability or
when space efficiency is paramount.


But didn't he just say it doesn't matter? I must be mistaken. But it's what he said "a byte and a float take 16 bytes". He was talking about some special case then, I guess.

Sponsor:

#2 Bacterius   Crossbones+   -  Reputation: 8158

Like
0Likes
Like

Posted 22 April 2012 - 04:04 PM

It's called memory alignment and it's done because accessing bytes on an 8-byte boundary (the precise boundary depends on the hardware but is usually either 4 or 8 bytes) is much faster, in terms of memory latency, than accessing each byte individually, because of how memory access works in hardware. Your language should provide you a keyword/construct to "pack" a structure or array so that automatic memory alignment is disabled for it, but note that you will pay the price in performance (so you should only do it for serialization or if you're really short on memory).

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#3 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 22 April 2012 - 04:50 PM

Ah so it's much faster? That makes sense, thank you! Memory tends to be not as limited as speed so that's actually a welcomed feature.

But doesn't that make arrays slower because they violate this rule?

Structures do or do not violate this rule?

And does that mean that I can just use int when i need byte? It's just as fast right?

#4 SiCrane   Moderators   -  Reputation: 9388

Like
1Likes
Like

Posted 22 April 2012 - 05:42 PM

It's called memory alignment and it's done because accessing bytes on an 8-byte boundary (the precise boundary depends on the hardware but is usually either 4 or 8 bytes)

This isn't quite right. If you read a single byte it doesn't matter what kind of boundary it's at. If you read two bytes at a time then it only needs to be at a multiple of two. If you read four bytes, it should be a multiple of four. A compiler isn't going to align every single data member to eight bytes. If you have a struct that looks like:
struct Foo {
  byte b1;
  byte b2;
  byte b3;
  byte b4;
}
it's only going to be four bytes long in total. None of the members need any special alignment so the compiler can pack them in tight.

#5 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 22 April 2012 - 06:42 PM

So a struct with multiple data types of different lengths forces the largest type length on all of the struct variables?

But if they are all the same then it doesn't enforce that because they are all already the largest length?

#6 SiCrane   Moderators   -  Reputation: 9388

Like
1Likes
Like

Posted 22 April 2012 - 06:54 PM

No, each variable has it's own alignment independent of what the alignment of what the other variables are. Alignment of other variables only affects what kind of space the compiler has to pack things together. For example:
struct Foo {
  byte b1;
  byte b2;
  short s1;
  int i1;
}
will have size eight, wasting no space. b1, b2 and s1 won't be forced to four byte alignment just because there's an int in the struct.

#7 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 22 April 2012 - 08:24 PM

So why did he say
struct {
    byte b1;
    float f1;
}

would have a size of 16?

#8 Martins Mozeiko   Crossbones+   -  Reputation: 1413

Like
1Likes
Like

Posted 22 April 2012 - 08:53 PM

No, it will be 8.
Because after b1 member there will be 3 byte padding so f1 member can start on address that is multiple of 4.
And size of float is 4. So 1 + 3 + 4 = 8.

#9 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 22 April 2012 - 10:54 PM

Right, he said long earlier which would be 16. I'm still not sure I get how this works though.

If I did float then byte it would still need 8 having 3 padding? Even though the float lands on the right multiple of four?

So if I did a struct with 3 bytes it would only take 3 bytes but then if I did a struct with a float it would still need to start on a multiple of four so padding would come before hand?

And If I did a byte float byte struct it would take 12 bytes but if I did a byte byte byte float struct it would only take 8? Or does the compiler optimize for me?

#10 Martins Mozeiko   Crossbones+   -  Reputation: 1413

Like
1Likes
Like

Posted 23 April 2012 - 12:24 AM

byte + byte + byte + float = 8

But it's starts to get tricky when last member of struct is byte. You should verify this with actual compiler, but my guess would be:
float + byte = 8
byte + byte + byte = 3
byte + float + byte = 12

The reason for this is following - if you put struct in array that the individual floats will always get addresses in multiple of 4. If the float + byte would have size only 5, then second element of 2 element array would have float with address 5, not 8. Because three byte struct doesn't have to align any of its members, it doesn't need any padding after last element.

You know that you can always specify explict offsets of individual members (FieldOffsetAttribute) and exact size of structure (Size member of StructLayoutAttribute). Don't rely on compiler when you need specific layout of structure.

#11 Blind Radish   Members   -  Reputation: 355

Like
0Likes
Like

Posted 23 April 2012 - 04:10 PM

Alright thank you.

I'm just trying to go for both speed and memory where ever I can, obviously, so if float, byte is more efficient than byte, float in every way, that's the way I want to do things.

More importantly though, I'm just trying to understand.

Thanks again everyone.


One last, optional question. Does the padding occur in reference to the computer, or does it occur in reference to the structure?
So (i'm doing this wrong but) address 0x40 and 0x44 would be efficient?
Or (i'm doing this wrong as well but I hope you get the idea) [+0] and [+4] after 0x03 would be just as fast?




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS