• Advertisement
Sign in to follow this  

Question about Data segment/Code segment

This topic is 1454 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Consider the following piece of code running on a 16-bit microcontroller (16 bit address bus width). My questions all concern what happens before main() is called.

unsigned short foo = 156;
const unsigned short bar = 12;

int main()
{
   return 0;
}

Is my understanding correct that:

  1. The address locations in the heap for foo and bar are known at compile time and are stored in the data segment of the binary file
  2. The sizes (memory consumption) of foo and bar are known at compile time and are stored in the data segment of the binary file
  3. The values of foo and bar are stored in the data segment of the binary file, separate from the size declaration of foo and bar
  4. Before main() is called, the memory for foo and bar is allocated and filled with their respective values on the heap, always at the same offsets
  5. In the case of an embedded system (microcontroller), memory space for "bar" is not allocated in RAM, but is directly read from ROM when required, since it was declared const and cannot change its value.
  6. In the case of an embedded system (microcontroller), memory space for "foo" is allocated in RAM and its value is copied from ROM into RAM before main() is called, from which the value can be read and written to when required later on.

My two main questions are:

  1. Without starting this program, how much "disk space" do "foo" and "bar" actually consume? There obviously has to be information on what they are, what value they have, and where they will be stored in memory.
  2. When hard-coding values (such as "foo=2"), how is that number "2" stored?

Thanks

Share this post


Link to post
Share on other sites
Advertisement

Hi,

I assume you are talking about a microcontroller similar to AVR or PIC. Those have separate address space for code and data: when the program executes, instructions are read from flash and they can read/write SRAM. To read data from flash you have to use a special instruction, that is a bit slower: LPM (load program memory).

Since you only program flash memory and ram is volatile, you do not really have 'data segment' in the final binary image. So some notes:

1, 2: They are not on heap (heap is what malloc internally uses), they are given locations and sizes in SRAM, which you can see if you objdump the binary.

3, 4: Since SRAM is erased with no power, they must be initialized to 156 and 12 each time just before main() starts. The compiler will generate code that copies those values to correct locations and then calls main().

5: Compiler will optimize it and embed the constant into the instruction. But no LPM will be used unless you instruct the compiler to do so (there is special macro PROGMEM to define a constant / string that is stored in flash.

6: Correct

 

And your questions:

1: All initialized variables will usually be close together, so there is just (src_flash, dst_sram, size) for the whole block + the actual values. I'd say it's typical in any binary.

2: If you use some constant values / numbers in code, their value is stored inside the instruction itself. For example LDI reg, 8-bit-value:

1 1 1 0 K K K K h h h h K K K K LDI Rh,K

 

Share this post


Link to post
Share on other sites
As Bregma points out, that information is highly dependent on other factors.


For example, it may be that a constant is used in the code but thanks to optimizations may be nearly eliminated. Instead of showing up as a single number in memory, it may show up as an additional calculation in the code rather than an object in memory, or as value left on the stack as a side effect from another operation. Just because you have a value in your C or C++ code does not mean it exists in any particular location within the executable.

Also note that compilers sometimes do seemingly strange tradeoffs. You might think it is better to keep a number constant as the number of times in a loop, but the compiler and optimizer might decide it is more efficient to unroll the loop; it requires more space but it will run faster. Your single byte of '12' might vanish into twelve copies of a loop requiring 280 bytes of space.

Is there a reason you are asking this question?

Share this post


Link to post
Share on other sites

Thanks for all the feedback! I see it's highly hardware and compiler dependent, but I'm satisfied with the answers.

 

Is there a reason you are asking this question?

 

General curiosity. Programming micro controllers makes you realise how sparse you suddenly have to be in comparison to programming for PCs, given the limited hardware, and I simply wondered how the binary file was structured.

 

The particular device I'm working with is the dsPIC33FJ06GS001, in case anyone was wondering. It's one of Microchip's newer line of controllers for digital signal processing.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement