• Advertisement
Sign in to follow this  

What programming languages does support VLAs?

This topic is 743 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I mean what programming languages out there support C-like VLAs (which are stored on the stack rather than the heap), other than C itself? As you probably don't know VLA stand for a variable-length-array which is defined in 'C' in a way that most compilers will store them onto the 'stack' (instead of how most modern languages instead use the 'heap'). In case you do however I'll be glad if you help me to find non-100 years old, legacy carrying languages that meet this requirement.

Share this post


Link to post
Share on other sites
Advertisement

And besides 'C++' too. I already knew that one.

C++ actually doesn't support them as a language feature. It's an area where C99 and C++03 actually diverged from each other.

 

I use them in C++ by implementing my own scopes and stacks that are separate from the language's default call-stack and block scopes.

Edited by Hodgman

Share this post


Link to post
Share on other sites
They're especially a hassle to optimize at the assembly level. Normally if all local variables are fixed size, their stack offsets are constant and can be kept directly in instruction operands. The compiler can also optimize out the frame pointer because it can control and deterministically know where the stack pointer is moving throughout the entire function because everything pushed or popped has constant sizes.

When you have VLA's... you can still reserve fixed locations for fixed size stuff, but a VLA wrecks the compiler's knowledge of anything on the stack afterwards, and you can no longer optimize out the frame pointer (since the stack pointer now has an offset from the frame pointer that can't be known at compile time). When you index a VLA, the compiler has to make a dereference relative to the start of the VLA. Since the stack almost always grows towards negative addresses, this means the start offset (even for the single-VLA case) is dynamic and has to be stored somewhere. The obvious implementation counts the number of VLAs a function uses and reserves extra fixed-position pointers on the stack to keep track of where the VLAs end up. It can deterministically figure out the position of *one* VLA the same way that normal frame pointer omission works, but after the first you need to have pointers. Manipulating any VLAs after the first then requires additional loading of those pointers off the stack or keeping them in registers (as opposed to the optimal method of baking the offsets into operands), leading to an extra dereference and more register thrash (which is worse since you're also forced to keep your frame pointer).

Static analyzers that warn you about array indices being out of range have two dynamic things to track (dynamic length, dynamic index) instead of one (static length, dynamic index).

It's pretty gross and leads to a bunch of extra instructions to deal with it. I avoid VLAs and alloca at all costs. Edited by Nypyren

Share this post


Link to post
Share on other sites
I already know how VLAs are represented. And I still like them. It's my own personal opinion. However my problem is that I can't find any good-languages (besides C/C++ and the ones listed on the Wiki-Page describing what VLA is) that supports them (I mean proper by default support, without for example the need of calling a function like 'alloca').

Share this post


Link to post
Share on other sites
And also in a matter of fact - VLAs are very useful as you have already pointed out when there you've used only a single such variable. Because in such cases they are created faster then calling for example 'malloc'. Another thing is as you probably know newer 64bit processors have a lot more registers then 32bit ones so I'm sure that even using more then a single VLA can be optimized and work faster then 'heap'.

Share this post


Link to post
Share on other sites

IIRC, some C compilers implement VLAs using malloc/free. Not sure where we encountered it, but my guess is some vendor supplied compiler for one of those non-arm microcontroller architectures.

Share this post


Link to post
Share on other sites


Because in such cases they are created faster then calling for example 'malloc'.

There's nothing to stop you pre-malloc'ing a large block of memory, and then using a stack (FIFO) allocator to allocate smaller arrays from that block. You've now amortised the cost of a single malloc call across all array allocations, and it's likely just as fast as using VLAs, since it too behaves like a stack.

Share this post


Link to post
Share on other sites


Another thing is as you probably know newer 64bit processors have a lot more registers then 32bit ones so I'm sure that even using more then a single VLA can be optimized and work faster then 'heap'.

What does having more registers have to do with anything?

Share this post


Link to post
Share on other sites

 

And besides 'C++' too. I already knew that one.

C++ actually doesn't support them as a language feature. It's an area where C99 and C++03 actually diverged from each other.

 

I use them in C++ by implementing my own scopes and stacks that are separate from the language's default call-stack and block scopes.

 

Clever idea.  So i take it you just have a special 'stack allocator' that really uses the heap but works like a stack under the hood?

Share this post


Link to post
Share on other sites


There's nothing to stop you pre-malloc'ing a large block of memory, and then using a stack (FIFO) allocator to allocate smaller arrays from that block. You've now amortised the cost of a single malloc call across all array allocations, and it's likely just as fast as using VLAs, since it too behaves like a stack.

 

Except it might not be thread safe and you might not have a suitable way to identify which thread you are on or be able to pass a thread local pool/heap-stack/whatever along. An example is a callback from a library, that might be called from multiple threads. In my case, it was an OS callback that was fired on (among others) malloc() and free(). The options were VLAs or always worst case.

Share this post


Link to post
Share on other sites

So i take it you just have a special 'stack allocator' that really uses the heap but works like a stack under the hood?

Pretty much. I try to avoid all globals in my current engine, which includes "the heap", as it obviously is a global data structure. I create quite a few different stacks (and pools, and heaps) for different purposes. The Dice scope stack allocation presentation gives a good idea on how to implement this while retaining C++ niceties such as RAII and constructors/the object model.
 
Back in the PS2/Wii era, we actually pre-allocated all the RAM and then split it up by geographical regions in the game world! e.g. we'd have 3 "level chunk" sized buffers, each of which had a "cursor" for stack allocations. When streaming in a new chunk of a game level, we'd pre-create all the objects possibly required within that chunk's stack. This let us have 2 chunks loaded at a time, and a 3rd one streaming in / being constructed. If you needed temporary / per-frame memory, you could record the cursor, allocate some more objects on the stack, and then reset the cursor back to your 'recorded' value to 'erase' them. When unloading a chunk of the world, we'd just reset it's cursor to the beginning of it's buffer, to indicate all those objects were gone. This was back in the "C with classes" style of C++, so we didn't simply call destructors, etc... and had custom shutdown logic where required.

In my case, it was an OS callback that was fired on (among others) malloc() and free(). The options were VLAs or always worst case

Change Swift's quote to start with "In most cases, " then wink.png

Share this post


Link to post
Share on other sites


Except it might not be thread safe and you might not have a suitable way to identify which thread you are on or be able to pass a thread local pool/heap-stack/whatever along. An example is a callback from a library, that might be called from multiple threads. In my case, it was an OS callback that was fired on (among others) malloc() and free(). The options were VLAs or always worst case.

So use a stack allocator per thread. That's what you are doing by using VLAs, anyway, since each thread has its own stack.

Share this post


Link to post
Share on other sites

So use a stack allocator per thread. That's what you are doing by using VLAs, anyway, since each thread has its own stack.


Not sure what you mean here? (honest question)

Are you talking about a pre-allocated (on heap) pool with stack like semantics? How could I do that without passing anything along to my function? Some compiler extension which allows per-thread globals?

Or are you talking about a stack-allocated buffer on each thread? Either we'd have to allocate it early on and pass it along, or we'd have to create the buffer each time we enter the function, but how would that be better than using a fixed-size array of worst case size?

Or are you talking about using some macro/whatever to manipulate the stack pointers explicitly? Would this be better than using compiler supported VLAs in some way? (again, honest question)

Share this post


Link to post
Share on other sites

So use a stack allocator per thread. That's what you are doing by using VLAs, anyway, since each thread has its own stack.


Not sure what you mean here? (honest question)

Are you talking about a pre-allocated (on heap) pool with stack like semantics? How could I do that without passing anything along to my function? Some compiler extension which allows per-thread globals?


You can use thread-local storage which is supported by most compilers, and as of C++11 is part of the standard.

You can also pretty easily determine what pool a pointer came from simply by address. Each pool knows its starting address, and its size, so if your pointer address is between the starting address and the starting address + size, then it belongs to that pool, and that pool can deallocate it. If you then need to deallocate from a different thread you can then either use locks to manipulate said pool immediately, or post a message to that thread to tell it to deallocate said pointer itself later on.

And since you're allocating from the heap, you can make the pools as big as you want, rather than being limited to stack size.

Of course, a stack allocator will quickly have fragmentation issues, so it is best to use one for either short-lived allocations or very long-lived ones. (Unless you want to get into the fun topic of compacting heaps ;) )

Share this post


Link to post
Share on other sites

You can use thread-local storage which is supported by most compilers, and as of C++11 is part of the standard.


Whoa, I knew TLS existed but I had no idea it was part of the C++ standard. Unfortunately, C++ is not an option and TLS does AFAIK not exist for the platforms and compilers I need to support. sad.png

To be able to find a pool by pointer, I still need that first pointer which I'm afraid I don't have. Also, performance implications. A global "stack" with some form of synchronization would likely have performance implications in addition to what I can only imagine would be serious fragmentation problems which would require some form of cleanup step (again, performance).

Share this post


Link to post
Share on other sites

Of course, a stack allocator will quickly have fragmentation issues

Well, a stack allocator doesn't suffer from any fragmentation if your allocations are actually stack-based. This is exactly the concept behind Dice's scope stacks.


Correct. If you deallocate in reverse order of allocation then you won't ever fragment. (Whether that is in some sort of "scope" blocks or raw allocations)

Hence my mention of "short-lived allocations". We tend to use a stack-based allocator for when we need to allocate temporary memory that isn't going to outlast the function (or at most the frame), but which cannot be on the stack for whatever reason.
 

You can use thread-local storage which is supported by most compilers, and as of C++11 is part of the standard.


Whoa, I knew TLS existed but I had no idea it was part of the C++ standard. Unfortunately, C++ is not an option and TLS does AFAIK not exist for the platforms and compilers I need to support. sad.png

To be able to find a pool by pointer, I still need that first pointer which I'm afraid I don't have. Also, performance implications. A global "stack" with some form of synchronization would likely have performance implications in addition to what I can only imagine would be serious fragmentation problems which would require some form of cleanup step (again, performance).


Even if your compiler and OS doesn't support it you can easily make your own. Even something as simple as an array of structs containing thread ID or handle and a pointer to the memory you're storing for that thread.

Share this post


Link to post
Share on other sites
TLS is easily implented on x86. I don't know about ARM, but on PPC, it's so insane to implement that you basically want to never use it.
"just getting the current thread ID" isn't easy on every CPU either :(

Share this post


Link to post
Share on other sites

TLS is easily implented on x86. I don't know about ARM, but on PPC, it's so insane to implement that you basically want to never use it.
"just getting the current thread ID" isn't easy on every CPU either sad.png


I'm probably showing my lack of experience with extreme-low-level coding, but wouldn't thread ID and TLS be more of an OS thing instead of a hardware thing?

After all, we've had multi-threaded OSes that run on single processors in the past. Edited by SmkViper

Share this post


Link to post
Share on other sites
It's a mix of OS and compiler support, though certain architecture features might make it easier (virtual memory in particular I guess?)

Even if your compiler and OS doesn't support it you can easily make your own. Even something as simple as an array of structs containing thread ID or handle and a pointer to the memory you're storing for that thread.


If only. smile.png There's no reliable API to get thread index. Possibly some thread pointer, but then you'd need a dictionary which is waaaay too slow and complex.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement