• Advertisement
Sign in to follow this  

Do I need to learn the Standard C++ Library???

This topic is 4220 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

HI I'm a beginner C++ programmer and have experience in C#/VB.net and am wondering do I really need to learn the STL? or the C standard library or can I just get on with learning Win32?

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
you should learn a good part of it yes. however, there are some less important things that you don't need to know... like how many of us have ever used a valarray?

Share this post


Link to post
Share on other sites
well of course. :) you should learn as much as you can. stl is far from perfect though so don't get too attached. when you work professionally you're less likely to use it but still having a good understanding will keep you from sounding like an idiot when you talk to your lame friends who develop non game stuff. ;)

Share this post


Link to post
Share on other sites
For just Win32, maybe you can get by without it. But try to learn a little of it anyway. So far I've only used vectors, and they have gotten me out of many sticky situations. I certainly don't regret it.
(Also, I hear learning/using Boost is beneficial as well; I may look into that soon)

Share this post


Link to post
Share on other sites
Quote:
Original post by adjustedrace
HI I'm a beginner C++ programmer


Welcome!

Quote:
and have experience in C#/VB.net


That's a good start. [smile]

Quote:
and am wondering do I really need to learn the STL?


Yes, undeoubtedly.

Quote:
or the C standard library


That too, though it's not as urgent.

Quote:
or can I just get on with learning Win32?


You might, but you're severely limiting your options.

Share this post


Link to post
Share on other sites
Quote:
Original post by blackpawn
well of course. :) you should learn as much as you can. stl is far from perfect though so don't get too attached. when you work professionally you're less likely to use it but still having a good understanding will keep you from sounding like an idiot when you talk to your lame friends who develop non game stuff. ;)


Really? You don't usually use the standard C++ library as a professional game developer? This comes as a bit of a shock to me, as there are some parts of it that I find practically irreplacable when using C++. I mean, what do you use rather than vectors for instance? You don't create your own custom vector class do you? Do you just program without the use of them entirley? Some clarification here would be neat, I am always interested in how things usually work in the professional game development world.

Share this post


Link to post
Share on other sites
Quote:
Original post by PunaProgrammer chris
Quote:
Original post by blackpawn
well of course. :) you should learn as much as you can. stl is far from perfect though so don't get too attached. when you work professionally you're less likely to use it but still having a good understanding will keep you from sounding like an idiot when you talk to your lame friends who develop non game stuff. ;)


Really? You don't usually use the standard C++ library as a professional game developer? This comes as a bit of a shock to me, as there are some parts of it that I find practically irreplacable when using C++. I mean, what do you use rather than vectors for instance? You don't create your own custom vector class do you? Do you just program without the use of them entirley? Some clarification here would be neat, I am always interested in how things usually work in the professional game development world.


I think there's some (incorrect) prejudice in the industry against SC++L. They don't trust it because it's not "fast enough" and maybe even because it's "3rd party"(like it's a seperate library and not as much part of C++ as the float or int types). So they go ahead and implement their own containers. I think even Carmack did this. Now, how is it possible a group of programmers that is specialized to develop SC++L is writing a worst implementation than 1-2 programmers that also have a game to write, I'll never know. Do they also implement their own float type? Why not write their own compiler and be done with it?

Share this post


Link to post
Share on other sites
You should definatly learn the STL, you'll be so much more productive, produce better code, learn some smart tricks, produce more efficient code, etc.

Quote:
Original post by mikeman
I think even Carmack did this.

They were working with Visual C++ 6, which does have some crappy containers; also I remember him talking about how some people (maybe himself?) wasn't that familar with C++. If some algorithm/data structure is not implemented in the STL then you can go ahead and code it yourself, unless you have special knowledge of your target system (for instance is it guarenteed to support SSE3?) you won't beat your compiler's implementation and if you can then you shouldn't work with that compiler. A good example of a data structure you could implement yourself is a hash map, and the STL doesn't have a lot of search algorithms.

Share this post


Link to post
Share on other sites
Here's my 2 cents: learn it, and use it for everything that looks like a "container problem", especially if you're a beginner at C++!

I won't judge on the whole flame-prone subject of the STL being too slow or badly implemented or whatever. No matter who is right there, using it will give you a reliable, portable and versatile implementation of many containers, and thus it will spare you a lot of time and frustration. If, at any point, your profiling efforts tell you that some part of the STL is slowing down a critical part of your code, then you can still try to implement something better with the things you've learned by then. At that point, you'll probably be pretty familiar with the language, and it won't take you much time. Trying to implement a map as a beginner will take a lot of time and the results will be nowhere near the quality of pretty much any current STL implementation.

Share this post


Link to post
Share on other sites
>you won't beat your compiler's implementation and if you can then you shouldn't >work with that compiler.

Actually, I can beat the compilers implementation by order of magnitude -- and that is using clean C++ - no assembly at all!

http://www.liimatta.org/fusion/profile/result.txt

The times are in milliseconds, smaller is better. The times are average of (N-2) runs, where N is 20 I think. The lowest and highest time are dropped before averaging.

The results are from top to bottom in order of oldest to latest, it's noticeable how g++'s std::list implementation has improved over the years. ;-)

Share this post


Link to post
Share on other sites
Heh... neat. Your vector iteration could be a wee bit better it seems [smile]

It's a bit strange to me that vector iteration seems to consistently beat list iteration. Also, several tens of milliseconds for list::push_back seems awfully slow. Care to give some more information, like how many elements of what type were held in the containers for those tests?

(edit) if at all possible, source code for the tests would be nice to have, I would be interested in that and in running my own tests.

Share this post


Link to post
Share on other sites
Quote:
Original post by torakka
Actually, I can beat the compilers implementation by order of magnitude -- and that is using clean C++ - no assembly at all!

http://www.liimatta.org/fusion/profile/result.txt

What compiler switches did you use? Does your implementation use regular heap-allocation?

Share this post


Link to post
Share on other sites
Quote:
Original post by Shadowdancer
It's a bit strange to me that vector iteration seems to consistently beat list iteration.

It shouldn't be. If you don't see that behavior, then something's not working right in your vector, or your list has some specific optimizations going for it. Not only is there less work involved in iterating over vectors, the cache coherancy is infinitely better.

CM

Share this post


Link to post
Share on other sites
Let's look at std::list, a typical implementation uses double linked lists to imlement this. Typical implementation of a node in pseudo code looks like this:

template <typename datatype>
struct node : datatype
{
node* prev;
node* next;
};

So the implementation overhead for memory usage is sizeof(node*)*2, on 32-bit system with 32-bit pointers this is typically 8 bytes. On a 64-bit system this is 16 bytes (assuming 64-bit pointers, these are just guidelines and can vary between platforms ;)

Let's take a very common development environment as example, 32-bit Windows and Visual C++ (any version so far). Memory allocation. The nodes are in default implementation (not using custom allocator) allocated from the heap. This allocation has special property: smallest allocation unit is 8 bytes (you can allocate 1 byte and still it allocates 8 bytes, or allocate 9 bytes and it allocates 16 bytes from the heap, and so on).

The overhead (pointers) take one unit, so the data _always_ takes atleast one unit. If we can drop the overhead into half situation would change: there would be "free" slot for 32 bit data payload on 32 bit platforms and 64 bit data payload in 64 bit platforms.

This is possible. Knuth devised xor linked list method years ago, it can be used to implement std::list.

The trick is as follows, first a few facts:

a = b ^ c;

Therefore,

b = a ^ c;

And,

c = a ^ b;

If we substitute:

v = prev ^ next;

We store the v into the node we have stored all information required for iterating the sequence. We always know where we are coming from and we know the current node's v, so:

next = from ^ v;

That's about it. :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Shadowdancer
It's a bit strange to me that vector iteration seems to consistently beat list iteration.

That's not strange at all, imagine a cache line is 128 bytes, and your data is 4 byte integers. In a vector the first 32 elements will be loaded in the cache on every miss, with a list a cache miss will occur every time we try to access an element.

Quote:
(edit) if at all possible, source code for the tests would be nice to have, I would be interested in that and in running my own tests.


You can find his source here. I'm about to check it out myself, but I need to setup G++ before I can comment on the results.

Share this post


Link to post
Share on other sites
>Heh... neat. Your vector iteration could be a wee bit better it seems

I was thinking the same, on the other hand, the iteration with Visual C++ is 2-3x the speed of Dinkumware's (what Microsoft uses by default). On g++ the std is slightly faster, up to twice as fast. It's difficult to explain as it is essentially:

++ptr;

I looked at the assembly output and it's identical. It must be the code around the innerloop, since the c++ code itself is different it must generate different whole sequence, because the code in both cases is heavily inlined (std). I can't think of anything else..


>slow. Care to give some more information, like how many elements of what type >were held in the containers for those tests?

The sourcecode is available, it's part of the example/ set in the repository snapshot.

It tests with fairly large sets of data: 10K+ if I remember correctly (which is still rather small test compared to data I sometimes work with). This is because I work with huge datasets at times and memory usage used to be critical, not anymore though as cheap memory solved that one on itself. ;)

If you run own tests, results would be nice + could add them into the results page.

>What compiler switches did you use? Does your implementation use regular >heap-allocation?

The compiler switches can be inspected in the makefiles, but generally -O3 etc.

I use regular heap allocation with a minor difference: I allocate in lots of 16 and also free into parallel linked list, which is used as pool until it is depleted and new lot of 16 is allocated (1 taken into use, 15 put into the pool). This defeats the need for custom allocator for pooling purposes, because you rarely see that done, even when performance is a problem (!) it just isn't done so this solves the "laziness" problem in memory management aspect.

Still want custom allocator if want to track the memory after allocation, free in large blocks, etc. But this is still a problem with constructors and destructors if invoke memory freeing w/o calling destructor. Btw. I don't _use_ the allocator argument, only thing that I know of that isn't compatible with std::list .. maybe fix that someday.

Share this post


Link to post
Share on other sites
Quote:
Original post by torakka
I use regular heap allocation with a minor difference: I allocate in lots of 16 and also free into parallel linked list, which is used as pool until it is depleted and new lot of 16 is allocated (1 taken into use, 15 put into the pool).

Very clever trick, in many cases adjacent listnodes will lie in the same cacheline because of it.

Share this post


Link to post
Share on other sites
You create an entire new list class just to ensure that your node fits in an eight byte block of memory when it stores an int, but don't bother profiling it when it stores anything but an int [which should be considered the standard situation]. You provide for custom allocators in your type definition, but don't bother actually using them. You jump through elaborate hoops in order to get around your disregard for type safety. You don't document anything, least of all the aforementioned hoops.

*edit: I'm going to go out on a limb and suggest that your xor optimization provided marginal benefit overall, and that the added complexity for every single list operation completely outweighs the benefit it did provide. I'm instead going to posit that your profile results [assuming they are valid, I can't bring myself to look at that file in any detail after reading through your list header] are entirely the byproduct of your pooled allocator. Remove the pooled allocator, and rerun your tests. I'd put money on the standard library out performing you.

CM

[Edited by - Conner McCloud on July 29, 2006 10:56:13 AM]

Share this post


Link to post
Share on other sites
Pretty cool :) However, in Release mode it doesn't seem to be any significant difference in performance using the compiler of VC++2005, your lists are somewhat faster, but not enough to make me want to leave the safety of SC++L :). Unless there is something wrong with my benchmarking(let's wait for others to share). Also, core::vector seems to crash with elements >~60KB.

Share this post


Link to post
Share on other sites
Quote:
Original post by adjustedrace
I'm a beginner C++ programmer and have experience in C#/VB.net
What's wrong with sticking with C# and VB.NET? The .NET Framework has its own classes and wrappers that implement generics. With the System.Collections namespace, you're able to do pretty much all of what the STL is capable of.

Share this post


Link to post
Share on other sites
mikeman, you could be running out of stack: there is constant size storage (8 objects) for small vectors. 8*64K = 512K

Share this post


Link to post
Share on other sites
>You create an entire new list class just to ensure that your node fits in an >eight byte block of memory when it stores an int, but don't bother profiling it when it

There are more design choises that went into the design that are apparent with quick glance.

Here are further thing of consideration:

1. Each dynamic heap allocation has static overhead, defeatable with a custom allocator, though. I recall it was 16 bytes per allocation, or that ballpark, I did measurements and profiling when designing this code - it didn't turn out in information vacuum. That brings us up to 24 bytes, minimum, per node.

With our arrangement, the overhead is (16+8)/16 bytes per allocation (1.5 bytes!). This means our memory overhead is 16 times smaller(!) for the list infrastructure. It was *additional* benefit that on top of that, we can fit one "int" worth of payload there FOR FREE.

2. The allocations are contiguous with vector-like cache behaviour, you can observe that in profiling the core::list could be put head-to-head with std::vector (!) and hold it's ground quite well in most situations.

It was not to "ensure" that my node fits in an eight byte block.


>stores anything but an int [which should be considered the standard situation].

Feel free to profile with different types, similiar behaviour can be observed. If you think that I write code without knowing what I am doing you are mistaken.


>You provide for custom allocators in your type definition, but don't bother >actually using them.

It has nothing to do with bothering, the code will be written when I get down to it. There is still work left on this library, no shit.


> You jump through elaborate hoops in order to get around your disregard for >type safety.

int* a = ...;
int* b = 0;
ptrdiff_t x = a - b;

That is a fully legal c++ construct. I should do a code review and remove some reinterpret_casts's which are not really needed there, true.


> You don't document anything, least of all the aforementioned hoops.

A good point. I would write documentation if I were more interested in the documentation than actually using the code.

I wouldn't make the code available at all unless I didn't get too many requests all the time to "send the latest version" or "can I see it" from old customers, colleagues and users of the previous commercial package. So I made it available for download without registration or too complicated licensing schemes.


>*edit: I'm going to go out on a limb and suggest that your xor optimization >provided marginal benefit overall, and that the added complexity for every >single list operation completely outweighs the benefit it did provide. I'm

Example case, iterator ++

temp = current;
current = current->step(source);
source = temp;

Where step is essentially "(a - 0) ^ b", the -0 is a constant literal so it is optimized away (you can check this with your non-stoneage compiler) so what's left is a^b, this is not essentially slower than assignment:

current = current->next;

The "cost" lies in storing a temporary. <:() Interesting observation, since we are doing xor, the output port (-register) in machine language must be a register in most architechtures. This is stored in current, so it makes sense that current in register during all of the graph after register allocation, this block is so small that it would be insane code generator to store register in memory followed by load from same memory. :)

The "temp" can be memory if it is spilled, that is case by case basis, but even so, it should reside in top-of-stack, which in turn should be in cache most of the time, L1 on top of that.


>instead going to posit that your profile results [assuming they are valid, I >can't bring myself to look at that file in any detail after reading through >your list header] are entirely the byproduct of your pooled allocator. Remove >the pooled allocator, and rerun your tests. I'd put money on the standard >library out performing you.

Your claim is easy to satisfy, here is the changes to the code required:

node* node_allocate() { return new node(); }
void node_release(node* p) { delete node; }

How much were you betting, again? I'd like to have it, please! Here is the result:

std::list 70.1866
core::list 60.7829

FYI, the node allocation abstraction being a discrete method isn't accident or pure chance.

Other note is that custom allocator still has more overhead than this arrangement, allocation is typically:

if ( a op b )
return memory_dereference;

Yes, I also tested custom allocator overhead. With std -and- core list implementations. The custom allocator is not enabled at this time because the custom allocator was implemented before placement-new and delete for raw memory.

I yet haven't found time to integrate that branch back in. So unprofessional, I know.

I'll fix this thanks or the feedback & motivation boost. Last year I worked on this maybe a week or two, max, as I went to Philippines for 8 months and worked from there to my parent company (which will change *again* next autumn due re-structuring this time by AMD ;)

FYI, I used to maintain the previous version of the library as fulltime job. Not anymore, now it's a hobby project as I am doing OpenGL drivers daytime and night time, well, censored. >B)

Share this post


Link to post
Share on other sites
Oops. Forgot iteration results:

3.15396
2.31731

(Guess which one is which..)

Share this post


Link to post
Share on other sites
And larger types..

template <typename testdata>
void profile(core::timer* xtimer)
{
profile< std::list<testdata> >( xtimer, "std::list " );
profile< core::list<testdata> >( xtimer, "core::list " );
profile< std::vector<testdata> >( xtimer, "std::vector " );
profile< core::vector<testdata> >( xtimer, "core::vector" );
}

struct intfoo
{
int v[32];
intfoo(int x) { v[0] = x; }
operator int () const { return v[0]; }
};

// invoking..
profile< intfoo >(xtimer);

--

core::list and core::vector - still faster. Infact with larger objects stored the iterator for core::vector makes head to the std::vector's iterator, this is a mystery but observed behaviour. :/

One thing to notice is that I usually tend to create "huge" objects in heap using new or ::create() method using factory design pattern ; pointers to these are stored instead by value. VERY common situation for code I write (reminder: *large* objects).

I only use by-value for POD and trivial/small types mostly, but again that is personal choise forged by years of practise.

One example that springs to mind when I store by value and object can be fairly large is fixed vertex layout when exporting from modeling software. ;)

I want to keep all the relevant data in one place, because I want to store objects of that type (vertex) by value in std::set so that I can build a set of unique vertices (and resolve the vertex indices when adding vertices into the set). VERY fast way to generate nice indexed data from virtually _any_ modeling software for exporting to graphics engine, where indexed primitives are welcome guests.


Now, to the business at hand. First of all, I did use to not be able to use std containers because on console platforms there used to be very minimal c++ support and std just didn't work. Period. We needed containers so rolled own. Now that compilers are finally catching up on c++ features (still missing export on most compilers!), all these fancy constructs are possible.

Infact, even though we can observe some world-class std implementations being repeatedly raped here, I am considering dropping the core::list and core::vector for the reason that I have enough ram these days (2 GB on this machine, I peak use at 2.5 GB or so) PLUS, the std implementations on Visual C++ and g++ are finally getting to standards that are acceptable. ;-)

If it didn't make any difference, I would already done that years ago. But notice: in short of writing own containers, I'd use std ones. I'm moving to those. At first they weren't simply available on platforms I worked on. Then they were crap. Now they are decent.

So: I infact urge that NO ONE here uses the code that is available. I don't guarantee it to be available for download in the future, or that I will keep releasing revised versions in public. I write a lot of code that I discard after a while, because I want to try out ideas that come into my head. I keep the good ideas and discard bad ones without much regret.

But fact is that there was a claim that one shouldn't expect *better* performance when writing own w/o using SSE(3) et cetera. That is already established that isn't true, agree?

Share this post


Link to post
Share on other sites
The thing is, while your project is admirable, that whenever we see someone claiming their implementation is faster, we always suspect that the implementation neglects several features and standards in order to "pass" some selected benchmarks. But SC++L is so much more than that. We must compare the 2 implementations in their entirety. Case in point: sorting a core::list takes more than 2x the time it takes to sort an std::list(in my test, sorting a std::list with 10000 integers takes on average 70ms, whereas doing the same with a core::list takes 180ms). With larger types it's even worse, std seems to be up to 4x faster.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement