List functionality in C

Started by
27 comments, last by Way Walker 15 years, 2 months ago
Quote:Original post by DarkMatter2008
someModel.polys = malloc(sizeof(Poly*)*10);


sizeof(Poly *) returns the size of a pointer to a Poly, which will be 4 on 32-bit platforms. I'm guess you want sizeof(Poly) here. It only works because your Poly consists of one pointer as well -- if it had anything else in there, it wouldn't work.

As I said before, this error:

Quote:Heap block at 00933290 modified at 009332C0 past requested size of 28


is indicative of a buffer-overrun, and that's exactly what using sizeof(Poly *) instead of sizeof(Poly) would do.

Quote:
for(int i = 0; i < 10; i++)
{
ptr->vector = vectors;
vectors.refs[vectors.refCount] = ptr; // Need to say what poly referenced it;
vectors.refCount++;
vectors.refs = realloc(vectors.refs, sizeof(*Poly))*vectors.refCount+1); // Need to resize refs incase another poly referecnes this vector
ptr++
}


As mentioned by Way, this code is a little funny. It would make more sense to resize the array when you need to, not when you're finished (as you've got it, your array is always going to be one bigger than needed).

Also, you should probably allocate a chunk of memory up-front and only resize it when you run out of space, rather than reallocating every time. I guess it's not that performance critical since you'll only do this when you "load" your model, but still...


Advertisement
Quote:Original post by Codeka
Quote:Original post by DarkMatter2008
someModel.polys = malloc(sizeof(Poly*)*10);


sizeof(Poly *) returns the size of a pointer to a Poly, which will be 4 on 32-bit platforms. I'm guess you want sizeof(Poly) here. It only works because your Poly consists of one pointer as well -- if it had anything else in there, it wouldn't work.


Heh, didn't even notice that because I'm so used to using "sizeof *someModel.polys" instead of "sizeof(Poly)".
Quote:
Why are you using void? Are you going to have references to structs other than Poly? If not, forward declare Poly and use pointers to Poly.


Ther reason for the void instead of a Poly pointer is a matter of what comes first, the chicken or the egg? this here doesn't compile:

struct Vector{Poly * refs;}struct Poly{Vector * vector}


As Poly has not declared yet, I cant swap the structs around either because the Poly struct will complain that the Vector struct has not been defined.

Quote:
Also, you need to decide between arrays and linked lists here. The typical way of doing a linked list in C would be to add a "Poly *next;" to the Poly struct and then refs should be a pointer to the first Poly in the list.


Initally I was going to go with a list as they are easier to write to, but i'll only be writing to them at the initialisation stage, arrays access much faster, hmmm but saying that I'll be reading sequentially so would there be any diffence if i did use a list? Anyway it is the vector that needs to store a record of what Poly has referenced, I need that for Gouraud lighting calculations later when working out the light intenisity of the vector.

Quote:
If you're using arrays, then you'll want refs to be a pointer to pointer to Poly (Poly **refs;) so that you can point to an array of Poly structs.


I'm not sure what you mean here, is that 2D array? I only want a 1D array of Poly pointers per Vector Struct.

Quote:
And, for what it's worth, // comments are only valid in C++ and C99 not C89.


I'm using VS2008, is there anyway to make it so it conforms to C? it doesn't seem to have just C in the create new project window.

Defining a variable in the for statement like this is, again, a feature of C++ and C99. If you're using C89 it shouldn't compile.

Quote:
vectors.refs = malloc(sizeof(Poly *));


It's generally a good idea to write:
var = malloc(sizeof *var);
since it's a bit more robust in the face of changing the type or typename of the variable var.


again I'm not too sure what you mean here. Despite me decalring refs as a void pointer I only want it for Poly pointers, the type won't change. What I'm trying to do is set up an array of poly pointers in side the vector struct.

Quote:
Other than that, and a few types, it compiles as C99 code, looks right (although it's a little strange to allocate for space+1 instead of allocating the +1 when you need it), and doesn't crash for me.


I'm always open to a better solution if one is avalible, I lived a sheltered life in the land of Java so I am finding this a bit challenging to get a grasp of.

Quote:
sizeof(Poly *) returns the size of a pointer to a Poly, which will be 4 on 32-bit platforms. I'm guess you want sizeof(Poly) here. It only works because your Poly consists of one pointer as well -- if it had anything else in there, it wouldn't work.


I thought because I want references instead of actually storing a struct in an array i'd use (Poly *), i'm trying to keep a record of what polygons have refernced each vector. Ideally the poly would have 3 pointers to vectors and other such information that a poly would require, i simplifed it for that demonstation.

Appologies if I seem rude or anything like that, i'm just a little confused.
Quote:Original post by DarkMatter2008
I lived a sheltered life in the land of Java so I am finding this a bit challenging to get a grasp of.


So true...

Such rich and flexible dynamic types are cumbersome in C. Linked lists are often simpler, but before going that route, consider whether you really need it.

If you have even the slightest chance of using fixed-size arrays, do it.

#include <stdio.h>#include <stdlib.h>#include <malloc.h>#include <string.h>#define MAX_POLYS 25#define MAX_VECTORS 25typedef struct VECTOR Vector;typedef struct {	Vector * p_vect;	int foo;} Poly;typedef struct VECTOR {	char name[15];	size_t n;	Poly poly[MAX_POLYS];} Vector;int main() {	char buf[20];	Vector vectors[MAX_VECTORS];	Vector * pv;	size_t n_vectors;	size_t i, j;	n_vectors = (MAX_VECTORS / 2);	for (i = 0; i < n_vectors; i++) {		pv = &vectors;		strcpy(pv->name, "Vector #");		strcat(pv->name, _itoa(i, buf, 10));		pv->n = rand() % MAX_POLYS;		for (j = 0; j < pv->n; j++) {			pv->poly[j].foo = j;			pv->poly[j].p_vect = pv;		}	}	for (i = 0; i < n_vectors; i++) {		printf("%20s: ", vectors.name);		for (j = 0; j < vectors.n; j++) {			printf("%d ", vectors.poly[j].foo);		}		printf("\n");	}	return 0;}


If over-allocating seems problematic - how much memory would 1,000,000 Poly structures really consume?

And for numeric calculations, fixed allocations may be even better, since it leads to potentially simpler and faster code.

But unless you actually have to deal with dynamic data structures, don't reinvent them in C. It's often not needed, just make sure to double-check array sizes. Nobody will warn you if you write over them.
Quote:Original post by DarkMatter2008
Quote:
Why are you using void? Are you going to have references to structs other than Poly? If not, forward declare Poly and use pointers to Poly.


Ther reason for the void instead of a Poly pointer is a matter of what comes first, the chicken or the egg?


What you wrote above doesn't compile in C, either, because you didn't typedef your structs (or, alternatively, write "struct Poly" instead of "Poly"). Just forward declare the struct:
typedef struct Vector Vector;typedef struct Poly   Poly;struct Vector {    Poly *refs;};struct Poly {    Vector *vector;};
The same "trick" can be useful in C++ as well.

Quote:
Initally I was going to go with a list as they are easier to write to, but i'll only be writing to them at the initialisation stage, arrays access much faster, hmmm but saying that I'll be reading sequentially so would there be any diffence if i did use a list?


It's not something I know much about, but I believe iteration of an array can be faster due to the array being a single block of memory, but I don't know how much of an effect that would have in the present case.

Quote:
Quote:
If you're using arrays, then you'll want refs to be a pointer to pointer to Poly (Poly **refs;) so that you can point to an array of Poly structs.


I'm not sure what you mean here, is that 2D array? I only want a 1D array of Poly pointers per Vector Struct.


Right, you want an array of pointers. Usually, you work with a pointer to the first element of an array instead of the array itself (e.g. malloc() essentially returns such a pointer). If the array contains pointers, that would be a pointer to pointer. A 2D is something else entirely, but is often emulated by a pointer to pointer.

Quote:
Quote:
And, for what it's worth, // comments are only valid in C++ and C99 not C89.


I'm using VS2008, is there anyway to make it so it conforms to C? it doesn't seem to have just C in the create new project window.


I'm not familiar with VS2008, but it may not be a problem in your current case (these were common extensions to C89 which is why they're in the new version of C, C99). I was just pointing it out so that you'd know what's wrong if it does become a problem.

Quote:
Quote:
vectors.refs = malloc(sizeof(Poly *));


It's generally a good idea to write:
var = malloc(sizeof *var);
since it's a bit more robust in the face of changing the type or typename of the variable var.


again I'm not too sure what you mean here. Despite me decalring refs as a void pointer I only want it for Poly pointers, the type won't change. What I'm trying to do is set up an array of poly pointers in side the vector struct.

It's mostly a style guideline that I find can help to reduce problems down the line. It's also harder to mix up the type because you're basically letting the compiler figure it out for you. In another part of what you posted, you wrote "sizeof(Poly *)" when you meant "sizeof(Poly)", but if you'd used the "sizeof *var" form, it would've been a non-issue.

Quote:
I'm always open to a better solution if one is avalible, I lived a sheltered life in the land of Java so I am finding this a bit challenging to get a grasp of.


Well, for the allocation, just switch the order you do it. Just allocate the space you need right before you add another reference instead of allocating extra space after adding the reference.

The other thing to understand is the difference between pointers and variables, which in Java (I hope I get this right, it's been a while since I touched Java) is the difference between class references and primitive values, except that in Java only classes have reference semantics and only primitives have value semantics whereas in C (and C++) you can have a value of any type and a pointer to any type.

Quote:
Quote:
sizeof(Poly *) returns the size of a pointer to a Poly, which will be 4 on 32-bit platforms. I'm guess you want sizeof(Poly) here. It only works because your Poly consists of one pointer as well -- if it had anything else in there, it wouldn't work.


I thought because I want references instead of actually storing a struct in an array i'd use (Poly *), i'm trying to keep a record of what polygons have refernced each vector. Ideally the poly would have 3 pointers to vectors and other such information that a poly would require, i simplifed it for that demonstation.


Right, but you have to store the actual Poly structures somewhere (otherwise there's nothing to point to). In the code, it looked like you intended to store them in someModel.polys, so you need to store something the size of an actual Poly struct, not the size of a pointer to a Poly struct.

If you'd used the malloc(N*sizeof *var) format, the compiler would've seen that var is of type Poly *, so *var is of type Poly, and would've done the right thing.
Quote:Original post by Way Walker

It's not something I know much about, but I believe iteration of an array can be faster due to the array being a single block of memory, but I don't know how much of an effect that would have in the present case.


You are correct. The adjacent elements are also loaded whenever you access something from RAM (from the stack or heap) and if you're using those elements (i.e. you're using an array) the variable will already have been loaded and access is much quicker. (Or, at least that's the theory.)

See: caching, L2 cache, memory loading, etc.

Even if this weren't true, to access the fifth element of a list and array:
    ListNode* ln = nodeBase.begin;    int i=0;    for(i=0; i < 5; i++)        ln = ln->next;
Compared to
    ArrayElem* a = arrayBase.begin;    a += 5;


However, lists do grow faster. When a vector is resized, it has to allocate new memory, copy the old, and delete the old pointer. When a list grows, its last node, which for constant access time, your list base class can hold a pointer to, and you can just allocate one node at a time. If growing, or sorting, is more important than iterating, lists are usually better.
Quote:Original post by Splinter of Chaos
However, lists do grow faster. When a vector is resized, it has to allocate new memory, copy the old, and delete the old pointer. When a list grows, its last node, which for constant access time, your list base class can hold a pointer to, and you can just allocate one node at a time. If growing, or sorting, is more important than iterating, lists are usually better.
That's not the whole story. Containers such as std::vector actually grow by a percentage of the original size (e.g. doubling), resulting in amortised constant time push_back operations. This is required by the C++ standard. In practice they aren't going to be much slower than lists at all.

The primary reason to ever use a list is for insertion and removal from the middle, as well as to prevent iterators being invalidated by doing either of those things. Having a class with a very expensive copy-constructor is one other possible reason. Otherwise vector is a better choice.

[Edited by - iMalc on January 23, 2009 3:14:33 PM]
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:That's not the whole story. Containers such as std::vector actually grow by a percentage of the original size (e.g. doubling), resulting in amortised constant time push_back operations. This is required by the C++ standard. In practice they aren't going to be much slower than lists at all.


I have read this before and it has always puzzled me. Assume that a std::vector doubles the size of its internal buffer on overflow condition and that it initializes with a size of 1. Now if 1024 elements are appended using push_back() (without reserve) this will involve 10 buffer reallocations and consequent copy of all elements in the buffer. Intuitively this strikes me that push_back() of n elements will be O( n log n). At least it appears to be of a different order from random access of one element eg O( 1).

[Edited by - chairthrower on January 23, 2009 9:11:52 PM]
You're right, it's not exactly going to be constant time. It's going to usually be constant time. I don't really understand "amortized", but I think that's what it means.

Quote:Original post by iMalc
Quote:Original post by Splinter of Chaos
However, lists do grow faster. When a vector is resized, it has to allocate new memory, copy the old, and delete the old pointer. When a list grows, its last node, which for constant access time, your list base class can hold a pointer to, and you can just allocate one node at a time. If growing, or sorting, is more important than iterating, lists are usually better.
That's not the whole story. Containers such as std::vector actually grow by a percentage of the original size (e.g. doubling), resulting in amortised constant time push_back operations. This is required by the C++ standard. In practice they aren't going to be much slower than lists at all.
Your linked resource forgot to mention the amortized part on that page (but got it on another). I'm surprised that reference missed it.

Although, of interest might be this article, which half way down has a graph showing grow and traverse times, and a few other things as well. It looks like the list is actually slower to grow than the vector...but wait!
Quote:in fairness I gave vector special treatment by calling vector::reserve(1000000) first to avoid any reallocations.
So, it seems the test might be fair for predictable sizes, but we're talking about not knowing the full size. True, vectors have a great advantage when the end size in known, but are incredibly inefficient otherwise.

According to GCC's interpretation (I'm reading the source):
    std::vector<int> x; // Capacity and size are zero.    x.push_back(1); // Capacity and size are one. Realloc happened.    x.push_back(1); // Capacity and size are two. Realloc happened.    x.push_back(1); // Capacity four, size three. Realloc happened.    x.push_back(1); // Capacity four, size four.    x.push_back(1); // Capacity eight, size five. Realloc happened.

G++ doubles the capacity, but it's worth noting (also checking the source) that MSVC only grows their vectors by 50 percent. Thus, gcc will waste space when the vector gets larger but reallocate less. MSVC will waste time reallocating, but waste less space.

But, I unfortunately can't prove this because I tried the same test as the "guru" and I get around 400 clock ticks (using std::clock()) for the list (push_back()ing 1,000,000 ints) and 32 for the vector test, and same for when I reserve space. These numbers are outrageously low compared to the guru's results and I wonder if my tests are in some way wrong.

Quote:The primary reason to ever use a list is for insertion and removal from the middle, as well as to prevent iterators being invalidated by doing either of those things. Having a class with a very expensive copy-constructor is one other possible reason. Otherwise vector is a better choice.


Quick sorting is another reason. A really good one, actually. And, really, according to my link, a dequeue would actually be best since grow time is lower and traverse time is reasonably. However, it's incredibly difficult to write one, so a list is a much better choice.
Quote:Original post by Splinter of Chaos
You're right, it's not exactly going to be constant time. It's going to usually be constant time. I don't really understand "amortized", but I think that's what it means.
It's not so much about what it 'usually' is, but what the long term average of the time taken is. For example, pushing back 1024 elements should take about twice as long as pushing back 512, which should again take about twice as long as pushing back 256, etc. Going by those measurements one would have to assume each operation was constant time. It's only when you look at the time taken for each individual push_back that you notice that they don't all take the same amount of time.
Quote:
Quote:Original post by iMalc
Quote:Original post by Splinter of Chaos
However, lists do grow faster. When a vector is resized, it has to allocate new memory, copy the old, and delete the old pointer. When a list grows, its last node, which for constant access time, your list base class can hold a pointer to, and you can just allocate one node at a time. If growing, or sorting, is more important than iterating, lists are usually better.
That's not the whole story. Containers such as std::vector actually grow by a percentage of the original size (e.g. doubling), resulting in amortised constant time push_back operations. This is required by the C++ standard. In practice they aren't going to be much slower than lists at all.
Your linked resource forgot to mention the amortized part on that page (but got it on another). I'm surprised that reference missed it.
Yes, so am I.
Quote:Although, of interest might be this article, which half way down has a graph showing grow and traverse times, and a few other things as well. It looks like the list is actually slower to grow than the vector...but wait!
Quote:in fairness I gave vector special treatment by calling vector::reserve(1000000) first to avoid any reallocations.
So, it seems the test might be fair for predictable sizes, but we're talking about not knowing the full size. True, vectors have a great advantage when the end size in known, but are incredibly inefficient otherwise.
"incredibly inefficient" is far too strong of a way of putting it. It's actually pretty darn good in reality. Even if the push_back operation takes slightly longer for a vector, iteration through the vector is much much faster than that of a list, due to everything being in contiguous memory and hence more often in the cache. All good sources around will tell you that your default choice of container should always be vector.
Quote:
Quote:The primary reason to ever use a list is for insertion and removal from the middle, as well as to prevent iterators being invalidated by doing either of those things. Having a class with a very expensive copy-constructor is one other possible reason. Otherwise vector is a better choice.


Quick sorting is another reason. A really good one, actually. And, really, according to my link, a dequeue would actually be best since grow time is lower and traverse time is reasonably. However, it's incredibly difficult to write one, so a list is a much better choice.
Sorting a list is not too bad as long as you correctly use the list's sort() member function and not std::sort from the algorithm header. Merge sort on a list is pretty simple, is always O(n *log n) and only requires O(log n) stack space. Given that it's also stable it's also fairer to compare it against stable_sort for an array. Then I think you'd find that they're about the same speed.

One reasonable way of implemeting a deque is actually really easy to implement. Just use a circular buffer, and keep track of the head and tail.
I think you're not justified in saying that a list is an outright better choice, it depends on the things I mentioned earlier.

Quote:Original post by Splinter of Chaitthrower
I have read this before and it has always puzzled me. Assume that a std::vector doubles the size of its internal buffer on overflow condition and that it initializes with a size of 1. Now if 1024 elements are appended using push_back() (without reserve) this will involve 10 buffer reallocations and consequent copy of all elements in the buffer. Intuitively this strikes me that push_back() of n elements will be O( n log n). At least it appears to be of a different order from random access of one element eg O( 1).
You're forgetting that all of the items added since the last resize haven't had to be copied at all. Then those since the reallocation before that have only had to be copied once. etc.
Assuming doubling as growth strategy for a push_back of 1024 items:
That's 512 that didn't need copying
256 were copied once
128 were copied twice
64 were copied 3 times
32 were copied 4 times
16 were copied 5 times
8 were copied 6 times
4 were copied 7 times
2 were copied 8 times
1 was copied 9 times
Now sum them up:
512*0 + 256*1 + 128*2 + 64*3 + 32*4 + 16*5 + 8*6 + 4*7 + 2*8 + 1*9 = 1013 copies.
Very close to the number of push_back operations.
Now assume there were only half as many push_backs:
256*0 + 128*1 + 64*2 + 32*3 + 16*4 + 8*5 + 4*6 + 2*7 + 1*8 = 502 copies.
Very close to the number of push_back operations again.
Each item has on average being copied slightly less than once, during all the push_back operations. Pretty neat huh!

[Edited by - iMalc on January 24, 2009 1:58:15 AM]
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement