#### Archived

This topic is now archived and is closed to further replies.

# Help with arrays

This topic is 6518 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Is there a way to change the size of an array after it has been created? and if there is should it be done?

##### Share on other sites
Well sure, a couple stratagies have been made.
(Note stratagies discussed are pretty much for C programmers, but can be applied to C++, although i am sure there are other ways)

Ok, the first strategy, is to use a pointer.

char *p = "Text";

printf(p[0]);

strcat(p, " by pointer use is really cool!");

Strategy 2:

Use Realloc() (sorry, never used this strategy, find some code for it)

Strategy 3:

Use a linked list

typedef struct
{
int data;
struct node *next;
} node;

you can find more info on linked lists (and a handy review of pointers) at: http://cslibrary.stanford.edu/

##### Share on other sites
The best way to use arrays is to just create the maximum size you''ll ever need and be done with it. In most cases, the extra unused memory is trivial compared to the memory associated with a linked list. If you need to resize an array, you have to create a new one and copy the old elements into it (if you want to save the data). There may be other things stored in and around the array, so resizing isn''t really feasible. Arrays are at their best when they are of fixed length and you never need (or almost never need) to resize them.

If the amount of the data is almost always changing and it cannot have a maximum, use a linked list instead. Also, linked lists work better for short lists of big objects, where the pointers that "link" objects are relatively small compared to the size of the objects they store.

For example, say you have a class called Document, that stores information about a file that the user is working with. The user may have 5, or 9 files open, whatever length. What is best to use? Probably a linked list, because the Document class will probably be a relatively large size compared to the "linking" pointers. You also don''t need the fixed memory for 10 documents (especially if they are very large objects) when the user only uses 1.

For another example, say you have a list of filenames. If you cannot determine the number of filenames or don''t want a maximum (maybe you are doing a program that installs files?), then you would use a linked list. A measly 4 bytes won''t hurt there. An array would require a lot of resizing or extra code that resizes after so many elements, but even that is just extra processing compared to a linked list.

For the third example, say you have some image data you''d like to store. When you work with it, using an array is very fast, especially so because the elements are so small (maybe 2-3 bytes), and resizing is not happening every second.

Random access with arrays is much faster than lists, but sequential access with linked lists can actually be (slightly, but not noticably) faster. The bottom line is that some situations are better for arrays, and some are better for linked lists. Learning the different ways data can be stored makes your programming very flexible.

- null_pointer
Sabre Multimedia

##### Share on other sites
easiest way IMHO is realloc:

char* str = (char*) malloc(sizeof(char) * ARRAY_SIZE);

//do something with str

str = (char*) realloc(str, sizeof(char) * NEW_ARRAY_SIZE);

realloc will leave the contents of your array untouched, but the adress may be change. mind you to always have in mind this, it can cause VERY nasty memory problems if you don''t immediately assign the array to it''s new address.

realloc(str, sizeof(char) * NEW_ARRAY_SIZE);
MIGHT work, but it also might not

##### Share on other sites
About the first strategy in the anonymous post :
the call to strcat is incorrect, because memory isn''t allocated to hold the extra characters, and strcat don''t check it! You need to use realloc first, otherwise you''ll get some problems...

##### Share on other sites
In C++, if you really want a resizable array, you might want to consider a STL vector.

null_pointer: I would like to see your justification in saying that sequential access is faster with a linked list.

##### Share on other sites
I recommend an STL vector too. Internally I believe they are always implemented as areas of contiguous memory (ie. arrays) but the reallocation is handled for you. You can also reserve space in advance if you know how much space you will need. And if you use realloc to reassign to the existing array pointer, is there not a risk that the call may fail (depending on memory availability) and you end up losing all your original data anyway?

As for the array vs. linked lists things, I think that in both cases, you are still dereferencing 1 pointer, whether that pointer is the array name or the ''next'' pointer of your class. Sometimes you may be dereferencing a pointer to the object and a pointer to the next object, but by that point the object is likely in the cache anyway. I expect either way could be quicker depending on array size, cache size, code size, etc etc etc. One thing is true though - all these fears about linked lists being ''slow'' (and using ''int somearray[478435]'' as an alternative, which will thrash the cache chronically!) are really not that applicable in 95% of situations.

##### Share on other sites
quote:
Original post by null_pointer

...but sequential access with linked lists can actually be (slightly, but not noticably) faster.

SiCrane: I would like to give my justification, but I only said that they can be slightly faster -- depends on how it's laid out. My knowledge stops there so I was just offering some simple suggestions on when to use arrays and when to use linked lists. I also said that if the difference existed it would be negligable. I did not want this thread to turn into another container holy war. The guy was just asking about whether it was possible to reallocate arrays or not.

No C++ keyword exists for that (like new and delete exist as replacements for malloc() and free()), so it is (logically) never necessary. If you must resize the array, you will have to destroy it and re-create it. No other(good) way around it.

- null_pointer
Sabre Multimedia

Edited by - null_pointer on 4/17/00 7:16:55 AM

##### Share on other sites
null_pointer: I''m not looking for a container holy war, but I don''t like seeing non-true statements stand in the forums, and, even in the best of cases (linked list nodes allocated in a contiguous block of memory, etc.), I have yet to see a proper justification that sequential access in a linked list can be faster than sequential access in an array. If you can bring evidence that what you said was true, it would be wonderful. However, what you said implies that if allocation time is a insignificant portion of computation as compared to access time, then linked lists should be favored over arrays, which is inconsistent with your own third example. I hold that if allocation time is insignficant portion of computation time, then arrays should be favored over linked lists.

Kylotan: It''s not quite as simple as dererferencing one pointer in both cases.

For 10 sequential access in an array, the code looks something like:
Node1 array[10];Node1 * ptr = array;for (i = 0; i < 10; i++) {  ptr->data = some_data;  ptr++;}

vs.
Node2 * head = some_allocated_10_node_list;Node2 * ptr = head;for (i = 0; i < 10; i++) {  ptr->data = some_data;  ptr = ptr->next;}

Assuming that the data member of either structure does not occur at the beginning of the memory for either, and assuming that the next pointer is at the beginning of the memory for the linked list representation; then any difference in speed would occur in ptr++ vs. ptr = ptr->next. Therefore the difference in the code is a dereference vs. a addition operation. If we''re even nicer to the linked list representation, and say that the size of the linked list node is an exact divisor of the length of an L1 cache line (32 bytes on the current Intel x86 core) or is always allocated on a 32 byte boundary and ptr->data and ptr->next always occur within the same 32 byte block, then the ptr = ptr->next can''t cause a cache miss, because the ptr->data line will prime the cache. So it all comes down to an addition operation vs. a dereference operation when you just look at the code. Of course on the modern x86 processor they take the same number of cycles.

Now for the down side. Considering the size of the L1 cache the ptr->data line is still more likely to cause an L1 cache miss in the linked list implementation, even given ideal circumstances (contiguous allocation of linked list nodes plus the previous assumptions) on the order of ((data_size + 4) / data_size) times more likely. (Unless you are using near pointers, but in that case you''ve got enough problems to worry about as it is.)

Furthermore the results of a dereference operation are available later in the pipeline than a addition operation. Therefore in the course of loop unrolling (a favorite compiler optimization) an additional 1 cycle overhead is likely to be assessed per loop.

And again, this is being extremely nice to the linked list implementation. If an operation is performed on the data element aligned at the beginning of the structure, than the linked list implementation shows an additional addition operation pentalty.

Without assumed contiguous memory allocation for linked list nodes, probability of an L2 cache miss goes way up. (As well as heap fragmentation, but that''s less of an issue.) And without the complex memory allocation assumption, there''s an additional chance for an L1 cache miss due to lack of priming.

Of course you might argue that there is an additional overhead present because the linked list would never be referenced inside a for loop in that manner. Instead you''d have:
Node2 * head = some_allocated_10_node_list;Node2 * ptr = head;while (ptr != NULL) {  ptr->data = some_data;  ptr = ptr->next;}

So you''ve eliminated a loop variable as well as an addition op each loop. Of course you''ve still got the problem that the dereference is available only late in the pipeline cycle, so there''s that 1 cycle penalty for pipeline stalls which pretty much cancels out the gains on the addition op. And of course because there are now no useful heuristics available to the compiler, loop unrolling will be either not be done, or performed in a less efficient manner.

But, anyways, if we eliminate the +/- 1 cycle impacts as insignificant (which they may not be if they occur within a tight loop), we''re left with L1 and L2 cache misses as the major contributors towards the linked list sequential access lagging behind arrays. We can minimize comparative negative impact by maximizing the data in each node.

##### Share on other sites
Yama-hama Si! Did you write a master''s thesis on this?

I can''t see any situation where a linear traversal would be faster for a llist either. I just wanted to add that there is also a much higher possibility for a page fault with a llist, since the nodes can be all over memory. Those are big penalties, although much more unlikely than cache misses. Put simply though, the cache was pretty much made for linear array access. Llists can come close (with some luck), but I seriously doubt they could be beat an array any day of the year (for linear access. Of course llists have tons of great features too).

One question Si. What did you mean by near pointers? Just about all pointers are near in protected mode.

Rock

##### Share on other sites
By near pointers, I meant the 2 byte segment:offset style pointers. Again, if you''re dealing with that, you''ve got enough problems as it is.