Why do we need to iterate data structures as they are already in RAM

Started by
19 comments, last by jpetrie 9 years, 10 months ago

Even a map iterates through a tree. As I said before. Iteration is done somewhere but mostly hidden behind some function or encapsulated in a class/template or however you name it.

I was not talking about trees as a data structure, but as a graphical control... neither was the OP.

This is also why I said that there's a direct relationship between the map keys and their values... If you use the pointers to the values as keys, there's no iteration during lookup. In my experience, this is how most tree-controls are implemented... they use pointers (or handles) to the data.

The graphical tree control is free to use whatever implementation it wants in order to speed up the lookups...


But even the traversal of a tree is done in steps that are done the same way in each step, so it is an iteration anyways you use a loop inside or recursive call to the traversal function.

You do not need to traverse the whole map just to get to the item you want... That's the whole point of maps.

Advertisement


Iteration means repeating something, usually refers to executing the body of a loop repeatedly (because it is a loop). It doesn't mean executing an algorithm. And a map data structure doesn't need to be implemented as a tree.

A loop is an algorithm too. Not very complex but it describes how something has to be done.

I dont said that the map is implemented in a tree. But even the traversal of a tree is done in steps that are done the same way in each step, so it is an iteration anyways you use a loop inside or recursive call to the traversal function.

Maps maybe implemented as hashtables. But even there you only get a good guess where to start the search, that afterwards is iterated through a collection of data to find the right one.

But it is not true. There are data structures which can perform constant time random access, notably arrays as mentioned earlier in the thread. To look up an element in the array from its position (index), you take the base address of the array, add to it the index of the element to be accessed multiplied by the element size in bytes, and that gives you the address of the element you are looking for, which you can then read. Directly. If you want to argue that this process consists of multiple steps and is implemented in assembly as a sequence of a handful of instructions and is thus a form of "iteration", you are free to do so, and you will have successfully redefined iteration to something different than what it means to every other programmer on the planet, and nobody will understand you.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

I'm going to call it a night after this post, but I'll be back.

I'm working on a solution. I'm a little tired right now, so I'll keep it short, sweet, and rough around the edges. I was thinking I'd use a geometric data structure that could morph from one data structure type to another. Such as, a list can morph into a tree, and a tree can morph into a graph. A graph can morph into an x/y grid. The 2nd thing about it is the data structure can adjust Level Of Detail, or index resolution. This way, the computer doesn't have to move across a high resolution data structure. It can zoom out for traveling long distance with less iteration, and zoom in at the right moment to line up on the desired index. There's other tricks that can be done to make it even more efficient, but by figuring out how the data structures can best adapt to their environment is imperative to their survival. wink.png The most important trick is to keep it all simple, and yet give it plenty of depth. If I can do that, I'll have a low memory footprint, a low processor footprint, and a low time/effort development footprint. Happy outcomes all around.

mellow.png


Iteration means repeating something, usually refers to executing the body of a loop repeatedly (because it is a loop). It doesn't mean executing an algorithm. And a map data structure doesn't need to be implemented as a tree.

A loop is an algorithm too. Not very complex but it describes how something has to be done.

I dont said that the map is implemented in a tree. But even the traversal of a tree is done in steps that are done the same way in each step, so it is an iteration anyways you use a loop inside or recursive call to the traversal function.

Maps maybe implemented as hashtables. But even there you only get a good guess where to start the search, that afterwards is iterated through a collection of data to find the right one.

But it is not true. There are data structures which can perform constant time random access, notably arrays as mentioned earlier in the thread. To look up an element in the array from its position (index), you take the base address of the array, add to it the index of the element to be accessed multiplied by the element size in bytes, and that gives you the address of the element you are looking for, which you can then read. Directly. If you want to argue that this process consists of multiple steps and is implemented in assembly as a sequence of a handful of instructions and is thus a form of "iteration", you are free to do so, and you will have successfully redefined iteration to something different than what it means to every other programmer on the planet, and nobody will understand you.

Mmmh.

I think you misread something. From my very first post.

If you organize your data in an array you even iterate through memory but because you know that the data structures are layed out in an successive manner you can calculate the address of any structure in this array from the startaddress, the index and the size a single structure occupy.

Same as you said.

Yes a map is something that maybe organized as an simple array.

I do not talked about assembly language and the like. So I dont understand what makes you so upset. I only wanted to say that "iteration" means doing something in the same way over and over again. And the only thing I do not agree is that iteration is limited to loops. Iteration is anything that is done over and over again until the goal of execution is reached.

And the other thing I said. Loops are algorithms too.

Thats all.

I'm not sure I understand correctly. Are you asking why we have to look at all values in the array while searching for the one we need?

Imagine with stack of paper pages with just 1 word on each and you need to find page with specific word. You cannot take it out on first try, so you look at all pages 1 by 1 until you find what you need. Same principle is here.

The thing with list and tree is, each time you create a node, you allocate it with new or malloc, which return some memory address, those address could be anything, they dont follow a pattern like an array where you can calculate the address of the next element. Even worst, list and tree might have their nodes moved wherever the programmer want to, so their is no linearity for those addresses, it's more or less random locations, and that's why you have to traverse the list to find the element you need.

Your question is like asking, but why do i need a street and address number to find x building in a city, insn't the city already know where it is? Unless the building is very popular or the town is very small, you wont get any result asking the mayor for the address your looking for.

The array is like a bunch of houses on the same street. Everything is one after the other. A list or tree are more like picking a bunch of houses at random.

Enough with the metaphores, on win32, every process has 4gb of (virtual) memory, your cpu can't reallisticly store all this in is tiny 1mb cache or whatever the number, so it as to fetch it from ram memory. It also grab the next fews bytes while doing so, so if you iterate over an array, it's faster cause you don't need to constantly fetch from ram to cpu cache. With list or tree, since the data are at more or less random location, you have to search the address first before loading the data (into the cpu cache or registers) and do something with it.

It's a trade-off between efficiency(array) and flexibility(list or tree ect)

Knowing something is in memory, and knowing where something is in memory are very different things. Data structures all have very specific hard guarantees to fit their usage cases, and this is where you have to make your assumptions.

Arrays are contiguous by definition (though they can be indirect arrays of pointers instead of local object data). This means that if you know the relative offset of your object (say, by index), you can jump straight to the object you want by pointer arithmetic. This is what people mean by "random access".

Linked lists are not guaranteed to be contiguous. You can certainly make one that is, but enforcing that constraint pretty much defeats the purpose of using one altogether. What this means is that your item (e.g. item 3) is somewhere in memory, but you have no idea where. Traversing a linked list is sort of like those breadcrumb-following riddles, where each destination tells you where to go next.

As for your other question -- no, objects are probably in RAM in their entirety, even when you're working with a pointer to them. I say "probably" because RAM is a hardware concept, and memory in software-land can constitute more than ram (e.g. your hard disk).


Knowing something is in memory, and knowing where something is in memory are very different things.

QFE.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

First off, I hope I placed this in the appropriate section. If so...

I've been programming for a while, but this may be a simple question to some (or many).

My Question: If a data structure already exist directly in RAM, then why do we need to spend the processing power to iterate in order to find the right data? (Side Note: pointers still exist in managed languages. They're just handled for you)

All that would actually be in RAM is:

index 0, index1, index 2

Index 0: --> GET VALUE '10'; FALSE; Next Address!

Index 1: --> GET VALUE '15'; FALSE; Next Address!

Index 2: --> GET VALUE '5'; TRUE; RETURN Value!

It doesn't work like this. RAM is like one huge array of bytes e.g. char MyWholeRAM[4 294 967 296] for 4 GB.

Every byte (char) has a unique address numbered from 0x00000000 - 0xffffffff

I'll spare you details about paging and kernel space/user space.

A pointer is just an address of one location stored at another location. The CPU can only load a few values at once in its registers so iterating through RAM means loading data in registers, do a calculation or comparison, and store the value.

Some times it needs to know that another value is at a particular place in RAM and that's when a pointer tells it where it is.

A linked list would be stored as a pointer to the next node or NULL if there are no more nodes, and the actual data. The other node would again contain a pointer (really just a 4 byte value like another or 8 bytes in case of x64)

This topic is closed to new replies.

Advertisement