A new type of data?

Started by
7 comments, last by vallentin 16 years ago
Dancing B+trees linked at the bottom with propagated statistics and page momentae(cached pointers to pages in another dancing B+ trees signifiantly smaller);vacuum against time intervals;pointers in the main btree to pages-with-statistics changed at partial vacuum time;ballanced subtrees at page moments after vacuum there;threads that gain or lose speed to do partial vacuum continuously;dimensions needed in base of a subtree f(n):actual relation to preserve r(f(n)/crt,crt/f(nlastvacuum) r in {<=,>=};statistics are made global and partial per subtree and use counts and medium depth;using pair of before-last level where applicable keys to evenly split data between not full related pages and splitting in a custom number of pages;algorithm to partial vacuum start from the belt and rise up from left to right just creating new and deleting marginal values from other pages [Edited by - vallentin on April 4, 2008 11:34:48 AM]
Advertisement
Great idea, altough I'm not sure it's orthogonal enough...
Since your previous discussion about a new programming language, I've finished the iterator functionality of my Extensible Compressed Trie. I've also got it hosted on its own site with a Wiki. You can view it at its SourceForge project page. Compare it with what you've got and tell me what you think of it.
I haven't got the chance to look over the code so I will ask you directly.It is a B+(how?) or a B?
Put simply, it is a hybrid between a B-tree and a Trie leaning more heavily on the side of the Trie.

When a node in the tree has only one child, it holds the index and the link in the node itself without allocating any branch space for the rest of the structure. When that node gets a second child, it allocates a vector with (depending on which constructor you used) enough memory to hold both children's links in the vector using either the largest index as the maximum node index to use, or the maximum specified in the constructor.

If the maximum node size was specified in the constructor the write time is O(1) but the memory consumption is higher. If you didn't specify a maximum node size in the constructor, then that introduces the possibility of a node resize operation which has a O(n) with a small value of n representing the number of indexes in the existing node that need to be copied to the new node. The read time is always O(1) and the iterator operates on O(n) for every advance to the iterator.
Quote:Original post by samuraicrow
Put simply, it is a hybrid between a B-tree and a Trie leaning more heavily on the side of the Trie.

When a node in the tree has only one child, it holds the index and the link in the node itself without allocating any branch space for the rest of the structure. When that node gets a second child, it allocates a vector with (depending on which constructor you used) enough memory to hold both children's links in the vector using either the largest index as the maximum node index to use, or the maximum specified in the constructor.

If the maximum node size was specified in the constructor the write time is O(1) but the memory consumption is higher. If you didn't specify a maximum node size in the constructor, then that introduces the possibility of a node resize operation which has a O(n) with a small value of n representing the number of indexes in the existing node that need to be copied to the new node. The read time is always O(1) and the iterator operates on O(n) for every advance to the iterator.


why do you resize nodes-isn't it paginated?The rest should be U and L.Obviously you have already known that(btree in wikipedia).

[Edited by - vallentin on March 23, 2008 2:34:30 PM]
The B-Tree similarity ends with the way that compressed nodes are stored. If you specify a maximum partial-key value in the constructor then the nodes allocate the biggest size necessary and never resize. I guess I shouldn't have brought this up in a B-Tree thread when clearly my data structure is primarily a trie (Wikipedia article on trie) intended to store strings of a data type such as a character string or a string of tokens.

The reason you'd want a resizable node is if you don't know what the maximum value of the partial-key is at the time of the creation of the trie but want to be able to repeatedly read back in constant time what was stored once in linear time. For example, in a compiler I'm working on there is a different token ID assigned to every data type. Since we don't know how many class types are present until we reach the end of the file on the first pass, we have to keep resizing the trie until we have added the last class definition. On the second pass (when we do the code generation) we have to access the trie to look up the type of each variable each time a variable is accessed. That information is read from the trie many times more often then it is written to. This makes the resizing of the trie nodes an acceptable sacrifice.
And what problem are you trying to solve?
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:Original post by Promit
And what problem are you trying to solve?


It was a faster-to-retrieve data type.Maybe you were talking to samuraicrow

This topic is closed to new replies.

Advertisement