It's your usual doubly linked list, and consumes no more memory than one (2 pointers per node, 1 pointer for the list head), but it allows for concurrent local insertion, self-removal and overtaking forward traversal. There are no formal locks, though the functions can block for a time if there are conflicts.
Now I'm the first to admit, I'm not sure if it's really lock-free or just looks that way. The basic mechanism of operation is message passing. When a node is removing, and the node previous to it is also removing, it will wait for a message. If the node to next to it is removing, it will send a message. These messages allow the threads to correctly restructure the list.
This mechanism is also used to insert nodes, though in this case it is assumed the previous node will not be removing, so only the next node will need a message. It is a safe assumption, I think, because if a thread is inserting, it must have control of the node it is inserting after. If it doesn't, then something else could remove that node, and you get a reference to an undefined space. I assume the programmer is smart enough to avoid this kind of fundamental error. An easy way to do this is to make sure you have an iterator attached to the reference node, which I will now discuss.
Iterator traversal is achieved by having the iterators insert themselves into the list, and sort-of "stack" on top of their current data node. This means you can start iterating from any node a thread has control of. The aforementioned message passing mechanism can also tell if a node is data or iterator, and act appropriately. This includes blocking the removal of data nodes with iterators attached, which means you can safely insert from them.
Note that inserting nodes and starting iterators at the start of the list is enabled by the use of a "virtual" node representing a node just before the start of the list.
To be clear, I have not rigorously proved the correctness of this code, but my testing has not revealed any problems (mind you this is only on a dual core system). This is mainly because I don't have the formal knowledge required to perform such a proof. However, after countless hours of informal analysis (i.e. thinking), I don't believe there are any errors.
Also, there are performance issues with the various wait loops. Specifically, they should spin a certain number of times before yielding, if they ever yield at all. However, I again don't know what strategy to use to balance the spin counts for optimal performance. It probably depends on the current usage too, so perhaps some dynamic system should be used. In my testing, using a load balancing algorithm between the threads worked very well, but I don't think that's an ideal solution. I'm not sure how to go about this exactly.
Finally, I could probably add reverse traversal too. I don't see any obvious problems with it, but I just haven't bothered to yet. Same for insertion previous to a node. However, for my purposes (the as yet unfinished concurrent garbage collector), these operations are unnecessary.
Let me know what you think. The code is well commented. I may also post it on github or some similar service eventually. What would you recommend?