I've been using append buffers for a while out of convenience, with the assumption that I'll perhaps switch to other algorithms after I get other things working. Kind of like a temporary thing that works and isn't prohibitively slow for what I want. I find them kind of nice for what they are but I haven't gotten around to comparing their performance to other algorithms. I know I could do this myself but I was hoping someone else might have experience with this.
One thing I use them for is to generate arbitrary amounts of data from a compute thread. A thread might output 0-N entries of data and you just call buffer.Append however many times you want and it works. There's a lot of other ways you could do this though. For example, is there an advantage to using append buffers over InterlockedIncrement'ing an index value? This is probably the simplest test showing off how lazy I am for not trying it out, but I always wondered if append buffers did something more complex than this. The actual counter variable needs to be 4096 byte aligned for some reason (is this a bug?) unlike a regular integer, so maybe this strict alignment rule offers some sort of better perf somehow?
There's also algorithms like scan or histopyramid that could be used to do a data compaction operation, is it assumed that these will generally be faster than using an append buffer? I understand they're rather different and performance varies based on the data in the latter case, but it would be nice to know if append buffers are just always the slowest option for things like stream compaction or filter operations.
E: Also, is there an advantage to using consume buffers instead of addressing the buffer directly? Like if I create a buffer with N items, I can just reference the items like buffer[index], why bother consuming? I guess it resets the counter but that's kind of a trivial thing to do.