Memory bandwidth is the bottleneck these days.
Bring on the triple channel! I was very upset when I learned that DDR3 implementations weren't supporting triple channel! I think it was only one or two intel boards that would. Of course you could always build a system using server hardware.
I was far more disappointed when I read several articles about how "we don't need triple channel memory". Well ya no shit we can't make good use of triple channel if it isn't available to develop on numb-nuts!
Quad channel on DDR4 shows next to no improvement, nevermind triple channel.
Why does it show no improvement?
Let's talk about that, actually.
Can the OS not facilitate operations on multiple memory channels in parallel?
Does the software showing no improvement not make use of multiple channels?
The OS cannot see the multiple channels, in fact. More on this in a moment.
It does seem to me though, that if you create a program that creates blocks of data on each channel it is a trivial act to utilize all four channels and achieve that maximum throughput.
How do you create blocks of data on each channel? I'll wait.
You have to remember, first and foremost, that any given program does not interact with the actual memory architecture of the system. Not ever. Let's work from the bottom up - a single stick of memory. No, wait, that's not the bottom. You have individual chips with internal rows and columns, themselves arranged into banks on the DIMM. Access times to memory are variable depending on access patterns within the stick!
But let's ignore the internals of a stick of RAM and naively call each of them a "channel". How do the channels appear to the OS kernel? Turns out they don't. The system memory controller assembles them into a flat address space ("physical" addressing) and gives that to the kernel to work with. Now a computer is not total chaos, and there is rhyme and reason to the mapping between physical address space and actual physical chips. Here's an example. There are no guarantees that this is consistent across any category of machines, of course. Also note that the mapping may not be zero based and please read the comments in that article regarding Ivy Bridge's handling of channel assignment.
Oh but wait, we're not actually interacting with any of that in development. All of our allocations happen in virtual address space. That mapping IS basically chaos. There's no ability to predict or control how that mapping will be set up. It's not even constant for any given address during the program's execution. You have no ability to gain any visibility into this mapping without a kernel mode driver or a side channel attack.
Just a reminder that most programmers don't allocate virtual memory blocks either. We generally use malloc, which is yet another layer removed.
The answer to "how do you create blocks of data on each channel" is, of course, that you don't. Even the OS doesn't, and in fact it's likely to choose an allocation scheme that actively discourages multi-channel memory access. Why? Because it has a gigantic virtual<->physical memory table to manage, and keeping that table simple means faster memory allocations and less kernel overhead in allocation. It's been a while since I dug into the internals of modern day kernel allocators, but if you can store mappings for entire ranges of pages it saves a lot of memory versus having disparate entries for each and every memory page. Large block allocations are also likely to be freed as blocks, making free list management easier. Long story short, the natural implementation of an allocator leans towards creating contiguous blocks of memory. How do you deal with that as a CPU/memory controller designer? Based on the link above, you simply alternate by cache line. Or, you scramble the physical address map to individual DRAM banks and chips. Remember that Ivy Bridge channel assignment bit? Yep, that's what happened.
Frankly, the benefits of multi-channel memory probably show up almost exclusively in heavily multitasking situations that are heavy on memory bandwidth. I bet browsers love it :D