you should get it working first, then multithread it. multithreading is an optimization, you do an optimization after you've gathered profiling data to pick the right places to optimize, and you profile after you have a working solution. so I'm afraid technically you cannot have multithreading headache while you're trying to understand the basic concept ;)
and yes, that's how binning works, but usually the screen is more divided into more than 4. 4 would for most resolutions exceed the cache size for framebuffer tiles. that's why I've made it in a loop.
and I suggest again to just add an index to the triangle to your bins, not the full vertices. An ID can be 16,24 or 32 bit, while your 3x vec4/float4 are 48byte, that's wastefull, especially once you've created more bins.
there is no point in spending more memory on bins than you spent on pixels per tile, as the whole binning idea is to save memory bandwidth by being cachefriendly.