like a lot of people here, I've also written a video renderer ;) runs at 60 fps with about 60 million blocks in view, most the time is spent on terrain generation.
I've got a single vertex buffer with positions and normals, which is shared by all chunks - each chunk has only an index buffer to bind when it's drawn. The chunk position is passed in as a uniform into the vertex shader to offset the positions.
The biggest performance boost you'll find is effective cross-chunk culling. Although I'm rendering millions of blocks, I'm never rendering more than a few hundred thousand faces. Each chunk only updates its buffer (and its neighbours') when its contents is changed.
Chunk size is also important. Too large, and you won't be able to use your generation time efficiently; too small and you might as well be drawing individual blocks. My chunks are 32^3, or about 30k blocks each, and they stack infinitely in all directions, so you can go as far up or down as you like.
It also helps to use any idle time you have to do generation. I haven't done this in my code, but when a chunk is requested but there isn't time to generate it you can push it to an idle queue, which you can generate when you next have spare time.
Hope that helps you can find my code on Github, just search for Bloxelcraft or Wren6991 if you'd like a look. It is in C++ though.