It might be easier to think of it this way:
Vector2 vertices =
Vector2(-1.0f, -1.0f), // Index 0
Vector2(1.0f, -1.0f), // Index 1
Vector2(-1.0f, 1.0f), // Index 2
Vector2(1.0f, 1.0f), // Index 3
Triangle indices =
Triangle(0, 3, 1), // Bottom left to top right to bottom right triangle
Triangle(0, 2, 3), // Bottom left to top left to top right triangle
Each index is referencing a point on the quad. You are ultimately drawing 2 triangles to achieve this. For quads, index buffers don't help all that much, but for something like a cube the amount of data sent to the gpu can get a lot smaller (especially when encoding more than just positions in the vertices) when using index buffers of 16 bit offsets instead of 3 whole floats again.
Open GL takes things as flat arrays of floats and integral types. As such, you have to think of the indices and vertices as groups inside these arrays. Does that make more sense?