I'm using this tutorial pretty heavily, and its section on textures is confusing me to some degree.
Don't know what exactly is confusing you, so let me try to shed some light onto the entire thing. Some of the following is probably already known by you, but I need to mention it for completeness.
1.) You need a vertex data stream with the vertices' position. Engine often batch sprites to reduce draw calls. This requires the all vertex positions, although coming from different sprites, to be specified w.r.t the same space, e.g. the world space or the view space. Hence any motion applied to the sprites is already to be considered on the CPU side, before a VBO is filled.
Okay, you can use instancing and hence handle things another way, but that is an advanced topic.
2.) You need a vertex data stream with the vertices' uv co-ordinates. If you have only one texture to deal with, you need just one uv stream. If you have several texture, you may need more than a single uv stream. But it is also possible to use one and the same uv for several textures (e.g. when using a color map and a normal map with the same layout in texture space).
3.) For a sprite you usually don't use normals, because sprites are just flat (letting some exotic variants aside). Otherwise, if normals are available, you need a vertex data stream for them, too.
4.) Whether you use one VBO per data stream, or put all of them into a single VBO, is usually a question of how dynamic the data in each stream is. For example, sprites are often computed frame by frame and transferred to the GPU in a batch. When the CPU computes both the vertex positions and uv co-ordinates on the fly, then both streams are dynamic and can be easily packed into a single VBO. On the other hand, if the CPU computes just vertex positions but re-uses the uv co-ordinates as they are again and again, then the vertex position stream is dynamic but the uv co-ordinates stream is static; this would mean 2 different VBOs when looking at performance.