what would be the best way to go
It depends on what your collision detection needs are, as collision detection can be expensive. Can you provide more information with regard to what collision parameters you need? Height-only? Penetration distance for complex shapes (sphere, cylinder, OBB, AABB..)?
For instance, if you only need terrain height for a particular cell (unit square), it is quicker (versus a "collision engine") to do a look-up of y in an array map simply accessed with x-z coordinates. If you need the height within a cell, you can get the height at the four corners and interpolate (either linearly or with a weighted average) if each cell is a flat quad. If the four corners of a quad are used as triangle vertices, you'll need to know whether the triangles form a valley-fold or mountain-fold.
would like something that I can easily make changes to in the future as things get more complex
To "easily" make changes you'll have to decide what "more complex" means, similar to the questions above regarding what collision parameters you'll need. I.e., you can't "easily" go from a map-height look-up scheme (mentioned above) to collision detection for shape-to-shape (sphere-triangle, cylinder-OBB, etc.)
If you have little or no experience with collision detection, can you put into words what you want to do, rather than how you want to do it?
EDIT: With regard to adding textures, that will likely be unrelated to collision detection. In 3D, collision detection is largely a numerical evaluation in which texture coordinates or colors has no part. I.e., more often, collision detection may use the same vertex position data used for rendering, but doesn't need the other graphics related parameters. You can, however, do a lookup of the texture for a cell, if you use a "wall" texture for "impassable," to determine if collision detection is needed. If the cell is "impassable" you can make decisions on response at that point without any further "collision" evaluations. That could also be implemented in the collision engine if you use collision groups defined, perhaps, simply with a bitwise enumeration.