I think one easy optimization would be that for each of those 512x512 blocks you create an index when exporting your leveldata that is grouped into batches of all those smaller tiles that use the same textures, then load it into the game with all other needed data(or construct the index at loadtime) and draw in that order.
Another would be to put more than one of those 32x32 textures into larger texture and fix up the texture coordinates to make up bigger batches.
I'm working on the latter, basically creating 4 larger textures with the smaller ones for a tile. The problem I can't figure out is that for each of the 256 chunks in the 512x512 tile, the texture is not a 1:1 ratio. I essentially need to scale the texture such that it gets painted smaller so it looks realistic in the world. I'm just not sure how to create this larger texture CPU side with scaling.
e.g: One of the 256 chunks has 4 color textures, each scaled down to 10% of its original version to be smaller and repeat multiple times in the 32x32 area of a chunk.