On a 640x640 map your method was 195 times faster. But on a 3600x1800 it was only 65 times faster. Do you know why?
See my above post.
Definitively related to memory access pattern (cache effect). This is not only visible from the discrepancy with 3600x1800 but also with the tiny 64x64 map.
For the tiny map, everything fits into the cache, so the list approach is still very fast (only 2x difference, almost certainly due to algorithmic difference). The large map benefits both from algorithmic difference and from a more cache-friendly access pattern, allowing most or all of the data to be cached when accessed in the heap implementation. For the huge map, the heap is having too many "holes" in the access pattern to be very cache friendly, so this degenerates to the same "kind of random access" factor as with the list implementation. There is however still an algorithmic advantage, so it still runs a lot faster (only not that much faster).