Wait... is the computation actually the expensive part of this? I'd expect any reasonably optimal solution to be limited by the performance of reading/writing from disk.
There's no point waiting for the entire file to be read before you start computation - otherwise your CPU is just sitting idle while data moves all the way from disk to main memory. And there's no point waiting till the entire computation is complete to start writing it back out again, for the same reason.
If we are going for maximal performance, I'd advocate reading in blocks of 4kb-16kb at a time, performing the replacement on the block, and then writing it out immediately.