• Advertisement
Sign in to follow this  

best way of doing large scale search&replace in files in c++?

This topic is 4347 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello. I need to write an application that searches trough a huge number of text files, opens them, and searches for a couple of conditions. If they are met, text should be inserted into the files (not at the end of file, anywhere). 1. What is the best way of doing this? Im not to familiar with c++ file i/o. Can i search directly for a string in a file? Or do i need to read the entire content into a buffer first? And then search the buffer? 2. Does std::string support containing large buffers well, such as a text file of a few kilobytes? And can it do substring insertion? If not, what should i use? (someone said stringstream, but looking at the docs it dosnt seem to have methods for find_substring() etc..) Thanks in advance.

Share this post


Link to post
Share on other sites
Advertisement
You can search for a string directly in the file but you would not be able to do insertions without reading the file into the buffer. It would also perform a lot faster if you read the file into a buffer with one fread and searched it in memory rather than byte by byte from the file, for example using strncmp.

I believe std::string can do substring insertion.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster

2) of course std::string can support large string and you can do a std::find over them

3) but I would consider using a stream instead, you can directly open the file as a char stream and run your own parser over it

boost offers more elegant solution for better performances tough, but it is less handy to use... that's boost

Share this post


Link to post
Share on other sites
This interests me too, as I'm going to write a high-performance HTML scraper soon.. probably in C or C++.

Share this post


Link to post
Share on other sites
One way to do it (fairly standard way) is simply to open the file, read it in a character at a time, doing the string matching.

Also have open an output file and for any non-matching characters just output them, otherwise output the replacement.

At the end you can simply copy the new file over the existing one.

Python might be a better choice for a Q'n'D version, or use sed ;) YMMV as always.

Be very careful about IO assumptions - reading a character at a time probably isn't all that much slower than reading the whole file at once (esp. for large files). The OS will perform reasonable buffering anyway.

Share this post


Link to post
Share on other sites
With regard to JuNC'c comments about buffering, I found when writing a console text editor under Win98, that large files took FOREVER to open when reading a character at a time compared to freading them in in 1024 chunks which made me wonder whether the OS doing file buffering really does make the issue moot.

Perhaps there was another reason for this that I'm missing.

Share this post


Link to post
Share on other sites
It might be a little overly advanced and more complicated than is needed for this purpose, but I've been experimenting recently with memory mapped files (for entirely different reasons). Basically, you have a file open, but can read from/write to it as though it was simply an array in memory. Depending on the OS and various flags, it will most likely be buffered pretty well, so I'm guessing all the reading and writing won't be severely hurt by hard drive latency; you'll essentially be doing all the work in memory anyway, without having to read the file into a temporary buffer manually, and write it back out to the file when you're done.

In Windows, you'd use CreateFileMapping() and MapViewOfFile(). In Linux, you'd use mmap(). Maybe something to try if you're bored/adventurous. Working on an array in memory would probably be easier than working with a file stream of some sort.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement