Archived

This topic is now archived and is closed to further replies.

fastest way to read in a file

This topic is 6035 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

don''t know for shure... but I''d say they''re about the same... if there is a diffrence then it would be a very small one... small enough not to matter... disk io is exreamly slow so the few clock ticks picked up by using one or the other is going to be so minamal that it isn''t going to matter which you use...

Share this post


Link to post
Share on other sites
Opening and closing the same file repeatedly fopen/fclose was able to do 4,399 per second and ifstream::open/ifstream::close could to 4,342 so if you open around 335,000 files fopen will save you ONE second. So, how many are you opening?

Always one for performance trivia I just has to try the Windows API CreateFile. It did 7,564 per second. So you would only have to open 10,500 files to save a second with it.

Edited by - LilBudyWizer on June 5, 2001 3:26:08 PM

Share this post


Link to post
Share on other sites
I supose he asks about wich way of reading files, C-like of C++ streams is fastest. Normally you don´t open many files at runtime.

If you want to write fast to a file, you have to batch as much data as you can instead of writing many small pieces...

What the hells!

Share this post


Link to post
Share on other sites
Sorry, I shouldn't have taken that so literally. I thought you were asking about the open itself so I was demonstrating that it isn't the open you have to worry about, but also how to answer questions like that. I compared a fread to an ifstream.read. Surprisingly there was a pretty dramatic differance. I wrote a 4mb file then used both to read it back. The initial read with fopen read about 2.8mb/s, but once the file was in the file cache it reread the file at 100 to 130mb/s. The ifstream got about 2.3mb/s. It didn't matter if the file was in cache or not. I don't generally use ifstream so I may have made a mistake, but what I was using was:

  
ifstream ifs;
ifs.open("File1.txt", ios::in | ios::binary);
ifs.seekg(0);
for (int i = 0; i < 4096; i++)
ifs.read(ucBuffer, 1024);
ifs.close();


Edited by - LilBudyWizer on June 5, 2001 8:35:24 PM

Share this post


Link to post
Share on other sites
Remember that the processor is faster than your disk. So, although Win32 API calls will be marginally faster than C stdio calls, which in turn will be marginally faster than C++ iostream calls, 99% of the time, the processor will be waiting on the data. So choose whatever method you''re most comfortable with, use optimal algorithms, and I''m sure you won''t notice the difference.

Share this post


Link to post
Share on other sites
Please note that if you use fstreams, and Visual C++ 6 with the C++ library that comes with it, you''re likely to run into a very annoying bug: if you create a fstream (i- or o-) by name, it will be unbuffered, leading to a huge performance hit.

For fixes, see this page. Another option is to use a different STL/IOstreams implementation altogether, and STLport is a good (and free!) alternative in that case.

HTH

Share this post


Link to post
Share on other sites
Look back about a month, I ask the same question. If you don''t mind using Win32 api''s I found that this was the fastest way:

  
void CFile::Load(const string& a_Filename, vector<char>& a_Data)
{
HANDLE Handle = CreateFile( a_Filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD Size = GetFileSize(Handle, NULL);
HANDLE Mapping = CreateFileMapping(Handle, NULL, PAGE_READONLY, 0, Size, NULL);
CloseHandle(Handle);
LPVOID BaseAddress = MapViewOfFile(Mapping, FILE_MAP_READ, 0, 0, 0);
a_Data.resize(Size);
memcpy(a_Data.begin(), BaseAddress, Size);
CloseHandle(Mapping);
UnmapViewOfFile( BaseAddress );
};


I''m not doing this optimally due to the way my code is structured. Better would be to use the filemapping and seek around in it for the data you need...providing you might not need the whole file.

This method is also optimal for pak file use(again which i don''t personally use). When i went looking for this I was certain that the OS would know of a faster buffered method of accessing files than one character at a time like my older method...

For more info look back to the post for the text the kind guy who answered my question. I think i recall he saying that using this with pak files can net up to a 200% increase in throughput.

HTH



Chris Brodie
http:\\fourth.flipcode.com

Share this post


Link to post
Share on other sites
The method gimp mentions looks indeed good for win32 specific (large) file access. Beware, however, that if you want to map from a specifc offset in a file, the offset has to match the system''s memory allocation granularity. That is, the offset must be a multiple of of the allocation granularity. You can use the GetSystemInfo function to get the allocation granularity.

HTH

Share this post


Link to post
Share on other sites
I can definitively say, without any doubt, that it all depends I did some testing. I tried a memory mapped file, reading the entire file with one read using ReadFile, reading the entire file with fread, reading 1k blocks with fread and reading the entire file with overlapped I/O. First I tested with a 4mb file. I got fairly consistant results across all five. I got roughly 2.8mb/s when it wasn''t in cache and 80 to 100mb/s when it wasn''t. I have no idea why today I get 80 to 100mb/s against the file cache when yesterday I got 100 to 130mb/s. I''ll figure that out later. The exception was the async I/Os. They only got about 30mb/s against cache. I believe the reason is that the I/O never came back pending so in the end it was just unneeded overhead.

When I switched to a 256mb file things were a little differant. Reading in the entire file with ReadFile or fread dropped to about 1mb/s. File mapping jumped to between 3 and 4mb/s. The fread of blocks went through the roof and hit 20mb/s. The machine only has 224mb of memory so it can''t all fit in cache. I have a hard time believing a eide 20gb 10,000rpm drive can deliver data that fast though so I have to assume the cache had some baring. The test isn''t really comparable though because I''m not sticking the entire file in memory at once with it. The async I/O (overlapped) began to shine though. It got 5 to 6mb/s. I repeated the memory mapped and async test several times. There was a fairly significant variance in both, 3 to 4 on one and 5 to 6 on the other, but the async consistantly beat out the memory mapped. I repeated all the tests a couple of times just to be sure nothing unusual happened, but clearly the memory mapped and async beat out reading the entire file in one read.

I still have to figure out what was up with the freads. The easiest is to eliminate memory contention by not loading the entire file in the other two tests. That almost invalidates the async test since you would have to be able to process the file out of order to use it for record by record processing. I also need to get a better handle on how the Windows file cache works now since I think the last time I tested was on either NT 4.0 or Windows 95. I also need to try a wider variety of scenerios. I have guests coming tonight though so that will have to be some other time.

Share this post


Link to post
Share on other sites
Because your array can only hold 1 int!
......your code....
int dat[]={0};
.............
this is a weird array declaration. Normally you declare the array´s capacity( int dat[6]).Because of you initialize it, the compiler can guess it´s capacity(1 element initialized).

.-If you want to have an expanding array,use STL´s vector.







What the hells!

Share this post


Link to post
Share on other sites