fastest way to read in a file

Started by
14 comments, last by omegasyphon 22 years, 10 months ago
What about _open? I suspect that just calls CreateFile in turn, though.
Advertisement
Look back about a month, I ask the same question. If you don''t mind using Win32 api''s I found that this was the fastest way:

  void CFile::Load(const string& a_Filename, vector<char>& a_Data){	HANDLE Handle = CreateFile(	a_Filename.c_str(),	GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING,	FILE_ATTRIBUTE_NORMAL, NULL);	DWORD Size = GetFileSize(Handle, NULL);	HANDLE Mapping = CreateFileMapping(Handle, NULL, PAGE_READONLY, 0, Size, NULL);	CloseHandle(Handle);	LPVOID BaseAddress = MapViewOfFile(Mapping, FILE_MAP_READ, 0, 0, 0);	a_Data.resize(Size);	memcpy(a_Data.begin(), BaseAddress, Size);	CloseHandle(Mapping);	UnmapViewOfFile( BaseAddress );};  


I''m not doing this optimally due to the way my code is structured. Better would be to use the filemapping and seek around in it for the data you need...providing you might not need the whole file.

This method is also optimal for pak file use(again which i don''t personally use). When i went looking for this I was certain that the OS would know of a faster buffered method of accessing files than one character at a time like my older method...

For more info look back to the post for the text the kind guy who answered my question. I think i recall he saying that using this with pak files can net up to a 200% increase in throughput.

HTH



Chris Brodie
http:\\fourth.flipcode.com
Chris Brodie
The method gimp mentions looks indeed good for win32 specific (large) file access. Beware, however, that if you want to map from a specifc offset in a file, the offset has to match the system''s memory allocation granularity. That is, the offset must be a multiple of of the allocation granularity. You can use the GetSystemInfo function to get the allocation granularity.

HTH
I can definitively say, without any doubt, that it all depends I did some testing. I tried a memory mapped file, reading the entire file with one read using ReadFile, reading the entire file with fread, reading 1k blocks with fread and reading the entire file with overlapped I/O. First I tested with a 4mb file. I got fairly consistant results across all five. I got roughly 2.8mb/s when it wasn''t in cache and 80 to 100mb/s when it wasn''t. I have no idea why today I get 80 to 100mb/s against the file cache when yesterday I got 100 to 130mb/s. I''ll figure that out later. The exception was the async I/Os. They only got about 30mb/s against cache. I believe the reason is that the I/O never came back pending so in the end it was just unneeded overhead.

When I switched to a 256mb file things were a little differant. Reading in the entire file with ReadFile or fread dropped to about 1mb/s. File mapping jumped to between 3 and 4mb/s. The fread of blocks went through the roof and hit 20mb/s. The machine only has 224mb of memory so it can''t all fit in cache. I have a hard time believing a eide 20gb 10,000rpm drive can deliver data that fast though so I have to assume the cache had some baring. The test isn''t really comparable though because I''m not sticking the entire file in memory at once with it. The async I/O (overlapped) began to shine though. It got 5 to 6mb/s. I repeated the memory mapped and async test several times. There was a fairly significant variance in both, 3 to 4 on one and 5 to 6 on the other, but the async consistantly beat out the memory mapped. I repeated all the tests a couple of times just to be sure nothing unusual happened, but clearly the memory mapped and async beat out reading the entire file in one read.

I still have to figure out what was up with the freads. The easiest is to eliminate memory contention by not loading the entire file in the other two tests. That almost invalidates the async test since you would have to be able to process the file out of order to use it for record by record processing. I also need to get a better handle on how the Windows file cache works now since I think the last time I tested was on either NT 4.0 or Windows 95. I also need to try a wider variety of scenerios. I have guests coming tonight though so that will have to be some other time.
Keys to success: Ability, ambition and opportunity.
well my next question is why doesnt this work

int dat[]={0};

data.txt //data file
0 -10 10 //data
0 20 20 //data

while (!fin.eof())
{
for (x=0;x<6x++)
{
cin>>dat[x];
}
}
Because your array can only hold 1 int!
......your code....
int dat[]={0};
.............
this is a weird array declaration. Normally you declare the array´s capacity( int dat[6]).Because of you initialize it, the compiler can guess it´s capacity(1 element initialized).

.-If you want to have an expanding array,use STL´s vector.







What the hells!
What the hells!

This topic is closed to new replies.

Advertisement