reading large files

Started by
20 comments, last by pacrugby 22 years, 6 months ago
Im just wondering about the best way to read large text files. Some of my files are 10 megs+. I use C++ 6.0 MFC on a brand new dell 1.8ghtz and Im expecting it to happen quite fast. However, the algo Im using now takes a few seconds. essentially the file is made up of 50,000 + lines with 13 or more numbers to be read in for each line. Thanks in advance.
Advertisement
taking a few seconds for a 10megs+ file is good.
Well, I don''t know if this is the best way but you could do something like this. It''s general and assumes that you do something with the data (copy it to another buffer, transmit it over the internet, process it...) before the next read.

#define BLOCK_SIZE 2048
.
.
.
char sBuffer[BLOCK_SIZE];
int nRet;

FILE *fp = fopen("yourfile.txt", "rb");

while(!feof(fp))
{
nRet = fread(sBuffer, BLOCK_SIZE, sizeof(char), fp);
//Do something with the data in the buffer, you have nRet
//number of characters in there.
}
fclose(fp);
...

I don''t think this is anything new, but this is how I wrote a file transfer program. It was able to transfer a 10Meg file over a network in a couple of seconds(well, maybe a couple more than a couple ). You can play with the block size to see if you can get different results, Leave it a multiple of 2 though.

Also you can check to see what the file size is, allocate a buffer of that size and do one read operation, but something just tells me that I shouldn''t trust reading 10Megs in one call.
But that might just be me.



Jason Mickela
ICQ : 873518
E-Mail: jmickela@pacbell.net
------------------------------
"Evil attacks from all sides
but the greatest evil attacks
from within." Me
------------------------------
"The paths of glory lead but to the grave." - Thomas GrayMy Stupid BlogMy Online Photo Gallery
the way I was doing it was by reading
each # recursively into my prog.
this was slow
then I started reading line by line
and using sscanf functions to get the #''s which
sped it up a lot.
I was just wondering if there was an even faster way like
reading the entire file into memory and the reading the #''s.
not to sure how reading is taken care of.

This is not directly related but for a college project i have to write an algorithm that searches through large (200+ mb) dna text files, to be executed on a supercomputer. The problem is we do not know the exact size of each file. You mentioned something about checking the file size first, then allocating enough memory for the text. How exactly do i check the file size and how do you allocate a non-constant amount of energy.

Any help appreciated.
you can read entire file into a buffer with fopen, fread(buffer, something or anther). look in the documentation for those commands.
This is not directly related but for a college project i have to write an algorithm that searches through large (200+ mb) dna text files, to be executed on a supercomputer. The problem is we do not know the exact size of each file. You mentioned something about checking the file size first, then allocating enough memory for the text. How exactly do i check the file size and how do you allocate a non-constant amount of memory.

Any help appreciated.
you can check file size by doing fseek(file, 0, SEEK_END), then position = ftell(file), position will hold the number of bytes in file.
If you include you can use std::istream for input and std::ostream for output. Open it with ios_base::in for input or ios_base::out for output. Read n characters with infile->read(myBuffer, n);

For the DNA file, you can get file info by calling the function
stat(fname, &file_stat_struct).

HSZuyd
The reason I think there must be a better way is because
I can load huge bmp files in no time >50meg.
why should text be any different.

This topic is closed to new replies.

Advertisement