Dividing large text files into smaller ones

Started by
7 comments, last by Chronoslade 22 years, 4 months ago
Is there a simple way to divide a large text file into 20 smaller files of equal or close to equal size? If so how would I go about this? the Text file contains string data and I am trying to do this with c or c++. "There is humor in everything depending on which prespective you look from."
"There is humor in everything depending on which prespective you look from."
Advertisement
Any help?
"There is humor in everything depending on which prespective you look from."
Well, obviously the first thing you need to do is read the text file into a temporary buffer and get its file size. After that you''d just divide the buffer in 20 parts of equal size and output it to 20 files. What''s the problem?

Of course, since it''s a text file you might want to do some parsing so that you don''t cut the file off in mid-sentence or something like that. Ya know what I mean?
Yeah... you could do something like this (just typed it up, and is untested, email me at BillyB@mrsnj.com if you need more help, or if this works and you want to thank me =D ):

#include

unsigned long fsize(FILE *in)
{
unsigned long sz;
fseek(in,0,2); //Seek to end of file
sz=ftell(in); //Get file position
fseek(in,0,0); //Seek back to beginning.
}

unsigned long ChunkSize(FILE *in, short numfiles)
{
unsigned long TotalSize=fsize(in);
unsigned long cSize;
cSize = TotalSize/numfiles;
return cSize;
}
//In file, Start Name, Chucnk #, ChunkSize.
void WriteChunk(FILE *in, char *sName, short num, unsigned long cSize)
{
FILE *out;
char tmp[256], chr;
unsigned long ctr;
sprintf(tmp,"%s%d.txt",sName,num); //Create new file name. name+number.txt
out=fopen(tmp,"wb"); //open file for write binary
for (ctr=0;ctr!=cSize;++ctr)
{
chr=getc(in);
if (feof(in)) break; //Break if eof.
putc(chr,out);
}
fclose(out);
}

int main(void)
{
unsigned long cSize, nFiles;
short ctr;
FILE *in;
nFiles = 20; //20 files.
in = fopen("test.txt","rb");
cSize = ChunkSize(in,nFiles);
for (ctr=0;ctr!=nFiles;++ctr)
{
WriteChunk(in,"out",ctr,cSize);//Write chunk.
}
fclose(in);
return 0;
}
Yeah... you could do something like this (just typed it up, and is untested, email me at BillyB@mrsnj.com if you need more help, or if this works and you want to thank me =D ):

#include

unsigned long fsize(FILE *in)
{
unsigned long sz;
fseek(in,0,2); //Seek to end of file
sz=ftell(in); //Get file position
fseek(in,0,0); //Seek back to beginning.
return sz;
}

unsigned long ChunkSize(FILE *in, short numfiles)
{
unsigned long TotalSize=fsize(in);
unsigned long cSize;
cSize = TotalSize/numfiles;
return cSize;
}
//In file, Start Name, Chucnk #, ChunkSize.
void WriteChunk(FILE *in, char *sName, short num, unsigned long cSize)
{
FILE *out;
char tmp[256], chr;
unsigned long ctr;
sprintf(tmp,"%s%d.txt",sName,num); //Create new file name. name+number.txt
out=fopen(tmp,"wb"); //open file for write binary
for (ctr=0;ctr!=cSize;++ctr)
{
chr=getc(in);
if (feof(in)) break; //Break if eof.
putc(chr,out);
}
fclose(out);
}

int main(void)
{
unsigned long cSize, nFiles;
short ctr;
FILE *in;
nFiles = 20; //20 files.
in = fopen("test.txt","rb");
cSize = ChunkSize(in,nFiles);
for (ctr=0;ctr!=nFiles;++ctr)
{
WriteChunk(in,"out",ctr,cSize);//Write chunk.
}
fclose(in);
return 0;
}
Sorry for the double post, in the second post I fixed my code (I forgot to do a return in the first function).

Billy
The program you listed above that will be able to handle a 500 meg file. Thats really the only purpose of this program is to divede the 500 megs into 20 files of about 25 megs each... ITs basically impossible to work with something that big.....

"There is humor in everything depending on which prespective you look from."
"There is humor in everything depending on which prespective you look from."

  1. Get the file size. You can do this by opening the file and fseek''ing to the end, then getting the cursor position.

  2. Divide this number into the number of smaller files you want. In your case that number is 20.

  3. Go back to the beginning of the open file (let''s call it A). Open a new file, which I shall call B.
  4. Copy (or extract) the first (sizeof(A)/number_of_smaller_files) bytes of A into it. Close B.

  5. If we''ve hit the end of A, break/return/exit. Otherwise open a new file using the same file handle. Go to 4.


  6. Sit back and marvel at your programming prowess





I wanna work for Microsoft!
i think that recombining the files can be tricky, since eof chars and thinhs are added to them.

This topic is closed to new replies.

Advertisement