Archived

This topic is now archived and is closed to further replies.

Chronoslade

Dividing large text files into smaller ones

Recommended Posts

Is there a simple way to divide a large text file into 20 smaller files of equal or close to equal size? If so how would I go about this? the Text file contains string data and I am trying to do this with c or c++. "There is humor in everything depending on which prespective you look from."

Share this post


Link to post
Share on other sites
Well, obviously the first thing you need to do is read the text file into a temporary buffer and get its file size. After that you''d just divide the buffer in 20 parts of equal size and output it to 20 files. What''s the problem?

Of course, since it''s a text file you might want to do some parsing so that you don''t cut the file off in mid-sentence or something like that. Ya know what I mean?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Yeah... you could do something like this (just typed it up, and is untested, email me at BillyB@mrsnj.com if you need more help, or if this works and you want to thank me =D ):

#include

unsigned long fsize(FILE *in)
{
unsigned long sz;
fseek(in,0,2); //Seek to end of file
sz=ftell(in); //Get file position
fseek(in,0,0); //Seek back to beginning.
}

unsigned long ChunkSize(FILE *in, short numfiles)
{
unsigned long TotalSize=fsize(in);
unsigned long cSize;
cSize = TotalSize/numfiles;
return cSize;
}
//In file, Start Name, Chucnk #, ChunkSize.
void WriteChunk(FILE *in, char *sName, short num, unsigned long cSize)
{
FILE *out;
char tmp[256], chr;
unsigned long ctr;
sprintf(tmp,"%s%d.txt",sName,num); //Create new file name. name+number.txt
out=fopen(tmp,"wb"); //open file for write binary
for (ctr=0;ctr!=cSize;++ctr)
{
chr=getc(in);
if (feof(in)) break; //Break if eof.
putc(chr,out);
}
fclose(out);
}

int main(void)
{
unsigned long cSize, nFiles;
short ctr;
FILE *in;
nFiles = 20; //20 files.
in = fopen("test.txt","rb");
cSize = ChunkSize(in,nFiles);
for (ctr=0;ctr!=nFiles;++ctr)
{
WriteChunk(in,"out",ctr,cSize);//Write chunk.
}
fclose(in);
return 0;
}

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Yeah... you could do something like this (just typed it up, and is untested, email me at BillyB@mrsnj.com if you need more help, or if this works and you want to thank me =D ):

#include

unsigned long fsize(FILE *in)
{
unsigned long sz;
fseek(in,0,2); //Seek to end of file
sz=ftell(in); //Get file position
fseek(in,0,0); //Seek back to beginning.
return sz;
}

unsigned long ChunkSize(FILE *in, short numfiles)
{
unsigned long TotalSize=fsize(in);
unsigned long cSize;
cSize = TotalSize/numfiles;
return cSize;
}
//In file, Start Name, Chucnk #, ChunkSize.
void WriteChunk(FILE *in, char *sName, short num, unsigned long cSize)
{
FILE *out;
char tmp[256], chr;
unsigned long ctr;
sprintf(tmp,"%s%d.txt",sName,num); //Create new file name. name+number.txt
out=fopen(tmp,"wb"); //open file for write binary
for (ctr=0;ctr!=cSize;++ctr)
{
chr=getc(in);
if (feof(in)) break; //Break if eof.
putc(chr,out);
}
fclose(out);
}

int main(void)
{
unsigned long cSize, nFiles;
short ctr;
FILE *in;
nFiles = 20; //20 files.
in = fopen("test.txt","rb");
cSize = ChunkSize(in,nFiles);
for (ctr=0;ctr!=nFiles;++ctr)
{
WriteChunk(in,"out",ctr,cSize);//Write chunk.
}
fclose(in);
return 0;
}

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Sorry for the double post, in the second post I fixed my code (I forgot to do a return in the first function).

Billy

Share this post


Link to post
Share on other sites
The program you listed above that will be able to handle a 500 meg file. Thats really the only purpose of this program is to divede the 500 megs into 20 files of about 25 megs each... ITs basically impossible to work with something that big.....

"There is humor in everything depending on which prespective you look from."

Share this post


Link to post
Share on other sites

  1. Get the file size. You can do this by opening the file and fseek''ing to the end, then getting the cursor position.

  2. Divide this number into the number of smaller files you want. In your case that number is 20.

  3. Go back to the beginning of the open file (let''s call it A). Open a new file, which I shall call B.
  4. Copy (or extract) the first (sizeof(A)/number_of_smaller_files) bytes of A into it. Close B.

  5. If we''ve hit the end of A, break/return/exit. Otherwise open a new file using the same file handle. Go to 4.


  6. Sit back and marvel at your programming prowess






I wanna work for Microsoft!

Share this post


Link to post
Share on other sites