Large (i mean absolutely huge) File Parsing (Solved)

Started by
16 comments, last by doodah2001 19 years, 11 months ago
[EDIT: nevermind, just read the entirety of the thread... doh]

-me

[edited by - Palidine on May 18, 2004 8:09:27 PM]
Advertisement
Don't solve the problem side step it. Use memory mapped files. Here is my code to open a file in memory mapped mode. Once open you adress it like it's a part of memory, ie no fseek or anything.

void CFile::OpenReadFile(void){	HANDLE FileHandle = (void*)CreateFile(m_Name.c_str(), GENERIC_READ, FILE_SHARE_READ,NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);	if (FileHandle == INVALID_HANDLE_VALUE)		throw FileUtilities::FileNotFound();	m_Size = (unsigned int)GetFileSize(FileHandle, NULL);	if (m_Size == 0)		return;	m_Handle = CreateFileMapping(FileHandle, NULL, PAGE_READONLY, 0, m_Size, NULL);	if (m_Handle == NULL)	{		std::runtime_error("Couldn't create file mapping");		return;	}	CloseHandle(FileHandle);	if (m_Handle == NULL)	{		std::runtime_error("Couldn't close file handle");		return;	}	m_Data = (char*)MapViewOfFile(m_Handle, FILE_MAP_READ, 0, 0, 0);	if (m_Data == NULL)	{		std::runtime_error("Couldn't map view of file");		return;	}}  

edit: breaking tables bad

[edited by - SiCrane on May 18, 2004 8:25:41 PM]
Chris Brodie
also, though this doesn''t solve the problem at hand, there are several freely available programs that you can plug into apache to get it to spit out seperate log files for each day/month/year/whatever. having a seperate log file for each month is a good idea anyway b/c then you can delete old log files and not have to worry about the bloat. just google around and i''m sure you''ll find something.

-me
Is line 25331 longer than 1022 characters? If you didn't want to worry about keeping the right buffer size you could read character by character into a string until you hit '\n'.

If you get the exact same problem with different C++ runtimes I doubt it has anything to do with being able to read the entire file. If you have grep or any other program that can display a specific line from a file, take a look at that line.

[edited by - igni ferroque on May 18, 2004 8:24:08 PM]
Free Mac Mini (I know, I'm a tool)
Ok, for some reason I have it working. Apparently it was a "SEARCH" rather than a "GET" or "POST" post that had over 300000 characters in it. I increased the line size to a million, an extreme amount and it seemed to work. Thanks for all the help anyways everyone.
MatDoodah2001@hotmail.comLife is only as fun as you make it!!!
quote:Original post by doodah2001
Ok, heres the basic outline of the code. This is the fstream version, however I have done bascially the same thing with FILE* and using windows specific with CreateFile().

int main()
{
ifstream in(filename);

if(!in.is_open())
return -1;

char line[1024];

while(!in.eof())
{
in.getline(line, 1024);

//call all the line parsing stuff here that just
//deals with the line, thats it
}
}

Its nothing complicated, it''s very simple. I have run it without doing any of my analysis functions but still get caught up on the same line. I have tried increasing the line size but that didn''t work because I still get caught up.

mat

I''m fairly sure that your problem occures when the line is longer than 1023 characters

Note that if the line is longer than 1023 characters, it sets the failbit of the stream. So the stream isn''t valid after that. !in.eof() will not fail, as that just tests for eof

You could either use std::getline, which is much safer because it operates on strings. Or you could use some code like this
if ( cin.rdstate() & ios::failbit ) {    // the line was too long         // turn off the stream''s ios::failbit    cin.clear( cin.rdstate() & ~ios::failbit );    // read and discard any unread characters, up to    // and including the delimiter character    while ( cin.good() && (cin.get() != delim) );}else{    //Process the line}

[/souce]

quote:Original post by doodah2001
Ok, for some reason I have it working. Apparently it was a "SEARCH" rather than a "GET" or "POST" post that had over 300000 characters in it. I increased the line size to a million, an extreme amount and it seemed to work. Thanks for all the help anyways everyone.


Hehe. That's the problem with fixed-size buffers. They're never big enough =) Glad you fixed the problem!

[edited by - igni ferroque on May 18, 2004 8:26:54 PM]
Free Mac Mini (I know, I'm a tool)
Fredizzimo,

Thats a good idea. I didn''t think of that. It seems as if it only happens on "SEARCH" posts so I can check for that, and just discard that line if I have to since I don''t care about "SEARCH" posts anyways.

mat
MatDoodah2001@hotmail.comLife is only as fun as you make it!!!

This topic is closed to new replies.

Advertisement