Fast floating point parser

Started by
17 comments, last by silvermace 15 years, 8 months ago
So I changed to a preallocated vector via:
items.reserve(m_cols).
However, this had almost no impact :(
Guess it's time to break out the real profiler eh?

Quote:Original post by AntheusDo you have RAM, or are you already paging?

I have 4gb of ram. So It should all be in ram.

Quote:Can you post the grammar, and perhaps what the file looks like?

As per my OP, grammar is:

*space_p >> real_p[push_back_a(items)] >> *(+space_p >> real_p[push_back_a(items)]) >> *space_p

The files look like
12.314 232.434 12341.3 ...
1234.532
.
.
.

No fixed tabs, spaes in between numbers
Advertisement
What compiler are you using? Which optimization settings are you using?
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:What compiler are you using?

VS 2005 & vs 2008

So I went to my xp box and profiled and ended up with this:

Count   %w/children time(us)			    void ArcASCIIRaster::threaded_parse::operator()(const tbb::blocked_range<unsigned int>& r) const	6	0.0	2.3	    {	6	0.0	1.6	        std::string* lines = m_lines;	6	0.0	1.6	        bool* failure = m_failure;	6	0.0	1.5	        std::string* msg = m_msg;	6	0.0	1.6	        *failure = false;								        //contains the parsed numbers	6	0.0	2.2	        std::vector<float> items;	6	0.0	663.3	        items.reserve(m_cols);	6	0.0	1.9	        for(unsigned int row = r.begin(); row < r.end(); row++)				        {												            //current col	5,866	0.0	1,492.7	            unsigned int col = 0;								            parse(m_lines[row].begin(),m_lines[row].end(),				                //BEGIN GRAMMAR				                *space_p >> real_p[push_back_a(items)] >> *(+space_p >> real_p[push_back_a(items)]) >> *space_p				                //END GRAMMAR	5,866	10.2	186,285,532.9	                );												            //report an error	5,866	0.0	3,402.6	            if(items.size() != m_cols)				            {				                *failure = true;				                *msg = "Not enough columns at line " + boost::lexical_cast<std::string>(row+1);				                return;				            }								            //need to chance to tbb:vector if we want to paralize this	5,866	0.0	1,591.4	            for(unsigned int col = 0; col < items.size(); col++)				            {								                try	48,435,562	0.7	12,385,339.0	                {	48,435,562	88.2	1,612,247,535.9	                    m_array[row*m_cols+col] = boost::lexical_cast<float>(items[col]);				                }				                catch(boost::bad_lexical_cast e)				                {				                    *failure = true;				                    *msg = "Failure to cast item to float";				                    return;				                }				                catch(...)				                {				                    *failure = true;				                    *msg = "Unkown error";				                    return;				                }								            }									5,866	0.0	8,223.5	            items.clear();	6	0.0	1.3	        }					6	0.0	279.6	    }	

Third column is in microseconds after opening a 150mb file....
Culprit seems to be this awesomely redundant code (from an earlier version that was using strings)
1,612,247,535.9 m_array[row*m_cols+col] = boost::lexical_cast<float>(items[col]);
That is a full 26.8min worth of execution time. :( Remember this is a *slow* laptop and the profiler adds a little overhead. However I think it's clear where the problem is.

However, the Boost::Spirit call is still substantial - any ideas on how to reduce this down further?

However
Quote:Original post by _Sigma
Quote:What compiler are you using?

VS 2005 & vs 2008
It looks like you found the main problem already, but you might be able to increase speed further by defining _SECURE_SCL to 0 (or you might not, depending on the optimizer, spirit's implementation, etc).
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by _Sigma
However, the Boost::Spirit call is still substantial - any ideas on how to reduce this down further?
Well, the first thing I'd try is to kick out Boost::Spirit and replace it with memory mapped files together with strtof (use the end pointer!). But still, you're fundamentally at the mercy of the quality of the floating point parser of a foreign library for a performance-critical operation. And float parsers tend to be optimized more for precision and code-size then performance.
So you should really consider hand-rolling and optimizing your own parser. The idea here is to read the digits as a normal integer, which is then adjusted by multiplying with an exponent table to get the final float. By skipping error/overflow checking and reading directly from a memory-mapped file you should be able to keep up with the drive without too much trouble.
Quote:Original post by implicitWell, the first thing I'd try is to kick out Boost::Spirit and replace it with memory mapped files together

Is this a good idea for 400mb files?
Quote:http://en.wikipedia.org/wiki/Memory-mapped_file
Depending on the circumstances, memory mapped file I/O can actually be substantially slower than standard file I/O. For example: when reading in very large files, most of the data will not be cached by the kernel, meaning many page faults will occur when reading uncached data.

Since you are doing a sequential run it should be okay. The first page miss loads the next whole page. Giving you a whole page of data to parse before the next page miss. If you were doing random access where data was more than a page apart then you would run into the problem mentioned. From the article
Quote:Since the memory-mapped file is handled internally in pages, linear file access (as seen, for example, in flat file data storage or configuration files) requires disk access only when a new page boundary is crossed, and can write larger sections of the file to disk in a single operation.
Quote:Original post by _Sigma
Quote:Original post by implicitWell, the first thing I'd try is to kick out Boost::Spirit and replace it with memory mapped files together

Is this a good idea for 400mb files?
Yes, you're doing a simple linear scan which is precisely the kind of thing memory-mapped files are optimized for. In Win32/POSIX you can even set flags to indicate that you're doing a sequential scan.
I've heard claims that it's beatable with uncached overlapped async IO under Win32 but the difference is minimal at best, and the real benefit of memory mapped files is that they're incredibly comfortable to use.

Then again you can't increase the size of the files much beyond this without risking to run out of address space on 32-bit systems, so you might have to use windowing to map a smaller part of the file at a time.
One thing I have noticed is that your code doesnt use const to full effect, your operator is defined as a const function, but you don't use const_iterators, instead favouring the vector's indexing operator. Good use of const helps the compilers optimizer out.

Just pointing out also that you are using a random access container, but I don't think you actually need random access (though realistically, I doubt a list would be faster due to dereference overhead and because you're using primitive data types, but still something to think about).

Have you tried using std::stringstream, there is a chance it could be faster than boost::lexical_cast. Your Try/Catch block is inside your loop, in C++ exception traps are relatively expensive to setup, so make sure that the compiler is optimising this so that it is outside the loops.

Computer architecture wise, I would say memory mapped file of a good size to align with your systems page size. I'm a little suprised that you "only" got a 20% speed boost from Intel's TBB, do you have a Dual Core?
"I am a donut! Ask not how many tris/batch, but rather how many batches/frame!" -- Matthias Wloka & Richard Huddy, (GDC, DirectX 9 Performance)

http://www.silvermace.com/ -- My personal website

This topic is closed to new replies.

Advertisement