Binary pattern searching

Started by
14 comments, last by JakeM 18 years, 7 months ago
Well, it's not too obvious for someone who has never heard of it since yesterday. :)

What I don't understand is why these searching algorithm docs never mention 'binary' searching.
It's always text searching. I don't want any algorithm to assume to want to search for text
only.
Advertisement
It's because text searching and binary searching can use the same algorithms. It just is more intuitive for humans to think about text searching.
Is this all of it? Short and sweet.

// Boyer-Moore-Horspool text searchingchar *search( char *pat, char *text, int n) {  int i, j, k, m, skip[MAXCHAR]; m = strlen(pat); if( m==0 ) return( text ); for( k=0; k<MAXCHAR; k++ ) skip[k] = m; for( k=0; k<m-1; k++ ) skip[pat[k]] = m-k-1; for( k=m-1; k < n; k += skip[text[k] & (MAXCHAR-1)] ) {    for( j=m-1, i=k; j>=0 && text == pat[j]; j-- ) i--;    if( j == (-1) ) return ( text+i+1 ); } return( NULL );}



As for reading it in, I can probably put that aside until I get this thing working for a binary file, fast or slow. Thanks.
Quote:Original post by JakeM
It would be nice if I could access the file like a buffer as it is. Does CreateFileMapping() have fast file access?
I'm only seeking sequentially forward, not jumping the file pointer around, file access is read only, so that should help?
I found during profiling that memory mapping files (using CreateFileMapping etc) in wundows using Visual Studio 2002 .Net Academic was around 9 times faster than using the C++ file stream classes, and around 4 times faster than the C fopen/fread/etc set of functions. Then I turned profiling off, and noticed it didn't really make a noticable difference when reading in files around 10MB in size. I also tried loading files around 700MB, but that was a different program that wasn't at all disk-bound in the first place (it ran fine even constantly reading the HD because of buffering) and it had the downside that much RAM was consumed (windows didn't page it out nearly aggressively enough so it actually slowed other parts of the program down slightly)

You could try it though. For great justice, you can use my file map class if you'd like:
/*Memory Mapped File ClassCopyright (c) 2005 ExtrariusPermission is hereby granted, free of charge, to any person obtaining a copy ofthis software and associated documentation files (the "Software"), to deal inthe Software without restriction, including without limitation the rights touse, copy, modify, merge, publish, distribute, sublicense, and/or sell copiesof the Software, and to permit persons to whom the Software is furnished to doso, subject to the following conditions:The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software.THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR INCONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.*/#include <windows.h>class FileMap{    HANDLE m_File;    HANDLE m_Mapping;    unsigned int m_FileLength;    void *m_Data;public:    FileMap(void): m_File(NULL), m_Mapping(NULL), m_Data(NULL), m_FileLength(0) {}    ~FileMap(void)    {        Close();    }    bool Open(const wchar_t *p_Filename)    {        Close();        m_File = CreateFile(p_Filename, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);        if(m_File != INVALID_HANDLE_VALUE)        {            m_Mapping = CreateFileMapping(m_File, NULL, PAGE_READONLY, 0, 0, NULL);            if(m_Mapping != NULL)            {                m_Data = MapViewOfFile(m_Mapping, FILE_MAP_READ, 0, 0, 0);                if(m_Data != NULL)                {                    m_FileLength = GetFileSize(m_File, NULL);                    return true;                }            }        }        Close();        return false;    }    void Close(void)    {        if(m_Data != NULL)        {            UnmapViewOfFile(m_Data);            m_Data = NULL;        }        if(m_Mapping != NULL)        {            CloseHandle(m_Mapping);            m_Mapping = NULL;        }        if(m_File != NULL)        {            CloseHandle(m_File);            m_File = NULL;        }        m_FileLength = 0;    }    unsigned char *GetPointer(void)    {        return reinterpret_cast<unsigned char *>(m_Data);    }    unsigned int GetLength(void)    {        return m_FileLength;    }};
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Thanks, I'll try to do something with it. :)

But, this is what I don't fully understand. Should I pass the entire file buffer to this searching function?
Or should I pass small chunks to it? Either way, the searching function only returns the first match found.
And if I pass small chunks, there's the possibility that my pattern could span between two
chunks and be missed entirely.
Wow, I just searched a 100MB+ file in less than a second, and picked off a couple hundred
matches. Blazing hot damn fast. Certainly better than my first attempt almost an agonizing 5 minutes.
It matched the output perfectly.

This topic is closed to new replies.

Advertisement