Jump to content
  • Advertisement
Sign in to follow this  
Sander

(perl-compatible) Regex-fu wanted

This topic is 4070 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I need some help on a particular (perl-compatible) regular expression. Say, I want to match question marks in an SQL query, e.g: "SELECT * FROM TABLE WHERE field1=?". However, it should only match question marks that are not in a string. Worse: it needs to account for escaped quotes as well. Examples: SELECT * FROM table WHERE field1='hello?' AND field2=? should only match the ? from field2 SELECT * FROM table WHERE field1='where\'s john?' AND field2=? should only match the ? from field2 I'm reasonably well versed with PCRE's but this is beyond me. Thanks in advance!

Share this post


Link to post
Share on other sites
Advertisement
I would say that such a question mark is preceded by any number of characters other than quotes and question marks, and any number of quoted strings.

^(([^'?]|('([^\']|\.)*')(?))*


I'm on a rush, so I can't do the escaping for you, so I've underlined the characters, which should be escaped if applicable. Match parenthesis 5 for the mark.

Share this post


Link to post
Share on other sites
If all you need to do is detect whether there are question marks, it'll be much easier to just strip out quoted-strings, and then look for question marks in the result. I don't really see how capturing the found question marks is any use, because each match will just be, well, a question mark. If you need to *replace* the question marks, then I think you're in a whole other level of hell :)

Share this post


Link to post
Share on other sites
Trust me, I am in a whole other level of hell :-)

I am building a new database abstraction layer in PHP and I have been inspired by the Python DB API which I find pleasurable to work with. The DBAL that I built so far can do things like:


$db->query('SELECT * FROM table WHERE field1=? AND field2=?', $value1, $value2);
// result: SELECT * FROM table WHERE field1='value1' AND field2='value2'

$ary = array(
array(11, 12),
array(21, 22),
array(31, 32)
);
$db->('SELECT * FROM table WHERE field1=? AND field2=?', $ary);
// results in 3 SELECT queries

$db->('INSERT INTO table (field1, field2') VALUES (?, ?)', $ary);
//result: INSERT INTO table (field1, field2) VALUES (11, 12), (21, 22), (31, 32)

// But also named variables
$ary = array(
'name' => "John's",
'location' => 'Denver',
'password' => 'asdf'
);
$db->query('INSERT INTO table (field1, field2) VALUES ({name}, {password})', $ary);
//result: INSERT INTO table (field1, field2) VALUES ('John\'s', 'asdf')


My problem ofcourse: I don't want to match ?'s or {var}'s inside quoted strings in the query.

@ToohrVyk: I played around with your regex but I could not make it work for me. I assume that this but is supposed to match the quoted strings:

('([^\']|\.)*')


But it doesn't. Perhaps replacing quoted strings with placeholders and then match the arguments is indeed the easiest solution.

Share this post


Link to post
Share on other sites
Instead of doing this with regular expressions, why not write a little parser? Shouldn't be too complicated and the final product will be simple maintable code instead of a nightmare of regexes.

Share this post


Link to post
Share on other sites
Like some one mentioned before me, the simplest solution would be to not use regex for the ? matching section.

The following is something i whooped up real fast, and works in all my test cases
<?php 
////// different test queries i used.
//$query = "SELECT * FROM table WHERE field1='where\'s john?' AND field2=?";
//$query = "SELECT * FROM table WHERE field1=? AND field2='where\'s john?'";
$query = "INSERT INTO table (field1, field2) VALUES ('ra\'wr?', ?)";


$flag = FALSE; // used to determine if were in single quotes
$peices = explode("'", $query);
foreach($peices as $key => $peice)
{
if ($flag)
{
if (substr($peice, -1) != "\\") $flag = FALSE;
continue;
}
// Definetely outside of single quotes here
// apply replacements here via preg_replace()
// or some other replacement method.
$peices[$key] = str_replace("?", "'[Thetan]'", $peice); // just for shits and giggles
$flag = TRUE;
}
$query = implode("'", $peices); // make query the glue'd together pieces of $peices
echo "{$query}\n"; // just testing results ;)
?>


of course you can wrap it in a clean little easy to use function.

[Edited by - Thetan on April 22, 2007 12:46:26 PM]

Share this post


Link to post
Share on other sites
Isn't whatever you're using to talk to your DB supposed to support bind variables? I recommend you look into that. By manually doing it with a regex, you're opening your code to the possibility of SQL injection.

Share this post


Link to post
Share on other sites
@Replicon: I am building the very thing that is talking to my database. There is no risk of SQL injection because all the variables I parse in are automatically escaped using PHP's mysql_escape_real_string() function before they are put into the SQL query. Currently I am using something like:

$query = preg_replace('/(\{\w+\})/e', '\$this->quote(\$args[\\1]);', $query);


Where the $this->quote() function checks the variable type and if it's a string, it escapes it and puts it in quotes.

@Thetan and Monder: Thanks for the advice. I will do some profiling today and see what's better. I'll need to modify that code so it can deal with both single quoted 'strings' as double quoted "strings" though.

Share this post


Link to post
Share on other sites
I did some profiling and the difference is negligible. There's a less than 0.01s speedup over 100.000 queries from doing quoted string placeholder replacement with PCRE's as opposed to exploding the string and doing it manually. I've stuck with the PCRE as they give me the benefit of handling single and double quoted strings transparently. The PCRE I use for that comes from http://badassery.blogspot.com/2007/02/regex-examples-in-depth-advanced-quoted.html and is:

(["'])(?:\\\1|.)*?\1


Thanks all!

Share this post


Link to post
Share on other sites
Quote:
Original post by Sander
The PCRE I use for that comes from http://badassery.blogspot.com/2007/02/regex-examples-in-depth-advanced-quoted.html and is:

(["'])(?:\\\1|.)*?\1


Hey, I found this via referrer logs. Note that my regex was designed more to demonstrate a technique than for practical use within programming, so there is a significant issue it didn't account for. That is, it doesn't account for escaped forward slashes (i.e., it treats "\\'" as a forward slash followed by an escaped single quote, rather than an escaped forward slash followed by an unescaped single quote, as I imagine you would want it to). However, that's easy to address. Just add a forward slash followed by a forward slash as the first option within the alternation, like so:

(["'])(?:\\\\|\\\1|.)*?\1

Now, it will match values enclosed in either double or single quotes, while requiring that the same quote type start and end the match, allowing for inner, escaped quotes of the same type as the enclosure, *and* properly accounting for escaped forward slashes.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!