paypal $11 to anyone who can create a regex formula

Started by
18 comments, last by Oluseyi 15 years, 10 months ago
I need a formula to solve this problem: http://www.gamedev.net/community/forums/topic.asp?topic_id=497739 As I know time is money and if anyone could please create a regex formula for me I'd paypal $11. Why 11? that way it's like you're getting $10 after the paypal fees kick in. Basically what I need is this. I need to be able to go through a string, and detect a whole table tag and all of it's contents regaurdless of spacing. So if there is a tag of

<table  name="table2" > <TBODY> ... </TBODY> </table>





or

<table name="table2"><TBODY> ... </TBODY></table>





It'll find it in the string I point to. Is this possible? If it is the first person to come up with it that works I'll pay them through paypal. EDIT to be more specific the table content will be dynamic. So it'll need to account for whatever string I give it. It's for a string comparison. I'll have table code I want to know if it's in the document string I give. Thanks guys!
Advertisement
Why don't you just do it yourself?
<table\s+name="table2">\s*<TBODY>\s*(.+?)\s*</TBODY>\s*</table>

seems to do the trick.

Make it multiline, or you'll most likely have problems with the (.+?) part (use the "M" modifier).

EDIT :
Alternatively, you can try
EDIT : Alternatively, you can try :
<table[^>]*>\s*<TBODY[^>]*>\s*(.+?)\s*</TBODY>\s*</table>


[Edited by - Trillian on June 14, 2008 8:00:37 PM]
wait how do I make it search for a multiline? I now how to do it with the Perl-syntax of /s but how does it work for C#? Do you want the $11? Let me test this really quick. I'll happily paypal you.
Quote:Original post by fpsgamer
Why don't you just do it yourself?


lol I didn't even know of that. Oh well I promised the money, I keep my word :)
if(Regex.IsMatch("<table\s+name=\"tbl2\">\s*<TBODY>\s*(.+?)\s*</TBODY>\s*</table>", wDoc))

all the \s comes up as an invaild escape sequence...?

EDIT: even with the @ sign it comes up with invalid escape sequences.
I just visited http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx, and it appears that you actually want the Singleline mode. Yeah I know, it's kind of weird. But the description says it all.

So the code would look like:

using System.Text.RegularExpressions;Regex regex = new Regex(@"<table[^>]*>\s*<TBODY[^>]*>\s*(.+?)\s*</TBODY>\s*</table>", RegexOptions.Singleline);


...then you can use regex.Match("Your source string") or regex.Matches, just check the documentation.
id get why that isn't working either. Here is my code:
old = old.Replace("\r", "");                                old = old.Replace("\n", "");                                wDoc = wDoc.Replace("\r", "");                                wDoc = wDoc.Replace("\n", "");                                wDoc = Regex.Replace(wDoc, " +", " ");                                old = Regex.Replace(old, " +", " ");                                Regex regex = new Regex(@"<table[^>]*>\s*<TBODY[^>]*>\s*(.+?)\s*</TBODY>\s*</table>", RegexOptions.Multiline);                                if(regex.IsMatch(wDoc))                                {                                    System.Diagnostics.Debug.WriteLine("----------------------------------\n\nWORKED\n\n--------------------------");                                }


And here is the table string:
"<TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" needsContainer=\"true\"><TBODY><TR><TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE>"


And this is the document string, I need to find the table above in this document string:
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"><HTML><HEAD><TITLE></TITLE><META http-equiv=Content-Type content=\"text/html; charset=utf-8\"><META content=\"MSHTML 6.00.6001.18063\" name=GENERATOR></HEAD><BODY><TABLE id=tbl1 width=\"100%\" Name=\"tbl1\" needsContainer=\"true\"> <TBODY> <TR> <TD class=style2 style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <IMG src=\"..\\webLib\\icon.png\"></TD> <TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" needsContainer=\"true\"> <TBODY> <TR> <TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></BODY></HTML>"

Anything I'm doing wrong?
I believe the original regular expression was incorrect. I personally don't use C#, but from what I can find it seems that, like most regular expression implementations, matching defaults to "greedy" mode.

In greedy mode the expressions <table[^\>]*> would probably match the whole string (searching until it finds the LAST > character, instead of the next as intended), which is probably your problem.

Unfortunately I can't find anything to indicate there's a global way to turn greediness on or off for an expression (can anyone with more C# experience confirm/deny this?), so the best option seems to be to turn off greedy matching for each character class in Trillian's expression:

\<table[^\>]*?\>\s*\<TBODY[^\>]*?\>\s*(.+?)\s*\</TBODY\>\s*\</table\>


I also escaped all the left and right brackets, since <word> seems to be syntax for describing a variable name for matched text in C# regular expressions. This may be unnecessary outside of parenthesis, so feel free to remove those backslashes.

I'll be the first to admit I'm shooting in the dark here, but I hope that helps a little. [smile]
The regex doesn't take into account nested types and you cannot do recursive algorithms using regexes. While I haven't tested it in C#, the regex worked with all the other engines I've tried it with (&#106avascript & php). Also note that the singleline option is useless if you've already taken the \r\n off.

That's pretty much all I could do for you. I don't need your 11$, but you could use them to buy a book on regexes, or give it to charity if you have nothing to do with them =).

This topic is closed to new replies.

Advertisement