Jump to content
  • Advertisement
Sign in to follow this  
csharpguru08

Unity paypal $11 to anyone who can create a regex formula

This topic is 3834 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I need a formula to solve this problem: http://www.gamedev.net/community/forums/topic.asp?topic_id=497739 As I know time is money and if anyone could please create a regex formula for me I'd paypal $11. Why 11? that way it's like you're getting $10 after the paypal fees kick in. Basically what I need is this. I need to be able to go through a string, and detect a whole table tag and all of it's contents regaurdless of spacing. So if there is a tag of
<table  name="table2" > <TBODY> ... </TBODY> </table>





or
<table name="table2"><TBODY> ... </TBODY></table>





It'll find it in the string I point to. Is this possible? If it is the first person to come up with it that works I'll pay them through paypal. EDIT to be more specific the table content will be dynamic. So it'll need to account for whatever string I give it. It's for a string comparison. I'll have table code I want to know if it's in the document string I give. Thanks guys!

Share this post


Link to post
Share on other sites
Advertisement
<table\s+name="table2">\s*<TBODY>\s*(.+?)\s*</TBODY>\s*</table>

seems to do the trick.

Make it multiline, or you'll most likely have problems with the (.+?) part (use the "M" modifier).

EDIT :
Alternatively, you can try
EDIT : Alternatively, you can try :
<table[^>]*>\s*<TBODY[^>]*>\s*(.+?)\s*</TBODY>\s*</table>


[Edited by - Trillian on June 14, 2008 8:00:37 PM]

Share this post


Link to post
Share on other sites
wait how do I make it search for a multiline? I now how to do it with the Perl-syntax of /s but how does it work for C#? Do you want the $11? Let me test this really quick. I'll happily paypal you.

Share this post


Link to post
Share on other sites
if(Regex.IsMatch("<table\s+name=\"tbl2\">\s*<TBODY>\s*(.+?)\s*</TBODY>\s*</table>", wDoc))

all the \s comes up as an invaild escape sequence...?

EDIT: even with the @ sign it comes up with invalid escape sequences.

Share this post


Link to post
Share on other sites
I just visited http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx, and it appears that you actually want the Singleline mode. Yeah I know, it's kind of weird. But the description says it all.

So the code would look like:

using System.Text.RegularExpressions;
Regex regex = new Regex(@"<table[^>]*>\s*<TBODY[^>]*>\s*(.+?)\s*</TBODY>\s*</table>", RegexOptions.Singleline);



...then you can use regex.Match("Your source string") or regex.Matches, just check the documentation.

Share this post


Link to post
Share on other sites
id get why that isn't working either. Here is my code:

old = old.Replace("\r", "");
old = old.Replace("\n", "");
wDoc = wDoc.Replace("\r", "");
wDoc = wDoc.Replace("\n", "");
wDoc = Regex.Replace(wDoc, " +", " ");
old = Regex.Replace(old, " +", " ");
Regex regex = new Regex(@"<table[^>]*>\s*<TBODY[^>]*>\s*(.+?)\s*</TBODY>\s*</table>", RegexOptions.Multiline);
if(regex.IsMatch(wDoc))
{
System.Diagnostics.Debug.WriteLine("----------------------------------\n\nWORKED\n\n--------------------------");
}





And here is the table string:

"<TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" needsContainer=\"true\"><TBODY><TR><TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE>"





And this is the document string, I need to find the table above in this document string:

"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"><HTML><HEAD><TITLE></TITLE><META http-equiv=Content-Type content=\"text/html; charset=utf-8\"><META content=\"MSHTML 6.00.6001.18063\" name=GENERATOR></HEAD><BODY><TABLE id=tbl1 width=\"100%\" Name=\"tbl1\" needsContainer=\"true\"> <TBODY> <TR> <TD class=style2 style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <IMG src=\"..\\webLib\\icon.png\"></TD> <TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" needsContainer=\"true\"> <TBODY> <TR> <TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></BODY></HTML>"




Anything I'm doing wrong?

Share this post


Link to post
Share on other sites
I believe the original regular expression was incorrect. I personally don't use C#, but from what I can find it seems that, like most regular expression implementations, matching defaults to "greedy" mode.

In greedy mode the expressions <table[^\>]*> would probably match the whole string (searching until it finds the LAST > character, instead of the next as intended), which is probably your problem.

Unfortunately I can't find anything to indicate there's a global way to turn greediness on or off for an expression (can anyone with more C# experience confirm/deny this?), so the best option seems to be to turn off greedy matching for each character class in Trillian's expression:

\<table[^\>]*?\>\s*\<TBODY[^\>]*?\>\s*(.+?)\s*\</TBODY\>\s*\</table\>


I also escaped all the left and right brackets, since <word> seems to be syntax for describing a variable name for matched text in C# regular expressions. This may be unnecessary outside of parenthesis, so feel free to remove those backslashes.

I'll be the first to admit I'm shooting in the dark here, but I hope that helps a little. [smile]

Share this post


Link to post
Share on other sites
The regex doesn't take into account nested types and you cannot do recursive algorithms using regexes. While I haven't tested it in C#, the regex worked with all the other engines I've tried it with (javascript & php). Also note that the singleline option is useless if you've already taken the \r\n off.

That's pretty much all I could do for you. I don't need your 11$, but you could use them to buy a book on regexes, or give it to charity if you have nothing to do with them =).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!