Jump to content
  • Advertisement
Sign in to follow this  
csharpguru08

[C#] WebBrowser changes strings by adding more junk?!?!?!

This topic is 3663 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

EDIT: btw this is my first post here so I think I did it correctly? Ok so I have these two strings right? One for the document of the whole html page and one for the table (which is located in the html page). I have other reasons for wanting to do this but to keep this example simple lets assume I'm only going to be replacing the table in the document with <?php ?> tags surrounding it. But I can't do that. B/c for w/e reason this code:
string wDoc = gl_wb.DocumentText;
string old = tblCol.OuterHtml;





Outputs two slightly different strings by the document adding in it's own extra spacing and \r\n commands. How do I stop this maddness? This is the output:
\r\n<TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" needsContainer=\"true\"><TBODY>\r\n<TR>\r\n<TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE>





<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">\r\n<HTML><HEAD><TITLE></TITLE>\r\n<META http-equiv=Content-Type content=\"text/html; charset=utf-8\">\r\n<META content=\"MSHTML 6.00.6001.18063\" name=GENERATOR></HEAD>\r\n<BODY>\r\n<TABLE id=tbl1 width=\"100%\" Name=\"tbl1\" needsContainer=\"true\">\r\n  <TBODY>\r\n  <TR>\r\n    <TD class=style2 \r\n    style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <IMG \r\n      src=\"..\\webLib\\icon.png\"></TD>\r\n    <TD \r\n    style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">\r\n<TABLE class=style1 \r\n  style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" \r\n      Name=\"tbl2\" needsContainer=\"true\">\r\n        <TBODY>\r\n        <TR>\r\n          <TD \r\n          style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name \r\n            Here </TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></BODY></HTML>\r\n





One of the things to note here if you look at the table (tbl2) in the document string, it's added it's own spacing and \r stuff magically...why? [Edited by - csharpguru08 on June 13, 2008 7:06:49 PM]

Share this post


Link to post
Share on other sites
Advertisement
Hi,

Well \r\n is the line end character, marking the end of each line of the file it was read from, and it looks like the spacing is coming from indentation in the source file.

It shouldn't be too much effort to loop through the string and remove the unwanted characters.

Share this post


Link to post
Share on other sites
That's what I was thinking. But I use the .Replace(@"\r, ""), .Replace(@"\n", "") commands and they just don't seem to work...that's what's weird. See this whole thing doesn't make sense. tblCol is from an HtmlElementCollection. It contains a list of all tables in the document. I simply store the OuterHtml in a string. So the code should be identical to the whole document right? I mean the table code should be the same? nope, the spacing is all different, the \r\n just get randomly added without my consent. That's what I want to stop. It makes no sense. They should all be the same.

Share this post


Link to post
Share on other sites
So I guess the real question is. Can I, with this problem somehow keep the spacing and returns someway and find a way to stop the document code from somehow changing? Or do I have to just take out all formatting for the lesser of two evils?

Share this post


Link to post
Share on other sites
Quote:
Original post by csharpguru08
That's what I was thinking. But I use the .Replace(@"\r, ""), .Replace(@"\n", "") commands and they just don't seem to work...that's what's weird. See this whole thing doesn't make sense. tblCol is from an HtmlElementCollection. It contains a list of all tables in the document. I simply store the OuterHtml in a string. So the code should be identical to the whole document right? I mean the table code should be the same? nope, the spacing is all different, the \r\n just get randomly added without my consent. That's what I want to stop. It makes no sense. They should all be the same.


My guess is that you are forgetting that the string type is immutable, and that calling .Replace without respect to its return value will do absolutely nothing.

For example:

string test = "Hello";

test.Replace("l", "m");
Console.WriteLine(test); // prints out 'Hello'

test = test.Replace("l", "m");
Console.WriteLine(test); // prints out 'Hemmo'

Share this post


Link to post
Share on other sites
Yes that was part of it. But it still doesn't stop the Document object to just dumping in spaces the table doesn't have. I can't take out spacing completely. That'd ruin the document. Here's the new string comparison (spacing is the last problem to solve :( )

Document String:

"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"><HTML><HEAD><TITLE></TITLE><META http-equiv=Content-Type content=\"text/html; charset=utf-8\"><META content=\"MSHTML 6.00.6001.18063\" name=GENERATOR></HEAD><BODY><TABLE id=tbl1 width=\"100%\" Name=\"tbl1\" needsContainer=\"true\"> <TBODY> <TR> <TD class=style2 style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <IMG src=\"..\\webLib\\icon.png\"></TD> <TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\"> <TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" DRSQuery=\"Default Query\" needsContainer=\"true\"> <TBODY> <TR> <TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></BODY></HTML>"




Table string:

"<TABLE class=style1 style=\"BORDER-RIGHT: #ff0000 1px dotted; BORDER-TOP: #ff0000 1px dotted; BORDER-LEFT: #ff0000 1px dotted; BORDER-BOTTOM: #ff0000 1px dotted\" Name=\"tbl2\" DRSQuery=\"Default Query\" needsContainer=\"true\"><TBODY><TR><TD style=\"BORDER-RIGHT: #2a2a2a 1px dotted; BORDER-TOP: #2a2a2a 1px dotted; BORDER-LEFT: #2a2a2a 1px dotted; BORDER-BOTTOM: #2a2a2a 1px dotted\">Name Here </TD></TR></TBODY></TABLE>"




Can someone show me a regex expression that'll select the table based on part of the content? like if it contains a table tag with a name of tbl2 or something? That'd be so helpful I'd be really appreciative. I'm looking at Regex right now but it's kind of confusing when you get into complicated things like this.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!