Jump to content
  • Advertisement
Sign in to follow this  
Flopid

[.net] C# Improved Substring with Escape Characters

This topic is 2840 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been trying to use the substring(start, end) function on HTML code, and it doesn't work right because it seems as if this particular function doesn't count the escape characters. So I found a way to go around it. Here is the code I use to get a certain substring from HTML. Maybe this will also ignite Microsoft to put this code in their other form of substring.


public string ImprovedSubstring(string fullString, string startString, string endString)
{
string theSubstring = "";

int i1 = fullString.IndexOf(startString);

fullString = fullString.Substring(i1);

int i2 = fullString.IndexOf(endString);

theSubstring = fullString.Remove(i2 + endString.Length);

return theSubstring;
}



LoL there is a plus sign in the last Remove method, some bug on gamedev doesn't show it in preview.

Thank you and hope this helps a little with escape chars.

Share this post


Link to post
Share on other sites
Advertisement
Quote:

I've been trying to use the substring(start, end) function on HTML code, and it doesn't work right because it seems as if this particular function doesn't count the escape characters

Are you sure? Which escape characters? Have you a small example, like a unit test, that demonstrates how it fails? You know that the function is actually substring(start, length)?

Your function doesn't appear to have any error checking in the case where the substrings are not located, nor does it handle the case where i2 < i1.

Share this post


Link to post
Share on other sites
Quote:
Original post by Flopid
I've been trying to use the substring(start, end) function on HTML code, and it doesn't work right because it seems as if this particular function doesn't count the escape characters.


What do you mean? Last time I checked it handled them just fine. This is more likely to be a problem with your understanding than the standard library.

Share this post


Link to post
Share on other sites
Sorry I did not write a more error proof code, just didn't know I need to, since it works if you know what you are doing. Thought a simple code would just help people who struggle with this stuff. Also I used the substring(start, to) for just extracting the part I need from the html code, and it always gave me an error saying that index is out of range. It seems to ignore all the escape chars like \n \t \ etc.. So it basicly searches in the string that is smaller (ie without escape chars). However the Substring(start) does not ignore them. So that is why I delete after a certain point. You could try that with some website's html code and you'll see.

Share this post


Link to post
Share on other sites
You are completely incorrect. Substring does not ignore escape characters in either overload. More likely is the fact that you simply calculated the length incorrectly.

Share this post


Link to post
Share on other sites
Errrr, I am wrong... I put the index of the last element as a lenth parameter which the substring takes. Rycross was right, I dind't understand the substring completely. Thank you for correcting me.

Share this post


Link to post
Share on other sites
Quote:

Sorry I did not write a more error proof code, just didn't know I need to, since it works if you know what you are doing.

So does Substring [grin]

But seriously, learning to spot odd boundary conditions is a good habit to get into, but a vital one if you are going to write generic functions like that which will end up being used all over the place.

In general, if you think you've found a bug in something like System.String, you are almost certainly wrong. This code is independently tested in millions if not billions of lines of code all around the world. If Substring stopped working there would be a deluge of bug reports. Not to say it cannot happen, just that it is vanishingly unlikely.

If you do think you've found such a bug, you should write a minimal example which demonstrates it. Something like this:


using System;

public class Example
{
public static void Main(string [] args)
{
string html = "<html><head><title>Whatever</title></head><body>CONTENT</body></html>";
int start = html.IndexOf("<title>");
int end = html.IndexOf("</title>");
// Expected "Whatever" but is "Whatever</title></head><bod"
Console.WriteLine(html.Substring(start + "<title>".Length, end));
}
}


This immediately demonstrates that you are using the API incorrectly. In other examples, it will focus your mind on the problem and you might be able to solve it yourself. In the case where there is still an apparent bug, it gives us something to try out ourselves to confirm the behaviour diverges from the expected output.
Quote:

Errrr, I am wrong... I put the index of the last element as a lenth parameter which the substring takes. Rycross was right, I dind't understand the substring completely. Thank you for correcting me.

It was right there all along in the very first reply:
Quote:

You know that the function is actually substring(start, length)?

Share this post


Link to post
Share on other sites
Quote:
Original post by tinybronco
If the issue is \n \t - you shouldn't use those for newlines and tabs and newlines.

In the place of \n, concat Environment.NewLine
In the place of \t, contact (char)9

you can also try using ASCII code
\r \n would be 0x0D 0x0A
\t would be 9


I'm not really sure what you'd really gain by by manually inserting the actual characters in there. The compiler inserts the real values at compile time. The runtime doesn't actually parse escape sequences.

Share this post


Link to post
Share on other sites
Quote:
Original post by tinybronco
Quote:
Original post by Flimflam
Quote:
Original post by tinybronco
If the issue is \n \t - you shouldn't use those for newlines and tabs and newlines.

In the place of \n, concat Environment.NewLine
In the place of \t, contact (char)9

you can also try using ASCII code
\r \n would be 0x0D 0x0A
\t would be 9


I'm not really sure what you'd really gain by by manually inserting the actual characters in there. The compiler inserts the real values at compile time. The runtime doesn't actually parse escape sequences.


Escape sequences work only in one direction. If you want to evaluate if a string has a newline in it, you cant go "if(x.indexOf("\n") != -1)" etc. For the sake of clarity and consistency, IMHO its a better practice to always use char or Environment in the place of escape sequence that way they appear the same way any place in your code.


You are completely mistaken as well. You can most certainly check the index of escape characters in C# code:

Console.WriteLine("blah\nblah".IndexOf("\n"));



Outputs 4, as expected. I suggest that both you and the original poster read up on escape sequences; they are simply textual replacements at compile time.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mike.Popoloski
You are completely mistaken as well. You can most certainly check the index of escape characters in C# code:
*** Source Snippet Removed ***

Outputs 4, as expected. I suggest that both you and the original poster read up on escape sequences; they are simply textual replacements at compile time.

I think he meant: if you want to find the string literal \n you would need to search for \\n.

Although I don't know what he's getting at as the original issue was solved.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!