Sign in to follow this  
Thevenin

HTTP Get/Post Parsing and Unicode Environment Variables.

Recommended Posts

Thevenin    270
As a test drive for the new C# programming language, I'm currently porting my thuasand line procedural C website code over to C#. I'm about midway porting my HTTP decryptor for CGI support when it suddenly arose in my head.. are the System.Globalization.Environment.GetEnvironmentVariable functions returning unicode strings? If so, than wouldln't CGI be passing %2A3F instead of just %2B? I'm a little confused. =/ Here is my code so far (Be warned, I'm very new to C#).
    /* 
     * Name: fRetrieveHTTPVariable
     * Programmer: XXXXXXXXXXXXXXXX
     * Version: November 28, 2005.
     * Description: This function returns the unencrypted string variable from HTTP markup encryption.
     *              Note that the key must be between A - Z (Yes, it IS case sensitive).
    */
    public string fRetrieveHTTPVariable(char TheKey, string TheEncodedMarkup,
                                        bool TheAllowReturns, bool TheAllowSpaces, 
                                        bool TheAllowPunctuation)
    {
        string TheReturnString = null;

	    /* The first step is to make sure the key and the encoded markup were valid inputs. */
	    if(TheEncodedMarkup == null || TheEncodedMarkup.Length <= 3 || TheKey < 'A' || TheKey > 'Z')
    		return null;

        /* Ok, lets first see if this string contains the key. */
        int TheStartLocationOfKey = TheEncodedMarkup.LastIndexOf(TheKey + "=");
        if(TheStartLocationOfKey == -1)
            return null;

        /* Ok, lets now find out if it ends via a hard return or another ampersand. */
        int TheEndLocationOfKey = TheEncodedMarkup.LastIndexOf("&",TheStartLocationOfKey);
        
        if (TheEndLocationOfKey == -1)
        {
            /* This means that the value of this key is the remainder of the string... */
            TheReturnString = TheEncodedMarkup.Substring(TheStartLocationOfKey);
        }
        else
        {
            /* This means that the value of this key is between two regions. */
            TheReturnString = TheEncodedMarkup.Substring(TheStartLocationOfKey, 
                                                         TheEndLocationOfKey-TheStartLocationOfKey);
        }
 
        /* Before determine wheter the string contains valid characters or not, it must be parsed. */
        
        /* Step (1): Create a character array with all the values in it.. */
        char[] TheCharArray = TheReturnString.ToCharArray();

        /* Step (2): Empty the old string so that we can reuse its variable name. */
        TheReturnString = null;

        /* Step (3): Loop through the character array and convert the hexadecimal encyrypted pieces 
           to a standard string. */
        for (int TheOffset = 0; TheOffset < TheCharArray.Length; TheOffset++)
        {
            if (TheCharArray[TheOffset] == '%' && TheOffset < TheCharArray.Length-2)
            {
                /* If there is a special character, than the code needs to parse it. */
                byte.Parse(TheEncodedMarkup.Substring(TheOffset, 2), NumberStyles.AllowHexSpecifier);
            }
            else
            {
                /* If its just a normal character, than we just append it to the string. */
                String.Concat(TheReturnString, TheCharArray[TheOffset]);
            }

        }



        return null;


Share this post


Link to post
Share on other sites
Thevenin    270
Quote:
Original post by Thevenin
If so, than wouldln't CGI be passing %2A3F instead of just %2B? I'm a little confused. =/

Here is my code so far (Be warned, I'm very new to C#).

*** Source Snippet Removed ***


In theory it should, however, Unicode is still in development; thats why CGI still passes only single byte characters. Keep in mind however, that the first 128? characters are the same in both unicode and ascii, so conversions between the two should not be difficult.

btw.. for the first time coding in C#, you've got some really good code. With my help, it could be better [razz]. The code I've posted below fixes one of your string syntax problems; the String.concat returns a string. After these changes your function should perform its task quite smoothly (That leaves just programming in the detection for spaces and such, but thats easy and we both know it [rolleyes]).

        /* Step (1): Create a character array with all the values in it and than make a backup string... */
char[] TheCharArray = TheReturnString.ToCharArray();
string TheBufferedString = TheReturnString;

/* Step (2): Empty the old string so that we can reuse its variable name. */
TheReturnString = null;

/* Step (3): Loop through the character array and convert the hexadecimal encyrypted pieces
to a standard string. */

char TheConvertedCharacter;
for (int TheOffset = 0; TheOffset < TheCharArray.Length; TheOffset++)
{
if (TheCharArray[TheOffset] == '%' && TheOffset <= TheCharArray.Length-2)
{
/* If there is a special character, than the code needs to parse it. */
TheConvertedCharacter = (char)byte.Parse(TheBufferedString.Substring(TheOffset+1, 2), NumberStyles.AllowHexSpecifier);
TheReturnString = String.Concat(TheReturnString, TheConvertedCharacter);
TheOffset+=2;
}
else
{
TheReturnString = String.Concat(TheReturnString, TheCharArray[TheOffset]);
}

}




Keep up the good work Thevenin!

Share this post


Link to post
Share on other sites
chollida1    532
Quote:
Original post by Thevenin

In theory it should, however, Unicode is still in development; thats why CGI still passes only single byte characters. Keep in mind however, that the first 128? characters are the same in both unicode and ascii, so conversions between the two should not be difficult.


I think you probably know this but their not identical. If your unicode encoding is 16bit then they arent' the same. there would be a 00 in the second byte. If your using utf8 then your right. if your using UCS32 then the first byte is the same and the other 3 are 00.

Cheers
Chris

Share this post


Link to post
Share on other sites
Thevenin    270
Quote:
Original post by chollida1
I think you probably know this but their not identical. If your unicode encoding is 16bit then they arent' the same. there would be a 00 in the second byte. If your using utf8 then your right. if your using UCS32 then the first byte is the same and the other 3 are 00.

Cheers
Chris


Thats very odd.

I went to www.unicode.org to verify it, and, well, holy crap is that site soo full of fluff; even their techinical details of what unicode strings are is fluff.

<rant>
I would have been happier if Microsoft invented their own international string code (Called Microcode, ahuahuahh).
</rant>


I believe the padding should go before the character (The opposite of what you desribed).

I'm currently searching for a good techinical article on the matter.

Edit: So VS2005 uses UTF-8 right? Are there any instances where I'll need my code to work with UTF-16/32?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Quote:
Original post by Thevenin
I believe the padding should go before the character (The opposite of what you desribed).


Actually it depends of the endianness ;-)
If I'm not mistaken UCS2 and UCS4 text files have a BOM (Binary Order Mark) at the beginning that tells which kind of endianness it uses.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this