# HTTP Get/Post Parsing and Unicode Environment Variables.

## Recommended Posts

As a test drive for the new C# programming language, I'm currently porting my thuasand line procedural C website code over to C#. I'm about midway porting my HTTP decryptor for CGI support when it suddenly arose in my head.. are the System.Globalization.Environment.GetEnvironmentVariable functions returning unicode strings? If so, than wouldln't CGI be passing %2A3F instead of just %2B? I'm a little confused. =/ Here is my code so far (Be warned, I'm very new to C#).
    /*
* Name: fRetrieveHTTPVariable
* Programmer: XXXXXXXXXXXXXXXX
* Version: November 28, 2005.
* Description: This function returns the unencrypted string variable from HTTP markup encryption.
*              Note that the key must be between A - Z (Yes, it IS case sensitive).
*/
public string fRetrieveHTTPVariable(char TheKey, string TheEncodedMarkup,
bool TheAllowReturns, bool TheAllowSpaces,
bool TheAllowPunctuation)
{
string TheReturnString = null;

/* The first step is to make sure the key and the encoded markup were valid inputs. */
if(TheEncodedMarkup == null || TheEncodedMarkup.Length <= 3 || TheKey < 'A' || TheKey > 'Z')
return null;

/* Ok, lets first see if this string contains the key. */
int TheStartLocationOfKey = TheEncodedMarkup.LastIndexOf(TheKey + "=");
if(TheStartLocationOfKey == -1)
return null;

/* Ok, lets now find out if it ends via a hard return or another ampersand. */
int TheEndLocationOfKey = TheEncodedMarkup.LastIndexOf("&",TheStartLocationOfKey);

if (TheEndLocationOfKey == -1)
{
/* This means that the value of this key is the remainder of the string... */
TheReturnString = TheEncodedMarkup.Substring(TheStartLocationOfKey);
}
else
{
/* This means that the value of this key is between two regions. */
TheReturnString = TheEncodedMarkup.Substring(TheStartLocationOfKey,
TheEndLocationOfKey-TheStartLocationOfKey);
}

/* Before determine wheter the string contains valid characters or not, it must be parsed. */

/* Step (1): Create a character array with all the values in it.. */
char[] TheCharArray = TheReturnString.ToCharArray();

/* Step (2): Empty the old string so that we can reuse its variable name. */
TheReturnString = null;

/* Step (3): Loop through the character array and convert the hexadecimal encyrypted pieces
to a standard string. */
for (int TheOffset = 0; TheOffset < TheCharArray.Length; TheOffset++)
{
if (TheCharArray[TheOffset] == '%' && TheOffset < TheCharArray.Length-2)
{
/* If there is a special character, than the code needs to parse it. */
byte.Parse(TheEncodedMarkup.Substring(TheOffset, 2), NumberStyles.AllowHexSpecifier);
}
else
{
/* If its just a normal character, than we just append it to the string. */
String.Concat(TheReturnString, TheCharArray[TheOffset]);
}

}

return null;



##### Share on other sites
Quote:
 Original post by TheveninIf so, than wouldln't CGI be passing %2A3F instead of just %2B? I'm a little confused. =/Here is my code so far (Be warned, I'm very new to C#).*** Source Snippet Removed ***

In theory it should, however, Unicode is still in development; thats why CGI still passes only single byte characters. Keep in mind however, that the first 128? characters are the same in both unicode and ascii, so conversions between the two should not be difficult.

btw.. for the first time coding in C#, you've got some really good code. With my help, it could be better [razz]. The code I've posted below fixes one of your string syntax problems; the String.concat returns a string. After these changes your function should perform its task quite smoothly (That leaves just programming in the detection for spaces and such, but thats easy and we both know it [rolleyes]).

        /* Step (1): Create a character array with all the values in it and than make a backup string... */        char[] TheCharArray = TheReturnString.ToCharArray();        string TheBufferedString = TheReturnString;        /* Step (2): Empty the old string so that we can reuse its variable name. */        TheReturnString = null;        /* Step (3): Loop through the character array and convert the hexadecimal encyrypted pieces            to a standard string. */        char TheConvertedCharacter;        for (int TheOffset = 0; TheOffset < TheCharArray.Length; TheOffset++)        {            if (TheCharArray[TheOffset] == '%' && TheOffset <= TheCharArray.Length-2)            {                /* If there is a special character, than the code needs to parse it. */                TheConvertedCharacter = (char)byte.Parse(TheBufferedString.Substring(TheOffset+1, 2), NumberStyles.AllowHexSpecifier);                TheReturnString = String.Concat(TheReturnString, TheConvertedCharacter);                TheOffset+=2;            }            else            {                    TheReturnString = String.Concat(TheReturnString, TheCharArray[TheOffset]);            }        }

Keep up the good work Thevenin!

##### Share on other sites
Quote:
 Original post by TheveninIn theory it should, however, Unicode is still in development; thats why CGI still passes only single byte characters. Keep in mind however, that the first 128? characters are the same in both unicode and ascii, so conversions between the two should not be difficult.

I think you probably know this but their not identical. If your unicode encoding is 16bit then they arent' the same. there would be a 00 in the second byte. If your using utf8 then your right. if your using UCS32 then the first byte is the same and the other 3 are 00.

Cheers
Chris

##### Share on other sites
Quote:
 Original post by chollida1I think you probably know this but their not identical. If your unicode encoding is 16bit then they arent' the same. there would be a 00 in the second byte. If your using utf8 then your right. if your using UCS32 then the first byte is the same and the other 3 are 00.CheersChris

Thats very odd.

I went to www.unicode.org to verify it, and, well, holy crap is that site soo full of fluff; even their techinical details of what unicode strings are is fluff.

<rant>
I would have been happier if Microsoft invented their own international string code (Called Microcode, ahuahuahh).
</rant>

I believe the padding should go before the character (The opposite of what you desribed).

I'm currently searching for a good techinical article on the matter.

Edit: So VS2005 uses UTF-8 right? Are there any instances where I'll need my code to work with UTF-16/32?

##### Share on other sites
Quote:
 Original post by TheveninI believe the padding should go before the character (The opposite of what you desribed).

Actually it depends of the endianness ;-)
If I'm not mistaken UCS2 and UCS4 text files have a BOM (Binary Order Mark) at the beginning that tells which kind of endianness it uses.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
627719
• Total Posts
2978790

• 9
• 21
• 14
• 12
• 42