• Advertisement

Archived

This topic is now archived and is closed to further replies.

This character in this file...I can't figure out what it is

This topic is 5861 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I downloaded the official Scrabble word list to make a scrabble playing program, and there is a character separating the words, but I can''t figure out what it is. In Notepad the character shows up as one of those little boxes, and in Wordpad it acts as a newline, with each word on it''s own line. So I wrote a little C++ program to tell me what the ASCII value of the character was, and my program doesn''t even recognize it. It acts like it''s not even in the file. It only reads the alphabetic characters of the words. Here''s my program:
  
#include <iostream>
#include <fstream>

using namespace std;

int main(int argc, char * argv[]) {

	char	input;
	int		limit = 100;
	int		n = 0;

	ifstream	infile("ospd.txt");

	for(n = 0; n < 20; n++) {
		infile >> input;
		cout << input << " (" << (int)input << ")" << endl;
	}

	return 0;
}  
and the output...
a (97)
a (97)
a (97)
a (97)
h (104)
a (97)
a (97)
h (104)
e (101)
d (100)
a (97)
a (97)
h (104)
i (105)
n (110)
g (103)
a (97)
a (97)
h (104)
s (115) 
and the text from the file...
aa
aah
aahed
aahing
aahs
 
It prints out only the alph characters, and none of the special character that separates the words. You can get the file I''m using here. If you care to take a look for yourself. I just need some way to be able to determine where a word ends and where one begins. Thanks for your help. Russell

Share this post


Link to post
Share on other sites
Advertisement
okay, lots of easy ways to do this, and remember them aswell

Under MS/PC-DOS, OS2, Win32, Win16 and so on the end of line seperaters are as follows - remember this
13
10

These are the two sympols that you couldn''t see, a music sign and a playing card

To see these, go to the command prompt (console), locate the file and type this
edit /77 ospd.txt
This loads the file in binary mode (binary file mode meaning it shows you everything instead of using the ascii formatting characters).

Note that the bottom right of the screen shows the value of the current charcater.


To write a program to read and test the file, try this (off the top of my head so hope there are no mistakes)
  
#include <stdio.h>

void main(void)

{

 FILE *in;

 in=fopen("ospd.txt","rb");

 while(!feof(in))

  printf("%i\n",fgetc(in);

 fclose(in);

}






Beer - the love catalyst
good ol'' homepage

Share this post


Link to post
Share on other sites
errr, for your data file use this code instead so you don't have to wait for half an hour

    
#include <stdio.h>

void main(void)

{

 FILE *in;

 int i;

 in=fopen("ospd.txt","rb");

 for(i=0;i<20;i++)

  printf("%i\n",fgetc(in));

 fclose(in);

}


PS - first example I also left off a bracket.





Beer - the love catalyst
good ol' homepage

Edited by - Dredge-Master on February 4, 2002 8:04:24 PM

Share this post


Link to post
Share on other sites
If you ever run across another such problem, just grab a hex editor and open the file up in that. Im surprised no one else suggested this, actually.

Share this post


Link to post
Share on other sites
I did show that.

You don''t need a hex editor, just the binary editor.

Edit.com (or edit.exe depending on version) has a binary mode

a hex editor is a viewer/editor which has binary and hex on it.

To view a character value, you do not need hex, unless your viewer doesn''t support a value display.


edit /77 is also the easiest way to view any file under 8mb.
fast, easy to navigate and change, everyone has it and unless you need the hex support, it involves less fumbling and you see more. It shows the full ascii set aswell which alot of hex editors do not.



Beer - the love catalyst
good ol'' homepage

Share this post


Link to post
Share on other sites
Also, one important thing to note is ''\n'' is the line-break character 10, whereas ''\r'' is the carriage-return character 13. But when working with files in text mode, the ''\n'' is automatically converted to ''\r\n''.

~CGameProgrammer( );

Share this post


Link to post
Share on other sites
The most important thing is the "rb" argument in fopen (). The ''b'' means open in binary mode, because otherwise it gets opened in text mode and doesn''t hand every byte to you.

___________________________________

Share this post


Link to post
Share on other sites
quote:
Original post by CGameProgrammer
Also, one important thing to note is '\n' is the line-break character 10, whereas '\r' is the carriage-return character 13. But when working with files in text mode, the '\n' is automatically converted to '\r\n'.

~CGameProgrammer( );




Carefull, that is for windows only!

Unix only uses linefeed and Mac only uses carriage return!!!

So when people write parsers, they should check for that!

    
if ( char == 13' )
{
if ( nextChar == 0x10 )
//windows new line

else
//mac new line

}
else if ( char == 10 )
{
//unix new line!!!

}


Edited by - Gorg on February 5, 2002 2:02:37 AM

Share this post


Link to post
Share on other sites
quote:

a hex editor is a viewer/editor which has binary and hex on it.



I know what a hex editor is, thankyou. And FYI most of them dont show the binary. I havent seen what that does, anyway. And any one worth its salt shows the ASCII too.


I wasnt saying your method didnt work, it just seemed a bit dirty/longwinded for what he wanted.

Share this post


Link to post
Share on other sites
On a side note, the reason you original program did not work was that when you open a file with streams, it defaults to ASCII mode of reading. When it does this, whitespace characters act as delimeters and are not read by the stream. Spaces, tabs, newlines and carriage returns are some examples of whitespace. To get your original program to work, you need to open the file as binary.

---
Make it work.
Make it fast.

"Commmmpuuuuterrrr.." --Scotty Star Trek IV:The Voyage Home

Share this post


Link to post
Share on other sites
If i remember correctly the square box is the null character, you can reproduce it in a dos command line by pressing alt-255
I hope that helps, but it also signifies a carriage return in windows, so thats what most text editors read it as.

Share this post


Link to post
Share on other sites
The square box is any character not part of the standard Windows character set. Windows only shows maybe 130 characters, and there are 256 possible ascii codes.

~CGameProgrammer( );

Share this post


Link to post
Share on other sites
quote:
Original post by Spiral
[quote]
a hex editor is a viewer/editor which has binary and hex on it.


I know what a hex editor is, thankyou. And FYI most of them dont show the binary. I havent seen what that does, anyway. And any one worth its salt shows the ASCII too.


I wasnt saying your method didnt work, it just seemed a bit dirty/longwinded for what he wanted.



Hex editors are the long winded version. A hex editor should have the hex displayed and the binary loaded ascii set on the right. Many of the hex editors available only display ascii display characters, not their control characters.


Again, why the hell would a sane man waste time with a hex editor when all he has to do is use edit. - sorry about non-political correctness here.



Beer - the love catalyst
good ol' homepage

Edited by - Dredge-Master on February 6, 2002 8:58:41 PM

Share this post


Link to post
Share on other sites
quote:
Original post by CaptainJester
On a side note, the reason you original program did not work was that when you open a file with streams, it defaults to ASCII mode of reading. When it does this, whitespace characters act as delimeters and are not read by the stream. Spaces, tabs, newlines and carriage returns are some examples of whitespace. To get your original program to work, you need to open the file as binary.

That''s the second piece of incorrect advice you''ve posted today. Sorry to be picky, but when you give people the wrong answers, they just end up posting again a week later wondering why what you said doesn''t work.

All the binary flag does in iostreams is suppress conversion of special characters (eg. converting the new line and carriage return characters to match the OS standard), not change the formatted input and output (eg. << and >>) to stop skipping whitespace. There are special functions (get and getline, I believe) that will read in from a stream without skipping whitespace.



[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost ]

Share this post


Link to post
Share on other sites
quote:
Original post by Kylotan
All the binary flag does in iostreams is suppress conversion of special characters (eg. converting the new line and carriage return characters to match the OS standard), not change the formatted input and output (eg. << and >>) to stop skipping whitespace. There are special functions (get and getline, I believe) that will read in from a stream without skipping whitespace.


The article I read on this made it sound like you didn''t have to use getline in binary mode. Unfortunately, I can''t find my C++ book and have to rely on information I can find on the Internet to learn what I don''t know.

quote:
Original post by Kylotan
That''s the second piece of incorrect advice you''ve posted today. Sorry to be picky, but when you give people the wrong answers, they just end up posting again a week later wondering why what you said doesn''t work.


I would prefer to be corrected, than have the wrong information going around. Thanks and sorry for the wrong advice.

---
Make it work.
Make it fast.

"Commmmpuuuuterrrr.." --Scotty Star Trek IV:The Voyage Home

Share this post


Link to post
Share on other sites

  • Advertisement