File reading/writing in C

Started by
22 comments, last by Zahlman 17 years, 10 months ago
I don't know how I manage to do it, but file reading and writing is the one thing I've never quite understood in C. I've got cref, so I've read about fopen and fread, I've seen it applied in tutorials and on man pages, but I just can't do it myself. For example, what's wrong with this: #include <stdlib.h> #include <stdio.h> int main() { FILE *myFile = NULL; char *txtArray; myFile = fopen("loadme", "r"); fread(txtArray, 1, 2, myFile); printf("%s\n", txtArray); return(0); } I would think nothing, but it just gives me a Bus Error. I just want to read some text! But here are a couple of more specific questions, but first, the fread definition: The fread() function reads, into the array pointed to by ptr, up to nitems members whose size is specified by size in bytes, from the stream pointed to by stream. Ok, so for example is two words two items? And wouldn't there size in bytes differ accross different lengths of words? Example "supercalifragalisticexpyalidocious" vs. "hello". And what if I don't know the lengths of the word or amount of "items" in the file? Any help would be much appreciated. Thanks!
Advertisement
The buffer that you read into must already be allocated.

txtArray = (char*)malloc(BUFFER_SIZE);orchar txtArray[BUFFER_SIZE];


Also, fread/fwrite define size in bytes, so to read/write a string you will have to read/write every byte in the string.

Your code there will read two bytes from the file.
If you don't know the lengths of the words in the file, there are two binary options and one main text option.

Binary options: Write the string's length before the string so that you know how many characters to read (this is my typical approach), or write the string out and include the null terminator byte (the problem here is that you don't know how large to make your buffer before you start reading).

If you want to use the above methods, you need to use a "binary file". This means that the file will store information in much the same way that a program stores values in memory. To fopen a binary file, you use "rb" or "wb" for the second parameter, where the 'b' of course means binary.


Text options: If you want to use a plain text file so that you can hand edit it, I recommend using 'fgets' and 'fputs' instead of 'fread' and 'fwrite'. fgets will read a line of text from the file by looking for newline characters ('\n') or if it reaches the end of the file. If no characters could be read, it returns NULL, which lets you know that you should stop reading in lines.
An 'item' is just a sequential chunk of bytes in a file that you define the size of.

For example "supercalifragalisticexpyalidocious" could be one item of size 34(fread(buf,1,34,fp)) and "hello" another item of size 5(fread(buf,1,5,fp)).

Usually if you are reading a file you know the structure/format of that file before hand. For example say you have file that contains a list of words seperated by a newline. You can read each word using this code:
char buf[80]={0};int total =0;  FILE* fp = fopen("wordlist.txt","rt");  if(NULL == fp)return;  while(fgets(buf,79,fp)){   int len = strlen(buf);   printf("Read string:%s of length:%i\n",buf,len);   total++;  }  printf("Total words in the file:%i\n",total);  fclose(fp);

For text files you can use fgets() to read in a file line by line.

In conclusion you must consider/determine/known the format of the file before trying to write code to read in the data.

[Edited by - Jack Sotac on May 31, 2006 1:54:16 PM]
0xa0000000
fgets eh?

Well thanks, I'll check that out! I was just wondering if I'd have to go find myself a text-to-number-of-bytes convertor or something. :p


Thanks a bunch!
Ah, just a funky issue:

I have a text file with this text in it:

Hello World
Earl

I can load the first two words seperatly like this:

FILE *myFile = NULL;
char txtArray[0];
char itmChar[0];
myFile = fopen("loadme.txt", "r");
fgets(txtArray, 7, myFile);
printf("%s\n", txtArray);
fgets(itmChar, 6, myFile);
printf("%s\n", itmChar);

And it will display them fine. But for some reason, to get "Hello " (including the space, on one line, so that "world" is not indented) I need to set the bytes to 7. So far I've figured that 1 byte is one letter. That means 6, 5 for "hello" and 1 for the space, but instead it's 7. And then 6 for the 5 letter word "world". That seems strange for this reason:

Before, if I loaded them all at once it only took 12 bytes to grab it all, now I need 13. Of course, 12 did not conform to my "1 byte per letter" rule either, I was guessing that the extra one was the beginning of line character or something, but now there's even ONE MORE byte.


What's going on?

Thanks!
Your problem is because strings in C are just special character arrays and the program needs an extra byte to figure out when to stop printing. This special byte is the Null character specified by '/0' or just the numberic constant 0.

So this:

char text[] = "hello ";


is the same as this:

char text[7] = {'h', 'e', 'l', 'l', 'o', ' ', '\0' };

C++: A Dialog | C++0x Features: Part1 (lambdas, auto, static_assert) , Part 2 (rvalue references) , Part 3 (decltype) | Write Games | Fix Your Timestep!

0) What you have doesn't really work. You are writing into memory that is pointed to by txtArray and itmChar (since the array names are interpreted as pointers to the beginnings of the arrays), but the arrays, being of zero size, are not big enough to hold the indicated data. In C (and also C++) arrays are not first-class objects; you simply can't "pass an array" to a function, but just pass the pointer. The fgets() function has no idea how much space is available, which is why you have to tell it with the provided parameter. By writing the code as shown, you *lie* to fgets(), which is forced to trust you, and thus write data into the bytes of memory next to the array - *which do not necessarily belong to you*. According to the language specification, *anything is allowed to happen at this point*. Including appearing to work correctly.

This is one of many, MANY reasons I would STRONGLY urge you to do this in C++, using proper tools from its standard library, instead. There are very, very few valid reasons for using C any more.

1) The fgets() (read "file-get-string") appends a null terminator to the read-in data; i.e. it reads in what poor C programmers call a "string". This extra \0 character is written in place after the data so that *other* functions can see where the end of the data is (because the length count doesn't get passed around with it). There is no "beginning of line" character, BTW; just end-of-line characters (carriage return and/or line feed).

In C++ you have access to a "file stream" object with a nicer interface than FILE* provides. But more importantly, it provides a real string object which does two very important things for you:

a) It handles all kinds of memory management automatically, so you never have to think about how much space is needed (you can't really read from a file directly into a string, but the provided code sample will show how to work around that, using another library widget with similar nice properties). The string automatically resizes itself if more data comes in than the current allocation can hold; holds on to its own allocation of memory which is guaranteed writable and won't be "aliased"; and cleans up after itself properly in all situations.

b) It remembers the length of the string data, as well as the allocated space, so that you never are stuck passing any "extra" parameters.

2) Another good reason to do this in C++ is that you are allowed to declare variables at their first use, and thus can initialize to a meaningful value right off the bat, rather than a dummy value. You can also do this implicitly by specifying constructor parameters on the declaration line.

3) Like you were told, there's no way to tell where one "item" ends in the file and the next begins except by having the file data indicate that in some way. The usual recommendation for string data is to prepend the string lengths.

4) I will hold your hand, even though I really should insist that you take it back to For Beginners first, and provide a full (not tested, but should be close) example for reading and writing:

#include <iostream>#include <string>#include <vector>using namespace std;// Binary I/Otemplate <typename T>T read_primitive(istream& is) {  T result;  is.read(reinterpret_cast<char*>(&result), sizeof(T));  return result;}template <typename T>void write_primitive(ostream& os, const T& t) {  os.write(reinterpret_cast<char*>(&t), sizeof(T));}string read_string_binary(istream& is) {  // The std::vector handles memory allocation of a buffer for us.  int len = read_primitive<int>(is);  vector<char> buffer(len);  is.read(&buffer[0], len);  return string(buffer.begin(), buffer.end());}string write_string_binary(ostream& os, const std::string& s) {  write_primitive(os, s.length());  os.write(s.c_str(), s.length());}// Text I/Ostring read_string_text(istream& is) {  int len;  is >> len;  // We represented the length count in human-readable format, but we still  // want to use .read() for the actual string data. Using the shift operator  // just reads one token ("word").  vector<char> buffer(len);  is.read(&buffer[0], len);  return string(buffer.begin(), buffer.end());}string write_string_text(ostream& os, const std::string& s) {  os << s.length() << s;}void test_binary() {  ostream test("foo.bin", ios::binary);  // The char* literal will be implicitly converted to a std::string object.  write_string_binary(test, "hello ");  write_string_binary(test, "world");  test.close();  istream test2("foo.bin", ios::binary);  string x = read_string_binary();  string y = read_string_binary();  cout << x + y << endl; // yes, this works with std::strings!  // Of course, you could also just output them sequentially...  // No need to close test2 at this point. I will explain that later if requested...}void test_text() {  // All the same...  ostream test("foo.txt"); // text mode is default.  write_string_text(test, "hello ");  write_string_text(test, "world");  test.close();  istream test2("foo.txt");  string x = read_string_text();  string y = read_string_text();  cout << x + y << endl;}int main() {  test_binary();  test_text();}
Quote:Original post by OneMoreToGo
I don't know how I manage to do it, but file reading and writing is the one thing I've never quite understood in C. I've got cref, so I've read about fopen and fread, I've seen it applied in tutorials and on man pages, but I just can't do it myself.

For example, what's wrong with this:

#include <stdlib.h>
#include <stdio.h>

int main() {

FILE *myFile = NULL;
char *txtArray;

myFile = fopen("loadme", "r");
fread(txtArray, 1, 2, myFile);

printf("%s\n", txtArray);

return(0);
}

I would think nothing, but it just gives me a Bus Error. I just want to read some text! But here are a couple of more specific questions, but first, the fread definition:

The fread() function reads, into the array pointed to by ptr, up to nitems members whose size is specified by size in bytes, from the stream pointed to by stream.


Ok, so for example is two words two items? And wouldn't there size in bytes differ accross different lengths of words? Example "supercalifragalisticexpyalidocious" vs. "hello". And what if I don't know the lengths of the word or amount of "items" in the file?

Any help would be much appreciated.

Thanks!





You might also think about doing error checking in your code

fopen() returns NULL if it cant open the file and the global errno variable will contain a error code.


if ((myFile = fopen("loadme", "r")) == NULL)
{
// print an error msgs "couldnt open file 'loadme' errno=" here...
return ...
}




Dont forget to fclose() your file also.



I choose not to use C++, because C is far more clean, and results in easier to understand code. If that requires that I may need to do a little more work, then perhaps that is what I shall do. I see C++ as somewhat overcomplicated, although perhaps I just don't like it because of it's silly >> & << operands. Who the heck put those in anyway? ><

Besides, using C++ would mean having to re-learn certain aspects of the language, and also having to adapt the code on a project for which these file reading fuctions are destined, seeing as it would be stupid to only change my file extensions. If I were to use C++ for this work, I would have to go through my code making it C++'ish, otherwise there's no point at all.

This topic is closed to new replies.

Advertisement