Sign in to follow this  
jccorreu

c++ open many "unknown" files simultaneously

Recommended Posts

I've got to read info from multiple files that will be given to me. I know the format and what the data is. The thing is each time we run the program we may be using a differnt number of files, with different file names each time. So i'm writing into the code to ask the user how many files, and what their names are. From each we'll read in 2 lines, then do some math using all of those lines. Then do it again on another set of lines. I'm having some trouble creating different objects with different names when I don't know before hand how many there will be or what the file names will be. I know the following code won't work but it might give an idea of what I'm thinking. int num; std::cout << "enter number of files: "; std::cin >> num; char* infile[num+1]; for(int n=1; n<=num; n++) { std::cout << "\nenter name of file " << n << " : "; std::cin >> infile[n]; std::ifstream infile[n]; infile[n].open(infile[n]); } Anyone got any idea how to create variables whose names are themselves variable by the program? It's something an old macro language I used to know could do, but I don't yet see a way to manipulate c++ into it. how could I force the creation of the ifstream object to take its name from such a variable, or from an element of an array? should I be using pointers in a different way? would I have to somehow overload the fstream::open() function? any other methods are also welcome. though this is some kind of process that I'd like to make more general and applicable for other uses. thanks all James

Share this post


Link to post
Share on other sites
Watch out for one thing, there... You have 2 variables called infile. You wouldn't be using the global one for the file creation there.

Share this post


Link to post
Share on other sites
Why do you care what the file names are *after* you open them? Why not just make a vector of the files, or even just process them (fully) one at a time? You don't need to turn the file name into a variable name at run-time; just set up space for the variable. C++ doesn't *have* variable names at run-time; they exist for your convenience only and are not present anywhere after the compiler has done its thing.

If you really, really need to remember what the name of the file is after you opened it, you could make an associative container mapping the file names to file objects - i.e. std::map<std::string, ifstream>.

Share this post


Link to post
Share on other sites
just have a vector of filenames even. At the start of your program, ask the user to enter all filenames to be used, and store them all in this vector. Then simply go through this vector, loading each filename as you go (via a function call probably), getting all the data you need, and moving on to the next one.

If I'm missing something here please tell, 'cause what I just said seems a bit simple =D

Share this post


Link to post
Share on other sites
You can have a container of pointers to streams. Say, boost::shared_ptr<> to stream, if you care about ref-counting (else just use raw pointers).

Btw: I wouldn't ask the user; I would use either a primary input file which lists the files, or I'd specify it all on the command line.


#include <deque>
#include <string>
#include <cstdio>
#include <cstdlib>

void usage(char const * msg) {
fprintf(stderr, "%s\nusage: program controlfile\n", msg);
exit(1);
}

int main(int argc, char * argv[]) {
if (argc != 2) usage("Bad command line arguments.");
FILE * mainfile = fopen(argv[1], "rb");
if (!mainfile) usage("Can't open input file.");
char line[1024];
std::deque<std::pair<std::string, FILE *> > files;
while (true) {
line[0] = 0;
fgets(line, 1024, mainfile);
if (!line[0]) break;
line[1023] = 0;
char * x = strchr(line, '\n');
if (x) *x = 0;
FILE * tmp = fopen(line, "rb");
if (!tmp) {
fprintf(stderr, "can't open %s\n", line);
exit(2);
}
files.push_back(std::pair<std::string, FILE *>(std::string(line), tmp));
}
fclose(mainfile);

// now, do whatever you need to the files in "files"

for (std::deque<std::pair<std::string, FILE *>>::iterator ptr = files.begin(),
end = files.end(); ptr != end; ++ptr) {
fclose((*ptr).second);
}
return 0;
}

Share this post


Link to post
Share on other sites
FILE* and fprintf()? In MY C++?

Anyway, you could also just make the container of the filenames, and then open the streams when you actually need them. Something like:


#include <vector>
#include <string>
#include <iostream>
#include <fstream>
using namespace std;

void procFiles(const vector<string>& names) {
for (vector<string>::iterator it = names.begin(); it != names.end(); ++it) {
ifstream current(names.c_str());
/* Do stuff with the file here */
}
}

int main(int argc, char** argv) {
// Assuming you put all the names on the command line, rather than a "control file".
vector<string> allFilenames(argv + 1, argv + argc);
/* other stuff */
procFiles(allFilenames);
}

Share this post


Link to post
Share on other sites
Ok I do not know what a vector class is in c++. I'm familiar with the math/physics concept fo a vector. nor am I familiar with the boost libraries, or deque. I'll be searching about them after I post this.
The reason for the 2 lines from each file is I have to do a linear interpolation to arrive at a predetermined "average", that will also be the same "average" that I'll be seeking for my linear interpolation of the pairs of lines from all the files. This is so I can make a standardized data for my calculations. From this I would then be do a mathematical calculation that uses all those "averages". It is very important to keep straight which is which, becuase of where they go in the equation. I dont' really need to know the names of the files, I just thought of that as a scheme for keeping the data straight. The data files are not large, no more than 1000 records usually, so i could bring them in completely.
I see no reason why I couldn't use a control file containing the names of the files, or also make it command line, instead of actually asking, but any way it will be unknown until run-time how many there are and what their names would be, and that is the real problem I'm having.
I don't know any longer, but concerning memory space and running time, what is more efficient, opening and closing files repeatedly when they are needed, or opening them and simply reading each time through then closing at the end, or bringing a few thousand records in?
i'll be trying what ya'll suggested, and in the end let you know what I wind up doing.
thanks

Share this post


Link to post
Share on other sites
I think you're all misunderstanding his problem. I think he just wants to discover dynamic memory allocation.

As you know, you can't create an array of a size based on a variable at compile time, ie:
int i = 5;
int nums[i];


The answer is dynamic memory allocation. Here's some code to look at that is basically what you're trying to do. The difference is that I'm using strings instead of chars and I don't have an array of the actual streams but just the filenames.
#include <iostream>
#include <string>
using namespace std;

int main() {
int amount;
string *filenames;

cout << "How many files? ";
cin >> amount;
filenames = new string[amount]; // this is the key part

for(int i = 0; i < amount; i++) {
cout << "Filename of #" << i << ": ";
cin >> filenames[i];
}

for(int i = 0; i < amount; i++)
cout << filenames[i] << endl;

return 0;
}

Share this post


Link to post
Share on other sites
Congratulations, you just leaked the allocated memory - which is why the rest of us are all talking about vectors and deques and so on.

Share this post


Link to post
Share on other sites
Dynamic memory 101:
How to create a run-time determined number of copies of someting.


#include <vector>

using std::vector;

void test(int number) {
// this is the type of a container that contains "int"s.
typedef vector<int> myvec;
myvec dyanmic;
// change the number of elements in the vector:
dynamic.resize(number);
{for (int i = 0; i < dynamic.size(); ++i) {
myvec[i] = i*(i-1)/2; // or whatever else you want
}}
printf("%d: %d\n", i, myvec[i]);
// or use cout. I like living dangerously when printing.
}


Share this post


Link to post
Share on other sites
Thanks some more,
Ivko was partly right. I had forgotten about new and was trying to use that functionality. Dynamic memory allocation seems to me like the only way to go, and I don't understand it enough (like so many other things) Zahlman, I don't understand why what he did is a memory leak, would you please explain, since I would have done the same thing, and actually learned to do it that way? Also a number of you pointed out that using the actual streams aint the best idea, so I'll not do that. What I've seen about vectors so far looks really kewl. I'm still playing with getting used to how to use them properly, but it still seems to me this should be do-able with basic old-fashion plain vanilla c++, as some seem to recognize also.

james

Share this post


Link to post
Share on other sites

#define OMEGA 1000000000//supersize my buffer
int main(int argc, char** argv)
{
char buffer[OMEGA];//give it a huge buffer
int *offsets = new int[argc-1];
int offset = 0;
FILE *fp;
for(int i=1; i<argc; i++)
{
fp = fopen(argv[i],"rb");
fseek (fp , 0 , SEEK_END);
sz = ftell (fp);
rewind (fp);
fread(buffer+offset,sz,1,fp);
fclose(fp);
offsets[i-1] = offset;
offset += sz;
}
delete [] offsets;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Aiursrage2k

#define OMEGA 1000000000//supersize my buffer
int main(int argc, char** argv)
{
char buffer[OMEGA];//give it a huge buffer
int *offsets = new int[argc-1];
int offset = 0;
FILE *fp;
for(int i=1; i<argc; i++)
{
fp = fopen(argv[i],"rb");
fseek (fp , 0 , SEEK_END);
sz = ftell (fp);
rewind (fp);
fread(buffer+offset,sz,1,fp);
fclose(fp);
offsets[i-1] = offset;
offset += sz;
}
delete [] offsets;
}


Congratulations, you just tried to reserve 1 GB of stack space, which is going to fail on basically any platform. I think you'd best allocate space for the file data dynamically too. Yes, that does mean that you won't get to toss it all in the same array and track offsets separately. Not that such l33tness is really going to save significant if any space anyway. Oh, and nice job using a #define for a constant. And of using old C I/O routines, for that matter. Boo.

And anyway, there probably isn't a good reason to read all of the files and store them right away. But if there is:


#include <vector>
#include <string>
#include <algorithm>
#include <iostream>
#include <fstream>
#include <iterator>
using namespace std;

int main(int argc, char** argv) {
vector<vector<char> > all_file_data;
for(int i=1; i<argc; i++) {
ifstream is(argv[i], ios::binary);
vector<char> current;
copy((istreambuf_iterator<char>(is)), istreambuf_iterator<int>(),
back_inserter(current));
all_file_data.push_back(current);
}
}


Nice and easy.

jcorreu: You have to deallocate memory that you allocate dynamically. In Ivko's case, he 'new[]'s an array of string objects, but doesn't subsequently 'delete[]' it as he should. This is irresponsible, and on older OSes (like Windows 98) can actually cause problems.

std::vector manages this allocation for you. It is your friend.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this