Relational Data Model (C++)

Started by
8 comments, last by Chad Smith 11 years ago

Before I begin I wanted to give the heads up that this is a school assignment. I am not asking for it to be coded, a solution, or even a working example. I am more looking for help in code design or how you would setup these very simple tables.

The assignment is to put together two tables (very simple) in C++. To hopefully give more meaning to examples I may give here are the tables and what they include.

Students: Student ID, First Name, Last Name
Grades: Student ID, Term, Year, Grade

Only "requirement" that I must follow in designing and implementing this is that the tables are internally represented as a Linked List of rows. Note that I am allowed to use the std::list collection. I can only use the Standard Library though (boost or any other third party library not allowed) and it seems like our university compilers do not support C++11 at this time.
Question: Could I get clarification what this would mean?
What I think it would mean is this: Lets have I have the Student Table. Student Table has the attributes ID, First name, Last Name. Those attributes would be the "columns" in the table. You would then hold each of those attributes in a list. Would it be something like this?


// Quick Example, Class Not fully coded or represent final product

class Student
{
     std::list<int> studentID;
     std::list<std::string> firstName;
     std::list<std::string> lastName;
}

Would that be right or did I read that wrong? Here is what is what the assignment states, though I am a bit confused by what she means.

"You will implement 2 tables, namely students and grades. Some of the design choices are up to you. You can either implement separate classes, implement a template, or implement a generic class for both row types (e.g. storing the attributes in a string array), etc. You will need one or multiple classes for the table and one or multiple classes for the corresponding rows, i.e. each table will internally be represented by a linked list of rows."

This is where I am getting a bit confused on a good way to design it and where I am asking for help. Collaborating with other people on some ideas always seems to help me when my brain isn't fully grasping a concept

I had some quick ideas that maybe some people would/could comment on to see if I am thinking about this correctly or if I am way off.

1: Just create a Table Class that represents a Table. I would just create two tables to represent the students and grades table. Issue I have with this is how to represent and store the attributes of each table. The reason I thought of this though is that to demonstrate the tables we will be issuing BASIC commands on the tables. Like print, to print the table. Select, to select a table, an attribute from that table, and the value, which would then print out all matching rows. We also can join the two tables together. Since each table has these commands seems like a Table Class would be good to hold these functions and not to repeat code.

2: Creating both a student and grades class that stores all the required information. Then create a StudentTable class that would store a list of Students and a GradesTable that would store a list of Grades

ie:


class Grades
{
     int studentID;
     std::string term;
     int year;
     char grade; 
}

class GradesTable
{
     std::list<Grades> studentGrades;
}

Any other ideas? Am I way off? Just some quick ideas and thoughts would be great.

Thanks ahead of time to all who try to help/comment!

Note: I did go talk to my professor though something about the way she explains things I tend to lose the grasp on the concept.

Advertisement
When I hear "each table will internally be represented by a linked list of rows" I think of what you describe in #2. But that doesn't mean you couldn't also implement #1 along with it. I don't know if you've learned about templates yet, but you could create a templated Table class that holds all of the common table/row manipulation functionality and have the template parameter be the type of data to store in the row. Or you could you create a Row base class and use inheritance to handle common functionality. Either of those would be a combination of your two ideas.

From the problem description you posted it sounds like your professor isn't looking for a specific approach so I would go with what makes sense to you based on what you have learned about C++ so far.

Only "requirement" that I must follow in designing and implementing this is that the tables are internally represented as a Linked List of rows.

That means all your row data must be in a single std::list<row> rows.

You are allowed to choose is what "row" is. You also have some choice in how to write the table class holding that list.

This exercise is effectively a tiny database, with common database operations. Flexibility is one of the points of using a database in the first place. There's no reason to manually code classes for "students", "grades" etc. - it's more work, results in worse code and destroys flexibility. A single table class is the way to go.

On the other hand, I would also not template the table class. It's overengineering for the purposes of an exercise like this, and you'd sweat blood trying to code a join function between two differently templated tables (which are totally different classes as far as the compiler is concerned). It would also prevent dynamically creating new table types at runtime, which is something you need for a proper join.

For the purposes of the exercise, I would strongly recommend defining the row type as vector<string>. I would also stick one extra vector<string> in the table class that holds the column names ("studentID", "term" etc.) so that you can find an attribute from that vector by name (e.g. "term") to obtain its index (1), and then use that index to grab data from the correct column in rows. Also useful for printing the table.

Only "requirement" that I must follow in designing and implementing this is that the tables are internally represented as a Linked List of rows.

That means all your row data must be in a single std::list<row> rows.

You are allowed to choose is what "row" is. You also have some choice in how to write the table class holding that list.

This exercise is effectively a tiny database, with common database operations. Flexibility is one of the points of using a database in the first place. There's no reason to manually code classes for "students", "grades" etc. - it's more work, results in worse code and destroys flexibility. A single table class is the way to go.

On the other hand, I would also not template the table class. It's overengineering for the purposes of an exercise like this, and you'd sweat blood trying to code a join function between two differently templated tables (which are totally different classes as far as the compiler is concerned). It would also prevent dynamically creating new table types at runtime, which is something you need for a proper join.

For the purposes of the exercise, I would strongly recommend defining the row type as vector<string>. I would also stick one extra vector<string> in the table class that holds the column names ("studentID", "term" etc.) so that you can find an attribute from that vector by name (e.g. "term") to obtain its index (1), and then use that index to grab data from the correct column in rows. Also useful for printing the table.

Thanks, this does spark some ideas though a couple of questions. While the final design is up to me and I do like in this assignment I'm not restricted really to a certain design, I also want to try to at least have some "proper" design as that is why I'm really asking to also expand on my knowledge and to get some ideas on ways to do it just in general.

I right now can see a Table class that holds a vector of strings for the columns and for the rows it sounds like your are describing a list of vectors(?). That seems simple enough. The real question or something I'm wondering about is dealing with different datatypes. It seems from what I read in your post that each thing entered in the rows is essentially a string. That even the ID Number, year, and the students grade would even be a string (unless I read your post wrong and I may have, correct me if I did). That seems like it would allow for simple operation and be expandable for any table I want to create, though is it "proper" or something to just have every tuple as a string internally? Should the tuple not store the datatype that it accepts? Like integers for student ID's and so on?

Thinking of it the way I am reading your post, if I were to have a way to create a table than I could just pass in a string that has each column name in, then just fill the vector up with that string. Sounds simple enough. Then adding would just take a string which has the information to fill in each row. Just fill the rows up with that string? Seems simple though it would require the string of data to be in the right order. While it would be for the purpose of this assignment it seems if the string was entered in a wrong order it would mess everything up.

Am I thinking of that correctly? I can't think of a way to have it where each row would require a different datatype (int, char, string) without going through a lot of work of templates, hard coding (wouldn't want to as while it may work for this assignment wouldn't allow different tables to be made), or creating my own row data type for each data type.

I could be entirely over thinking this for the purpose of this assignment (I am known for over thinking simple information, from Computer Science to every day life stuff) so just let me know if I am. Guess I just want to get a design going that doesn't seem hardcoded. Most of my assignments right now seem a bit too hardcoded and don't allow a ton of expandability. While I am not trying to create a solution that is used by the general public, I also don't want to create a design that is hardcoded and wouldn't allow functionality. Mostly because I know I need to get out of that.

Thanks for the help. Any opinions still welcome of course. I'm working on a couple test things and writing down some ideas that I may want to try or anything I can think of.

Thanks!

it "proper" or something to just have every tuple as a string internally?

That depends on the purpose of these tables. For some real-world uses it would be fine to just store everything as strings. Ints and many other kinds of data can easily be converted into a string and back.

Anyway, you have understood correctly. This "array of strings" approach (explicitly OK'd by your teacher) doesn't really deal with type safety; it just takes and gives back strings.

If you wanted to enforce type safety, that could be added in. For instance, in addition to column names, the table could store a vector of column types (enums with a range of values corresponding to data types that you want your tables to support: DB_INT, DB_STRING, ...). Then you could use that information in functions that push data into the table to require the input to be of certain type, and to convert from the real type to whatever internal storage type the table class uses. You could still choose to keep std::string as the storage type, but also choose to use something else like a C++ union in which case you don't have to convert data. Reading from the table could also enforce correct types.

sometable.getInt(rownumber, "ID") could throw an exception if the "ID" column does not hold ints, for example.

That depends on the purpose of these tables. For some real-world uses it would be fine to just store everything as strings. Ints and many other kinds of data can easily be converted into a string and back.

I was thinking of that that it'd be easy enough to just convert back and forth if needed.

Anyway, you have understood correctly. This "array of strings" approach (explicitly OK'd by your teacher) doesn't really deal with type safety; it just takes and gives back strings.

Ok after thinking of some things during the day I might have just confused myself. Maybe I could get some help with really quick.

I get that the rows need to be stored in a list and you mentioned the std::list<row> rows in your previous post. Then you mentioned you would store the rows as a std::vector<string>. Does this mean you would then have essentially a: std::list<std::vector<std::string>> (maybe a typedef std::vector<std::string> rows would make the initial declaration simpler to read)? If so, I guess I am a bit lost on how to operate on that?

ie:


// Very quick example
class Table
{
     // this just holds the columns names right?
     // so just like "Student ID, First Name, Last Name"
     std::vector<std::string> columnNames

     // this would hold the actual information of the table?
     // Just using a single vector of strings, nothing else, would just hold the rows for one column wouldn't it?"
     // Though since each row is just an array or vector of strings I could do:
     typedef std::vector<std::string> rows;
     // because must be internally represented as a linked list?
     std::list<rows> data;
};

Unless I am thinking of this table totally wrong, I see the vector of strings to hold the column names and I also see a 2D Array (a 2D vector would/could give a jagged array couldn't it)? to actually hold the information of the tables? The column index would simply hold the index taken from the columnNames Vector, with the row index simply being the next row that doesn't currently have information inside it. While describing this I do see a linked list in my head for the row as it would just hold the data and the next index it points too (though it seems like that would only be for one individual attribute for the table and not the entire table that has all the data?)

So I guess the next thing is more of a programming/code question. I'm not entirely sure on how I would work with a list of vectors? Like how would I send data into that vector in the list?

using data.push_back would want a vector, so I'd be confused on how to add something into the rows vector which then gets pushed onto the linked list?

It's just every time I describe this and draw this out I am seeing a Multidimensional array?


Student Table drawn out
Student ID   First Name   Last name   (this would be the column names vector)
011758        Bob          Smith     
32846         Billy        Joe
..            ..           ..         (this is where I see a 2D Array to hold this information.  I'm not seeing a linked list of vectors [where vectors just a typedef to be row])?






Sorry if I am acting dumb. Not sure what I am missing to not grasp the total concept of it with a linked list. I just keep seeing a 2D Array. Using a linked list I don't see how to know what data goes under each column and then having it point to the next? Plus trying to mess with it in code is having me confused on how to act if I had a linked list of vectors?

I have done some simple database operations before when I was into web programming though since that internally was already built for me I seemed to have no fully grasped what was going on internally?

It's not a 2d array, it's a linked list of 1d arrays.

It's not a 2d array, it's a linked list of 1d arrays.

Yea, that's actually what I meant when I said that I just wan't seeing exactly how to deal with all the data as a linked list of vectors. My brain kept on thinking a 2D array would be needed.

Though I thought about it a little bit and I have started to design something. I store it as a list of vectors, then when I add something to the table I just do at the time


// data is my linked list and input is just the string input that has the data
data.push_back(InsertToVector<std::string>(input));

This seems to work. InsertToVector is just a function that I wrote that takes in a string of data (default separated by a white space) and inserts each tuple into the vector and returns the vector. Seems like at this time I can transverse the list and access what I need in the vector if needed. At this time it seems like it works though I am not sure if their is a better way than my InsertToVector function that I wrote. Is there? Seems a bit weird that I am copying everything into a temporary vector (inside InsertToVector), which then just returns that vector into the vector in the linked list. Would their be a better way to do that?

Though now that I have something to work with I can focus a bit more on the design of the class.

Thanks to everyone who helped! Of course I will welcome any opinions as I would love to refactor it a bit more after I do turn the assignment in for the learning process. Though I would appreciate if anyone could answer the question about if their is a better way than what I have right now to insert the string data into the list of vectors.

Thanks!

InsertToVector is just a function that I wrote that takes in a string of data (default separated by a white space) and inserts each tuple into the vector and returns the vector.

When you say "inserts each tuple", do you mean each token? As I understand it, the function tokenizes input, and puts each token into the vector.

Anyways, I think it looks pretty good. Maybe enumerate each column so you don't have to use magic numbers to index those vectors, if you haven't done so already.

InsertToVector is just a function that I wrote that takes in a string of data (default separated by a white space) and inserts each tuple into the vector and returns the vector.

When you say "inserts each tuple", do you mean each token? As I understand it, the function tokenizes input, and puts each token into the vector.

Anyways, I think it looks pretty good. Maybe enumerate each column so you don't have to use magic numbers to index those vectors, if you haven't done so already.

Yes that is what it does.

If the input was "011758 Billy Joe" then 011758 would be stored at index 0 of the vector, so on and so on.

Yes, I am currently looking at way to get away from magic numbers to index the vectors.

Thanks for all the help!

This topic is closed to new replies.

Advertisement