Sign in to follow this  
AIRmichael

Complex data loading

Recommended Posts

Hi, Is there a library around that does data loading of any kind of format? (any kind of data from any coll or row with any delimiter)? Even reaching the complexity of having multiple file formats in 1 file... Greetings, Mic

Share this post


Link to post
Share on other sites
There are libraries for many existing formats... XML, JPG, MD5, whatever. However, there aren't libraries that can just read "any" format; that's not really how it works. Generally, if you have a custom format, you have to have custom code that understands that format.

What kind of data are you wanting to load, and what does it look like?

Share this post


Link to post
Share on other sites
File formats are a dime a dozen, different files have different formats. Having different "formats" in one file is normal also.. Bitmaps can have pallete information, and each 'pixel' could either represent a pallete entry, or a standard RGB value. As far as being able to load "any" file format,I highly doubt something like that exists. A lot of formats are for internal use (like game data files and such), and unless someone decoded them, there is no way ~you~ can open them. Can you be more specific on what kind of data you are trying to load?

Share this post


Link to post
Share on other sites
Hi,

I know what you mean. Already guessed myself that this is not available yet somewhere else, but I still had to try (for my work).

So yeah, it has to load in all kinds of data, booleans, floats, strings in whatever order placed. Features are now defined per line. ( A "feature" is a object which need all that data to be load in as 1 feature). But several features can be placed on the same line as well, in another order, with different delimiters.

File example:
[comma delimited colls]
[comma delimited colls]
[comma delimited colls][tab delimited colls]
[comma delimited colls][tab delimited colls]
[space delimited colls][comma delimited colls]

So actually, I need to develop a tool that loads in any format...

Greetings, Mic

Share this post


Link to post
Share on other sites
Unless there is a prefix on the data that tells you what the delimiter for a particular bit of data is, I don't see any way in the world you could interpret the data correctly.

Example:

A, B C,D

Should this be:

"A" - " B C" - "D"

or

"A," - "B" - "C,D"

Could be either depending on whether you are treating it as space or comma delimited.

Would it be presumptuous to suggest that perhaps the file format you are trying to support is the source of the problem, rather than the lack of a library that can interpret it?

Share this post


Link to post
Share on other sites
I am assuming what you are REALLY trying to say is that there can be any number of items and values and such. One thing that is VERY simple is a list of comma-seperated object=value pairs:


MyVarA=5,MyVarB=f53.12,MyVarC=f1242.00,MyVarD="Hello\, Is someone there?",MyVarE="Another \"Test\" with system\, characters"


If you are using special characters that may be needed inside strings, its always goot to accept the "\" literal (which tells your parcer that the next character is a literal string character, so skip any line terminators. There are much more efficent ways to do even that though. You could have it more like this:



////////////////// String variable example ///////////////////
[0000] 0x01 <-- Vartype, This could be a #DEFINE VarType_String 1
[0001] 0x06 <-- Length of variable name
[0002] "TestVar" <-- Variable length Name (quotes aren't actuially saved)
[0009] 0x08 <-- Length of string value
[000A] "My Value" <-- String value (again, quotes aren't actuially saved)



Note that I tend to prefer non-null-terminating strings because they are much safer to use.

Basically you can just have a different format for each data type (for example, booleans do not need a 'value length' field, they simple need one byte to store whether it is 'true' or not).

Is this closer to what you are looking for?

Share this post


Link to post
Share on other sites
Variables within the dataset is not a option, due the huge size of datasets. (The datasets I have to load in can be huge). But yeah, the definitions of what datatype something is, has to be user defined eventually within the tool.

It will be complicated, but I will work towards it. Thx.

Greetings, Mic

Share this post


Link to post
Share on other sites
Yeah,

A dataset contains all kinds of genetic information. Like chromosome numbers, locations, gene names and id's, all kinds of ratio scores etc.

Greetings, Mic.

Share this post


Link to post
Share on other sites
It sounds like a lot of data but the reading/writing should be pretty straight foward. I would, however, recommend you use compression (zlib or likewise) if you are going to have insane amounts of data. This would slow down load times but could quite possibly greatly reduce the size of your files.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this