databases as asset-storage

Started by
16 comments, last by Antheus 12 years, 2 months ago


Of course, they are web scale....


Are you mocking me, sir?


Look how git works. Not git the application, git simply does not work for non-diffable/binary content, but the DVCS design and how change tracking log works.
[/quote]

Yes, the history and change tracking can be used. Any other VCS is going to have the same problem with trying to diff arbitrary binary formats, no?


For assets, git fails. Difficulty in change management of binary assets makes such operations unfeasible. So instead of using raw diffs, only adopt the change tracking mechanism, not the diffs.
[/quote]

See above.


SVN is fine, probably better that DVCS for binary assets.
[/quote]

Why? Please explain.


Perforce is usually preferred.
[/quote]

Why? I haven't used Perforce, to be honest, but I would stay away from it unless absolutely necessary simply because of the licensing costs.
What would be the technical advantage of using it over, for example, git?


Instead of inventing some complex mechanism, you just use whatever versioning system you have on top of file system.
[/quote]

And which file system would that be?
Advertisement
Why? Please explain.[/quote]

DVCS relies on diffs for efficiency, source code specifically is a natural fit. Binary assets don't diff naturally, resulting in considerable overhead, both for resources as well as management. As repository grows, SVN scales better from user perspective.

There is a special git version/patch for storing blobs, but if using a workaround, might as well go all the way, since using multiple repositories loses the uniform change management advantage.

For source, operations like bisect and three-way merge make sense. For binary assets they don't. If two people edit same PSD, merging doesn't make conceptual sense in a way one can merge edits to textual content.

Yes, the history and change tracking can be used. Any other VCS is going to have the same problem with trying to diff arbitrary binary formats, no?[/quote]

Synchronizing centralized VCS doesn't require bringing in entire change history, one just gets the snapshot, partial checkouts are also used.

Why? I haven't used Perforce, to be honest, but I would stay away from it unless absolutely necessary simply because of the licensing costs.
What would be the technical advantage of using it over, for example, git?[/quote]

Proven track record.

SVN was mentioned as free alternative.

And which file system would that be?[/quote]

Whatever works or comes with OS. NTFS, ext, ...


DVCS relies on diffs for efficiency, source code specifically is a natural fit. Binary assets don't diff naturally, resulting in considerable overhead, both for resources as well as management. As repository grows, SVN scales better from user perspective.


It scales better in what way? 'User perspective' is vague.


There is a special git version/patch for storing blobs, but if using a workaround, might as well go all the way, since using multiple repositories loses the uniform change management advantage.
[/quote]

What 'special git version/patch'? Git can store binary blobs out of the box; there's no patch required.
And what's this about multiple repositories? That's the nature of DVCS. There's no central repo. You can, however, designate a 'primary' and treat secondaries as 'forks' (think 'GitHub').


For source, operations like bisect and three-way merge make sense. For binary assets they don't.
[/quote]

You're absolutely right. SVN has the same problem. So what's your point?


Synchronizing centralized VCS doesn't require bringing in entire change history, one just gets the snapshot, partial checkouts are also used.
[/quote]

And what's wrong with bringing in the entire change history? The compression in git is amazing and if you push upstream frequently, your updates are small.


Proven track record.
[/quote]

Proven track record of what? Again, vague.


Whatever works or comes with OS. NTFS, ext, ...
[/quote]

Oh, I thought you were talking about a versioning file system: http://en.wikipedia.org/wiki/Versioning_file_system. You were just talking about plain old file systems. So tell me, what does that file system have to with the OP's problem?
Proven track record of what? Again, vague.[/quote]

Of use in and out of industry.

The compression in git is amazing and if you push upstream frequently, your updates are small.[/quote]

How well is it working for raw video assets, where each file is 500MB+? And raw images.

So tell me, what does that file system have to with the OP's problem?[/quote]

In your running app, you hook to directory change notification. Whenever any of the files changes, reload them.

To update a running app, use regular VCS checkout, working on top of plain old file system. Presto - versioned on-the-fly updates, exactly what OP asked, without reinventing entire VCS and everything else.

Of use in and out of industry.
[/quote]
That's not a "technical advantage"; you have said nothing to back up your statement.


How well is it working for raw video assets, where each file is 500MB+? And raw images.
[/quote]
In what (real world) use case would one do such a thing?
And what's wrong with bringing in the entire change history? The compression in git is amazing and if you push upstream frequently, your updates are small.
A single revision for a modest console game's source assets repository (which ends up on a DVD-sized disk after compilation) could be hundreds of gigabytes. The entire change history could be several dozen terabytes.
If you've got 50 people on the project, it's obviously cheaper to have a central storage server with several terabytes available, rather than to require every developer to have a several-terrabyte RAID setup themselves. Plus when a new developer joins the project, you don't want be doing a several terabyte download via [font=courier new,courier,monospace]git://[/font] etc.
At these large scales, DVCS for binary assets simply falls apart, so far. I really do hope this situation is rectified, because git is great for code or at smaller scales of binary data. Large projects would really need a hybrid git, where it's mostly DVCS, but certain directories could be centralised.
In what (real world) use case would one do such a thing?
On a project that requires video or image files? You always need to store the original (non lossy-compressed) files in your source repo.
Why [Perforce is usually preferred]? I haven't used Perforce, to be honest, but I would stay away from it unless absolutely necessary simply because of the licensing costs.
What would be the technical advantage of using it over, for example, git?
In terms of assets, it's superior over git because of the DCVS issues already mentioned. The real question then is, why is it superior over SVN (seeing as SVN is free)? For a small project with no money to throw around on spurious licensing, it probably doesn't matter. On the larger scale though, it's simply much more efficient than SVN (e.g. when managing, branching, downloading hundreds of gigs of binary).

[edit]Time travelling quote:
but I think you missed some specific points in OPs question. We're not talking about version control for development artifacts. We're talking about this:[/quote]I wasn't trying to respond to the OP sorry, and I'm pretty sure the questions of yours that I answered were off-topic, so I was just trying to quash this off-topic trolling between you and Antheus about how useful Git is for binary data.

[quote name='thok' timestamp='1329171797' post='4912757']And what's wrong with bringing in the entire change history? The compression in git is amazing and if you push upstream frequently, your updates are small.
A single revision for a modest console game's source assets repository (which ends up on a DVD-sized disk after compilation) could be hundreds of gigabytes. The entire change history could be several terabytes.
If you've got 50 people on the project, it's obviously cheaper to have a central storage server with several terabytes available, rather than to require every developer to have a several-terrabyte RAID setup themselves. Plus when a new developer joins the project, you don't want be doing a several terabyte download via [font=courier new,courier,monospace]git://[/font] etc.
In what (real world) use case would one do such a thing?
On a project that requires video or image files? You always need to store the original (non lossy-compressed) files in your source repo.
[/quote]

Point taken, but I think you missed some specific points in OPs question. We're not talking about version control for development artifacts. We're talking about this:

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

One of my main concerns is how data stored in the database should be used by the game runtime. I try to strive for 'realtime-editing' so what I am considering is a network interface that simply lets you do restful operations to query or modify the data and has notification api. This way any client could at any given time query for an up-to-date asset given that it is in the repository.

[/font]
[/quote]

As a side note, if _all_ of the dev artifacts (video, sound, code, etc.) are being stored in a single repo (without any apparent thought to compartmentalization), that's just insane. Whoever does this should be beaten. Yes, obviously it would be insane to check out _everything_ if you want to modify a single of code. That's what superprojects/submodules are for: http://en.wikibooks.org/wiki/Git/Submodules_and_Superprojects
It's effectively the same thing as a partial checkout.

Yes, obviously it would be insane to check out _everything_ if you want to modify a single of code.[/quote]

Which is nice in theory, but in practice you often need a whole thing. Even better, you want the whole thing.

Here's a scenario: An asset is crashing your runtime. Not knowing whether it's the code, the asset or the pipeline, you try a few things. Since problems are detected during QA, you need to replicate the failing version. Build server may help, but ultimately you need to test individual pieces to pinpoint it.

To go even faster, write a test, bisect to see when it started failing, then fix the change. At minimum, the build server needs to do full checkout and possibly deployment.

Single repository across all people/teams/company is nice, but doesn't always work.

Git, as nice as it is, is not the first, last and only word when it comes to VCS.

This topic is closed to new replies.

Advertisement