Jump to content

  • Log In with Google      Sign In   
  • Create Account


databases as asset-storage


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 phr34k9   Members   -  Reputation: 152

Like
0Likes
Like

Posted 10 February 2012 - 03:53 AM

I am aware that in larger projects for the development cycle it might be favorable to store assets in centralized database. For the most part I understand the reasoning and motivations to do so, and concluded that the workflow it encourages I want to give it a shot. However not having experience on the subject I was wondering what a good setup would be.

One of my main concerns is how data stored in the database should be used by the game runtime. I try to strive for 'realtime-editing' so what I am considering is a network interface that simply lets you do restful operations to query or modify the data and has notification api. This way any client could at any given time query for an up-to-date asset given that it is in the repository.

Given that assets tend to contain large volumes of data issuing a operation to fetch the data for every single time the sandbox starts might put some serious strains on network-infrastructure so probably some architecture is in place to cache frequently used assets into replicated asset-bundles.

Overall how does this strategy sound? To my recollection there aren't any published articles or presentation about such subjects are there?

Sponsor:

#2 Ashaman73   Crossbones+   -  Reputation: 7134

Like
0Likes
Like

Posted 10 February 2012 - 04:06 AM

Hmmm...welll.. using a centralized repository with a real-time client to manipulate data is...bu...not funny at all . My experiences with this kind of tools are horrible (solo = no problem, 2 = few problems, >2 = trouble explosion!).

Just use some kind of version management with centralized repository like SVN where you are able to check out/in, lock and revert changes. It is simple like that, althought subversion(SVN) is not meant as asset managment tool, it is free and powerful, thought there are commercial tools for asset managment out there which could do the task better, when you want to spend some $$$.

Tools are important, but dont dig into tool creation when there're already tools available, create some games instead Posted Image

#3 thok   Members   -  Reputation: 684

Like
0Likes
Like

Posted 11 February 2012 - 05:40 PM

It's hard to say without knowing anything about your game. If you'd care to give us a summary of your architecture and design so far, we may be able to point you in the right direction.

With regards to data storage and caching, you could look into NoSQL datastores. Keep a relational database around for durable storage and utilize a NoSQL store (such as Redis or MonogoDB--there are many different stores; choose the one right your task) for fast access to your cache.

#4 thok   Members   -  Reputation: 684

Like
-3Likes
Like

Posted 11 February 2012 - 05:49 PM

Hmmm...welll.. using a centralized repository with a real-time client to manipulate data is...bu...not funny at all . My experiences with this kind of tools are horrible (solo = no problem, 2 = few problems, >2 = trouble explosion!).

Just use some kind of version management with centralized repository like SVN where you are able to check out/in, lock and revert changes. It is simple like that, althought subversion(SVN) is not meant as asset managment tool, it is free and powerful, thought there are commercial tools for asset managment out there which could do the task better, when you want to spend some $$$.

Tools are important, but dont dig into tool creation when there're already tools available, create some games instead Posted Image


Did I read this right? Using SVN as a database? Dear God. How do you propose to format the data? More importantly, how do you query it?

#5 thok   Members   -  Reputation: 684

Like
0Likes
Like

Posted 11 February 2012 - 05:59 PM

One of my main concerns is how data stored in the database should be used by the game runtime. I try to strive for 'realtime-editing' so what I am considering is a network interface that simply lets you do restful operations to query or modify the data and has notification api. This way any client could at any given time query for an up-to-date asset given that it is in the repository.


This particular paragraph in your question gave me an idea. You may be able to use messaging technology (AMQP, for example) to send and receive notifications about data changes. If I understand your problem correctly, I can sketch an idea of how this would work.

User A opens record 7 for viewing
User B also opens record 7 for viewing
Both users 'subscribe' to updates about record 7
User A makes an edit -> notification is broadcast to all record 7 subscribers
All subscribers to record 7 updates can react appropriately to the update (query a datastore, display a message, whatever)

Have a look at RabbitMQ: http://www.rabbitmq.com/

#6 phr34k9   Members   -  Reputation: 152

Like
0Likes
Like

Posted 12 February 2012 - 03:52 AM

So far it's technically not a game but more like an engine infrastructure to facillitate collabarative open world development. Given the wide variaty of assets, decals I see the potentional need to manage assets on a order of magnitude of a factor 4 or 5 (1000 / 10.000). Which is one of the reasons why I am leaning into centralized servers for management, replications, versioning. What I envision is in essense a web-interface that empowers anybody to submit assets, manages dependecies between the assets, prevent complicated machine setups.

To elaborate a bit on my architechture on your request: I have abstracted away filepaths and direct file-io. At this moment this is nothing more than replacing a 'asset uri' to a physical location on the harddrive. This effectivly simulates mounted folders.

Small example could be: {CB3158AC-D1C6-41B5-8006-221E326A7284} is mapped to Z:\Content Packages\content. And my own fopen, and FileStream classes take care of mapping to the appropiate file. Similair to a drive letter the guid identifies the repository, and meta-data in the folder structure would allow me to identify the file-system technology i.e. Native OS Filesystem, NoSQL, BerkleyDB. Goal of this abstraction was origionally to allow remounting of asset repositories so that you can easilly switch where you'ld want to store the assets, could be a ntfs share, another disk drive. A bonus is that these repositories could be shared between games that are development.

This particular paragraph in your question gave me an idea. You may be able to use messaging technology (AMQP, for example) to send and receive notifications about data changes. If I understand your problem correctly, I can sketch an idea of how this would work.


Your suggestion is much appreciated but my concern was not perse how to receive notifications but more on the lines of what the game runtime has to do with the data it receives. What caching policies to apply? How should it store it? Example could be to 'cache' assets in a client-side database, but what implications does it have for load/search times of assets? Would this overall be neglectable or would it bring the idea closer in the category of 'looks good paper, will never work in practice'?

#7 thok   Members   -  Reputation: 684

Like
0Likes
Like

Posted 13 February 2012 - 06:54 AM

Your suggestion is much appreciated but my concern was not perse how to receive notifications but more on the lines of what the game runtime has to do with the data it receives. What caching policies to apply? How should it store it? Example could be to 'cache' assets in a client-side database, but what implications does it have for load/search times of assets? Would this overall be neglectable or would it bring the idea closer in the category of 'looks good paper, will never work in practice'?


Ah, I misunderstood the core issue, then. My answer for now would have be more questions:
- What are some specific use cases/scenarios for this system?
- Can users lock an asset for editing, or can anyone potentially edit anything (creating also the potential for merge conflicts)?
- Is your data plaintext, binary, or both?
- In your initial post, you mentioned "[storing] assets in a centralized database". Were you referring literally to a (relational) database, or were you referring generically to some other type of data repository?

Perhaps I shouldn't have been so quick to shoot down Ashaman73's SVN idea, at least with respect to version control. It sounds like you're describing a situation where a distributed version control system __might__ apply. SVN is not distributed of course; git, however, comes to mind as a possible solution. Pretty much all of your concerns about caching, client-side databases, and even performance should be no problem for git. It's very fast. Assuming a DVCS actually applies to your problem, of course....

#8 frob   Moderators   -  Reputation: 20349

Like
1Likes
Like

Posted 13 February 2012 - 09:14 AM

I've worked with asset databases before, and they come with their own set of problems.

Yes, there are some benefits. The real-time edits you mention is just one of many potential benefits.

But problems include:
* Inconsistent sets of assets due to live updates
* live-update of assets you don't want changed, or modifications by other people
* Difficulty in syncing to a previous version
* Extreme difficulty in reconstructing a specific build in the past

It is almost impossible to sync to a specific changelist when you must synchronize that with one or more databases at that same time; did the build server pull from the database right before, or right after? If the desired changelist was at 12:57:08 and a database change was also at 12:57:08, do you include the database change or not?

It is almost as easy to just use FindFirstChangeNotification() to monitor your directory tree. It can be a tiny bit more work to reload assets when they change, but most of the work would be very similar. Watching for changed files gives the benefits of live updates without most of the problems listed above. It can still have inconsistent changes potentially applied, but that is entirely within your control instead of outside it.
Check out my personal indie blog at bryanwagstaff.com.

#9 Antheus   Members   -  Reputation: 2397

Like
-2Likes
Like

Posted 13 February 2012 - 09:27 AM

you could look into NoSQL datastores


Of course, they are web scale....

What I envision is in essense a web-interface that empowers anybody to submit assets, manages dependecies between the assets, prevent complicated machine setups.


Look how git works. Not git the application, git simply does not work for non-diffable/binary content, but the DVCS design and how change tracking log works.

Each change is a node in graph. When user decides to checkout a specific revision, they select the node, which contains backlinks to older content.

When updating, user walks back through these changes, building the new items incrementally through diffs. Caching here is implicit - any blob available to client is either fully known or can be built from change log.



For assets, git fails. Difficulty in change management of binary assets makes such operations unfeasible. So instead of using raw diffs, only adopt the change tracking mechanism, not the diffs.

When a client requests repository 0x41a4fcb, get that. Examine to see which resources it contains, follow backward via previous patch. And so on, until you have all the resources. Instead of rebuilding from diffs, request complete latest resources from repository.

For custom solution, the above can be done server-side to avoid latency of requests. Tell server: My repository is 0x4132, give me latest (or 0xfa81). Server walks the change graph, returns a single file containing list of all resources which need to be downloaded.


Since changes are incremental, the graph is immutable. If you have resource 0x4132, it will never be changed again - any user that makes any change will get a new ID. Such approach naturally maps to REST-ful design. This also solves the caching problems, since each resource is immutable, it can be deleted from cache at any time and retrieved later.


To avoid solving the wrong problem, look how to extract change information from existing repository. SVN is fine, probably better that DVCS for binary assets. Perforce is usually preferred. Then format those change logs so they can be retrieved via web service. Either way, you still need a reliable versioning system in the back - just keeping track of change graph is not enough - hard problems lie elsewhere, so don't try reinventing those.


Performance of such system is hard to guess. Git et al fail due to content parsing and ID generation, but other parts scale relatively well, most of the time. Above is merely an interface to existing service.

----

Monitoring for file system changes (as mentioned above) is a frequently used solution. Instead of inventing some complex mechanism, you just use whatever versioning system you have on top of file system.

Far from perfect or "pure", but works in practice.

#10 phr34k9   Members   -  Reputation: 152

Like
0Likes
Like

Posted 13 February 2012 - 11:46 AM

Thanks for the much desired information. The change tracking system of git looks like a promising candidate indeed, giving both the abillity to pin a session to a specific version as well as well as live-updates. Technological complications seem to be one of the more pressing issues of centralized storage rather than performance metrics that which I feared. That gives me confidence the idea could very well be practical and reasons to invest in a implementation.

#11 thok   Members   -  Reputation: 684

Like
0Likes
Like

Posted 13 February 2012 - 12:08 PM

Of course, they are web scale....


Are you mocking me, sir?

Look how git works. Not git the application, git simply does not work for non-diffable/binary content, but the DVCS design and how change tracking log works.


Yes, the history and change tracking can be used. Any other VCS is going to have the same problem with trying to diff arbitrary binary formats, no?

For assets, git fails. Difficulty in change management of binary assets makes such operations unfeasible. So instead of using raw diffs, only adopt the change tracking mechanism, not the diffs.


See above.

SVN is fine, probably better that DVCS for binary assets.


Why? Please explain.

Perforce is usually preferred.


Why? I haven't used Perforce, to be honest, but I would stay away from it unless absolutely necessary simply because of the licensing costs.
What would be the technical advantage of using it over, for example, git?

Instead of inventing some complex mechanism, you just use whatever versioning system you have on top of file system.


And which file system would that be?

#12 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 13 February 2012 - 03:15 PM

Why? Please explain.


DVCS relies on diffs for efficiency, source code specifically is a natural fit. Binary assets don't diff naturally, resulting in considerable overhead, both for resources as well as management. As repository grows, SVN scales better from user perspective.

There is a special git version/patch for storing blobs, but if using a workaround, might as well go all the way, since using multiple repositories loses the uniform change management advantage.

For source, operations like bisect and three-way merge make sense. For binary assets they don't. If two people edit same PSD, merging doesn't make conceptual sense in a way one can merge edits to textual content.

Yes, the history and change tracking can be used. Any other VCS is going to have the same problem with trying to diff arbitrary binary formats, no?


Synchronizing centralized VCS doesn't require bringing in entire change history, one just gets the snapshot, partial checkouts are also used.

Why? I haven't used Perforce, to be honest, but I would stay away from it unless absolutely necessary simply because of the licensing costs.
What would be the technical advantage of using it over, for example, git?


Proven track record.

SVN was mentioned as free alternative.

And which file system would that be?


Whatever works or comes with OS. NTFS, ext, ...

#13 thok   Members   -  Reputation: 684

Like
-1Likes
Like

Posted 13 February 2012 - 04:23 PM

DVCS relies on diffs for efficiency, source code specifically is a natural fit. Binary assets don't diff naturally, resulting in considerable overhead, both for resources as well as management. As repository grows, SVN scales better from user perspective.


It scales better in what way? 'User perspective' is vague.

There is a special git version/patch for storing blobs, but if using a workaround, might as well go all the way, since using multiple repositories loses the uniform change management advantage.


What 'special git version/patch'? Git can store binary blobs out of the box; there's no patch required.
And what's this about multiple repositories? That's the nature of DVCS. There's no central repo. You can, however, designate a 'primary' and treat secondaries as 'forks' (think 'GitHub').

For source, operations like bisect and three-way merge make sense. For binary assets they don't.


You're absolutely right. SVN has the same problem. So what's your point?

Synchronizing centralized VCS doesn't require bringing in entire change history, one just gets the snapshot, partial checkouts are also used.


And what's wrong with bringing in the entire change history? The compression in git is amazing and if you push upstream frequently, your updates are small.

Proven track record.


Proven track record of what? Again, vague.

Whatever works or comes with OS. NTFS, ext, ...


Oh, I thought you were talking about a versioning file system: http://en.wikipedia.org/wiki/Versioning_file_system. You were just talking about plain old file systems. So tell me, what does that file system have to with the OP's problem?

#14 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 13 February 2012 - 04:44 PM

Proven track record of what? Again, vague.


Of use in and out of industry.

The compression in git is amazing and if you push upstream frequently, your updates are small.


How well is it working for raw video assets, where each file is 500MB+? And raw images.

So tell me, what does that file system have to with the OP's problem?


In your running app, you hook to directory change notification. Whenever any of the files changes, reload them.

To update a running app, use regular VCS checkout, working on top of plain old file system. Presto - versioned on-the-fly updates, exactly what OP asked, without reinventing entire VCS and everything else.

#15 thok   Members   -  Reputation: 684

Like
0Likes
Like

Posted 13 February 2012 - 06:00 PM

Of use in and out of industry.

That's not a "technical advantage"; you have said nothing to back up your statement.

How well is it working for raw video assets, where each file is 500MB+? And raw images.

In what (real world) use case would one do such a thing?

#16 Hodgman   Moderators   -  Reputation: 29552

Like
1Likes
Like

Posted 13 February 2012 - 06:05 PM

And what's wrong with bringing in the entire change history? The compression in git is amazing and if you push upstream frequently, your updates are small.

A single revision for a modest console game's source assets repository (which ends up on a DVD-sized disk after compilation) could be hundreds of gigabytes. The entire change history could be several dozen terabytes.
If you've got 50 people on the project, it's obviously cheaper to have a central storage server with several terabytes available, rather than to require every developer to have a several-terrabyte RAID setup themselves. Plus when a new developer joins the project, you don't want be doing a several terabyte download via git:// etc.
At these large scales, DVCS for binary assets simply falls apart, so far. I really do hope this situation is rectified, because git is great for code or at smaller scales of binary data. Large projects would really need a hybrid git, where it's mostly DVCS, but certain directories could be centralised.

In what (real world) use case would one do such a thing?

On a project that requires video or image files? You always need to store the original (non lossy-compressed) files in your source repo.

Why [Perforce is usually preferred]? I haven't used Perforce, to be honest, but I would stay away from it unless absolutely necessary simply because of the licensing costs.
What would be the technical advantage of using it over, for example, git?

In terms of assets, it's superior over git because of the DCVS issues already mentioned. The real question then is, why is it superior over SVN (seeing as SVN is free)? For a small project with no money to throw around on spurious licensing, it probably doesn't matter. On the larger scale though, it's simply much more efficient than SVN (e.g. when managing, branching, downloading hundreds of gigs of binary).

[edit]Time travelling quote:

but I think you missed some specific points in OPs question. We're not talking about version control for development artifacts. We're talking about this:

I wasn't trying to respond to the OP sorry, and I'm pretty sure the questions of yours that I answered were off-topic, so I was just trying to quash this off-topic trolling between you and Antheus about how useful Git is for binary data.

#17 thok   Members   -  Reputation: 684

Like
0Likes
Like

Posted 13 February 2012 - 06:21 PM

And what's wrong with bringing in the entire change history? The compression in git is amazing and if you push upstream frequently, your updates are small.

A single revision for a modest console game's source assets repository (which ends up on a DVD-sized disk after compilation) could be hundreds of gigabytes. The entire change history could be several terabytes.
If you've got 50 people on the project, it's obviously cheaper to have a central storage server with several terabytes available, rather than to require every developer to have a several-terrabyte RAID setup themselves. Plus when a new developer joins the project, you don't want be doing a several terabyte download via git:// etc.

In what (real world) use case would one do such a thing?

On a project that requires video or image files? You always need to store the original (non lossy-compressed) files in your source repo.


Point taken, but I think you missed some specific points in OPs question. We're not talking about version control for development artifacts. We're talking about this:

One of my main concerns is how data stored in the database should be used by the game runtime. I try to strive for 'realtime-editing' so what I am considering is a network interface that simply lets you do restful operations to query or modify the data and has notification api. This way any client could at any given time query for an up-to-date asset given that it is in the repository.



As a side note, if _all_ of the dev artifacts (video, sound, code, etc.) are being stored in a single repo (without any apparent thought to compartmentalization), that's just insane. Whoever does this should be beaten. Yes, obviously it would be insane to check out _everything_ if you want to modify a single of code. That's what superprojects/submodules are for: http://en.wikibooks.org/wiki/Git/Submodules_and_Superprojects
It's effectively the same thing as a partial checkout.

#18 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 13 February 2012 - 06:40 PM

Yes, obviously it would be insane to check out _everything_ if you want to modify a single of code.


Which is nice in theory, but in practice you often need a whole thing. Even better, you want the whole thing.

Here's a scenario: An asset is crashing your runtime. Not knowing whether it's the code, the asset or the pipeline, you try a few things. Since problems are detected during QA, you need to replicate the failing version. Build server may help, but ultimately you need to test individual pieces to pinpoint it.

To go even faster, write a test, bisect to see when it started failing, then fix the change. At minimum, the build server needs to do full checkout and possibly deployment.

Single repository across all people/teams/company is nice, but doesn't always work.

Git, as nice as it is, is not the first, last and only word when it comes to VCS.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS