databases as asset-storage

Started by
16 comments, last by Antheus 12 years, 2 months ago
I am aware that in larger projects for the development cycle it might be favorable to store assets in centralized database. For the most part I understand the reasoning and motivations to do so, and concluded that the workflow it encourages I want to give it a shot. However not having experience on the subject I was wondering what a good setup would be.

One of my main concerns is how data stored in the database should be used by the game runtime. I try to strive for 'realtime-editing' so what I am considering is a network interface that simply lets you do restful operations to query or modify the data and has notification api. This way any client could at any given time query for an up-to-date asset given that it is in the repository.

Given that assets tend to contain large volumes of data issuing a operation to fetch the data for every single time the sandbox starts might put some serious strains on network-infrastructure so probably some architecture is in place to cache frequently used assets into replicated asset-bundles.

Overall how does this strategy sound? To my recollection there aren't any published articles or presentation about such subjects are there?
Advertisement
Hmmm...welll.. using a centralized repository with a real-time client to manipulate data is...bu...not funny at all . My experiences with this kind of tools are horrible (solo = no problem, 2 = few problems, >2 = trouble explosion!).

Just use some kind of version management with centralized repository like SVN where you are able to check out/in, lock and revert changes. It is simple like that, althought subversion(SVN) is not meant as asset managment tool, it is free and powerful, thought there are commercial tools for asset managment out there which could do the task better, when you want to spend some $$$.

Tools are important, but dont dig into tool creation when there're already tools available, create some games instead tongue.png
It's hard to say without knowing anything about your game. If you'd care to give us a summary of your architecture and design so far, we may be able to point you in the right direction.

With regards to data storage and caching, you could look into NoSQL datastores. Keep a relational database around for durable storage and utilize a NoSQL store (such as Redis or MonogoDB--there are many different stores; choose the one right your task) for fast access to your cache.

Hmmm...welll.. using a centralized repository with a real-time client to manipulate data is...bu...not funny at all . My experiences with this kind of tools are horrible (solo = no problem, 2 = few problems, >2 = trouble explosion!).

Just use some kind of version management with centralized repository like SVN where you are able to check out/in, lock and revert changes. It is simple like that, althought subversion(SVN) is not meant as asset managment tool, it is free and powerful, thought there are commercial tools for asset managment out there which could do the task better, when you want to spend some $$$.

Tools are important, but dont dig into tool creation when there're already tools available, create some games instead tongue.png


Did I read this right? Using SVN as a database? Dear God. How do you propose to format the data? More importantly, how do you query it?


One of my main concerns is how data stored in the database should be used by the game runtime. I try to strive for 'realtime-editing' so what I am considering is a network interface that simply lets you do restful operations to query or modify the data and has notification api. This way any client could at any given time query for an up-to-date asset given that it is in the repository.



This particular paragraph in your question gave me an idea. You may be able to use messaging technology (AMQP, for example) to send and receive notifications about data changes. If I understand your problem correctly, I can sketch an idea of how this would work.

User A opens record 7 for viewing
User B also opens record 7 for viewing
Both users 'subscribe' to updates about record 7
User A makes an edit -> notification is broadcast to all record 7 subscribers
All subscribers to record 7 updates can react appropriately to the update (query a datastore, display a message, whatever)

Have a look at RabbitMQ: http://www.rabbitmq.com/
So far it's technically not a game but more like an engine infrastructure to facillitate collabarative open world development. Given the wide variaty of assets, decals I see the potentional need to manage assets on a order of magnitude of a factor 4 or 5 (1000 / 10.000). Which is one of the reasons why I am leaning into centralized servers for management, replications, versioning. What I envision is in essense a web-interface that empowers anybody to submit assets, manages dependecies between the assets, prevent complicated machine setups.

To elaborate a bit on my architechture on your request: I have abstracted away filepaths and direct file-io. At this moment this is nothing more than replacing a 'asset uri' to a physical location on the harddrive. This effectivly simulates mounted folders.

Small example could be: {CB3158AC-D1C6-41B5-8006-221E326A7284} is mapped to Z:\Content Packages\content. And my own fopen, and FileStream classes take care of mapping to the appropiate file. Similair to a drive letter the guid identifies the repository, and meta-data in the folder structure would allow me to identify the file-system technology i.e. Native OS Filesystem, NoSQL, BerkleyDB. Goal of this abstraction was origionally to allow remounting of asset repositories so that you can easilly switch where you'ld want to store the assets, could be a ntfs share, another disk drive. A bonus is that these repositories could be shared between games that are development.


This particular paragraph in your question gave me an idea. You may be able to use messaging technology (AMQP, for example) to send and receive notifications about data changes. If I understand your problem correctly, I can sketch an idea of how this would work.


Your suggestion is much appreciated but my concern was not perse how to receive notifications but more on the lines of what the game runtime has to do with the data it receives. What caching policies to apply? How should it store it? Example could be to 'cache' assets in a client-side database, but what implications does it have for load/search times of assets? Would this overall be neglectable or would it bring the idea closer in the category of 'looks good paper, will never work in practice'?


Your suggestion is much appreciated but my concern was not perse how to receive notifications but more on the lines of what the game runtime has to do with the data it receives. What caching policies to apply? How should it store it? Example could be to 'cache' assets in a client-side database, but what implications does it have for load/search times of assets? Would this overall be neglectable or would it bring the idea closer in the category of 'looks good paper, will never work in practice'?


Ah, I misunderstood the core issue, then. My answer for now would have be more questions:
- What are some specific use cases/scenarios for this system?
- Can users lock an asset for editing, or can anyone potentially edit anything (creating also the potential for merge conflicts)?
- Is your data plaintext, binary, or both?
- In your initial post, you mentioned "[storing] assets in a centralized database". Were you referring literally to a (relational) database, or were you referring generically to some other type of data repository?

Perhaps I shouldn't have been so quick to shoot down Ashaman73's SVN idea, at least with respect to version control. It sounds like you're describing a situation where a distributed version control system __might__ apply. SVN is not distributed of course; git, however, comes to mind as a possible solution. Pretty much all of your concerns about caching, client-side databases, and even performance should be no problem for git. It's very fast. Assuming a DVCS actually applies to your problem, of course....
I've worked with asset databases before, and they come with their own set of problems.

Yes, there are some benefits. The real-time edits you mention is just one of many potential benefits.

But problems include:
* Inconsistent sets of assets due to live updates
* live-update of assets you don't want changed, or modifications by other people
* Difficulty in syncing to a previous version
* Extreme difficulty in reconstructing a specific build in the past

It is almost impossible to sync to a specific changelist when you must synchronize that with one or more databases at that same time; did the build server pull from the database right before, or right after? If the desired changelist was at 12:57:08 and a database change was also at 12:57:08, do you include the database change or not?

It is almost as easy to just use FindFirstChangeNotification() to monitor your directory tree. It can be a tiny bit more work to reload assets when they change, but most of the work would be very similar. Watching for changed files gives the benefits of live updates without most of the problems listed above. It can still have inconsistent changes potentially applied, but that is entirely within your control instead of outside it.
you could look into NoSQL datastores[/quote]

Of course, they are web scale....

What I envision is in essense a web-interface that empowers anybody to submit assets, manages dependecies between the assets, prevent complicated machine setups.[/quote]

Look how git works. Not git the application, git simply does not work for non-diffable/binary content, but the DVCS design and how change tracking log works.

Each change is a node in graph. When user decides to checkout a specific revision, they select the node, which contains backlinks to older content.

When updating, user walks back through these changes, building the new items incrementally through diffs. Caching here is implicit - any blob available to client is either fully known or can be built from change log.



For assets, git fails. Difficulty in change management of binary assets makes such operations unfeasible. So instead of using raw diffs, only adopt the change tracking mechanism, not the diffs.

When a client requests repository 0x41a4fcb, get that. Examine to see which resources it contains, follow backward via previous patch. And so on, until you have all the resources. Instead of rebuilding from diffs, request complete latest resources from repository.

For custom solution, the above can be done server-side to avoid latency of requests. Tell server: My repository is 0x4132, give me latest (or 0xfa81). Server walks the change graph, returns a single file containing list of all resources which need to be downloaded.


Since changes are incremental, the graph is immutable. If you have resource 0x4132, it will never be changed again - any user that makes any change will get a new ID. Such approach naturally maps to REST-ful design. This also solves the caching problems, since each resource is immutable, it can be deleted from cache at any time and retrieved later.


To avoid solving the wrong problem, look how to extract change information from existing repository. SVN is fine, probably better that DVCS for binary assets. Perforce is usually preferred. Then format those change logs so they can be retrieved via web service. Either way, you still need a reliable versioning system in the back - just keeping track of change graph is not enough - hard problems lie elsewhere, so don't try reinventing those.


Performance of such system is hard to guess. Git et al fail due to content parsing and ID generation, but other parts scale relatively well, most of the time. Above is merely an interface to existing service.

----

Monitoring for file system changes (as mentioned above) is a frequently used solution. Instead of inventing some complex mechanism, you just use whatever versioning system you have on top of file system.

Far from perfect or "pure", but works in practice.
Thanks for the much desired information. The change tracking system of git looks like a promising candidate indeed, giving both the abillity to pin a session to a specific version as well as well as live-updates. Technological complications seem to be one of the more pressing issues of centralized storage rather than performance metrics that which I feared. That gives me confidence the idea could very well be practical and reasons to invest in a implementation.

This topic is closed to new replies.

Advertisement