Sign in to follow this  
capn_midnight

Database change management

Recommended Posts

I'm curious about how other people use and live with databases, specifically the types of tools they use around their database to simplify work with it. For background info, what database(s) do you use, and what is the scope of your project? What kind of tools do you use to manage change with your database (implicit or explicit schema, server deployments, etc.)? What kind of "generator" tools are you using (thick- or thin-ORM, data access layers, whatever)? What pain points do you still experience? Thanks!

 
For myself, I have worked on a lot of custom enterprise resource planning systems over the years. Data is generally small, and highly relational. It's been mostly SQL Server, but there is some MySQL in there too. In other words, all pretty basic RDBMSs. I've played around with ORM systems like NHibernate and Entity Framework, found them quite useful for designing the initial database, but lacking something in the long run for any time changes were necessary after a nontrivial amount of data had been recorded. As a result, I've tended to try to leave the database changing in the database realm and use only a thin ORM (direct table mapping) with classes automatically generated from the schema. This seems to avoid lost data problems, but comes at a high development cost for constantly repeating yourself and not having change management ala source control in place.
 
Redgate has a tool called SQL Compare that I've used in the past that makes diffing and merging databases a snap, but it is very expensive. Without a SQL Compare-like tool, it's really easy to miss changes to tables and stored procedures without extremely comprehensive testing (and really, who does that?) when deploying from a development- to testing- or production-environment. And still, that is a uni-direction move. Unlike source control, you never get to see the past again (unless you're restoring from backup, and that is just not automatic enough or simple enough to be a workable solution).

Share this post


Link to post
Share on other sites

We've used SQL Server in the past for several of our online games, with merging and deployment between environments being handled by senior DBAs using SSMS and SQL Compare. A custom tool was built to handle data entry that used the ADO.NET framework for interacting with the database. For every environment (development, test, release, etc.), every user had their own private database, to which they could pull data from the master database, "check out" and "lock" data, edit it using the tool, and then check it back in (push it back to the master database). It was the same model Perforce used. The idea was that someone could edit data and muck up their own database without affecting anyone else, or sullying the master data, which as you mentioned was difficult to get back once it changed (we took backups of the master databases several times a day, but that's still a lot of work to lose if we ever had to restore).

 

We ran into a number of "pain points", such as how to handle users being out-of-date with new schema, making sure they were always editing current data, optimizing the push/pull between a user DB and the master DB so that it was as fast and efficient as possible, preventing user data from being destroyed at all costs during pushes, pulls, and schema updates, merging data between two users when their edits caused the same table to be modified, etc. Basically, the more the master DB and any given user DB "drifted" apart, both in terms of schema and data, the more problems we would run into. And this would happen quite often, depending on how long a particular user was working on a piece of data and not syncing from the master DB.

 

On top of that, the data entry tool itself had its fair shared of issues and bugs. The fact that a lot of our game data was architected in such a way that did not translate very cleanly to a relational layout, made its job a lot harder as well. Data was often presented to the user in such a way that made translation to and from the low-level DB objects (DataSets and DataTables mostly) quite error-prone, because it wasn't being presented relationally, but rather in the most intuitive fashion to the user.

 

In the end, we just keep slogging through and fixing bugs until everything was in a tenuous-but-mostly-working state, most of the time smile.png  I'm not really sure what else to add unless you have a particular question about anything.

Share this post


Link to post
Share on other sites

Very interesting, thank you.

 

I had seriously been considering doing something similar to that. My current client's project has a fair bit of test data and the other developers and I are frequently stepping on each other's toes. 

 

Liquibase came up on the Hacker News post I made on this same subject: http://www.liquibase.org/

 

It looks like it manages a lot of what I'm finding issue with, but has taken the unfortunate tactic of expressing it in XML. Perhaps it wouldn't be too hard to develop a UI for that XML, though.

Share this post


Link to post
Share on other sites

In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

Share this post


Link to post
Share on other sites

In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

 

A quick example? I'm not familiar with all the terminology and such.

Share this post


Link to post
Share on other sites

Alpha_ProgDes: he's talking about a table in the DB that has a record of every script you've ever ran against the DB as a whole, to create/alter tables/views/procedures, etc. DDL stands for Data Definition Language, notionally being separate from the Structured Query Language (though often conflated).

 

so say you have a database that is storing User accounts. You could start off with two tables:

User(id, name, email)

RevisionHistory(date, script)

 

and revision history would already have a row:

("2013-22-03 09:24:35.45", "CREATE TABLE User(ID INT IDENTITY(1, 1) PRIMARY KEY, Name NVARCHAR(10), Email NVARCHAR(256);")

 

Now, that's not a very useful Name column, it's pretty short, so you then need to create alter it, thus necessitating a new RevisionHistory record:

("2013-22-03 09:26:48.01", "ALTER TABLE User ALTER COLUMN Name NVARCHAR(256);")

 

This is necessary but not sufficient for making a good system. It would be equivalent to you having to manually write out text files with Git diffs and changelogs. 

 

I see the problem as being one of lack of development in the realm of databases. Essentially SQL is a dynamic, interpreted programming language with strict typing. In static application languages, like C#, the compiler does a lot of checking to make sure changes to code don't break other code (and certain coding styles can leverage more of the compiler for static checking than others). In dynamic application programming languages, like Python, we tend to use Unit Tests to verify those changes were safe. But Unit Tests don't work in very stateful environments, and databases are extremely stateful (that being their purpose).

 

So that's why I'm so hell bent on trying to figure out better tools for working with databases. It should be possible to add abstraction layers that essentially do static checking for us. And if you can do static checking and track changes, then you can be much more cavalier about what changes you try to make, because the system finds the breakages for you and provides you an easy path out of the weeds.

 

Liquibase adds a lot of the change tracking I'm looking for, but not of the static checking. It also doesn't appear to automate the process, your front-end to the system is an XML document, which is just about the worst interface a person could possibly make.

 

But that doesn't mean a new front end couldn't be made. I think it's a good start, it just needs a lot more work.

Share this post


Link to post
Share on other sites
For me, the eventual answer was to move to a schema free NoSQL approach. None of the tools I've seen people use either in games or in business make DB changes easy or robust so I'm moving that logic I to the app in future.

Share this post


Link to post
Share on other sites

Going non-relational, non-ACID is probably not going to be an option for me. Most of the stuff I do *does* fit well into the relational model, and ACID is a crucial feature. Auditors don't like "well, for the sake of speed, we dropped a couple of records on the floor. We made it up in the next cycle" as an excuse for why you don't have complete records. it's just that change management is complete garbage.

 

Some other things I've found in my research:

Flyway (another Java system, doesn't use XML under it like Liquibase)

Migrator.NET

Subsonic (looks like it might do more than just migration)

Roundhouse

octalforty-Wizardby (claims to be very simple)

MigSharp

 

I'm reminded that I worked at a place years ago that had their own home-rolled system for doing migrations. Again, it was XML based, and only ran against SQL server. Migration is great, but it's missing (or is maybe orthogonal to) a major feature I want: static analysis. I want to be able to change the schema and immediately have all of the impacted queries pointed out to me. I like breaking code, but it has to break hard to make it work.

Edited by capn_midnight

Share this post


Link to post
Share on other sites

In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

We store DDL+SQL statements in text files that are under change control. There is no shared database, everybody has a local instance which is built and populated for them by a script that simply executes those statements. Changes are then made locally to that database, most of the time the changes are temporary and get discarded but when someone does want to make a change permanent (e.g. a change to the schema) then they apply it to the script text files and check them in. If your DBMS supports plain-text SQL dumps then you could leverage that.

The VCS obviously handles the revision history but it of course won't maintain the structural integrity of the database when merging schema scripts together - we work around that by virtue of being a small operation; so only one person is scheduled to be making schema changes at any one time.

Share this post


Link to post
Share on other sites

NoSQL doesn't necessarily mean it's not ACID, nor does it necessarily mean easy transition between "schema". Regardless, sounds like you have a need to stick with the RDBMS world.

 

Unfortunately the fastest way to get what you're looking for is through "expensive" tools. Another RedGate tool is SQL Prompt which has the ability to find invalid database objects which gets you part of the way there. If any of your stored procedures rely on a field that changed, it will let you know.

 

http://www.red-gate.com/products/sql-development/sql-prompt/features#explore

Share this post


Link to post
Share on other sites


In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

We store DDL+SQL statements in text files that are under change control. There is no shared database, everybody has a local instance which is built and populated for them by a script that simply executes those statements. Changes are then made locally to that database, most of the time the changes are temporary and get discarded but when someone does want to make a change permanent (e.g. a change to the schema) then they apply it to the script text files and check them in. If your DBMS supports plain-text SQL dumps then you could leverage that.

The VCS obviously handles the revision history but it of course won't maintain the structural integrity of the database when merging schema scripts together - we work around that by virtue of being a small operation; so only one person is scheduled to be making schema changes at any one time.


That's how we stored our schema and store procedures, in several scripts that could run and build a user DB up from scratch. But we didn't store our data that way. The master DB always had a copy of the most recent "authoritative" data and once your database was rebuilt the data could be pushed back down to you. The master schema and its stored procedures were also managed manually by the lead DBA, as were all the script files.

Share this post


Link to post
Share on other sites

A bit of a more comprehensive post from me now I'm not posting from my phone.

 

Going non-relational, non-ACID is probably not going to be an option for me. Most of the stuff I do *does* fit well into the relational model, and ACID is a crucial feature. Auditors don't like "well, for the sake of speed, we dropped a couple of records on the floor. We made it up in the next cycle" as an excuse for why you don't have complete records. it's just that change management is complete garbage.

 

Sorry, I thought you were in a normal game environment rather than a Proper one with auditors and the like. I love the relational model in principle; the querying power there is so much more adept than you get in a key/value or document store. I've just always found that they're designed for a world that doesn't change very much, which sadly is never the world I've lived in!

 

To be fair, the ACID issue is sometimes overblown. Most NoSQL databases provide ACID for some given definition of a transaction, just that you can't arbitrarily bunch up statements and bracket them with BEGIN/END TRANSACTION. eg. An edit to a single document may be guaranteed to be atomic. You just have to decide whether your data fits that pattern or not. And lots of RDBMSs are a bit sneaky about exactly how they implement the Isolated part of ACID anyway - with NoSQL you usually know exactly where you stand because the locking mechanism is unambiguous, rather than varying based on the current database or table settings and being defined by standards that are potentially ambiguous.

 

Anyway, in the past I've worked with MySQL, PostgreSQL, MongoDB, SQLite, and SQL Server. Each instance (mis)handled changes differently!

 

On an MMO project, we used PostgreSQL and change management was handled in a similar way to what has been mentioned above: scripts checked into version control alongside the code that requires it, packed full of ALTER TABLE statements. If I were forced to continue to use that system I would have also added an explicit Version table so that code can exit early if it detects the DB is in an invalid state. I might also add a commit hook to check that nobody tries to check in a change script without the relevant update to the Version table, also.

 

That project had some issues where important game data was stored in the DB, and this had to be merged with the live game data. I was responsible for maintaining that tool for live patching use. My advice based on that experience is:

  • Don't do it. Don't have 2 separate databases that need to merge into one. Different data sources should be in different databases, and it's unlikely a relational DB is the best source for designed game data.
  • Don't do it. If you do require both to be in an RDBMS format, have them in separate schemas/databases on the server, or with differently prefixed table names.
  • Don't do it. Export the data from the designer's database and rebuild it in the target database if you really need to have them all in the same schema.
  • Ok, ok. Abandon surrogate keys (eg. autoincrements) and use machine-specific IDs. Most DBs will offer a machine specific UUID type, or have your app create them if not.
  • You really want to merge data, you want it in the same tables, but don't want to give up simple autoincrement/sequence keys? Ok. *sigh* Partition the numeric range ahead of time so each domain gets to use a subset of the keys. Keep an eye on whether you're approaching the limit, and panic when you get past the half-way mark. Assign someone to double-checking your change scripts to ensure nobody can trash live data by overwriting it, or deleting previously-permanent values that other tables may reference. Make sure you're running it all on a mirrored copy of the main live server to catch the inevitable referential integrity errors you'll get before hitting your main customers with it.

 

Ok, onto other topics now. ORMs... I think they're great, as long as your language supports them adequately. SQLAlchemy in Python is pretty good - your objects are basically SQL tables, your assignments to them work pretty much as expected, and there's little chance for your tables and objects to get out of sync. Plus you can generate arbitrary SQL if you need to. But I wouldn't like to use something similar in a language that doesn't have good reflection capabilities. I've used systems that had code generation for such things (eg. take a data definition and generate the DDL and the accompanying C++ for it) and that was more hassle than it was worth, largely because of the change management issues. Changing the DB will already require you to add a DB change script, and maybe alter the main app logic as well - you don't want people to have to remember to check in a load of regenerated ORM files as well. If you create too much friction here people will try and work around it by reusing fields and adding variant fields and the like, and you don't want that.

 

Some ORMs have relatively decent change management systems. eg. South for Django. Often it depends on having a good data representation, and any given ORM makes that easier by standardising it across all DB types (to some degree).

 

As an example to those who aren't locked in to RDBMSs, NoSQL approaches can work really well, but again it depends on the language to some degree. I have a Python project that uses MongoDB, which is my current DB preference for game data. One example of change management there is that I had a document that contained a list of strings for 'tags' (like the tags this forum has). Later, I needed the tags to have their own metadata, and conversion was basically a one-liner in the app which reads the list, sees if it contains strings, and if so, replace each string with the string plus the relevant metadata. And once you have this logic in place, you can either upgrade as you go along, migrating each record if and when it's accessed, or perform a batch migration by running a load and save on each document, or you can perform a mixture - eg. the main app migrates records when it needs to, and a background process slowly iterates through the same table doing the same job. Once that's complete, you can decide whether to leave the migration code in place or remove it later.

Share this post


Link to post
Share on other sites

That project had some issues where important game data was stored in the DB, and this had to be merged with the live game data. I was responsible for maintaining that tool for live patching use. My advice based on that experience is:

  • Don't do it. Don't have 2 separate databases that need to merge into one. Different data sources should be in different databases, and it's unlikely a relational DB is the best source for designed game data.
  • Don't do it. If you do require both to be in an RDBMS format, have them in separate schemas/databases on the server, or with differently prefixed table names.
  • Don't do it. Export the data from the designer's database and rebuild it in the target database if you really need to have them all in the same schema.
  • Ok, ok. Abandon surrogate keys (eg. autoincrements) and use machine-specific IDs. Most DBs will offer a machine specific UUID type, or have your app create them if not.
  • You really want to merge data, you want it in the same tables, but don't want to give up simple autoincrement/sequence keys? Ok. *sigh* Partition the numeric range ahead of time so each domain gets to use a subset of the keys. Keep an eye on whether you're approaching the limit, and panic when you get past the half-way mark. Assign someone to double-checking your change scripts to ensure nobody can trash live data by overwriting it, or deleting previously-permanent values that other tables may reference. Make sure you're running it all on a mirrored copy of the main live server to catch the inevitable referential integrity errors you'll get before hitting your main customers with it.

I was also responsible for a similar tool on our project that had to merge data between databases, and it turned out to be quite the task. However I'm torn as to whether or not in the future I'd go with the one-database-per-user-plus-master approach again and merge data between them, or stick with a single database and avoid the merge issue altogether. I'm not entirely convinced that merging is inherently wrought with problems and pitfalls, as long as you identify up-front precisely the functionality you'll need and all the issues that need to be solved to meet those needs. For instance, we solved the key problem by having the master database be the authoritative key source, so whenever you wanted to add a record to a table you asked the master for a key, and it was guaranteed to be unique across every environment. The downside of course is that you had a lot higher contention on this database as more and more users asked for keys. Most of the other issues we encountered were snags with merging data that we hadn't anticipated, usually some special case where data was dependent on other data in some unexpected way that necessitated a hand-merge or modification to the merge process. Ultimately it seemed like more careful planning and foresight would have allowed us to avoid a lot of these problems.

Share this post


Link to post
Share on other sites

If you just need keys that are unique across an arbitrary number of machines, use a UUID.

 

http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid

http://www.postgresql.org/docs/8.3/static/datatype-uuid.html

http://msdn.microsoft.com/en-us/library/aa276822%28v=SQL.80%29.aspx

 

etc.

 

That solves the problem of inserting data from multiple homogeneous tables on arbitrary machines. But inserts are often the most trivial operation to perform anyway. Often people want to delete things, first locally, then remotely (eg. a test item prototype is being phased out), but you can't tell if this is going to violate referential integrity until you run the script on the exact database you intend to operate on and catch the errors (eg. someone made an NPC on the live server that actually holds one of those items).

 

Planning ahead can certainly mitigate a lot of these issues but my opinion is that the very notion of change management is an admission that not everything can be planned for. My preference in future will be to work with systems that make change easier, even at the expense of losing some of the RDBMS benefits.

Edited by Kylotan

Share this post


Link to post
Share on other sites

We used BIGINTs for keys, but I wasn't the DBA or technical director on the project so I don't know for sure why we didn't use UUIDs instead. However if I had to speculate, I would say it was because  they would have blown our memory/processing requirements due to being twice as large and taking more time to manipulate CPU-wise on the servers/clients, for every record in the database. But again, IANADBA smile.png

 

As for deletions, we didn't allow users to delete data per se, but they could mark data 'deprecated' and the game would simply not load it, and we could write our own scripts to check for dependencies on deprecated data. As you said, trying to manage that across databases would have been painful, and the scripts weren't too difficult to write. Plus, it was easy to turn data "on" and "off" via a flag.

 

In the future I would love working with systems that made change management easier; it really just depends on what RDBMS features I'd have to sacrifice.

Share this post


Link to post
Share on other sites

Hi,

 

I've worked in RDMBS systems for more years than I'd like to remember. The only strategy that worked for me (20k+ deployments) was to start with a blank database schema, and every addition to it was deemed a change, with a version number "from", "to"; and a tool which executed those scripts based on the order of the "from" and "to". Database schema updates across a whole country really need managing and need to be bullet proof - its the one thing you cannot get wrong. Unlike code updates, where you can just copy over any version to any other version, database schema changes are always a delta from one known version to another known version, both in terms of the DDL and the DML operations you need to carry out to transit from one version to another.

 

Another thing I learned was to channel all database schema updates through a central controlling agent; in my case a single person; responsible for issuing back to my development teams a daily database update script set. These would be batched by day - so "script 1.000 to 1.0001.sql" would be all DDL/DML to get from 1 to 1.0001 versions. To apply a batch of changes, just iteritively execute the SQL or concat the files.

 

Many in my org worried that the continuous aggregation of changes would be a problem, especially on new installs, so we compromised by having a daily "greenfield" DDL script as well "0.0000 to 1.00001.sql" which contained only DDL to get from "nothing" state to a last-known-good state.

 

Worked for me. Hope that also works for you.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this