Database change management

Started by
15 comments, last by PhillipHamlyn 11 years ago

I'm curious about how other people use and live with databases, specifically the types of tools they use around their database to simplify work with it. For background info, what database(s) do you use, and what is the scope of your project? What kind of tools do you use to manage change with your database (implicit or explicit schema, server deployments, etc.)? What kind of "generator" tools are you using (thick- or thin-ORM, data access layers, whatever)? What pain points do you still experience? Thanks!

For myself, I have worked on a lot of custom enterprise resource planning systems over the years. Data is generally small, and highly relational. It's been mostly SQL Server, but there is some MySQL in there too. In other words, all pretty basic RDBMSs. I've played around with ORM systems like NHibernate and Entity Framework, found them quite useful for designing the initial database, but lacking something in the long run for any time changes were necessary after a nontrivial amount of data had been recorded. As a result, I've tended to try to leave the database changing in the database realm and use only a thin ORM (direct table mapping) with classes automatically generated from the schema. This seems to avoid lost data problems, but comes at a high development cost for constantly repeating yourself and not having change management ala source control in place.
Redgate has a tool called SQL Compare that I've used in the past that makes diffing and merging databases a snap, but it is very expensive. Without a SQL Compare-like tool, it's really easy to miss changes to tables and stored procedures without extremely comprehensive testing (and really, who does that?) when deploying from a development- to testing- or production-environment. And still, that is a uni-direction move. Unlike source control, you never get to see the past again (unless you're restoring from backup, and that is just not automatic enough or simple enough to be a workable solution).

[Formerly "capn_midnight". See some of my projects. Find me on twitter tumblr G+ Github.]

Advertisement

We've used SQL Server in the past for several of our online games, with merging and deployment between environments being handled by senior DBAs using SSMS and SQL Compare. A custom tool was built to handle data entry that used the ADO.NET framework for interacting with the database. For every environment (development, test, release, etc.), every user had their own private database, to which they could pull data from the master database, "check out" and "lock" data, edit it using the tool, and then check it back in (push it back to the master database). It was the same model Perforce used. The idea was that someone could edit data and muck up their own database without affecting anyone else, or sullying the master data, which as you mentioned was difficult to get back once it changed (we took backups of the master databases several times a day, but that's still a lot of work to lose if we ever had to restore).

We ran into a number of "pain points", such as how to handle users being out-of-date with new schema, making sure they were always editing current data, optimizing the push/pull between a user DB and the master DB so that it was as fast and efficient as possible, preventing user data from being destroyed at all costs during pushes, pulls, and schema updates, merging data between two users when their edits caused the same table to be modified, etc. Basically, the more the master DB and any given user DB "drifted" apart, both in terms of schema and data, the more problems we would run into. And this would happen quite often, depending on how long a particular user was working on a piece of data and not syncing from the master DB.

On top of that, the data entry tool itself had its fair shared of issues and bugs. The fact that a lot of our game data was architected in such a way that did not translate very cleanly to a relational layout, made its job a lot harder as well. Data was often presented to the user in such a way that made translation to and from the low-level DB objects (DataSets and DataTables mostly) quite error-prone, because it wasn't being presented relationally, but rather in the most intuitive fashion to the user.

In the end, we just keep slogging through and fixing bugs until everything was in a tenuous-but-mostly-working state, most of the time smile.png I'm not really sure what else to add unless you have a particular question about anything.

Very interesting, thank you.

I had seriously been considering doing something similar to that. My current client's project has a fair bit of test data and the other developers and I are frequently stepping on each other's toes.

Liquibase came up on the Hacker News post I made on this same subject: http://www.liquibase.org/

It looks like it manages a lot of what I'm finding issue with, but has taken the unfortunate tactic of expressing it in XML. Perhaps it wouldn't be too hard to develop a UI for that XML, though.

[Formerly "capn_midnight". See some of my projects. Find me on twitter tumblr G+ Github.]

In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

A quick example? I'm not familiar with all the terminology and such.

Beginner in Game Development?  Read here. And read here.

 

Alpha_ProgDes: he's talking about a table in the DB that has a record of every script you've ever ran against the DB as a whole, to create/alter tables/views/procedures, etc. DDL stands for Data Definition Language, notionally being separate from the Structured Query Language (though often conflated).

so say you have a database that is storing User accounts. You could start off with two tables:

User(id, name, email)

RevisionHistory(date, script)

and revision history would already have a row:

("2013-22-03 09:24:35.45", "CREATE TABLE User(ID INT IDENTITY(1, 1) PRIMARY KEY, Name NVARCHAR(10), Email NVARCHAR(256);")

Now, that's not a very useful Name column, it's pretty short, so you then need to create alter it, thus necessitating a new RevisionHistory record:

("2013-22-03 09:26:48.01", "ALTER TABLE User ALTER COLUMN Name NVARCHAR(256);")

This is necessary but not sufficient for making a good system. It would be equivalent to you having to manually write out text files with Git diffs and changelogs.

I see the problem as being one of lack of development in the realm of databases. Essentially SQL is a dynamic, interpreted programming language with strict typing. In static application languages, like C#, the compiler does a lot of checking to make sure changes to code don't break other code (and certain coding styles can leverage more of the compiler for static checking than others). In dynamic application programming languages, like Python, we tend to use Unit Tests to verify those changes were safe. But Unit Tests don't work in very stateful environments, and databases are extremely stateful (that being their purpose).

So that's why I'm so hell bent on trying to figure out better tools for working with databases. It should be possible to add abstraction layers that essentially do static checking for us. And if you can do static checking and track changes, then you can be much more cavalier about what changes you try to make, because the system finds the breakages for you and provides you an easy path out of the weeds.

Liquibase adds a lot of the change tracking I'm looking for, but not of the static checking. It also doesn't appear to automate the process, your front-end to the system is an XML document, which is just about the worst interface a person could possibly make.

But that doesn't mean a new front end couldn't be made. I think it's a good start, it just needs a lot more work.

[Formerly "capn_midnight". See some of my projects. Find me on twitter tumblr G+ Github.]

For me, the eventual answer was to move to a schema free NoSQL approach. None of the tools I've seen people use either in games or in business make DB changes easy or robust so I'm moving that logic I to the app in future.

Going non-relational, non-ACID is probably not going to be an option for me. Most of the stuff I do *does* fit well into the relational model, and ACID is a crucial feature. Auditors don't like "well, for the sake of speed, we dropped a couple of records on the floor. We made it up in the next cycle" as an excuse for why you don't have complete records. it's just that change management is complete garbage.

Some other things I've found in my research:

Flyway (another Java system, doesn't use XML under it like Liquibase)

Migrator.NET

Subsonic (looks like it might do more than just migration)

Roundhouse

octalforty-Wizardby (claims to be very simple)

MigSharp

I'm reminded that I worked at a place years ago that had their own home-rolled system for doing migrations. Again, it was XML based, and only ran against SQL server. Migration is great, but it's missing (or is maybe orthogonal to) a major feature I want: static analysis. I want to be able to change the schema and immediately have all of the impacted queries pointed out to me. I like breaking code, but it has to break hard to make it work.

[Formerly "capn_midnight". See some of my projects. Find me on twitter tumblr G+ Github.]

In you're concerned about revision history, a clever approach I've seen used in the past is to have a 'changelog' table which stores rows of DDL that, when applied in order, can rebuild your database from scratch up to any particular point in time. So whenever someone makes a change to your database, whether it be schema or data, you turn those changes into a list of equivalent DDL statements and record them.

We store DDL+SQL statements in text files that are under change control. There is no shared database, everybody has a local instance which is built and populated for them by a script that simply executes those statements. Changes are then made locally to that database, most of the time the changes are temporary and get discarded but when someone does want to make a change permanent (e.g. a change to the schema) then they apply it to the script text files and check them in. If your DBMS supports plain-text SQL dumps then you could leverage that.

The VCS obviously handles the revision history but it of course won't maintain the structural integrity of the database when merging schema scripts together - we work around that by virtue of being a small operation; so only one person is scheduled to be making schema changes at any one time.

NoSQL doesn't necessarily mean it's not ACID, nor does it necessarily mean easy transition between "schema". Regardless, sounds like you have a need to stick with the RDBMS world.

Unfortunately the fastest way to get what you're looking for is through "expensive" tools. Another RedGate tool is SQL Prompt which has the ability to find invalid database objects which gets you part of the way there. If any of your stored procedures rely on a field that changed, it will let you know.

http://www.red-gate.com/products/sql-development/sql-prompt/features#explore

This topic is closed to new replies.

Advertisement