Continuous database migration with Liquibase and Flyway
by Niko Köbler
An application's version-controlled source code is stored in the repository. Why not that of the database? To reproduce arbitrary database states in development, test or production environments, two powerful Java libraries are at hand that can be seamlessly integrated into a build for an agile Continuous Delivery.
A team has been happily developing and changing code, checking the code into the Version Control System (VCS), retrieving updates from there, updating the development environment and restarting the application. Suddenly, someone shouts: "Who the heck has changed something in the database? And why don't they tell me what changes they've made? Now the application is giving me another error message, and I'm having to adjust my database access!" This is the kind of trouble and frustration that the team would really like to avoid.
Further along in the project, the team is just about to finalise the development of the next release. It has checked in the code and all unit tests are passing successfully. However, before any integration testing can be done, the team must first update the relevant database on the integration server, because it has already become painfully obvious that structural changes were made during the last sprint and that the application can't be launched in the integration environment without a database update.
What changes have been made over the past few weeks, though? Who can remember all that and who was involved in making those changes and could help with a successful integration database update? Does the test data need to be adjusted? After a hard day of laborious database patching, endless application launches and shut-downs and nerve-wracking error log and stack trace evaluations, the environment is functional and all the integration tests are successful. However, the question is whether the team could have saved itself all this effort, an effort that it potentially makes at the end of every sprint? And has the team documented the changes it has introduced so that they can be reproduced in the same way in the customer's production environment?
In another scenario, a team has been instructed to troubleshoot and fix version A, which is in production use on a customer's system. However, the team's development work has since arrived at version B, and extensive database changes have been made. Of course, nothing has been properly documented as the team was glad to get the current version to work – and nobody wants to go back to an old version. Where can the team now find the database layout that corresponds to the required version A release? The code is available in the version control system; after all, the developers haven't been reckless. But the database? Why isn't the database tied into the version control system?
Against a two-tier society
A database can't merely be considered an external storage facility for the data that is used by an application. Even if a team makes every effort to abstract and encapsulate the data access layer to a degree that would in theory allow the database underneath an application to be replaced at any time. One would think that the database couldn't be a part of the actual application; however, because an application needs the database to manage the accumulating and required data, it is inseparably married to it, despite all encapsulation, abstraction and replace-ability efforts. As the application's properties and structures evolve, so do those of the database. Sometimes only the data changes, and sometimes only the structure; but often, both change at the same time. The application is often treated as a kind of private patient while the database is considered a national health service patient. Opponents of a two-tier society will do well to consider the welfare and health of an application complete with its database from the start of every project.
Software teams should be able to (re)produce the database state that corresponds to a specific application state at any time – ideally automatically and at the push of a button. This allows them to ensure that every programmer uses the current or appropriate version of the database alongside the application code during development. Teams will no longer need to think (or even worse, worry) about the deployment in target environments such as development, testing, staging, integration, quality control and finally production. This is bound to cause less trouble and frustration and will provide a less stressful development phase that at best leads to more productive and higher quality results. If the team can then prepare suitable test data in parallel with the structural database changes, automatically update this data in the test or integration environment and use it for the tests, and therefore for repeatable results, real progress has been made and the team can focus on its original tasks: development, improvement, renewal – adding value.
It is, however, not only the deployment advantages described above that allow developers to relax. If a team checks the database into the version control system in the same way as the source code, bugs and integration issues will be detected early on – provided that effective continuous integration mechanisms are in place. Bugs can then be fixed quickly before an unnecessary number of the people involved discover the same mistake, and further frustration within the team can be avoided. By checking the database into the VCS, the team can also create a kind of backup and documentation of the structure and of any test data that may exist.
Three rules for working with databases
Any internet research on best practices for working with databases within a development team will soon lead to K. Scott Allen's "Ode to Code" web site. On his blog, Allen also discusses the present topic and has defined three rules for database work:
- Never use a shared database server for development work.
- Always have a single, authoritative source for your schema.
- Always version your database.
Point 1 is valid because the potential for error and frustration is extremely high when a developer makes changes to the shared database without informing the other team members and without providing them with suitably modified code at the same time. Both of the other points refer to checking the database into the VCS and being able to update target systems or recreate a version that is suitable for the development of a bug fix from there at any time. Therefore, the database must be hard-coupled to be managed under the same version control system as the application itself, because the database is an integral part of the application.