Merging two MySQL (or MySQL compatible) databases in AWS using DMS

Most projects don’t start out knowing exactly how their infrastructure will turn out. That is one of the reasons I like infrastructure-as-code, but when it comes to data (as in databases), it’s a whole other ball game. Actually, I believe that Cloud’s don’t lock us in with their services, they lock us in once our data is there. Data is hard to move, particularly now that we collect so much of it and often have to convert between systems.

Recently I had a request from a client that needed to merge two Oracle instances into one for cost savings. I’m not going to cover that here (let’s just say Oracle is a beast of its own). But to try out AWSs Data Migration Service (DMS) to see if it was a fit, I tested it with two MySQL instances.

Here’s what you’d need to do:

  1. Go to RDS
  2. Make sure that you have backups enabled on both source instances. This is not just good practice, it’s required by the process.
  3. Go to Parameter Group > New from the MySQL one and set the binlog format to ROW.
  4. Apply this new parameter group to your two source instances, and reboot the instances to effect it
  5. Take note of the following pieces of info from both source instances: instance/host (e.g. myinstance.qwerty.eu-west-1.rds.amazonaws.com), database name (e.g. mydatabase), database user (e.g. myuser), database password, database port (for MySQL: 3306)
  6. Create the target instance and take note of the same info as before
  7. Create the replication instance. I’ve been using dms.t2.medium which is fast enough for me. Check that you are allocating enough disk space for it to use as temporary data area
  8. Create endpoints from the two source instances and the target instance
  9. The replication instance needs to have access to read and write, so add the replication instance’s private IP to the security groups of the source and target instances. As an alternative, set the same security group on replication instance as the database
  10. Create the merge task to do the Full Load (initial copy) and the Change Data Capture (CDC aka ongoing replication)

If everything works out, the initial copy will be done pretty queckly and DMS enters replication mode, so any change on a source instance will show up on the target instance pretty much immediately.

Bonus

In case you need to rename a given schema you can use the guided version or a JSON mapping like below (e.g. renaming Test to Test1:

{
  "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "transformation",
            "rule-id": "2",
            "rule-name": "2",
            "rule-action": "rename",
            "rule-target": "schema",
            "object-locator": {
                "schema-name": "Test"
            },
            "value": "Test1"
        }
    ]
}

I hope this helps!