Rob Moorman
Rob Moorman

Founder, technology consultant, architect and full stack developer
Apr 27

Listen to this blog post

Importing production data from AWS

A short guide to help you migrate production data out of AWS into your development stack

The Cloud hosting services from Amazon Web Services are very powerful. We use it to spin up complete infrastructures with predefined resources, such as auto scaling clusters and load balancers (we use our own maintained CloudFormation templates). This gives us the ability to focus on development and have less concerns about the performance and security of our hosting environments.

During development we find it best to work with actual anonymised production data (no lorem ipsum, images of kittens, etc.). Therefore we often migrate production back to our local development machines. Below we'll instruct how to load a dump from a RDS Postgres database and how to sync user uploaded files, like images and documents.

Dump and restore a RDS Postgres database

First make sure your database can be reached via your network. In most cases adding your IP address in the security group attached to your database will do the job. When the connection can be made, we are ready to create the dump.

pg_dump -Fc -v -h [endpoint of instance] -U [master username] [database] > [database].dump

The endpoint should look like The master username is the username you initially provided while creating the RDS database instance.

Now we have a valid dump of our database, we want to restore it into our local Postgres database.

createdb [database]
pg_restore -v -d [database] [database].dump

Syncing data from S3

In most cases you also need to copy user uploaded files, which are referenced in your database dump. In our own projects (hosted on AWS) we always use S3 buckets for this.

Galdy AWS offers very powerful API's to perform actions like syncing data to and from a S3 bucket. You can easily install a CLI with pip install awscli.

Now let's sync our data with the following command :

aws s3 sync s3://[bucket name]/media media

In this case, we synced the media folder which is common used in Django (and Wagtail) projects to store all user uploaded files in. Therefore we use the very simple django-storages app to hook it up with AWS.

Want to know what AWS can do for you?

Contact us

Tip: the aws s3 command also offers a --dryrun option, this helps to point out what's going to happen as we don't want to accidentally delete all our files (you should however restrict this with the right IAM policies though).

With these two simple steps we can quickly migrate production data into our own development stack and work with actual data instead of creating dummy data.

Rob Moorman
Rob Moorman

Founder, technology consultant, architect and full stack developer