Carl Willimott - Tech Blog: Transferring data between S3 buckets

Transferring data between S3 buckets

May 08, 2023

4 min read

Back

Imagine you have an S3 bucket with a large amount of files, and you need to move everything to a new bucket. Perhaps you are migrating from staging to production or need to transfer some config and don't fancy downloading all the files and uploading them. In fact, why would you? Especially if they contain a load of PII, you don't really want all of that on your laptop.

The AWS CLI has a couple of handy commands which once setup, make this whole process a breeze, and in fact you can run the process as many times as you need, which is especially useful if you get a constant stream of new files and need to get any which might have appeared in the source bucket whilst you are releasing your new app to production.

The setup

The first step, if you haven't done so already is to download the AWS CLI, depending on your operating system this may vary slightly. You can find more information on the AWS website.

For the following examples, we are going to assume that you have the correct permissions configured to access two buckets which are in separate AWS accounts. We are going to be using the access key and secret values from the security credentials tab in the AWS website for the source bucket, and then assume a role which will allow us to access the destination bucket. The configuration of these credentials is out of the scope of this article, but I will briefly explain how to configure the CLI.

Once you have your access key and secret you can run the following command:

aws configure

You should be able to paste your values here and the CLI will perform the initial setup for you. You can continue to use the CLI to finishing setting up, but we are going to manually update your credentials and config files which should now have been created. I have included an example where you might find these files if you are using a unix-like system.

Although your files likely won't look exactly the same, you should aim to configure them so that you have the default profile setup for the source account and then the other profile will be the destination account.

Don't worry too much about the region settings as we explicitly set them in the sync command anyway.

Config

~/.aws/config

[default]
region = eu-west-1

[profile other]
role_arn = arn:aws:iam::111111111111:role/OrganizationAccountAccessRole
source_profile = default
region = eu-west-1

Credentials

~/.aws/credentials

[default]
aws_access_key_id=XXXXX
aws_secret_access_key=XXXXX
region=eu-west-2

To ensure we have permission to access each of the buckets we can run a simple list command on each bucket. If your destination bucket is empty, you won't see anything printed in your terminal. Note how we specify the profile for the destination bucket.

aws s3 ls s3://source-bucket-uri
aws s3 ls s3://destination-bucket-uri --profile other

You should see something like the following:

2023-02-02 12:09:15      78656 aaa
2023-03-17 12:40:17      56342 bbb
2023-03-12 15:19:32      98767 ccc

Running the migration

After you have set everything up, you can run the sync command which will efficiently assess your buckets and transfer any files which may be missing. Note, in this example our buckets are into two different AWS regions, so you might need to change this according to your needs.

aws s3 sync s3://source-bucket-uri s3://destination-bucket-uri --source-region eu-west-2 --region eu-west-1

Assuming you have all of your permissions configured, and you have files to transfer, you should see something like the following:

copy: s3://source-bucket-uri/aaa to s3://destination-bucket-uri/aaa
copy: s3://source-bucket-uri/bbb to s3://destination-bucket-uri/bbb
copy: s3://source-bucket-uri/ccc to s3://destination-bucket-uri/ccc

You can run this command as many times as you like, which makes it useful if you are constantly receiving new files in the bucket.

Verifying your results

Once complete, you can run the following commands for each environment.

aws s3 ls s3://source-bucket-uri --recursive --human-readable --summarize
aws s3 ls s3://destination-bucket-uri --profile other --recursive --human-readable --summarize

Hopefully, you will see the same output for each environment.

2023-02-02 12:09:15      1.25Mib aaa
2023-03-17 12:40:17      1.25Mib bbb
2023-03-12 15:19:32      1.25Mib ccc

Total Objects: 3
   Total Size: 3.75 MiB

You can also verify these details in your AWS account, but being able to do everything from the command line helps speed everything up, and you can write a script for this process if required.

Useful links