Skip to content
Sandny Blog
  • Home
  • Java
  • JavaScript
  • AWS
  • Embedded
  • About
Copy files from S3 to PostgreSQL RDS AWS

How to copy files from S3 to PostgreSQL RDS

  • September 17, 2022September 17, 2022
  • by Coder Kai

When you have millions of entries of data to be inserted into PostgreSQL as a backup or for any reason, its often hard to iterate and insert into a table. For this scenario, PostgreSQL AWS extensions provide COPY command for S3 files to copy files from S3 to PostgreSQL RDS directly.

First we will see what’s the COPY command does. According to the official documentation

COPY moves data between PostgreSQL tables and standard file-system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already). COPY TO can also copy the results of a SELECT query.

https://www.postgresql.org/docs/current/sql-copy.html

Following is the COPY command syntax

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    FORCE_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'

One important point is COPY command doesn’t

To move the file to RDS, AWS has an extension of COPY command that can be installed directly to PostgreSQL. Let’s dive into the steps.

1. Install AWS S3 extension in PostgreSQL to copy files from S3 to PostgreSQL RDS

First, following are the steps that are required to write the file from S3 to the RDS

  1. Create a secure signed link to the file which can read the file
  2. COPY the file from S3 to RDS and update the table.

To run the COPY command you will first need to install the aws_s3 extension. https://github.com/chimpler/postgres-aws-s3. Login to the PostgreSQL from a secure connection and execute the following.

postgres=> CREATE EXTENSION aws_s3 CASCADE;
NOTICE: installing required extension "aws_commons"
CREATE EXTENSION

If you have a public link you can login using following

psql --host=123456789.aws-region.rds.amazonaws.com --port=5432 --username=postgres --password

You can execute\dx command to verify whether the extension is installed

postgres=> \dx
       List of installed extensions
    Name     | Version |   Schema   |                 Description
-------------+---------+------------+---------------------------------------------
 aws_commons | 1.2     | public     | Common data types across AWS services
 aws_s3      | 1.1     | public     | AWS S3 extension for importing data from S3
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

2. Get a secure link to the S3 file

Now we can see how to get a secure link to the data file in S3. The data file should be in CSV format. You can use a custom delimiter other than , and create a file with records. Note that the records should be exactly as the rows in the table.

Example data files in CSV format

S3 to PostgreSQL CSV data set

With |delimiter as the separator. Useful when you have comma-separated texts in between

You can save this file to an s3 bucket and get its path

Next you have to create a secure link to this file. For that, you can use the following query.

bucket = docs-lab-store-for-rpg
filename = '/docs-lab-test-folder/version_and_jdks_listing.csv'
region = 'us-west-1'

command = """
  SELECT aws_commons.create_s3_uri(
      '{bucket}', 
      '{filename}',
      '{region}'
  ) as uri
"""
query = command.format(bucket=bucket, filename=filename, region=region)

This will create a link for the s3 file and we can use it with the aws_s3.table_import_from_s3 function

bucket = docs-lab-store-for-rpg
filename = '/docs-lab-test-folder/version_and_jdks_listing.csv'
region = 'us-west-1'
access_key = 'acccessKeyToSign'
secret_key = 'secretKeyToSign'

command = """
    with s3 AS (
        SELECT aws_commons.create_s3_uri(
            '{bucket}',
            '{filename}',
            '{region}'
        ) as uri
    )
    SELECT aws_s3.table_import_from_s3(
        'my_table', '', '(format csv, DELIMITER ''|'')',
        (select uri from s3), 
        aws_commons.create_aws_credentials('{access_key}', '{secret_key}', '')
    )
"""
query = command.format(bucket=bucket, filename=filename, region=region, access_key=access_key, secret_key=secret_key)

Now when you execute this with psycopg or equivalent python client it will import your file to the RDS.

3. How to insert auto-generating ID

If your data has auto-generating IDs it will give an error as the copy command doesn’t know which columns to enter the data. For that, you can specify what each comma separate value represents in terms of the columns.

You can modify the command as follows

SELECT aws_s3.table_import_from_s3(
        'my_table', '
         Year,
         Industry_aggregation_NZSIOC,
         Industry_code_NZSIOC,
         Industry_name_NZSIOC, 
         Units,
         ', '(format csv, DELIMITER ''|'')',
        (select uri from s3), 
        aws_commons.create_aws_credentials('{access_key}', '{secret_key}', '')
    )

4. How to deal with constraints

When you are executing if you get a constraint fails the query will stop. For that, you can copy data to a temporary table and insert all data from the temporary table to the destination table by ignoring constraints.

CREATE TEMP TABLE tmp_table 
    (LIKE my_table INCLUDING DEFAULTS)
    ON COMMIT DROP; 

// copy to tmp_table

INSERT INTO my_table
    SELECT *
    FROM tmp_table
    ON CONFLICT DO NOTHING

That’s it. Also if you need information how to save data to CSV file using python, you can refer following.

How to save pandas Dataframe to a file and read back

How to save pandas Dataframe to a file and read back
Chakra UI Animations
Coder Kai
A humble developer
Copy from S3 to PostgreSQL PostgreSQL Psycopg Python Python from S3 to PostgreSQL RDS

Related articles

Pandas Dataframe to a file and read back
How to save pandas Dataframe…
How to host GoDaddy domain with AWS EC2
How to host GoDaddy domain…
AWS Ingress ALB with EKS
How to use AWS Ingress…
Frequent errors with Apollo Client
Query definition in GraphQL and…
GraphQL Web App with AWS Amplify and IAM Authentication
AppSync GraphQL Web App with…
Query records for a list of keys in GraphQL
Query records for a list…
aws graphql with lambda
How to mutate AWS GraphQL…
Use AWS GraphQL for react native app
Use AWS GraphQL for React…

Categories

  • android 3
  • Apollo Client 1
  • AWS 8
    • AppSync 5
    • EC2 1
    • EKS 1
    • Route53 1
    • S3 1
  • AWS Amplify 1
  • Chakra UI 1
  • Docker 1
  • Embedded 1
  • EmberJS 1
  • FCM 1
  • Godaddy 1
  • GraphQL 3
  • ios 1
  • Jasper 1
  • Java 10
    • Java 11 1
    • Java 14 1
  • JavaEE 2
  • JavaScript 39
    • Express.js 4
    • Javascript Guide 7
    • Node.js 3
    • react-native 4
    • React.js 17
    • Typescript 1
  • Kubernetes 1
  • machine learning 1
  • Maven 2
  • OCaml 3
  • PostgreSQL 1
  • Python 2
  • react-native 4
  • ReactJS 3
  • sass 1
  • Server 6
  • spark 1
  • Terraform 2
  • Ubuntu 4
  • Uncategorized 1
  • webpack 2

Recent Comments

  • binance register on Chakra UI Animations
  • binance account on SSL Websocket proxy with Nginx for Graphql Subscriptions
  • Binance Pag-signup on How to fetch data with useEffect

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

  • October 2022 3
  • September 2022 7
  • May 2022 1
  • December 2021 1
  • August 2021 1
  • July 2021 6
  • June 2021 3
  • February 2021 1
  • July 2020 1
  • December 2019 5
  • November 2019 6
  • October 2019 3
  • August 2019 1
  • March 2019 1
  • February 2019 1
  • January 2019 2
  • December 2018 1
  • September 2018 2
  • August 2018 1
  • June 2018 1
  • February 2018 1
  • November 2017 2
  • October 2017 5
  • September 2017 1
  • June 2017 1
  • May 2017 10
Sandny Blog space
Theme by Colorlib Powered by WordPress