Migrating data from SQL Server to PostgreSQL

Prerequisites

  • access to both source and target databases
  • install Python packages
    • sqlalchemy
    • pymssql
    • psycopg2-binary
    • python-dotenv
  • databases and schema already exist in the target

Steps

  • create target database and schema(s)
  • create python virtual environment – python -m venv ~/apps/.myssql_psql
  • activate python – $ source ~/apps/.myssql_psql/bin/activate
  • install python packages
  • run python script to copy data from source

Python Code for testing connection

import os
import pymssql
import psycopg2
import pandas as pd
from sqlalchemy import create_engine, text
from dotenv import load_dotenv

# Required to load the previously defined environment variables in .env file
load_dotenv()
"""
SQL Server and PostgreSQL using native connection and SqlAlchemy
python -m pip install pymssql sqlalchemy psycopg2 python-dotenv
"""
conn = pymssql.connect(
    server=os.environ.get('MS_HOST'),
    user=os.environ.get('MS_USER'),
    password=os.environ.get('MS_PASSWORD'),
    database='AdventureWorksLT2022',
)

SQL_QUERY = """
SELECT @@Version;
"""
cursor = conn.cursor()
cursor.execute(SQL_QUERY)
records = cursor.fetchall()
print(records)
cursor.close()
# second query
cursor = conn.cursor()
cursor.execute('SELECT * FROM SalesLT.Address')
cols = [c[0] for c in cursor.description]
df = pd.DataFrame.from_records(cursor.fetchall(), columns=cols)
cursor.close()
print(df.head())
conn.close()

# pymssql
engine = create_engine(
f"mssql+pymssql://{os.environ.get('MS_USER')}:{os.environ.get('MS_PASSWORD')}@{os.environ.get('MS_HOST')}:1433/AdventureWorksLT2022"
)
query = text("""SELECT * FROM SalesLT.Address""")
df = pd.read_sql_query(query, con=engine)

# postgresql
pgengine = create_engine(
    f"postgresql+psycopg2://{os.environ.get('PG_USER')}:{os.environ.get('PG_PASSWORD')}@{os.environ.get('PG_HOST')}:5432/ellodb"
)
df.to_sql(schema='ve', name='address', con=pgengine, chunksize=5000, index=False, index_label=False, if_exists='replace')

query = text("SELECT version();")
df = pd.read_sql(query, pgengine)
print(df.head())
pgengine.dispose()

Sample content of .env file

PG_HOST=localhost
PG_PORT=5432
PG_USER=user
PG_PASSWORD=password
PG_DATABASE=database

MS_HOST=localhost
MS_PORT=1433
MS_USER=user
MS_PASSWORD=password
MS_DATABASE=database

Installed python packages

Use requirements.txt file to install necessary packages. Content of the file listed below:

#pip freeze
greenlet==3.0.3
numpy==2.0.1
pandas==2.2.2
psycopg2-binary==2.9.9
pymssql==2.3.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
six==1.16.0
SQLAlchemy==2.0.32
typing_extensions==4.12.2
tzdata==2024.1