Tag: sqlalchemy

  • Migrating data from SQL Server to PostgreSQL

    Prerequisites

    • access to both source and target databases
    • install Python packages
      • sqlalchemy
      • pymssql
      • psycopg2-binary
      • python-dotenv
    • databases and schema already exist in the target

    Steps

    • create target database and schema(s)
    • create python virtual environment – python -m venv ~/apps/.myssql_psql
    • activate python – $ source ~/apps/.myssql_psql/bin/activate
    • install python packages
    • run python script to copy data from source

    Python Code for testing connection

    import os
    import pymssql
    import psycopg2
    import pandas as pd
    from sqlalchemy import create_engine, text
    from dotenv import load_dotenv
    
    # Required to load the previously defined environment variables in .env file
    load_dotenv()
    """
    SQL Server and PostgreSQL using native connection and SqlAlchemy
    python -m pip install pymssql sqlalchemy psycopg2 python-dotenv
    """
    conn = pymssql.connect(
        server=os.environ.get('MS_HOST'),
        user=os.environ.get('MS_USER'),
        password=os.environ.get('MS_PASSWORD'),
        database='AdventureWorksLT2022',
    )
    
    SQL_QUERY = """
    SELECT @@Version;
    """
    cursor = conn.cursor()
    cursor.execute(SQL_QUERY)
    records = cursor.fetchall()
    print(records)
    cursor.close()
    # second query
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM SalesLT.Address')
    cols = [c[0] for c in cursor.description]
    df = pd.DataFrame.from_records(cursor.fetchall(), columns=cols)
    cursor.close()
    print(df.head())
    conn.close()
    
    # pymssql
    engine = create_engine(
    f"mssql+pymssql://{os.environ.get('MS_USER')}:{os.environ.get('MS_PASSWORD')}@{os.environ.get('MS_HOST')}:1433/AdventureWorksLT2022"
    )
    query = text("""SELECT * FROM SalesLT.Address""")
    df = pd.read_sql_query(query, con=engine)
    
    # postgresql
    pgengine = create_engine(
        f"postgresql+psycopg2://{os.environ.get('PG_USER')}:{os.environ.get('PG_PASSWORD')}@{os.environ.get('PG_HOST')}:5432/ellodb"
    )
    df.to_sql(schema='ve', name='address', con=pgengine, chunksize=5000, index=False, index_label=False, if_exists='replace')
    
    query = text("SELECT version();")
    df = pd.read_sql(query, pgengine)
    print(df.head())
    pgengine.dispose()

    Sample content of .env file

    PG_HOST=localhost
    PG_PORT=5432
    PG_USER=user
    PG_PASSWORD=password
    PG_DATABASE=database
    
    MS_HOST=localhost
    MS_PORT=1433
    MS_USER=user
    MS_PASSWORD=password
    MS_DATABASE=database

    Installed python packages

    Use requirements.txt file to install necessary packages. Content of the file listed below:

    #pip freeze
    greenlet==3.0.3
    numpy==2.0.1
    pandas==2.2.2
    psycopg2-binary==2.9.9
    pymssql==2.3.0
    python-dateutil==2.9.0.post0
    python-dotenv==1.0.1
    pytz==2024.1
    six==1.16.0
    SQLAlchemy==2.0.32
    typing_extensions==4.12.2
    tzdata==2024.1