Category: articles

  • Ticket sales and trend analysis using fake data, Amazon Q and QuickSight

    Another article compiled with help of Amazon Q. This time focusing on football ticket sales and trends analysis. The intension is to get some help to understand what dataset is required to build quick dashboard using QuickSight. football ticket sales and trend analysis using fake data, amazon q, quicksight, sqlalchemy 2.0 and postgresql Response from…

  • Customer segmentation analysis using fake data and QuickSight with Amazon Q

    I asked a question to Amazon Q “How to use QuickSight Amazon Q for customer segmentation?” and also asked several follow up questions. To get the scripts working it needed little of bit tweaking. In most cases results were quite accurate and are listed below: Using sample date Certainly! I can help you create a sample dataset in Python that includes attributes useful for customer segmentation. This dataset will include demographics, purchase history, and behavior patterns. Here’s a Python script to generate this sample data: Generated Python Script Generated code for…

  • Migrating data from SQL Server to PostgreSQL

    Prerequisites Steps Python Code for testing connection Sample content of .env file Installed python packages Use requirements.txt file to install necessary packages. Content of the file listed below:

  • Data layer security and permission in a multi-tenancy data warehouse

    In this article we will discuss various approaches for achieving data isolation or data layer security in a multi-tenancy data warehouse. I will refer multi-tenancy data warehouse as an architecture where an instance of OLAP database is shared by many customers. In a multi-tenancy platform data isolation (row level data access) is usually performed in…

  • AWS Data Wrangler (AWS SDK for Pandas)

    AWS Data Wrangler (awswrangler) is now called AWS SDK for Pandas), is a python library which make it easier to integrate AWS services for ETL (Extract Transform Load) works. I will use Google Colab and AWS SDK for Pandas for creating: Prerequisites Steps Let’s get started. Verify your table Go to AWS Console and check…

  • Changing int data type to bigint in SQL Server

    The challenges involved in changing int identity column (i.e. int identity(1,1)) to bigint depends on size of the database and references to identity key in other tables and objects such as user defined procedure/function, sequence and indexes (cluster non cluster). Let’s try changing identity column data type from int to bigint using SQL Server AdventureWorksLT2022…

  • Reading large XML file using Pentaho and Apache Hop

    Refer to my pervious article – Processing GLIEF data in JSON format . I wanted refresh my knowledge of Pentaho Data Integration tool and see if I could process huge XML file without running into Java OOM (Java Out of Memory) problem. Pentaho offers Input step called – XML Input Stream (StAX – Streaming API…

  • Data validation check in streaming data

    In this article, I will share my approach (see my previous article for further information on data validation) for carrying out data validation check using streaming data from a Kafka topic. Checking data is very important. Many often this task is get overlooked. One cannot trust source data, especially if you are ingesting data for…

  • Looking at Company House Profile stream data

    Companies House provides various stream end points to obtain data related to companies. This https://stream.companieshouse.gov.uk/companies provides latest changes recorded about a company. Not all data points will be present for a company. Data capture and storage options Processing data using AWS DynamoDB Local DynamoDB local stores its data in SQLite database file and therefor It…

  • Using DynamoDB local on Googlee Colab

    Recently, I have setup a process on a RaspberryPI server and configured it to run in the background and listen for events from Companies House streaming API. It has has been running without any issues. I want to use DyanamoDB local for storing events data from streaming API. Steps for deploying DynamoDB local Create a…