Category: articles

  • Google CoLab and relational database

    We will show how to access relational database using Google CoLab. We will use both python module and PySpark for accessing MySQL database and writing data to it. See other articles related to BigQuery and Google Colab Prerequisites For example, you will only need mysql-connector-python for accessing MySQL database. Connect to MySQL database Assuming you…

  • Load data into BigQuery using Terraform

    We will load data into BigQuery using Terraform. Our task: We will use Google Cloud Shell, a tool for managing resources hosted on Google Cloud Platform. It provides a VM with 5GB home directory will persist across sessions, but the VM is ephemeral and will be reset approximately 20 minutes after your session ends. Terraform…

  • Hive array operations

    In this article we describe how to add and remove items from array in Hive using PySpark. We are going to use two array functions to accomplish the given task: UPDATE – output might contains duplicate rows. Use groupBy with agg to deals with multiple columns. You may need to use the following function: Ref:…

  • Add or remove items from array using PySpark

    In this article, we will use HIVE and PySpark to manipulate complex datatype i.e. array<string>. We show how to add or remove items from array using PySpark We will use datasets consist of three units, representing paye, crn and vat units. For sample data see – https://broadoakdata.uk/synthetic-data-creation-linking-records/ We need to link and unlink few units…

  • Fedora Workstation on GEO GeoBook 2E

    Installed Open Source Linux distribution Fedora from RedHat. This laptop is configured with minimum software for United Technical College at Mirpur, Jagannathpur, Sunamganj District of Bangladesh. GeoBook 2e is for learning, at any time in any place. A lightweight 12.5-inch design ensures easy portability for users of all ages while an Intel Quad or Dual…

  • How to Create Pivot Table in MySQL

    MySQL does not have a Pivot function to create a pivot table in MySQL. Pivot tables are useful for data analysis, allow you to display row values as columns to easily get insights. Tyerefore one needs to write SQL query to create pivot table in MySQL. You can use SUM and IF OR CASE statements to create…

  • Google Sheets in CoLab using service account key file

    Using Google Drive, Google Sheets and Python in Google Colaboratory to access Google Sheet using service account json key file. Useful links Prerequisites Steps Code Snippets

  • Search for a dissolved company

    We will show how to search for a dissolved company using Companies House Public API. Using Databricks to call the api and then creating dataframe using PySpark. Steps See How to get Companies House data using REST API Code snippets

  • Install PostgreSQL on AWS CloudShell

    PostgreSQL on AWS CloudShell, please use the the command below:

  • Synthetic data creation and linking records

    In this article we will how to generate synthetic data and linking records on particular field. Approach We will use three dataframes and will link records using value my matching a variable. See previous article https://broadoakdata.uk/generating-synthetic-data-using-hive-sql/ for reference. Code snippets Generating synthetic data using PostgreSQL