Author: broadoakdata

  • Saving data as parquet table with partition key

    We will show how to add latest changes into parquet table with partition key. Approach Code snippets CDC implementation Synthetic data Example code for using above functions

  • Loading JSON data into Neo4j database

    Following on from recent article which describes method for using Neo4j database on Google CoLab. We will show how to load JSON data into a Neo4j database. Prerequisites Implementation We will use Pandas and py2neo packages for the task in hand. Output

  • Running Neo4j on Google Cloud Shell

    Install Neo4j from tarball Installing Neo4j on Google Colab Example Installing Elasticsearch on Google Colab Notes Run both elasticsearch and neo4j without login user id and password. For this you need to uncomment or edit few directives in config/elasticsearch.yml and conf/neo4j.conf respectively. Neo4j Eleasticsearch

  • Google Translation API

    We will show how to use Google Translation API using Google Colab. Prerequisites Steps Code to translate text Translating a HTML file

  • Generating delta for Synthetic data

    How do you include delta records in new synthetic dataset? In previous article we describe how to generate synthetic data and will use the same method to solve this question. Approach Code showing generating delta See Google CoLab notebook Implementation – codes Screen shots

  • Read and write data using SQLAlchemy and Pandas dataframe

    SQLAlchemy has made it easy to read and write data from databases. I am using Google Colab and free databases to store and manipulate data. In this article, I will read data from MySQL database and save the result to a PostgreSQL using Pandas dataframe to_sql method. Steps Prerequisites Needed following Python packages – Google…

  • Getting data from MySQL database using PHP

    db4free.net provides a testing service for the latest version of the MySQL Server. You can easily create an account for free and test your applications. db4free.net is also a good resource for education and to make yourself familiar with MySQL database via phpMyAdmin. For more advance example, please read this artcle – Contact Form using HTML, JavaScript, CSS,…

  • Scheduling Python Script to run on GCP

    I came across this article Setting up a Recurring Google Cloud Function With Terraform and used serverless approach for scheduling python script which will run periodically to invoke Companies House PSC stream to get data and save them into free PostgreSQL database. I am using Google Cloud Function and Cloud Scheduler  to accomplish this task without need for…

  • Processing Companies House PSC stream data

    Need to access PSC stream API and extract information about persons with significant control. However the result has personal information such name, part of date of birth, nationality and address with postcode. Please read following articles – Processing UK Companies House PSC Data and Companies House Stream API more detailed information on Companies House PSC…

  • Save Google Sheets data into BigQuery and MySQL

    Read data from Google Sheets and save a worksheet into BigQuery and MySQL. Steps: Writing data into MySQL See the previous article – Google CoLab and relational database Spark Context details and how to get data from MySQL database. Getting data from Google Sheets Save dataframe to BigQuery Read data from BigQuery Ref: Google CoLab