Author: broadoakdata

  • Process Companies House data using Google Colab

    In this article we will discuss how to process Companies House data using Google Colab. In previous articles we have demonstrated ETL processes using Companies House monthly snapshot data. Most of the processing were carried out local machine. Recent monthly snapshot contains just over 5 million records. We are interested to build ETL pipeline for…

  • Simple ETL pipeline using Apache Nifi

    In this article, we will compose a simple ETL pipeline using Apache Nifi. We want to read Excel File and convert it to a csv file using Apache NiFi ConvertExcelToCSV Processor. Prerequisites install Apache Nifi (we used $ brew install nifi) basic knowledge of ETL concept and data integration tool Data Pipeline Apache Nifi is…

  • Merge multiple rows sharing id into one row

    UPDATE – 07/07/2022 It can be achieved using few lines of PySpark codes. See below: In this article, we will show how to merge multiple rows sharing id into one row using PySpark. We will use Companies House dataset for this article. You may way find previous articles about how to get companies house data…

  • Locality Sensitive Hashing for finding similar Company names

    In this article we will use Locality Sensitive Hashing for finding similar Company names and will use data from Companies House as mentioned in the previous article. We will use PySpark pipeline to streamline the process of finding similar company names. Background Fuzzy/Approximate  matching two strings means calculating how similar two strings are and one…

  • Hexo blog framework

    Hexo is a fast, simple and powerful blog framework. You write posts in Markdown (or other markup languages) and Hexo generates static files with a beautiful theme in seconds. Ref: https://hexo.io/docs/ Installation According to Hexo blog framework website, it only takes a few minutes to set up Hexo. Requirements Installing Hexo is quite easy and only requires…

  • Install Google Cloud CLI in Termux

    Install gcloud CLI to access Google Cloud Shell via SSH on Android using Termux. First: Run curl https://sdk.cloud.google.com | bash Note: This will fail when trying to install components. Ignore this. Then $PREFIX/google-cloud-sdk/install.sh –override-components (without specifying components) Will add gcloud to $PATH And then gcloud components install gsutil Finally gcloud –console-only Using the –console-only flag is useful if you’re running…

  • Companies House API using Node.js

    We will show how to call Companies API using Node.js. We will use ES6 Node.js node-fetch library for the Companies House BETA API UPDATE – 24/12/2022 you can use node-fetch library to fetch data from Companies House API. See code snippets below. If you encounter the following error message: Then use: Instead of Installation instructions Run npm…

  • Hello world in different computer languages

    We will write Hello world in different computer languages: PHP Python C Java Hello world in different computer languages will be implemented using Termux Linux emulator on an android smart phone. PHP PHP is server side programing language. There are many well known applications have been written using PHP such Meta’s Facebook, WordPress, Drupal, SugarCRM,…

  • Compile and run C programs in Termux

    In this article, we will show how to Compile and run C programs in Termux. We will write a hello.c program then we will compile and run in an Anroid phone using Termux Linux emulator. Prerequisites Assuming you already have latest C compiler in your device installed. Just check it out using: Write a program…

  • Running WordPress on a Android phone

    In this article, we will show how to run WordPress in an android phone. Prerequisites Termux Termux is terminal emulator and Linux environment apps which runs on Android phone. It combines powerful terminal emulation with an extensive Linux package collection. Download it from Play Store and install in your phone. Update and install packages Use…