Category: Python

  • Using DynamoDB local on Googlee Colab

    Recently, I have setup a process on a RaspberryPI server and configured it to run in the background and listen for events from Companies House streaming API. It has has been running without any issues. I want to use DyanamoDB local for storing events data from streaming API. Steps for deploying DynamoDB local Create a…

  • Companies House Stream API

    The Companies House streaming API gives you access to realtime data changes of the information held at Companies House. It pushes data changes to your client . You need API credentials to use Companies House Stream API. UPDATE: 10/06/2023 We look into persons with significant control API. To access streaming data using Python. End points…

  • Processing GLIEF data in JSON format

    UPDATE – 13/04/2024 – There is a work around which involved updating JSON file using linux tools such as – sed and awk. The file contains array of JSON objects. The issue is that Apache Spark read the whole contents instead of processing each JSON object. Problem Get dataset and unzip the dataset i.e. after…

  • Working with Companies House snapshot data

    Get a snapshot of latest live (excluding dissolved companies) basic company data from – http://download.companieshouse.gov.uk/en_output.html. Latest file (http://download.companieshouse.gov.uk/BasicCompanyDataAsOneFile-2022-01-01.zip) contains over 5 million records. It’s a csv file however there are records (22 companies) which contain comma and quotes in the data. You might have to do extra work to parse those records correctly. It is…

  • How to get Companies House data using REST API

    Companies House provide REST API, which lets you retrieve information about limited companies in UK. In this article, will use python script to get data from Companies House data using REST API. Prerequisites We need to register with Companies House to access REST API. Follow the steps below: Create a developer account Create an application…