Category: Python
-
Using DynamoDB local on Googlee Colab
Recently, I have setup a process on a RaspberryPI server and configured it to run in the background and listen for events from Companies House streaming API. It has has been running without any issues. I want to use DyanamoDB local for storing events data from streaming API. Steps for deploying DynamoDB local Create a…
-
Companies House Stream API
The Companies House streaming API gives you access to realtime data changes of the information held at Companies House. It pushes data changes to your client . You need API credentials to use Companies House Stream API. UPDATE: 10/06/2023 We look into persons with significant control API. To access streaming data using Python. End points…
-
Processing GLIEF data in JSON format
UPDATE – 13/04/2024 – There is a work around which involved updating JSON file using linux tools such as – sed and awk. The file contains array of JSON objects. The issue is that Apache Spark read the whole contents instead of processing each JSON object. Problem Get dataset and unzip the dataset i.e. after…
-
Working with Companies House snapshot data
Get a snapshot of latest live (excluding dissolved companies) basic company data from – http://download.companieshouse.gov.uk/en_output.html. Latest file (http://download.companieshouse.gov.uk/BasicCompanyDataAsOneFile-2022-01-01.zip) contains over 5 million records. It’s a csv file however there are records (22 companies) which contain comma and quotes in the data. You might have to do extra work to parse those records correctly. It is…
-
How to get Companies House data using REST API
Companies House provide REST API, which lets you retrieve information about limited companies in UK. In this article, will use python script to get data from Companies House data using REST API. Prerequisites We need to register with Companies House to access REST API. Follow the steps below: Create a developer account Create an application…