Author: broadoakdata

  • Running PostgreSQL database in Termux

    UPDATE – 06/01/2024 added few useful commands UPDATE – 10/03/2023 add bytes json data using – dataset and psycopg2-binary python packages UPDATE – 25/02/2023 After database update and postgresql may not start. It might throw error messages: In this article, we will show how to run PostgreSQL database in Termux on Android device. Create skeleton…

  • Change Data Capture Implementation using PySpark

    In this article, we will describe an approach for Change Data Capture Implementation using PySpark. All the functions are included in the example together with test data. The is an updated version Change data capture ETL pipelines. UPDATE – 10/06/2023 using HIVE SQL to implement find_changes will take less time than processing dataframe using PySpark.…

  • Saving data into BigQuery

    In this article, we will show how to export data to BigQuery. You may use Databricks or Google Colab to write PySpark ETL script for saving data into BigQuery. We will use Databricks and Google Cloud Platform for saving data into BigQuery. Prerequisites Setup an account to use Databricks Community Edition Google Cloud Platform –…

  • Handling errors and warnings in PySpark

    In this article, we will describe how – handling errors and warnings in PySpark can be handled using dataframe. We will use Companies House data to implement methods for handling errors and warnings. The basic ask is to write out errors and warnings in a Hive table or write into error or warning file. Requirements…

  • Generating synthetic data using HIVE SQL

    UPDATE – 07/10/2022 using sparkContext.range() You can use sparkContext.range() function to generate rows and then use withColumn for adding variables to a dataframe. It generate ‘a column with name ‘id’. Drop it you do not need it using df.drop(‘id’). In this article, we will show way of generating synthetic data using HIVE SQL. You can…

  • Data Catalogs

    Data Catalogs and Datasets Datasets are the files and tables that data workers need to find and access. They may reside in a data lake, warehouse, master data repository, or any other shared data resource. Data Catalogs help organise all of your metadata in one place. It is combined with data management and search tools…

  • Companies House Data – Free and Paid (Part 3 of 3)

    What it does not include (only available in XML / Application Protocol Interface – API): Registered office address, Company profile, Search, Officers, Registers, Charges, Filing history, Insolvency, Exemptions, Officer disqualifications, Officer appointments, UK Establishments, Persons with significant control (PSC). For programming technical officers we provide a link to the how. https://developer-specs.company-information.service.gov.uk/companies-house-public-data-api/reference Web address, Directors Details,…

  • Running Jupyter Notebook on Android phone

    Running Jupyter Notebook on Android phone is doable. We will show you the steps to run Jupyter Notebook on your anroid phone. Steps to follow You have Termux instslled Open Termux, then run the following commands in the prompt: $ apt install clang python fftw libzmq freetype libpng pkg-config libcrypt $ LDFLAGS=”-lm -lcompiler_rt” pip install…

  • Companies House Data Free and Paid (Pt. 2 of 3)

    2.            Accounts Data product In the same way Company Basic information is provided in downloadable ZIP files from Companies House, the accounts data is free and can also be downloaded in this format and also other more machine-readable options for digital transformation purposes. The data type is IxBRL or line eXtensible Business Reporting Language and…

  • Companies House Data – Free and Paid (1/3)

    Companies House data is divided into: A) Basic company information as a free service or, B) Complex (/Paid) with a monthly fee for more demanding, regular (automated) calls upon the entire data set. Companies House provide great, easy to read, documentation for developers where option B is applicable. Companies House provide the data in various…