Author: broadoakdata
-
Contact Form using HTML, JavaScript, CSS, PHP and MySQL
Creating a contact form using HTML and saving data in MySQL. Prerequisites Create a table in MySQL database HTML form MySQL connection settings Processing form data Screenshots
-
Import data from google sheets to a MySQL table
Saving data from Google. Sheets to MySQL table using Apps Script. Prerequisites MySQL table Apps Script to insert data into MySQL Data in Google Sheets
-
Companies House Stream API
The Companies House streaming API gives you access to realtime data changes of the information held at Companies House. It pushes data changes to your client . You need API credentials to use Companies House Stream API. UPDATE: 10/06/2023 We look into persons with significant control API. To access streaming data using Python. End points…
-
Upgrading PHP 7.4 to 8.2 on Raspberry PI
Latest version of WordPress requires PHP 8.0+. Install PHP 8.2 and common modules
-
Processing UK Companies House PSC Data
We will look into the People with significant control (PSC) snapshot data which can be downloaded from Companies House website. Snapshot data file is provided in JSON format and can be downloaded as a single file or as multiple files for ease of downloading. We will use single file in Google CoLab to carry out…
-
Synthetic data creation using mimesys and pyspark
In the previous article we described how to generate synthetic data using HIVE SQL. However in this we will use python package mimesys together with pyspark for synthetic data. Prerequisite for generating synthetic data Create dataframe with 10 rows For example, we will create a dataframe with 10 rows. See code snippets below: Synthetic data…
-
Processing Charity Data using Google Colab
Data related to registered charities in England and Wales can be downloaded from https://register-of-charities.charitycommission.gov.uk/register/full-register-download. Charity data and running PySpark under Google CoLab We will use Google Colab to download public available data from Charity Commission website. Transformed or enrich data will saved in Google BigQuery. Please read data definition before ingesting and carrying out exploratory…
-
Reading GLIEF data in XML format and storing to BigQuery
In the previous article – Processing GLIEF data in JSON format, described how to ingest data into Databricks (community edition) data lake using PySpark. However we were unable to process GLEIF Golden Copy JSON format file due memory issue and complexity of nested JSON objects. The input file size (after unzipping the file size was…
-
Processing GLIEF data in JSON format
UPDATE – 13/04/2024 – There is a work around which involved updating JSON file using linux tools such as – sed and awk. The file contains array of JSON objects. The issue is that Apache Spark read the whole contents instead of processing each JSON object. Problem Get dataset and unzip the dataset i.e. after…