Tag: kafka
-
Data validation check in streaming data
In this article, I will share my approach (see my previous article for further information on data validation) for carrying out data validation check using streaming data from a Kafka topic. Checking data is very important. Many often this task is get overlooked. One cannot trust source data, especially if you are ingesting data for…
-
Using Apache Flink to process data from Kafka and MySQL database
I need to use Apache Flink to process data which stored in Kafka and MySql. In my previous article I shared my notes on how to use a free MySQL server (db4free.net) instance for development work. Apache Flink is good processing engine and has nice features for manipulating data using Batch and/or Streaming processing. I…
-
Read data from and write data to Kafka using PySpark
I use Google Colab for my development work. I have setup a Kafka server on my local machine (RapsberryPi). The task in hand to use PySpark to read and write data. You can use Batch or Streaming query. Spark has an extensive documentation at their website under the heading – Structured Streaming + Kafka Integration…
-
Streaming data processing using Apache Kafka and Flink
We will use Apache Kafka and Apache Flink to process data from Companies House Stream API . First, we will setup Apache Kafka and Flink in Google Colab platform. You can download the script to install both software from here – https://gitlab.com/akkasali910/companieshouse-data . Or you can use the following bash codes: After running the above…