Category: PDI

  • Installing jdk 1.8 on Raspberry Pi 2

    Embbeded Pentaho engine requires Java 8. Follow the steps below to install jdk 1.8 on Raspberry Pi: Get JDK1.8 for for Linux ARM v6/v7 Hard Float ABI from http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html sudo tar zxvf jdk-8u271-linux-arm32-vfp-hflt.tar.gz -C /opt Set default java and javac to the new installed jdk8. sudo update-alternatives –install /usr/bin/javac javac /opt/jdk1.8.0_271/bin/javac 1 sudo update-alternatives –install /usr/bin/java…

  • Embedded PDI and use of Avro format file

    As you may recall from the last article (Embedded PDI and big-data-plugin) that it is difficult to configure big-data-plugin to use from Embedded PDI engine. However you can use the depreciated big-data-plugin steps without any problem. Avro is data format that bundles serialized data with the data’s schema in the same file. Avro is preferred format…

  • Embedded PDI and big-data-plugin

    How do you make pentaho-big-data-plugin available to your application? From version 6, the PDI has added support for Karaf, it is an OSGI platform that allows developers to create dynamically binding modules which can be added and removed from a running platform without the need for a restart. This feature makes it difficult to use…

  • Deploy Embedded PDI in a Kubernetes cluster

    Suppose you have a Kubernetes cluster up and running. You want to deploy a docker container (https://hub.docker.com/repository/docker/aliakkas/karatepdi). Karatepadi is a Spring Boot Rest application for automating ETL testing using Behaviour Driven Development framework Karate and embedded Pentaho Data Integration engine. You can use it to execute PDI job/transformation or running Karate feature file to produce…

  • Monitor Kubernetes cluster with Prometheus and Grafana

    Suppose you have a Kubernetes cluster up and running. How do you monitor it? Well obvious choice is to use Prometheus (gather metrics data of cluster) and Grafana (for visualising metrics data in a dashbaord ). Prerequisites working kubernetes cluster with kubectl configured locally installed kubectl and helm commands RBAC authorisation already setup Setting up…

  • How to copy BigQuery tables between locations with PDI

    Suppose you have accidentally created a dataset in US region instead of EU. Your dataset has few tables with large amount of data. How do you copy or move data from one region to another. You could use command line tool bq cp: Unfortunately the copy command does not support cross region copies. You can…

  • Writing AVRO file using PDI

    Pentaho provides number of Hadoop distributions (shims) which are included in the distribution. You need to enable shims in order to write and read data in avro or parquet format: locate pentaho-big-data-plugin and shims directory ${PENTAHO}/data-integration/plugins/pentaho-big-data-plugin and edit plugin.properties and set active.hadoop.configuration=hdp30 cross check shims name under pentaho-big-data-plugin/hadoop-configurations directory You need google-bigquery plugin and give…

  • No joy with Google App Engine

    Deploying KaratePDI (a Spring Boot application) to Google App Engine. It was an expensive mistake for number of reasons: unable to integrate Google storage with embedded Pentaho Data Integration engine require refactoring of the code Google App Engine charges by hour per instances – will get charges for not using the app! what a scum!!!!…

  • Copy data from local MySQL database to Google Cloud MySQL

    The main job performs two tasks: insert latest records into the target database update the target database with delta records PDI crashes if it tries to load more than few millions records at a single run. The approach is taken to overcome this limitation by processing data in smaller data chunks. How it works get…

  • Access to Google Cloud SQL (MySQL) using DBeaver

    Prerequisites Get Google Service Account private key file in json Setup environment variable   GOOGLE_APPLICATION_CREDENTIALS – name of the private key file with full path  Driver settings Setup a new connection and select MySQL 8+ database Edit Driver Settings  Add the following files to Libraries: Mysql-socket-factory-connector-j-8-1.0.15-jar-with-dependencies.jar Mysql-connector-java-8.0.19.jar Add new properties: socketFactory com.mysql.cj.protocol.StandardSocketFactory cloudSqlInstance google-project-name:region:instanceName