Components

Overview of Pentaho Pro components ..

Pentaho Client / Server Architecture

Pentaho's client/server architecture forms the basis of its data integration and business analytics suite, providing a flexible and scalable platform for enterprise data management and analysis. The architecture is designed to support various data integration, reporting, and analytics needs across an organization.

Client / Server
Port Number
Description

5432

PostgreSQL Server

8080

Pentaho Server Tomcat Web Server Startup Port

8012

Pentaho Server Shutdown Port

9001

HSQL Server Port

9092

Embedded H2 Database

Key components include:

Pentaho Client Tools

The Pentaho Client, a key component of the Pentaho suite, encompasses several user-facing tools designed for data management and analytics. These include the Data Integration tool (PDI), which is central to extracting, transforming, and loading (ETL) operations; Spoon, a graphical user interface for designing ETL processes; Designer for convenient pipeline design; Scheduler linked to Quartz for job scheduling; Repository Browser for managing ETL assets; and Database Explorer for database operations.

Additionally, it offers tools like Metadata Editor and Schema Workbench for advanced data manipulation. Together, these tools empower users to efficiently process and analyze data within the Pentaho ecosystem.

Data Integration

Pentaho Data Integration (PDI), also known as Kettle, is an open-source data integration tool that allows the extraction, transformation, and loading (ETL) of data into databases, data warehouses, and business applications. It is designed to handle a wide variety of data sources including traditional relational databases, unstructured data formats, and cloud-based storage. PDI is composed of several key components that work together to provide a comprehensive ETL solution.

Pentaho Client / Server Architecture

Spoon

Spoon is the graphical user interface (GUI) for designing and testing PDI jobs and transformations. It allows users to visually create, edit, and manage ETL processes without writing code.

Designer

Drag & Drop 'objects' to design your pipelines and workflows.

Scheduler

Connects to Quartz scheduler on server. Jobs and transformations must be uploaded to Repository.

Repository Browser

The repository is a central storage area for PDI resources such as jobs, transformations, and database connections. It facilitates collaboration among team members by allowing them to share and manage ETL assets efficiently.

These components collectively make PDI a powerful tool for data integration, enabling businesses to cleanse, integrate, and analyze data from diverse sources more effectively.

Connects to Apache Jackrabbit content Repository, pointing to a supported database:

  • PostgreSQL

  • MSSQL Server

  • Oracle

  • MySQL

  • MariaDB

DB Explorer

Database Explorer that enables you to conduct minimal database operations.

Last updated

Was this helpful?