Components
Overview of Pentaho Pro components ..
Last updated
Was this helpful?
Overview of Pentaho Pro components ..
Last updated
Was this helpful?
Pentaho's client/server architecture forms the basis of its data integration and business analytics suite, providing a flexible and scalable platform for enterprise data management and analysis. The architecture is designed to support various data integration, reporting, and analytics needs across an organization.
5432
PostgreSQL Server
8080
Pentaho Server Tomcat Web Server Startup Port
8012
Pentaho Server Shutdown Port
9001
HSQL Server Port
9092
Embedded H2 Database
Key components include:
The Pentaho Client, a key component of the Pentaho suite, encompasses several user-facing tools designed for data management and analytics. These include the Data Integration tool (PDI), which is central to extracting, transforming, and loading (ETL) operations; Spoon, a graphical user interface for designing ETL processes; Designer for convenient pipeline design; Scheduler linked to Quartz for job scheduling; Repository Browser for managing ETL assets; and Database Explorer for database operations.
Additionally, it offers tools like Metadata Editor and Schema Workbench for advanced data manipulation. Together, these tools empower users to efficiently process and analyze data within the Pentaho ecosystem.
Pentaho Data Integration (PDI), also known as Kettle, is an open-source data integration tool that allows the extraction, transformation, and loading (ETL) of data into databases, data warehouses, and business applications. It is designed to handle a wide variety of data sources including traditional relational databases, unstructured data formats, and cloud-based storage. PDI is composed of several key components that work together to provide a comprehensive ETL solution.
Spoon is the graphical user interface (GUI) for designing and testing PDI jobs and transformations. It allows users to visually create, edit, and manage ETL processes without writing code.
Drag & Drop 'objects' to design your pipelines and workflows.
Connects to Quartz scheduler on server. Jobs and transformations must be uploaded to Repository.
The repository is a central storage area for PDI resources such as jobs, transformations, and database connections. It facilitates collaboration among team members by allowing them to share and manage ETL assets efficiently.
These components collectively make PDI a powerful tool for data integration, enabling businesses to cleanse, integrate, and analyze data from diverse sources more effectively.
Connects to Apache Jackrabbit content Repository, pointing to a supported database:
PostgreSQL
MSSQL Server
Oracle
MySQL
MariaDB
Database Explorer that enables you to conduct minimal database operations.