Building Open Source ETL Solutions with Pentaho Data Integration
By Matt Casters, Roland Bouman, Jos van Dongen
Final Release Date: August 2010
The ultimate resource on building and deploying data integrationsolutions with Kettle
Kettle is a scaleable and extensible open source ETL and dataintegration tool that lets you extract data from databases, flatand XML files, web services, ERP systems, and OLAP cubes. Itprovides over 120 built-in transformation steps to validate,cleanse, and conform data, as well as numerous options to load datainto data warehouses and many other targets. Kettle is acomprehensive, low-cost alternative to traditional data integrationtools like Informatica PowerCenter, IBM InfoSphere DataStage, andBusinessObjects Data Integrator.
This book explains in detail how to use Kettle to create, test,and deploy your own ETL and data integration solutions. You'lllearn to use Kettle's programs to create transformations and jobs,use version control, audit data, and schedule your ETL solution.Then you'll progress to more advanced concepts such as clusteringand cloud computing, real-time data integration, loading a DataVault model, and extending Kettle by building your own plugins. Inaddition, you'll find hands-on examples and case studies that showexactly how to put Kettle's features into practice.
Explore the components of the Kettle ETL toolset
Discover how to install and configure Kettle and connect it tovarious data sources and targets
Design and build every aspect of an ETL solution usingKettle
Learn how to load a data warehouse with Kettle
Understand the steps for deploying and scheduling ETLsolutions
Gain the skills to integrate Kettle with third-partyproducts
Learn to extend Kettle and build your own plugins
Use clustering and cloud computing to scale and improve theperformance of your Kettle ETL solutions
Find out how to use Kettle for real-time data integration