Introduction:
“Get Data From XML” can read data from 3 kind of sources (files, stream and url) in 2 modes (user can define files and urls at static mode or in a dynamic way).
Options Available:
- Files Tab
- Content Tab
- Fields Tab
- Additional Output Fields
Make data easy with Helical Insight. Helical Insight is world's best open source business intelligence tool.
Click Here to Free Download
We can see the “Get data from xml” in Design in pentaho.
Step Name – Unique step name in a single transformation.
Files Tab :
- XML Source from field is used to declare in which way we will read the data.
- XML source is defined in a field : Giving XML data in a certain field in the input stream.
- XML source is a filename : Giving filenames in a certain field in the input stream. These are read.
- Read source as URL : Giving URLs in a certain field in the input stream. These are read.
- Get XML source from a field : specify the field to read XML, filename or URL from.
Specifies the location and/or name of the input text file.
Specifies the regular expression you want to use to select the files in the directory specified.
Specifies the regular expression you want to exclude to select the files in the directory specified.
Make data easy with Helical Insight. Helical Insight is world's best open source business intelligence tool.
Click Here to Free Download
Contains a list of selected files and a property specifying if file is required or not. If a file is required and it is not found, an error is generated. Otherwise, the file name is skipped.
Sample XML file:
Getting XML file:
Content Tab:
Settings:
Loop XPath : For every “Loop XPath” location we find in the XML file(s), we will output one row of data. This is the main specification we use to flatten the XML file(s). You can use the “Get XPath nodes” button to search for the possible repeating nodes in the XML document. Please note that if the XML document is large that this can take a while.
Encoding : the XML filename encoding in case none is specified in the XML documents. (yes, those still exist)
Namespace aware : check this to make the XML document namespace aware.
Ignore comments : Ignore all comments in the XML document while parsing.
Validate XML : Validate the XML prior to parsing. Use a token when you want to replace dynamically in a Xpath field value. A token is between @_ and – (@_fieldname-).
Use token : A token is not related to XML parsing but to PDI.
Make data easy with Helical Insight. Helical Insight is world's best open source business intelligence tool.
Claim Your 30 Days Free Trail
Ignore empty file : an empty file is not a valid XML document. Check this if you want to ignore those altogether.
Do not raise an error if no file: Don’t raise a stink if no files are found.
Limit : Limits the number of rows to this number (zero (0) means all rows).
Fields Tab:
Click on fields then you will get the all the fields present in the XML file and preview it.
After preview:
This is how we read the XML files in pentaho.
Thank You
Bolle Vani
BI Developer
Helical IT Solutions Pvt Ltd
Best Open Source Business Intelligence Software Helical Insight Here
To extract data from XML in Pentaho, you can utilize the built-in “Get Data from XML” step in Pentaho Data Integration (PDI). This tool allows you to define the structure of your XML source and specify the data you wish to extract. For enhanced flexibility and more complex data transformations, consider using Sonra Flexter. Flexter simplifies XML parsing and can efficiently handle large XML files, seamlessly integrating with Pentaho to streamline your data processing workflows.