How to Get Data from XML in Pentaho

Posted on by By admin, in Pentaho | 1

Introduction:

“Get Data From XML” can read data from 3 kind of sources (files, stream and url) in 2 modes (user can define files and urls at static mode or in a dynamic way).

Options Available:

  1. Files Tab
  2. Content Tab
  3. Fields Tab
  4. Additional Output Fields
Make data easy with Helical Insight.
Helical Insight is world's best open source business intelligence tool.
Click Here to Free Download

We can see the “Get data from xml” in Design in pentaho.

Step Name – Unique step name in a single transformation.

Files Tab :

  • XML Source from field is used to declare in which way we will read the data.
    1. XML source is defined in a field : Giving XML data in a certain field in the input stream.
    2. XML source is a filename : Giving filenames in a certain field in the input stream.  These are read.
    3. Read source as URL : Giving URLs in a certain field in the input stream. These are read.
    4. Get XML source from a field : specify the field to read XML, filename or URL from.
  • File or Directory :
  • Specifies the location and/or name of the input text file. 

  • Regular Expression :
  • Specifies the regular expression you want to use to select the files in the directory specified.

  • Exclude Regular Expression :
  • Specifies the regular expression you want to exclude to select the files in the directory specified.

    Make data easy with Helical Insight.
    Helical Insight is world's best open source business intelligence tool.
    Click Here to Free Download

  • Selected Files :
  • Contains a list of selected files and a property specifying if file is required or not. If a file is required and it is not found, an error is generated. Otherwise, the file name is skipped.

      Sample XML file:

      Getting XML file:

      Content Tab:

      Settings:

      Loop XPath : For every “Loop XPath” location we find in the XML file(s), we will output one row of data. This is the main specification we use to flatten the XML file(s). You can use the “Get XPath nodes” button to search for the possible repeating nodes in the XML document. Please note that if the XML document is large that this can take a while.

      Encoding : the XML filename encoding in case none is specified in the XML documents. (yes, those still exist)

      Namespace aware : check this to make the XML document namespace aware.

      Ignore comments : Ignore all comments in the XML document while parsing.

      Validate XML : Validate the XML prior to parsing. Use a token when you want to replace dynamically in a Xpath field value. A token is between @_ and – (@_fieldname-).

      Use token : A token is not related to XML parsing but to PDI.

      Make data easy with Helical Insight.
      Helical Insight is world's best open source business intelligence tool.
      Claim Your 30 Days Free Trail

      Ignore empty file : an empty file is not a valid XML document. Check this if you want to ignore those altogether.

      Do not raise an error if no file: Don’t raise a stink if no files are found.

      Limit : Limits the number of rows to this number (zero (0) means all rows).

      Fields Tab:

      Click on fields then you will get the all the fields present in the XML file and preview it.

      After preview:

      This is how we read the XML files in pentaho.

      Thank You
      Bolle Vani
      BI Developer
      Helical IT Solutions Pvt Ltd

      logo

      Best Open Source Business Intelligence Software Helical Insight Here

      logo

      A Business Intelligence Framework


      logo

      Best Open Source Business Intelligence Software Helical Insight is Here

      logo

      A Business Intelligence Framework

    0 0 votes
    Article Rating
    Subscribe
    Notify of
    1 Comment
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments

    To extract data from XML in Pentaho, you can utilize the built-in “Get Data from XML” step in Pentaho Data Integration (PDI). This tool allows you to define the structure of your XML source and specify the data you wish to extract. For enhanced flexibility and more complex data transformations, consider using Sonra Flexter. Flexter simplifies XML parsing and can efficiently handle large XML files, seamlessly integrating with Pentaho to streamline your data processing workflows.