Loading Data from CSV 2.0

Posted on by By Sohail, in Pentaho | 0

Loading Data from CSV into DB in Pentaho Data Integration is very easy because the 3 things are required

  1. A Created Table
  2. Text File Input/CSV Input Component
  3. Table Output

Now, what if we don’t have a table created and we tried loading CSV into a Table then it will fail but what if I told you we can, So when we read our CSV data with the Text File Input/CSV Input it tends to create a MetaData of the CSV.

Hence we can use that to our advantage here and create a SQL CREATE File and then use a Job SQL Component which has the ability to RUN a SQL file created from the Metadata.

Below is an overview of the whole JOB

  1. getFileName: To the filename and use to that for tableName
  2. getCSVMeta: We use the Meta Data to create a SQL CREATE File
  3. The Manual work you might have to do is this:

    1. Putting the file path
    2. Getting fields from the CSV

    Below is the Code in the Modified JavaScript

    // get instance of RowMetaInterface, which represents metadata of your input row:
    var rowMetaInterface = getInputRowMeta();
    // get List of ValueMetaInterface objects, which represent each of your input fields:
    var valueMetaList = rowMetaInterface.getValueMetaList();
    // get this object to be able to iterate over the list:
    var iterator = valueMetaList.iterator();
    //Alert(valueMetaList);
    var filename = getVariable("TABLENAME", "notablename");
    var tablename = filename.replace(".csv","");
    setVariable("TABLENAME",tablename, "r");
    
    var allFieldsName = [];
    var allFieldsType = [];
    var allFieldsLent = [];
    var allFieldsTpre = [];
    
    while (iterator.hasNext()) {
        // get current ValueMetaInterface object:
        var valueMetaInterface = iterator.next();
        // The name of current field:
        var fieldName = valueMetaInterface.getName();
        // getType() returns just an internal integer number of the type:
        var fieldLength = valueMetaInterface.getLength()
        //Alert(fieldLength)
        var typeIndex = valueMetaInterface.getType();
        //Alert(typeIndex)
        // typeName will hold the actual String name of the type:
        var typeName = org.pentaho.di.core.row.ValueMetaInterface.typeCodes[typeIndex];
        //Alert(typeName);
        var typePrecision = valueMetaInterface.getPrecision();
    
        // wrap Java String into a JavaScript String object to be able to convert them into Json
        allFieldsName.fieldName = String(fieldName);
        allFieldsName.push(String(fieldName));
        allFieldsType.fieldType = String(typeName);
        allFieldsType.push(String(typeName));
        allFieldsLent.push(String(fieldLength));
        allFieldsTpre.push(String(typePrecision));
    
    }
    
    
    var createScript = "Create Table " + tablename +" (";
    
    for (var i = 0; i < allFieldsName.length; i++) {
        if (allFieldsName[i] != "RowNumber") {
            var prec = "";
    
    
            if (allFieldsTpre[i] > 0) {
                if (i == allFieldsName.length-2) {
                    prec = "(" + allFieldsLent[i] + "," + allFieldsTpre[i] + ")";
                } else {
                    prec = "(" + allFieldsLent[i] + "," + allFieldsTpre[i] + "),";
                }
            } else if (allFieldsTpre[i] < 0 || allFieldsType[i] == "Integer") {
                if (i == allFieldsName.length-2) {
                    prec = "";
                } else {
                    prec = ",";
                }
            } else {
                if (i == allFieldsName.length-2) {
                    prec = "(" + allFieldsLent[i] + ")";
                } else {
                    prec = "(" + allFieldsLent[i] + "),";
                }
            }
            var DataType; 
            if (allFieldsType[i] == "String") {
                DataType = "Text";
            } else if (allFieldsType[i] == "Number") {
                DataType = "Numeric";
            } else if (allFieldsType[i] == "Integer") {
                DataType = "BIGINT";
            } else {
                DataType = allFieldsType[i];
            }
    		//Alert(i);
    		//Alert(allFieldsName.length - 1);
            createScript += "\n" + allFieldsName[i] + " " + DataType + prec;
    		if (i == allFieldsName.length-2) {
    			createScript +=  "\n" + ")";
    		} 
        }
    } //THIS ABOVE CREATE SCRIPT IS FOR A POSTGRES DB
    
  4. LoadCSVdata: perform a Load CSV Data into Table

I use an if field value is NULL, to replace values (not advisable but when you have data that has lots irregularity then I kept that )

In case if you have any queries please get us at support@helicaltech.com

Thank You
Sohail Izebhijie
Helical IT Solutions Pvt Ltd

logo

Best Open Source Business Intelligence Software Helical Insight Here

logo

A Business Intelligence Framework


logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments