Loading Data from CSV into DB in Pentaho Data Integration is very easy because the 3 things are required
- A Created Table
- Text File Input/CSV Input Component
- Table Output
Now, what if we don’t have a table created and we tried loading CSV into a Table then it will fail but what if I told you we can, So when we read our CSV data with the Text File Input/CSV Input it tends to create a MetaData of the CSV.
Hence we can use that to our advantage here and create a SQL CREATE File and then use a Job SQL Component which has the ability to RUN a SQL file created from the Metadata.
Below is an overview of the whole JOB
- getFileName: To the filename and use to that for tableName
- getCSVMeta: We use the Meta Data to create a SQL CREATE File
- Putting the file path
- Getting fields from the CSV
- LoadCSVdata: perform a Load CSV Data into Table
The Manual work you might have to do is this:
Below is the Code in the Modified JavaScript
// get instance of RowMetaInterface, which represents metadata of your input row:
var rowMetaInterface = getInputRowMeta();
// get List of ValueMetaInterface objects, which represent each of your input fields:
var valueMetaList = rowMetaInterface.getValueMetaList();
// get this object to be able to iterate over the list:
var iterator = valueMetaList.iterator();
//Alert(valueMetaList);
var filename = getVariable("TABLENAME", "notablename");
var tablename = filename.replace(".csv","");
setVariable("TABLENAME",tablename, "r");
var allFieldsName = [];
var allFieldsType = [];
var allFieldsLent = [];
var allFieldsTpre = [];
while (iterator.hasNext()) {
// get current ValueMetaInterface object:
var valueMetaInterface = iterator.next();
// The name of current field:
var fieldName = valueMetaInterface.getName();
// getType() returns just an internal integer number of the type:
var fieldLength = valueMetaInterface.getLength()
//Alert(fieldLength)
var typeIndex = valueMetaInterface.getType();
//Alert(typeIndex)
// typeName will hold the actual String name of the type:
var typeName = org.pentaho.di.core.row.ValueMetaInterface.typeCodes[typeIndex];
//Alert(typeName);
var typePrecision = valueMetaInterface.getPrecision();
// wrap Java String into a JavaScript String object to be able to convert them into Json
allFieldsName.fieldName = String(fieldName);
allFieldsName.push(String(fieldName));
allFieldsType.fieldType = String(typeName);
allFieldsType.push(String(typeName));
allFieldsLent.push(String(fieldLength));
allFieldsTpre.push(String(typePrecision));
}
var createScript = "Create Table " + tablename +" (";
for (var i = 0; i < allFieldsName.length; i++) {
if (allFieldsName[i] != "RowNumber") {
var prec = "";
if (allFieldsTpre[i] > 0) {
if (i == allFieldsName.length-2) {
prec = "(" + allFieldsLent[i] + "," + allFieldsTpre[i] + ")";
} else {
prec = "(" + allFieldsLent[i] + "," + allFieldsTpre[i] + "),";
}
} else if (allFieldsTpre[i] < 0 || allFieldsType[i] == "Integer") {
if (i == allFieldsName.length-2) {
prec = "";
} else {
prec = ",";
}
} else {
if (i == allFieldsName.length-2) {
prec = "(" + allFieldsLent[i] + ")";
} else {
prec = "(" + allFieldsLent[i] + "),";
}
}
var DataType;
if (allFieldsType[i] == "String") {
DataType = "Text";
} else if (allFieldsType[i] == "Number") {
DataType = "Numeric";
} else if (allFieldsType[i] == "Integer") {
DataType = "BIGINT";
} else {
DataType = allFieldsType[i];
}
//Alert(i);
//Alert(allFieldsName.length - 1);
createScript += "\n" + allFieldsName[i] + " " + DataType + prec;
if (i == allFieldsName.length-2) {
createScript += "\n" + ")";
}
}
} //THIS ABOVE CREATE SCRIPT IS FOR A POSTGRES DB
I use an if field value is NULL, to replace values (not advisable but when you have data that has lots irregularity then I kept that )
In case if you have any queries please get us at support@helicaltech.com
Thank You
Sohail Izebhijie
Helical IT Solutions Pvt Ltd
Best Open Source Business Intelligence Software Helical Insight Here

A Business Intelligence Framework
Best Open Source Business Intelligence Software Helical Insight is Here
