Insert file contents into the database using Talend
There was a requirement to read a dynamically generated text file and put its contents inside the database which could be MySQL, Postgres or SQL server etc. In Talend you can’t change components, metadata and connections seamlessly from one database to another. Talend have provided independent component for respective database to perform various tasks. In order to solve this problem, I have used Talend JDBC component and created a project to perform this task. Whenever you want to change the database, you just need to change the database connection properties like JDBC URL, JDBC class name, JDBC database name etc. After providing the details Talend would automatically connect to respective database and perform its task.
Which database datatype to be used?
Another issue is related to datatype of target database and Talend component metadata. Following different datatypes are used by few databases
- MySQL supports Text/blob,
- Postgres has typea or Text,
- SQL server have deprecated text datatype and recommending varchar(max).
For above scenario, I have used Text as a datatype for mysql, varchar(max) for sql server and text for Postgres. At the Talend side, set the output metadata to string.
How to read the file?
File reading needs metadata creation in repository, and an additional component to read the file. This extra work is avoided by using Talend Routine which reads file contents from input file path and returns all the contents in the string format to the job.
package routines;
import java.nio.file.Files;
import java.nio.file.Paths;
/*
This routine reads a file and covert it into the Byte code and return it to the function.
*/
public class read_csv_file {
public static String ByteArrayFromFile(String filepath) {
try{
byte[] incoming_file_data = Files.readAllBytes(Paths.get(filepath));
return new String(incoming_file_data);
}
catch(Exception err){
err.printStackTrace();return null;
}
}
}
Above routine reads the file and returns its contents to the function.
A fixed flow component can call this function to flow data to tJDBCoutput component. This would automatically insert all the contents of text file into the respective database column.
Following value is used in tFixedFlow component for respective database column
read_csv_file.ByteArrayFromFile((String)globalMap.get(“fileName”))
Advantages of this method
- This is database independent method for file content insertion
- Seamless integration with multiple database
- Platform independent
- Supports almost all relational JDBC databases
- Easy to implement
As a conclusion, using Talend JDBC component, it is possible to connect to multiple databases and same project/design could be re-used for different types of databases without making any change to the project.
Vaibhav Waghmare
Helical IT Solutions
Best Open Source Business Intelligence Software Helical Insight is Here