Parallel Execution of Jobs in Talend

Posted on by By admin, in Talend | 0

Since Talend is a java-code generator, we can run jobs and subjobs in multiple threads to reduce the runtime of a job.
There are multiple techniques to execute the talend jobs in parallel.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Click Here to Free Download

What is parallelization:

If there are multiple subjobs that are not dependent on each other, Talend executes the subjobs sequentially, ie., wait for one subjob to finish its execution to start another subjob. This process might take a lot of time to execute, depending on the number of subjobs to run. Hence, in talend jobs, the data flow can be partitioned into multiple threads. These threads execute in parallel so that there is a significant reduction in the runtime of the job.

Parallelization can be achieved in 3 ways.

  1. Enable multi-thread execution.
  2. Use tParallelize component(The tParallelize component is only available in the Enterprise Edition of Talend)
  3. Use parallel execution for execution plan(TAC).

This blog discusses parallelization using multi-thread execution option available in Talend Open Studio.

Enable multi-thread execution

This feature in talend, allows multiple jobs or subjobs to execute in parallel, provided they are not interdependent.

In the job tab of the job settings, enable the “Multi-thread execution” option provided to execute the subjobs in parallel.

Below is a simple job to demonstrate how parallel execution can be achieved by enabling “Multi-thread execution” option and how it behaves without enabling the option.

Sample job below has two subjobs which produce the timestamp at which the job started its execution and displays it using tLogRow component. In the first subjob, it produces the timestamp at which the job started and waits for 3 seconds using tSleep component and then displays it. And in the second subjob, it produces the timestamp and displays it immediately.

Sample job

  • Without enabling the Multi-thread execution option:
  • enabling multi-thread

  • After executing the job the current timestamp generated are :
  • executing the job

    The timestamp generated by the first subjob is 2018-10-10 13:06:47
    The timestamp generated by the second subjob is 2018-10-10 13:06:50

    Hence the second subjob is executed after the first subjob execution is completed.

  • After enabling the Multi-thread execution option:
  • enabling the Multi-thread

  • After executing the job the current timestamp generated are :

current timestamp generated

The timestamp generated by first subjob is 2018-10-10 13:16:54
The timestamp generated by second subjob is 2018-10-10 13:16:54

Hence the second subjob executed parallel to the first subjob execution.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Grab The 30 Days Free Trail

** NOTE :

  • If you’re going to run two SubJobs in parallel, then you need to consider the dependencies between these two SubJobs. There may also be subsequent SubJobs that are dependent on the completion of both of these two SubJobs.
  • This feature is optimal when the number of threads do not exceed the number of processors of the machine you use for parallel executions. Else, some of the Subjobs have to wait until any processor is freed up.
logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

In Case if you have any queries please get us at support@helicaltech.com

Thanks
Rajitha
Helical IT Solutions Pvt Ltd

4 1 vote
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments