Performance Improvements in Apache Drill

Posted on December 10, 2019 by By Satya Gopi, in Business Intelligence | 0

Prerequisites: ApacheDrill

We are firing a query in Apache drill it is easily taking 3 minutes for fetching just 1 column from a Table,so to overcome we have used to 2 Performance Improvements

Partition Pruning
Parquet meta data caching

Partition Pruning :

Partition pruning allows a query engine to be able to determine and retrieve the smallest needed dataset to answer a given query. Reading small data means fewer cycles on the IO and fewer cycles on the CPU to actually process data.

Example:

create table dfs.tmp.inputcontrolsinfo partition by (`displayDate`,airport_code,location) as 
select 
distinct `displayDate`,
fields[3].control.`modelvalue` as airport_code,
fields[4].control.`modelvalue` as location
from  `observation`

Above partition is doing on basis of displaydate ,airportcode,location,now we can fire the query as below

 Select * from dfs.tmp.inputcontrolsinfo

Partition will work just like as indexing concept only

Parquet metadata caching :

Capability to cache Parquet metadata in Drill. Once the metadata is cached, it can be refreshed as needed, depending on how frequently the datasets change in the environment.

Command to use cache metadata.

REFRESH TABLE METADATA dfs.tmp.inputcontrolsinfo ;

You only have to run the REFRESH TABLE METADATA command against a table once to generate the initial metadata cache file. Thereafter, Drill automatically refreshes stale cache
files when you issue queries against the table. An automatic refresh is triggered when data is modified.The query planner uses the timestamp of the cache file.

In case if you have any queries please get us at support@helicaltech.com

Thanks,
SatyaGopi
BI Developer
Helical IT Solutions Pvt

apache drill

Business Intelligence

drill

open source

0 0 votes

Article Rating

0 Comments

Inline Feedbacks

View all comments

You might also like..

Business Intelligence

Installation of Firebird db

By admin

Steps to install firebird db 1. Go to google and type firebird in search box and then click on first link. License aggrement 2. Click on downloads and then install Firebird latest version(5.0.0). 3. It will navigate to the below...

Software Testing

Defect Life Cycle

By admin

This blog explains about the complete life cycle of a bug and different status of bug from the stage it was identified,fixed,retest and close. What is Defect life cycle? Defect life cycle is the life cycle of a defect or...

Software Testing

Different Levels of Testing in Software Testing

By admin

What are the Levels of Software Testing? In this blog,we are going to understand the various levels of software testing In Software Testing,we have four different levels of testing,which are as mentioned below: Unit Testing Integration Testing System Testing Acceptance...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

Performance Improvements in Apache Drill

Posted on December 10, 2019 by By Satya Gopi, in Business Intelligence | 0

You might also like..

Business Intelligence

Installation of Firebird db

By admin

Software Testing

Defect Life Cycle

By admin

Software Testing

Different Levels of Testing in Software Testing

By admin

Contact Form