How to Use External Python Libraries in AWS Glue Job

Posted on May 24, 2019 by By admin, in AWS | 1

Python extension modules and libraries can be used with AWS Glue ETL scripts as long as they are written in pure Python.

Python libraries used in the current Job:

Libraries – Pg8000

Zipping Libraries for Inclusion

The libraries to be used in the development in an AWS Glue job should be packaged in a .zip archive(for Spark Jobs) and .egg(for Python Shell Jobs).

If a library consists of a single Python module in one .py file, it can be used directly instead of using a zip archive.

Make data easy with Helical Insight.
Helical Insight is world’s best open source business intelligence tool.

Click Here to Free Download

Loading Python libraries into AWS Glue job

The libraries are imported in different ways in AWS Glue Spark job and AWS Glue Python Shell job.

Importing Python Libraries into AWS Glue Spark Job(.Zip archive) :

The libraries should be packaged in .zip archive.

Load the zip file of the libraries into s3.
Open the job on which the external libraries are to be used.
Click on Action and Edit Job.
Click on Security configuration, script libraries, and job parameters (optional) and in Python Library Path browse for the zip file in S3 and click save.

Open the job and import the packages in the following format

from package import module as myname

Example : from pg8000 import pg8000 as pg

Prefix the user-defined name from the above step when a method is called from the package.

Example : pg.connect(…) ==> connect is a method in the library.

The above steps works while working with AWS glue Spark job. To implement the same in Python Shell, an .egg file is used instead of .zip.

Importing Python Libraries into AWS Glue Python Shell Job(.egg file)

Libraries should be packaged in .egg file.

Make data easy with Helical Insight.
Helical Insight is world’s best open source business intelligence tool.

Get your 30 Days Trail Version

Creating .egg file of the libraries to be used

Create a new folder and put the libraries to be used inside it.
Then create a setup.py file in the parent directory with the following contents:

from setuptools import setup, find_packages
 
setup(
    name = "pg8000",
    version = "0.1",
    packages = [‘pg8000’]
)

Note: If there are multiple libraries to be archived as .egg, then the folder names of the libraries are to be mentioned in the packages in an array separated by a comma.

Example: packages = [‘libraries’,’comma’,’separated’]

To create .egg, you’ll need to do the following from the command line:

	python setup.py bdist_egg

This will generate three new folders:

Build, Dist and foldername-0.1-py2.7.egg -> 2.7 is the version of the python in which the command which creates the .egg is executed.

Load the .egg file of the libraries into s3.
This .egg file is used instead of a .zip file in the Job properties.

In case if you have any queries please get us at support@helicaltech.com

Thank You
Rajitha
Helical IT Solutions Pvt Ltd

Best Open Source Business Intelligence Software Helical Insight Here

A Business Intelligence Framework

Best Open Source Business Intelligence Software Helical Insight is Here

A Business Intelligence Framework

AWS AWS glue helical tech Python

0 0 votes

Article Rating

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Chirag

4 years ago

Hi Team,

Thanks for the article. I am getting this error. ModuleNotFoundError ‘pg8000’ any idea?

Also if you can share the library or the .zip file that you have used

You might also like..

Helical Insight

Helical IT Solutions Launches Helical Insight 5.2.2 : Focus on Advance Embedded Analytics

By admin

24 Dec 2024: Helical IT Solutions is excited to unveil Helical Insight 5.2.2, the latest iteration of its cutting-edge Open Source Business Intelligence (BI) platform. This release reinforces Helical Insight's position as a cost-effective, versatile, and powerful alternative to mainstream...

Helical Insight 5.2.1

Helical IT Solutions Launches Helical Insight 5.2.1: Elevating Open Source BI to New Heights

By admin

02 Sept 2024 – Helical IT Solutions is thrilled to announce the release of Helical Insight version 5.2.1, the latest upgrade to its Open Source Business Intelligence (BI) platform. This new version delivers a powerful, cost-effective BI solution that is...

Business Intelligence

Installation of Firebird db

By admin

Steps to install firebird db 1. Go to google and type firebird in search box and then click on first link. License aggrement 2. Click on downloads and then install Firebird latest version(5.0.0). 3. It will navigate to the below...

About Helical IT Solutions Pvt Ltd

Location

Contact Us

Search what you are looking for..

How to Use External Python Libraries in AWS Glue Job

Posted on May 24, 2019 by By admin, in AWS | 1

Zipping Libraries for Inclusion

Loading Python libraries into AWS Glue job

Creating .egg file of the libraries to be used

A Business Intelligence Framework

A Business Intelligence Framework

You might also like..

Helical Insight

Helical IT Solutions Launches Helical Insight 5.2.2 : Focus on Advance Embedded Analytics

By admin

Helical Insight 5.2.1

Helical IT Solutions Launches Helical Insight 5.2.1: Elevating Open Source BI to New Heights

By admin

Business Intelligence

Installation of Firebird db

By admin

Contact Form