Data Catalog module is responsible for going through the connected data sources and preparing a catalog. This work is done through a crawler. This data catalog is having more detailed granular information like the tables and columns present, the data types etc. This preparing of data catalog can be done on various kind of data sources which includes relational databases, AWS RDS, S3 files, CSV, TSV, JSON, Parquet file etc. The data catalog can have the metadata information about multiple data sources as well.

As of now AWS Glue is having less prebuilt components and for doing a lot of transformations related work often Python code is required. We at Helical have excellent knowledge on ETL DW concepts as well as Python. We have been part of various AWS Glue ETL projects end to end. Please get in touch to schedule a call with our AWS Certified consultants, see a demo of our implementations and organize a free POC.