MongoDB Connector for Apache Spark
Spark supports connector for MongoDB , through which we can execute SQL queries on MongoDB as well as we can perform spark RDD tranformations on MongoDB data.
With the connector, you have access to all Spark libraries for use with MongoDB datasets: Datasets for analysis with SQL (benefiting from automatic schema inference).
The MongoDB Connector for Spark is compatible with:
1. MongoDB 2.6 or later and
2. Apache Spark version 1.6.x or later.
To implement MongoDB Spark Connector in java we need to install/download the spark-mongoDB-connector dependency.
Following are the examples of spark MongoDB Connector implemented in java :
MongoDB Spark RDD Example
- Below is the example of MongoDB Spark RDD transformation.
- We just connected to MongoDB(collection/table) through spark.
- Performed RDD transformations on MongoDB data like convert first row to json data format and number of rows in collection/table.
- Use mongoDB connection details(ex.hostname,portnumber,dbname/collection name)
Note : To execute provided sample project you need to provide mongoDB connections according to your mongoDB.
Refer and download below attached sample java project of spark-RRD :
SparkMongoDB-RDD
MongoDB Spark SQL Example
Below is the example of MongoDB Spark SQL queries.
We just connected to MongoDB(collection/table) through spark.
We can dynamically pass database name and collection/table name as well as query to application.
According to that query will get executed and will get result set.
When you run the following project , you need to pass the database name and collection name/table name shown below:
Note : To execute provided sample project you need to provide mongoDB connections according to your mongoDB.
Refer and download below attached sample java project of spark-SQL:
SparkMongoDB
For reference you can use below queries on different collection/tables.
Following are some sample Spark SQL MongoDB queries
DBname : foodmart
Table Name : employee_test
1. SELECT name FROM tmp
2. SELECT name.first from tmp
3. SELECT name from tmp where name.first=’Merry’
4. SELECT name.last as lastname from tmp
5. SELECT department from tmp limit 1
6. SELECT * from tmp
7. SELECT * from tmp where _id>1.0
8. SELECT * from tmp where city=’New York’
9. SELECT * from tmp order by _id desc
10.SELECT name from tmp group by name
11.SELECT sum(_id) from tmp
DBname : foodmart
Table Name : customer_sales
1. SELECT max(store_sales) from tmp
DBname : foodmart
Table Name : employee
1. SELECT full_name,count(employee_id) from tmp where salary>50000 group by full_name
2. SELECT * from tmp where education_level=’Graduate Degree’
3. SELECT month(birth_date)from tmp limit 2
4. SELECT year(birth_date) fom tmp where full_name=’Sheri Nowmer’
5. SELECT full_name from tmp where full_name like ‘She%’
DBname : foodmart
Table Name : totalSalary
1. SELECT value.FirstName from tmp
Best Open Source Business Intelligence Software Helical Insight is Here
A Business Intelligence Framework
Thanks,
Sayali Mahale