Dynamic Schema Processor: How to implement security in Pentaho using Dynamic Schema Processor

Implementation of Security in Pentaho using Dynamic Schema Processor

         In my earlier post, I had explained the meaning of Dynamic Schema processor (DSP) and why do we need it. Now if you have a Mondrian Schema uploaded to the BA Server and you would like to modify it in the run time. For instance in our project we change it for every user authenticated to the platform to apply security restrictions. This modification aims to limit records that users are able to see in reports based on that Mondrian Schema. So, basically DSP is used to implement security on the uploaded Mondrian schema.

1. How to implement DSP?

The solution is we have to create a JAR file which extends the Dynamic Schema Processor from Mondrian library. I would be explaining the implementation of DSP using an example,

  • Pentaho Server: Pentaho BA Server 6
  • Database Name: MySQL/sampledata
  • Schema Name: Sales

The aim is to show data based upon the country to which the logged in user belongs to. Below is the resultset of the Users belonging to various countries.

Resultset

Now on top of it, a mondrian schema is created using Schema workbench.

schema

  • Original Schema:

Originalschema

Let’s upload the schema to the BA Server. Click on the Manage Data Source button in Home Page, then open the link Import Analysis from the drop-down menu under the icon on the left to New Data Source button. Select the file with our Mondrian Schema and a data source that links to our database. Alternatively we can also publish it through our Schema workbench using the publish option inside the File menu.

Now we have the schema uploaded on the server. The Mondrian cubes can be accessed using Pentaho analyzer/jPivot/Saiku plug-in. The schema will display the data for all dimensions and facts.

  • Modifying the schema:

Now let’s modify the schema slightly by adding a SQL dialect to the xml schema where the CUSTOMER_W_TER table is mentioned (You can also modify the schema from the DSP itself but for now let’s modify this thing in the schema itself).

 

modifiedSchema

Note: The ‘%PLACE_HOLDER%’ value is a where clause query that is defined in the properties file. It is basically the column in which you want to filter data on. Here the column is country in which the user is present. We have to make sure there is a user table and it has some relationship with the column with which we are filtering the data.

2. Creating Dynamic Schema Processor:

Let’s create the Dynamic Schema Processor that will modify the Mondrian schema file so that when we use the Market dimension, only the records for the specified country are displayed. In order to implement that, we need to create a class that implement a DynamicSchemaProcessor interface (we need to extend the filterDynamicSchemaProcessor/ localizingDynamicSchemaProcessor which implements the interface). Below are the code which extends the DynamicSchemaProcessor interface,

 

DSPCode

The variable filter fetches the where clause query from the properties file which will also be kept in the server folder. The function which fetches the details have been defined in another class,

DSPCode1

DSPCode3

 

3. How the code works?

  • Once user logs into the server, the class “ChangeDynamicSchemaProcessor” is called and the schema is passed into the code and is assigned to the variable “originalschema”.
  • All the properties are loaded using a properties file.
  • The properties file is used to configure the db connection,the where clause which will be added to the schema file, the query to generate Country for that particular user.
  • Once the properties is loaded, the class calls a function “getdetails”(present in the class Connectiontest which is initialized first) which passes the properties file along with username and the role of that particular user.
  • Inside the Connectiontest class, it is checked against the role of that particular user. If the role is Admin, then it will return “1=1” or else it will execute the query defined inside the properties file. Once the query is executed, it will return the country name for that particular user.
  • Once the country name is returned, it returns to the main class where the schema is modified and the where clause is added in place of “%PLACE_HOLDER%” variable.
  • Once the schema is modified, the modified schema is displayed with only the data for that particular country.

 

4. Configuring Dynamic Schema Processor on BA Server:

Once the code is completed, we need to create a JAR file to upload the class on the server. In order to use the newly created Dynamic Schema Processor you have to add the jar to the following location: /biserver-ee/tomcat/webapps/pentaho/WEB-INF/lib/. Then you have to edit the configuration of the schema in the BA Server to specify that this particular DSP should be used for this Schema. You go again to Manage Data Sources select the schema and then click Edit from drop-down menu and then add a new parameter called DynamicSchemaProcessor with value linking to your class (see picture below). Also the properties file needs to added to the following location: /biserver-ee/tomcat/webapps/pentaho/WEB-INF/classes. Once complete, restart the server.

DatasourceLocation

Adding the class name to the parameter list so that whenever the cube is executed, the class is called. Please see the image below

datasourceConfig

Now if we create the same report we will get different results than in the beginning.

 

5. Configuring the Saiku analytics to work with DSP:

The above configuration will work with Pentaho Analyzer. But in case of Community Edition, the Saiku plugin is required to run the analysis. Now since Saiku has its own Mondrian engine, we need to configure on the saiku configuration as well.

Saiku has its own Mondrian engine. In order to work, we have to add the JAR file to the lib folder of Saiku present in pentaho-solutions/system/saiku/lib folder. Once the JAR is loaded, we need to do one more thing in order to make the plugin work.

A file named as plugin.spring.xml is present in the pentaho-solutions/system/saiku folder where the configuration property for Dynamic Schema is commented out. We just need to remove the comments so that Saiku will activate the DSP in its engine. Just uncomment the dynamicschema processor property as shown below,

SaikuConfig

Once it is completed, restart the BA Server. The saiku analytics will work.

6. Checking on the errors and logging:

The problem with DSP is that, if there is any error it won’t let the cube to be displayed on the BA Server. In order to verify and check the errors, we need to directly look into the log files present inside the log folder of the Pentaho Server. We wont be able to see the cube, if we have any issue on the code. If the code is working properly, it would display the modified schema on the pentaho-log.log file.

7. Disadvantages and things to be taken care of:

Dynamic Schema processor is an added scenario that has been provided by Pentaho so that we can have enhanced security on reports. However there are some disadvantages with DSP,

  • DSP doesn’t support jPivot: In the Pentaho Server Community Edition, jPivot is used for analysis reports. But DSP doesn’t work with JPivot reports. We did all kinds of research but it didn’t work. One can configure the Saiku Plugin present in the marketplace and then we can use DSP on CE Server.
  • Performance issues: If the data is huge, then the report might run slowly since it is calling the underlying class.
  • Mention the class name properly because a minor issue and the cube wont display on the BA Server.
  • It is advisable to extend the LocalizingDynamicSchemaProcessor class instead of filterDynamicSchemaProcessor because in some cases the code doesn’t work.

 

8. Conclusion:

DSP is a wonderful thing that pentaho has provided to implement on reports. It helps us to maintain enhanced security. It can be implemented on cubes, prpt reports and also on the interactive reports. The reports get modified dynamically when a user logs into the BA server.

Introduction to Dynamic Schema Processor- Modifying Mondrian Schema in runtime

Introduction to Dynamic Schema Processor- Modifying Mondrian Schema in runtime

What is Mondrian?

Mondrian is an open source OLAP Server which is written in Java. It supports MDX (Multi Dimensional Expression) query language and the XML for analytical and olap4j interface specifications. It reads from SQL and other data sources and aggregates data in a memory cache.

What is Dynamic Schema Processor?

A dynamic schema processor is used to dynamically change a Mondrian schema at runtime. Once the Mondrian schema is uploaded on the BA Server and you’d like to modify it on the runtime, then we need to implement Dynamic Schema Processor. For instance, let’s say we have to show the data related to the user logging into the server then we have to modify the schema dynamically in order to achieve this. This modification aims to limit records that users are able to see in reports based on the Mondrian Schema.

Why do we need Dynamic Schema Processor(DSP)?

Mondrian allows us to define user roles on the Schema itself. But we cannot add the data restriction dynamically. We can add various roles and then we can restrict the data based on that role. However, it is a very lengthy process and it is not advisable if we have many users. Hence, Mondrian has provided a dynamic schema processor which dynamically changes the schema at runtime. To do that we need to create a class that implements a DynamicSchemaProcessor interface (we need to extend the filterDynamicSchemaProcessor/ localizingDynamicSchemaProcessor which implements the interface). Once the class is created, we need to upload the JAR into the server and also we need to call the class (the class created for Dynamic Schema) where the Schema is uploaded.

Architecture of Dynamic Schema Processor:

DSP_Architecture

 

Above is the basic flow of how Dynamic Schema works in the Pentaho Server:

  1. User logs in Pentaho BA Server.
  2. The JAR loaded inside the server is called by the server internally and the class calls the DSP.
  3. The DSP class is called.
  4. The session variable sets the username which is passed into the SQL Query.
  5. The username is passed to the SQL query defined in the properties file(upload the property file in the classes folder) and it fetches the corresponding ID from the user table.
  6. The query is generated and the ID of the corresponding user is fetched.
  7. DSP returns the ID to the schema.
  8. User then access the Analyzer report inside the BA Server.
  9. The cube calls the DSP class (the class name is defined inside the datasource configuration).
  10. The DSP modifies the cube by adding a where clause condition to the schema to filter out the data based on the ID for the corresponding user.
  11. The data for that particular ID is generated.
  12. The analyzer report is generated for the particular user logged in.

Exception: In case of the Admin user, the schema file will not add the where clause for filtering and the user will have access to the whole data.

Conclusion

So this is how the Dynamic Schema Processor works in a Pentaho server. It is an added feature that Pentaho has provided to provide enhanced security on the data that are getting generated on the reports. The reports gets modified dynamically based upon the user who logs into the server.

Regards,

Nitish Kumar Mishra

What is a virtual cube and what is its significance in Mondrian

Virtual Cubes and their significance in Mondrian

Before going into the Virtual cubes, first lets just refresh our memory about what is a cube. A cube is basically a multi dimensional views of data,querying and analytical compoenets to clients. Its is a data processing unit composed of dimensions and fact tables. It acts as a business layer between an end user and the DW. A cube works on the mechanism of Star schema where there can be a fact table and multiple dimension tables and all the dimensions are related to the fact.

Virtual Cubes:

We all now know that each cube can have one fact table and all the dimensions present in the cube are having a relationship with that fact table. Now there are some scenarios in which we have two different cubes and we need to establish a relationship between them. In those kind of scenarios, we have the virtual cubes concept. A virtual cube is a cube in which we can use the dimensions and measures from multiple cubes and generate a analysis report using that cube. if we need to analyze data across multiple cubes, or need to combine information from two fact tables on the same subject but with different granularity — then you must create a virtual cube.

Significance of Virtual cubes:

1. A virtual cube can also be based on a single cube to expose only selected subsets of its measures and dimensions.

2. A virtual cube can include normal or linked cubes as component cubes.

3. Since virtual cubes store only their definitions and not the data of their component cubes, they require virtually no physical storage space. You can use virtual cubes to create combinations and variants of existing cubes without using significant additional storage.

4. A virtual cube can provide a valuable security function by limiting the access of some users when viewing the underlying cubes. If some of a cube’s information is sensitive and not suitable for all users, you can create a virtual cube from the existing cube and omit the sensitive information.

5. Virtual cubes can also be used to implement the concept of Snowflake schema. In virtual cubes, we can combine the information from the fact table on the same subject and also information between two dimension from different cubes. it improves the performance and consumes lesser memory.

implementation of Virtual cube:

1. Dimensions which are defined outside the cube (non-confirmed dimensions) can be used directly inside the virtual cube. However, the dimensions which are defined inside the cube can be accessed by adding the cube name inside the VirtualCubeDimension component.

2. The VirtualCubeMeasure component imports a measure from one of the constituent cubes. It is imported with the same name. If you want to create a formula or rename a measure as you import it, use the CalculatedMember component instead.

However, virtual cubes has a disadvantage if the elements present inside the cube is large in number. Processing a virtual cube automatically triggers processing of all underlying cubes that need to be processed, which can add significant time. So it is advisable to use lesser amount of components and only those components that are actually required in the virtual cube.

Overcome Jasper Reports Complexities using PLPGSQL

This document is to help users to overcome jasper reports complexities using PLPGSQL (Postgres Stored Procedure function). Will look into the complexity with input controls of type multi-select query and how to cascade them.

Creating a stored procedure function in plpsql, as we are dealing with multi-select query we need to create a stored procedure function with Array , there is also function VARDIC available in plpgsql but using Array is much easier.

Complexity 1:

Example: How to create a stored procedure function for type multi-select query input control.

CREATE OR REPLACE FUNCTION multiple_m1(text[])

RETURNS TABLE(comp_id integer, comp_name text, comp_sort text) AS

$$

BEGIN

IF  ‘Khosla’ = ANY($1) THEN

    RETURN QUERY SELECT id::int as comp_id, ltrim(company.compname)::text as compname, ltrim(company.compname)::text as sort

    FROM company where deleteflag = ‘0’ and active = ‘1’

    and id in (270,394,376,396,403);

ELSEIF ” = ANY($1) THEN

    RETURN QUERY (SELECT id::int as comp_id,ltrim(company.compname)::text as compname, ltrim(company.compname)::text as sort

    FROM company where deleteflag = ‘0’ and active = ‘1’

    UNION SELECT -1::int AS id, ‘All’::text AS compname, ‘ ‘ as sort) order by sort;

ELSE

    RETURN QUERY SELECT id::int as comp_id,ltrim(company.compname)::text as compname, ltrim(company.compname)::text as sort

    FROM company where deleteflag = ‘0’ and active = ‘1’

    and (compname like ANY($1) or ” like ANY($1));

END IF;

RETURN;

END;

$$ LANGUAGE plpgsql

 

How to call function in Input Control?

Select * from company_multiple_m1 (‘{“”}’);

{“”} = NULL.

Create another Input Control of type Multi-Select Query. Here is another I/P control where comp_id column value of “multiple_m1” (Input control) is the input of my 2nd I/P Control. So My 2nd I/P control runs based on the comp_id value of 1st I/P control. This is nothing but cascading.

 

CREATE OR REPLACE FUNCTION mulitple_m2(integer, integer[])

 

RETURNS TABLE(id integer, reference integer, job_title text, status text) AS

 

$$

 

BEGIN

 

IF -1 = $1 THEN

 

RETURN QUERY select jobs.id::int, jobs.reference::int, jobs.job_title::text, ltrim(substring(jobs.status,3,char_length(jobs.status)))::text as status 

 

FROM jobs ,company c

 

where jobs.”DeleteFlag” = ‘0’

 

and c.id= jobs.id and c.DeleteFlag = ‘0’ and c.active = ‘1’

 

and (jobs.id = ANY($2) OR ‘-1’= ANY($2))

 

order by jobs.job_title;

 

               

 

ELSEIF 1 = $1 THEN

 

RETURN QUERY select jobs.id::int, jobs.reference::int, jobs.job_title::text, ltrim(substring(jobs.status,3,char_length(jobs.status)))::text as status

 

from jobs ,company c

 

where jobs.”DeleteFlag” = ‘0’ and jobs.status = ‘1: Open Req’

 

and c.id= jobs.id and c.DeleteFlag= ‘0’ and c.active = ‘1’

 

and (jobs.id = ANY($2) OR ‘-1’ = ANY($2))

 

order by jobs.job_title;

 

 

 

ELSE

 

RETURN QUERY select jobs.id::int, jobs.reference::int, jobs.job_title::text, ltrim(substring(jobs.status,3,char_length(jobs.status)))::text as status

 

from jobs ,company c

 

where jobs.”DeleteFlag” = ‘0’ and jobs.status = ‘0: Inactive’

 

and c.id= jobs.id and c.DeleteFlag= ‘0’ and c.active = ‘1’

 

and (jobs.id = ANY($2) OR ‘-1’= ANY($2))

 

order by jobs.job_title;

 

END IF;

 

RETURN;

 

END;

$$ LANGUAGE plpgsql

Cascading value of Parameter 1 into Parameter 2?

SELECT * FROM mulitple_m2($1,$2);

SELECT * FROM mulitple_m2($1,ARRAY[$P!{mulitple_m1}]::integer[]);

 

Complexity 2:

If let’s say your Input Control off type Multi-Select Query value is String and if this value is passed on your SQL Query. Your SQL Query does not works with $X{IN,}

$X{IN,column_name,parameter} does not works with plpgsql. It throws Error Message saying that Column “Column_name” does not exist.

Hence $X {} does not work in Jasper server when dealing with Stored Procedure.

Another work around is to pass $P{} in your SQL instead of $X{} and it throws an Error saying that “parameter type not supported in query : parameter_name class java.util.Collection”. Hence $P can be used.

Another work around is to pass $P{} with exclamation as $P!{parameter} in your SQL instead of $X{} and it throws an Error saying “org.postgresql.util.PSQLException:  syntax error at or near “[” ”

What is that Error? Where is that Syntax Error?

The reason behind this is your jasperserver passing String Values like [AJ, alexandra, Amanda, Amy, andrea, Angie, API]. Your SQL statement does not accept string values without single quotes. Thus in order to add single quotes to the parameter values, we need to perform at java end and then to pass those values on SQLquery.

How to perform it?

1. Add another parameter on the report.

Jasper Reports using PLPGSQL

    2. In expression field using Join function of Java, add single quotes along with Array and pass this parameter into SQL query.

EX:

“ARRAY[‘”+$P{parameter _m1}.join(‘\’,\”)+”‘]”

 

2)      3. Know your SQL query looks like

SELECT * FROM Nofsubmitted_report($P!{parameter _2});

Complexity 3:

If let’s say your Input Control off type Multi-Select Query value is Integer and How we pass this Integer array (collection) to call plpgsql function.

As we know $X{} and $P{} does not works in Jasper Server when dealing with stored procedure function, we have also seen what error throws when we use it.

The solution i found is to pass value with in Array like

SELECT * FROM function_name(ARRAY[$P!{parameter_m1}]);

For any confusion, please get in touch with us at Helical IT Solutions