Mondrian Cache Refresh Automation Mechanism in Pentaho BA Server

REFRESH MONDRIAN CACHE ON THE BA SERVER

One of the strengths of mondrian’s design is that you don’t need to do any processing to populate special data structures before you start running OLAP queries. More than a few people have observed that this makes mondrian an excellent choice for ‘real-time OLAP’ — running multi-dimensional queries on a database which is constantly changing.

The Cache stores data in byte code format in memory which helps in great performance enhancements. However, when you have a cube with (near-)real-time data, the caching can get in your way. If, for instance, your data is updated every ten minutes, but the data from an hour ago is still cached, your dashboard won’t show your new data.

There are two steps to clear the cache from the server.

Step 1: After you login to the BA Server, Go to Tools –> Refresh. There you can manually clear the cache.

Step 2: There is another way of clearing the Mondrian Cache is by using a Kettle job to clear your cache after you complete the insertion/updating of the new data. First we have to create a transformation. Inside the transformation, using the HTTP Client Step we can clear the cache through the ETL(Before that just use a Generate Rows step since the HTTP step works if there is a step before it) by passing the following URL inside the step “localhost:8080/pentaho/api/system/refresh/modrianSchemaCache” and also the user credentials. It clears the Mondrian cache every time the ETL runs on the server. It clears cache for all the cubes that are present in the server so that it is ready for the fresh data and we wont face any cache related issue which has always been a headache for real time data.

Similarly, we can also use this mechanism to clear the cache for reporting metadata and also for the repository cache. We just have to follow the same procedure in this also.

 

Regards,

Nitish Kumar Mishra

Dynamic Schema Processor: How to implement security in Pentaho using Dynamic Schema Processor

Implementation of Security in Pentaho using Dynamic Schema Processor

         In my earlier post, I had explained the meaning of Dynamic Schema processor (DSP) and why do we need it. Now if you have a Mondrian Schema uploaded to the BA Server and you would like to modify it in the run time. For instance in our project we change it for every user authenticated to the platform to apply security restrictions. This modification aims to limit records that users are able to see in reports based on that Mondrian Schema. So, basically DSP is used to implement security on the uploaded Mondrian schema.

1. How to implement DSP?

The solution is we have to create a JAR file which extends the Dynamic Schema Processor from Mondrian library. I would be explaining the implementation of DSP using an example,

  • Pentaho Server: Pentaho BA Server 6
  • Database Name: MySQL/sampledata
  • Schema Name: Sales

The aim is to show data based upon the country to which the logged in user belongs to. Below is the resultset of the Users belonging to various countries.

Resultset

Now on top of it, a mondrian schema is created using Schema workbench.

schema

  • Original Schema:

Originalschema

Let’s upload the schema to the BA Server. Click on the Manage Data Source button in Home Page, then open the link Import Analysis from the drop-down menu under the icon on the left to New Data Source button. Select the file with our Mondrian Schema and a data source that links to our database. Alternatively we can also publish it through our Schema workbench using the publish option inside the File menu.

Now we have the schema uploaded on the server. The Mondrian cubes can be accessed using Pentaho analyzer/jPivot/Saiku plug-in. The schema will display the data for all dimensions and facts.

  • Modifying the schema:

Now let’s modify the schema slightly by adding a SQL dialect to the xml schema where the CUSTOMER_W_TER table is mentioned (You can also modify the schema from the DSP itself but for now let’s modify this thing in the schema itself).

 

modifiedSchema

Note: The ‘%PLACE_HOLDER%’ value is a where clause query that is defined in the properties file. It is basically the column in which you want to filter data on. Here the column is country in which the user is present. We have to make sure there is a user table and it has some relationship with the column with which we are filtering the data.

2. Creating Dynamic Schema Processor:

Let’s create the Dynamic Schema Processor that will modify the Mondrian schema file so that when we use the Market dimension, only the records for the specified country are displayed. In order to implement that, we need to create a class that implement a DynamicSchemaProcessor interface (we need to extend the filterDynamicSchemaProcessor/ localizingDynamicSchemaProcessor which implements the interface). Below are the code which extends the DynamicSchemaProcessor interface,

 

DSPCode

The variable filter fetches the where clause query from the properties file which will also be kept in the server folder. The function which fetches the details have been defined in another class,

DSPCode1

DSPCode3

 

3. How the code works?

  • Once user logs into the server, the class “ChangeDynamicSchemaProcessor” is called and the schema is passed into the code and is assigned to the variable “originalschema”.
  • All the properties are loaded using a properties file.
  • The properties file is used to configure the db connection,the where clause which will be added to the schema file, the query to generate Country for that particular user.
  • Once the properties is loaded, the class calls a function “getdetails”(present in the class Connectiontest which is initialized first) which passes the properties file along with username and the role of that particular user.
  • Inside the Connectiontest class, it is checked against the role of that particular user. If the role is Admin, then it will return “1=1” or else it will execute the query defined inside the properties file. Once the query is executed, it will return the country name for that particular user.
  • Once the country name is returned, it returns to the main class where the schema is modified and the where clause is added in place of “%PLACE_HOLDER%” variable.
  • Once the schema is modified, the modified schema is displayed with only the data for that particular country.

 

4. Configuring Dynamic Schema Processor on BA Server:

Once the code is completed, we need to create a JAR file to upload the class on the server. In order to use the newly created Dynamic Schema Processor you have to add the jar to the following location: /biserver-ee/tomcat/webapps/pentaho/WEB-INF/lib/. Then you have to edit the configuration of the schema in the BA Server to specify that this particular DSP should be used for this Schema. You go again to Manage Data Sources select the schema and then click Edit from drop-down menu and then add a new parameter called DynamicSchemaProcessor with value linking to your class (see picture below). Also the properties file needs to added to the following location: /biserver-ee/tomcat/webapps/pentaho/WEB-INF/classes. Once complete, restart the server.

DatasourceLocation

Adding the class name to the parameter list so that whenever the cube is executed, the class is called. Please see the image below

datasourceConfig

Now if we create the same report we will get different results than in the beginning.

 

5. Configuring the Saiku analytics to work with DSP:

The above configuration will work with Pentaho Analyzer. But in case of Community Edition, the Saiku plugin is required to run the analysis. Now since Saiku has its own Mondrian engine, we need to configure on the saiku configuration as well.

Saiku has its own Mondrian engine. In order to work, we have to add the JAR file to the lib folder of Saiku present in pentaho-solutions/system/saiku/lib folder. Once the JAR is loaded, we need to do one more thing in order to make the plugin work.

A file named as plugin.spring.xml is present in the pentaho-solutions/system/saiku folder where the configuration property for Dynamic Schema is commented out. We just need to remove the comments so that Saiku will activate the DSP in its engine. Just uncomment the dynamicschema processor property as shown below,

SaikuConfig

Once it is completed, restart the BA Server. The saiku analytics will work.

6. Checking on the errors and logging:

The problem with DSP is that, if there is any error it won’t let the cube to be displayed on the BA Server. In order to verify and check the errors, we need to directly look into the log files present inside the log folder of the Pentaho Server. We wont be able to see the cube, if we have any issue on the code. If the code is working properly, it would display the modified schema on the pentaho-log.log file.

7. Disadvantages and things to be taken care of:

Dynamic Schema processor is an added scenario that has been provided by Pentaho so that we can have enhanced security on reports. However there are some disadvantages with DSP,

  • DSP doesn’t support jPivot: In the Pentaho Server Community Edition, jPivot is used for analysis reports. But DSP doesn’t work with JPivot reports. We did all kinds of research but it didn’t work. One can configure the Saiku Plugin present in the marketplace and then we can use DSP on CE Server.
  • Performance issues: If the data is huge, then the report might run slowly since it is calling the underlying class.
  • Mention the class name properly because a minor issue and the cube wont display on the BA Server.
  • It is advisable to extend the LocalizingDynamicSchemaProcessor class instead of filterDynamicSchemaProcessor because in some cases the code doesn’t work.

 

8. Conclusion:

DSP is a wonderful thing that pentaho has provided to implement on reports. It helps us to maintain enhanced security. It can be implemented on cubes, prpt reports and also on the interactive reports. The reports get modified dynamically when a user logs into the BA server.

Introduction to Dynamic Schema Processor- Modifying Mondrian Schema in runtime

Introduction to Dynamic Schema Processor- Modifying Mondrian Schema in runtime

What is Mondrian?

Mondrian is an open source OLAP Server which is written in Java. It supports MDX (Multi Dimensional Expression) query language and the XML for analytical and olap4j interface specifications. It reads from SQL and other data sources and aggregates data in a memory cache.

What is Dynamic Schema Processor?

A dynamic schema processor is used to dynamically change a Mondrian schema at runtime. Once the Mondrian schema is uploaded on the BA Server and you’d like to modify it on the runtime, then we need to implement Dynamic Schema Processor. For instance, let’s say we have to show the data related to the user logging into the server then we have to modify the schema dynamically in order to achieve this. This modification aims to limit records that users are able to see in reports based on the Mondrian Schema.

Why do we need Dynamic Schema Processor(DSP)?

Mondrian allows us to define user roles on the Schema itself. But we cannot add the data restriction dynamically. We can add various roles and then we can restrict the data based on that role. However, it is a very lengthy process and it is not advisable if we have many users. Hence, Mondrian has provided a dynamic schema processor which dynamically changes the schema at runtime. To do that we need to create a class that implements a DynamicSchemaProcessor interface (we need to extend the filterDynamicSchemaProcessor/ localizingDynamicSchemaProcessor which implements the interface). Once the class is created, we need to upload the JAR into the server and also we need to call the class (the class created for Dynamic Schema) where the Schema is uploaded.

Architecture of Dynamic Schema Processor:

DSP_Architecture

 

Above is the basic flow of how Dynamic Schema works in the Pentaho Server:

  1. User logs in Pentaho BA Server.
  2. The JAR loaded inside the server is called by the server internally and the class calls the DSP.
  3. The DSP class is called.
  4. The session variable sets the username which is passed into the SQL Query.
  5. The username is passed to the SQL query defined in the properties file(upload the property file in the classes folder) and it fetches the corresponding ID from the user table.
  6. The query is generated and the ID of the corresponding user is fetched.
  7. DSP returns the ID to the schema.
  8. User then access the Analyzer report inside the BA Server.
  9. The cube calls the DSP class (the class name is defined inside the datasource configuration).
  10. The DSP modifies the cube by adding a where clause condition to the schema to filter out the data based on the ID for the corresponding user.
  11. The data for that particular ID is generated.
  12. The analyzer report is generated for the particular user logged in.

Exception: In case of the Admin user, the schema file will not add the where clause for filtering and the user will have access to the whole data.

Conclusion

So this is how the Dynamic Schema Processor works in a Pentaho server. It is an added feature that Pentaho has provided to provide enhanced security on the data that are getting generated on the reports. The reports gets modified dynamically based upon the user who logs into the server.

Regards,

Nitish Kumar Mishra

CUBES

 Cubes

The data structures used in the OLAP are multidimensional data cubes or OLAP cubes:

image1

An OLAP cube is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications.

An OLAP cube is an array of data understood in terms of its 0 or more dimensions. OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for analyzing business data in the search for business intelligence.

So, in simple it is the data presentation in multiple dimensions.

Example: A company needs survey of their products all over the world.

So for this it requires answer of some questions like

–          Sell in USA?

–          Sell in USA of different products?

–          Sell in USA of different products in last 6 month?

–          Sell in Other country and total sell?

–          What type of consumers attract by this product?

–          Product details, with sell?

…etc

In this case we need one master table/Fact table where we store values commonly used (like last month sell) and also the relation of other table by using foreign key (Customer detail , USA sell). In such a manner that data retrieve much faster as possible.

 

The OLAP cube consists of facts, also called measures, categorized by dimensions (it can be much more than 3 Dimensions; dimensions referred from Fact Table by “foreign keys”). Measures are derived from the records in the Fact Table and Dimensions are derived from the dimension tables, where each column represents one attribute (also called dictionary; dimension can have many attributes)

Facts and Measures

Fact is most detailed information that can be measured.

In simple it works like central control, which has information of whole cube like where we find which thing.

image2

OLAP Common operations include slice and dice, drill down, roll up, and pivot:

 

Slice:

A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.

image3

 

Dice:

The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices).

image4

 

Drill Down/Up:

Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down).

image5

 

Roll-up:

(Aggregate, Consolidate) A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined.

image6

 

Pivot:

This operation is also called rotate operation. It rotates the data in order to provide an alternative presentation of data – the report or page display takes a different dimensional orientation.

image7

 

Summary

OLAP cube is really helpful, when we are talking about billions of data. It is very easy and straightforward to translate business questions into multidimensional query.

By which we can really increase data performance as well as understandability of data in a better manner.

Reusability is also possible in cube, which make this more scalable.

Thanks for reading : Pushpraj Kumar ( Helical IT Solutions )

Debug Mondrian Cubes

This blog will talk and educate the reader on how to debug Mondrian Cubes.

Question: How to look at the queries that Mondrian generates while the user is navigating the OLAP cube. ? It’s really useful to look at Mondrian log files because they give us a lot of useful information about how our system is behaving. We can
o look at sql statements and MDX queries,
o Have some profiling information on queries that are executed,
o Get other useful debugging information.

The following Steps illustrate how to enable Mondrian debugging logs, adding some properties to the Mondrian configuration file. After that, we’ll configure two new log4j appenders to have the desired log files properly written on our file system.

Step 1: Enable Mondrian debug log – Mondrian has a big set of configuration settings that can be modified. In our case, to enable Mondrian debug information follow the steps detailed below: Open the mondrian.properties file located in <bi-server_home>/pentaho-solution/system/mondrian and add the following line.
mondrian.rolap.generate.formatted.sql=true

Example:
Mondrian.properties file location.
D:\Installation Softwares\Pentaho\biserver-ce-4.8.0-stable\biserver-ce\pentaho-solutions\system\mondrian

Debug Mondrian Cube

Step 2: Update log4j configuration
At this point we’re going to modify the log4j configuration file adding the required appenders categories to have our logging information displayed properly Open the log4j.xml file located in <bi-server_home>/tomcat/webapps/pentaho/WEB-INF/classes Based on what you want to log, add the one or each of the following lines to the file. They will create two new RollingFileAppenders. You’re free to use the kind of appender you prefer.

Example: Location of log4j.xml file

D:\Installation Softwares\Pentaho\biserver-ce-4.8.0-stable\biserver-ce\tomcat\webapps\pentaho\WEB-INF\classe Add the following code.

NOTE: The code is already available within the file.. Just we need to un-comment.

<appender name=”MDXLOG” class=”org.apache.log4j.RollingFileAppender”>
<param name=”File” value=”../logs/mondrian_mdx.log”/>
<param name=”Append” value=”false”/>
<param name=”MaxFileSize” value=”500KB”/>
<param name=”MaxBackupIndex” value=”1″/>
<layout class=”org.apache.log4j.PatternLayout”>
<param name=”ConversionPattern” value=”%d %-5p [%c] %m%n”/>
</layout>
</appender>

<category name=”mondrian.mdx”>
<priority value=”DEBUG”/>
<appender-ref ref=”MDXLOG”/>
</category>

<!– ========================================================= –>
<!– Special Log File specifically for Mondrian SQL Statements –>
<!– ========================================================= –>

<appender name=”SQLLOG” class=”org.apache.log4j.RollingFileAppender”>
<param name=”File” value=”../logs/mondrian_sql.log”/>
<param name=”Append” value=”false”/>
<param name=”MaxFileSize” value=”500KB”/>
<param name=”MaxBackupIndex” value=”1″/>
<layout class=”org.apache.log4j.PatternLayout”>
<param name=”ConversionPattern” value=”%d %-5p [%c] %m%n”/>
</layout>
</appender>

<category name=”mondrian.sql”>
<priority value=”DEBUG”/>
<appender-ref ref=”SQLLOG”/>
</category>

 

Step 3: Enable the new log settings To have the new log settings enabled restart the Pentaho bi-server instance.

Log files location
D:\Installation Softwares\Pentaho\biserver-ce-4.8.0-stable\biserver-ce\tomcat\logs
After restarting the server and when you run the CUBE-OLAP…
You can find the following two files in the above location.

Files are:
i) Mondrian_mdx.log
ii) Mondrian_sql.log

Debug Mondrian Cube 2

Now, Enjoy analysing SQL queries that generated while performing run actions in various tools like Pentaho Analyser, Saiku Analysis.

Get in touch with us at Helical IT Solutions