WITH clause in MDX

WITH clause in MDX

In MDX queries, the WITH clause has an important role to play. WITH clauses are used to define some members which will be used in the SELECT clause for analysis purposes. The WITH clause contains either delarations of multiple sets and multiple members. Also, it is used to perform certain calculations like calculating profits,calculating prev year’s information,etc.

Below are some examples of the MDX queries using the WITH clause,

1. MDX Query to calculate the Total Sales for each year:

WITH member [Measures].[Total Sales] as ‘([Measures].[Sales] * [Measures].[Quantity])’
SELECT NON EMPTY {[Measures].[Sales], [Measures].[Quantity], [Measures].[Total Sales]} ON COLUMNS,
NON EMPTY [Time].[Years].members ON ROWS
FROM [SteelWheelsSales]

 

In the above example, the WITH clause was used to calculate the total sales value by multiplying the measures “Sales” and “Quantity”. The WITH clause creates a member [Measures].[Total Sales] (it can be used to create a new measure or a new dimension member to calculate based on the existing measures). After the WITH Clause, the member “Measures.Total Sales” is used as a measure in the SELECT clause and is displayed in the column axis whereas the dimensional data is displayed in the row axis.

Result:

With1

2. MDX Query to set a format string in a WITH Clause:

WITH member [Measures].[Total Sales] as ‘([Measures].[Sales] * [Measures].[Quantity])’,
format_string=’$#,##0.00′
SELECT {[Measures].[Sales], [Measures].[Quantity], [Measures].[Total Sales]} ON COLUMNS,
[Time].[Years].members ON ROWS
FROM [SteelWheelsSales]

In the same query, if we add a format_string keyword and specify the format in which you want to display the values in that particular member, then we would get the desired result in the output. Alternatively, you can use the name(for example, here its currency).
Result:
With2

3. MDX query to generate result using SET in WITH CLAUSE:

WITH member [Measures].[Total Sales] as ‘([Measures].[Sales] * [Measures].[Quantity])’, format_string = “$#,##0.00”
SET [Not 2005] as ‘Filter({[Time].[Years].Members}, ([Time].[Years].CurrentMember.Name <> “2005”))’
SELECT {[Measures].[Sales], [Measures].[Quantity], [Measures].[Total Sales]} ON COLUMNS,
[Not 2005] ON ROWS
FROM [SteelWheelsSales]

Now lets suppose you want to show all the data except for 2015. You can use the SET Function which will filter the data (as shown in the above example) and will show you the data for all except 2005.

Result:

With3

4. MDX Query to compare data for current year and prior year:

WITH member [Measures].[Total Sales] as ‘([Measures].[Sales] * [Measures].[Quantity])’
member [Measures].[Last Period] as ‘([Time].[Years].CurrentMember.PrevMember, [Measures].[Sales])’
SELECT {[Measures].[Sales],[Measures].[Last Period]} ON ROWS,
[Time].[Years].Members ON COLUMNS
FROM [SteelWheelsSales]

In the above expression, the member [Measures].[Last Period] is used to generate Sales information of the previous year. The CurrentMember.PrevMember function is used to generate the Previous Year information(you can do it similarly for any time member) based upon the Measure that you have specified. This is also done using WITH Clause.

Result:
With4

 

Conclusion:

The WITH Clause in MDX is very similar to the WITH Clause we use in SQL SERVER/Postgres. It helps us in understand the functionality better and is also used for faster processing. The WITH CACHE clause is used for optimization which helps in faster query execution.

 

 

Thanks,

Nitish Kumar Mishra

Difference between SQL and MDX

Difference between SQL and MDX

The MDX (Multi Dimensional Expressions) syntax appears very similar to the SQL syntax and also functions somewhat similar to the way SQL performs. But there are many differences, basically conceptual differences, between SQL and MDX.

The basic difference between MDX and SQL is the ability of MDX to reference multiple dimensions.

  1. Data Representation:

SQL refers to only two dimensions, rows and columns, when processing queries since SQL was designed to handle tabular data in two dimensional formats. Here rows and columns have some meaning in SQL syntax. However, in MDX Queries, it can refer to multiple dimensions distributed along different axis. Here rows and columns are different to that of SQL. They are depicted as Horizontal (rows) and vertical (column) axis.

  1. Difference in Select and Where clause:

In SQL, the Select clause is used to depict the column information i.e. it defines the column layout for a query, while the where clause is used to define the row layout. In MDX, the select clause is used to define several axis dimensions, while the where clause is used to restrict the data to a specific dimension or a member.

  1. Visualization:

In SQL, the visualization of data is in a two dimensional format consisting of rows and columns. It is more like a tabular form. In MDX, the visualization of data is multi dimensional on horizontal and vertical axis. It is more like in a cross tabular form.

  1. JOINS:

MDX supports all the joins that SQL supports. The join condition between the fact and the dimension will be specified while designing the cube. Following are the examples in which a join is specified,

  1. Inner Join:

In SQL,


SELECT table1.<column_name>,table2.<column_name>FROM table1 inner join table2

ON table1.column1 = table2.column2;

 

 

In MDX,

SELECT NON EMPTY {[DimensionName].[LevelName].Members} ON ROWS,
NON EMPTY {[Measures].[MeasureName]} ON COLUMNS
FROM [CubeName];

 

  1. Left Outer Join:

In SQL,

SELECT table1.<column_name>,table2.<column_name>
FROM table1 left join table2
ON table1.column1 = table2.column2;

 

In MDX,

SELECT {[DimensionName].[LevelName].Members} ON ROWS,
NON EMPTY {[Measures].[MeasureName]} ON COLUMNS
FROM [CubeName];

 

  1. Right Outer Join:

In SQL,

SELECT table1.<column_name>,table2.<column_name>
FROM table1 right join table2
ON table1.column1 = table2.column2;

 

In MDX,

SELECT NON EMPTY {[DimensionName].[LevelName].Members} ON ROWS,
{[Measures].[MeasureName]} ON COLUMNS
FROM [CubeName];

 

  1. Full Outer Join:

In SQL,

SELECT table1.<column_name>,table2.<column_name>
FROM table1 right join table2
ON table1.column1 = table2.column2;

 

In MDX,

SELECT {[DimensionName].[LevelName].Members} ON ROWS,
{[Measures].[MeasureName]} ON COLUMNS
FROM [CubeName];

 

  1. FUNCTIONS:

 

MDX supports a lot of functions which makes its execution much simpler. In SQL, generating the desired result makes query very complex. For example, generating the Sales value for current year and previous year along with the YTD value is very easy in MDX since there are some predefined functions like Parallel period and YTD which you need to define in the WITH clause of the MDX statement. In SQL, it becomes complicated since you need to create aliases for each value (Current Year, Previous Year and YTD) and then you have to join all the aliases to generate the desired result. It makes query very complex.

 

  • Example in MDX:
WITH member [Measures].[Last Period] as ‘([Time].[Years].CurrentMember.PrevMember, [Measures].[Sales])’ member [Measures].[Growth] as ‘[Measures].[Sales] – [Measures].[Last Period]’
SELECT CROSSJOIN({[Time].[Years].Members},{[Measures].[Sales],[Measures].[Last Period],[Measures].[Growth]}) ON COLUMNS FROM [SteelWheelsSales];

 

CONCLUSION:

MDX and SQL performs the same execution but the results are portrayed in different ways. SQL shows data in a tabular two dimensional form whereas MDX shows data in a multidimensional form.

 

Regards,

Nitish Kumar Mishra

What is a virtual cube and what is its significance in Mondrian

Virtual Cubes and their significance in Mondrian

Before going into the Virtual cubes, first lets just refresh our memory about what is a cube. A cube is basically a multi dimensional views of data,querying and analytical compoenets to clients. Its is a data processing unit composed of dimensions and fact tables. It acts as a business layer between an end user and the DW. A cube works on the mechanism of Star schema where there can be a fact table and multiple dimension tables and all the dimensions are related to the fact.

Virtual Cubes:

We all now know that each cube can have one fact table and all the dimensions present in the cube are having a relationship with that fact table. Now there are some scenarios in which we have two different cubes and we need to establish a relationship between them. In those kind of scenarios, we have the virtual cubes concept. A virtual cube is a cube in which we can use the dimensions and measures from multiple cubes and generate a analysis report using that cube. if we need to analyze data across multiple cubes, or need to combine information from two fact tables on the same subject but with different granularity — then you must create a virtual cube.

Significance of Virtual cubes:

1. A virtual cube can also be based on a single cube to expose only selected subsets of its measures and dimensions.

2. A virtual cube can include normal or linked cubes as component cubes.

3. Since virtual cubes store only their definitions and not the data of their component cubes, they require virtually no physical storage space. You can use virtual cubes to create combinations and variants of existing cubes without using significant additional storage.

4. A virtual cube can provide a valuable security function by limiting the access of some users when viewing the underlying cubes. If some of a cube’s information is sensitive and not suitable for all users, you can create a virtual cube from the existing cube and omit the sensitive information.

5. Virtual cubes can also be used to implement the concept of Snowflake schema. In virtual cubes, we can combine the information from the fact table on the same subject and also information between two dimension from different cubes. it improves the performance and consumes lesser memory.

implementation of Virtual cube:

1. Dimensions which are defined outside the cube (non-confirmed dimensions) can be used directly inside the virtual cube. However, the dimensions which are defined inside the cube can be accessed by adding the cube name inside the VirtualCubeDimension component.

2. The VirtualCubeMeasure component imports a measure from one of the constituent cubes. It is imported with the same name. If you want to create a formula or rename a measure as you import it, use the CalculatedMember component instead.

However, virtual cubes has a disadvantage if the elements present inside the cube is large in number. Processing a virtual cube automatically triggers processing of all underlying cubes that need to be processed, which can add significant time. So it is advisable to use lesser amount of components and only those components that are actually required in the virtual cube.

Pentaho Analysis: Workflow

OLAP Analysis: Workflow

 

To prepare data for use with the Pentaho/Jaspersoft OLAP Analysis (and Reporting, to a certain extent) client tools, you should follow this basic workflow :

Design a Star or Snowflake Schema

The entire process starts with a data warehouse.The end result should be data model in the star or snowflake schema pattern. You don’t have to worry too much about getting the model exactly right on your first try. Just cover all of your anticipated business needs; part of the process is coming back to the data warehouse design step and making changes to your initial data model after you’ve discovered what your operational needs are.

Populate the Star/Snowflake Schema

Once your data model is designed, the next step is to populate it with actual data, thereby creating your data warehouse. We can use  Pentaho Data Integration or Talend for ETL.

Build a Mondrian Schema

Now that your initial data warehouse project is complete, you must build a Mondrian schema to organize and describe it in terms that Pentaho/Jasper Analysis can understand. This is also accomplished through Pentaho Data Integration by using the Agile BI plugin. Just connect to your data warehouse and auto-populate your schema with the Modeler graphical interface. Alternatively, you can use Schema Workbench to create an analysis schema through a manual process.

Initial Testing

At this point you should have a multi-dimensional data structure with an appropriate metadata layer. You can now start using Pentaho Analyzer, Jasper Ad-hoc and JPivot to drill down into your data and see if your first attempt at data modeling was successful. In all likelihood, it will need some adjustment, so take note of all of the schema limitations that you’re unhappy with during this initial testing phase.

Do not be concerned with performance issues at this time – just concentrate on the completeness and comprehensiveness of the data model.

Adjust and Repeat Until Satisfied

Use the notes you took during the testing phase to redesign your data warehouse and Mondrian schema appropriately. Adjust hierarchies and relational measure aggregation methods. Create virtual cubes for analyzing multiple fact tables by conforming dimensions. Re-test the new implementation and continue to refine the data model until it matches your business needs perfectly.

Test for Performance

Once you’re satisfied with the design and implementation of your data model, you should try to find performance problems and address them by tuning your data warehouse database, and by creating aggregation tables. The testing can only be reasonably done by hand, using Pentaho Analyzer, Jaspersoft Ad-hoc or JPivot. Take note of all of the measures that take an unreasonably long time to calculate. Also, enable SQL logging and locate slow-performing queries, and build indexes for optimizing query performance.

Create Aggregation Tables

Using your notes as a guide, create aggregation tables using Pentaho Aggregation Designer or talend to store frequently computed analysis views. Re-test and create new aggregation tables as necessary.

If you are working with a relatively small data warehouse or a limited number of dimensions, you may not have a real need for aggregation tables. However, be aware of the possibility that performance issues may come up in the future.

Check in with your users occasionally to see if they have any particular concerns about the speed of their BI content.

Deploy to Production

Your data warehouse and Mondrian schema have been created, tested, and refined. You’re now ready to put it all into production. OLAP reports help users to do multi-dimensional and cross functional analysis, slice and dice, drill-up and drill-through the data. It gives power in the hands of business users to generate ad-hoc reports and get better insight of their business and and make better decisions based on OLAP analysis.