Tuning a slow PostgreSQL Function
The problem I faced:
I had a postgreSQL function which was getting executed from a BI Report. It was called several times in a loop with different parameters for a single report execution. The function performed poorly.
The postgreSQL function had 2 “select” queries which returned a resultset based on the various parameters that were passed to them. It had some IF ELSE conditions, which determined which of the query is to be executed.
It looked something like this –
CREATE OR REPLACE FUNCTION testfunction(IN integer, IN text) RETURNS TABLE(id text,....) AS $BODY$ BEGIN IF '-1' = $1 THEN RETURN QUERY SELECT .... FROM .. WHERE abc = ANY($2)..; ELSE RETURN QUERY SELECT .... FROM .. WHERE abc = ANY($2) and efg = ..; END IF; RETURN; END; $BODY$ LANGUAGE plpgsql VOLATILE
What was observed is, the function got executed fast a couple of times and then the performance degraded. If each of the queries was tested separately, outside the function, they executed in less than 1 sec and the function took about 20-30 secs. Now, if the function was called 10 times from the report, my report would take 3-4 minutes to load!
The thing that worked!
After trying several things, converting the SQL statements in the function to dynamic SQL did the trick. It turned out, the optimizer failed to optimize the execution plan according to the parameter values. The postgreSQL function was using a cached query plan which may not always be optimal for different parameter values.
When we execute a simple query, the plan of the query is determined at run time and is optimized as per the parameters used. To replicate this behaviour, we can dynamically generate the SQL statements inside the functions and then EXECUTE them.
I changed my function to look like this –
CREATE OR REPLACE FUNCTION testfunction(IN a integer, IN b text) RETURNS TABLE(id text....) AS $BODY$ BEGIN IF '-1' = $1 THEN RETURN QUERY EXECUTE 'SELECT .... FROM .. WHERE abc = ANY($2)..' USING a,b; ELSE RETURN QUERY EXECUTE 'SELECT .... FROM .. WHERE abc = ANY($2) and efg = ..' USING a,b; END IF; RETURN; END; $BODY$ LANGUAGE plpgsql VOLATILE
Other Things I learnt along the way
1) Function Volatility Categories – IMMUTABLE , STABLE, VOLATILE – The volatility category is a promise to the optimizer about the behavior of the function. Marking your function appropriately could be a very important performance tweak. IMMUTABLE functions always return same value for a set of parameters (involves no DB lookups). STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. VOLATILE indicates that the function value can change even within a single table scan, so no optimizations can be made.
You can read in detail about the different categories here.
2) Language name – SQL v/s PGPLSQL – If your function is nothing but a list of SQL Statements or a single SQL statement, mark the language as SQL and not PGPLSQL. This does impact performance.
3) Match your argument types to the database columns – Sometimes, the reason behind your function executing slowly might be as simple as mismatching argument types. If say, a database column is of type numeric and we use a char type argument, the query will give correct results, but indexes defined on the column will not be utilized.
Hope this helps !!
Helical IT Solutions
Best Open Source Business Intelligence Software Helical Insight is Here