Pages

Saturday 22 February 2014

OBIA 7964 - Need To Tune Lengthy ETL - Analyze Table ETL step is performing very slowly


Performance of daily incremental load is vital with increasing demand for more and more data from structural and non-structural data sources. There is always a desire to limit the daily incremental load window to accommodate more data acquisition routines to enhance analytical capabilities of business users. A reporting solution spanning across different time zones make it more difficult.

There is no substitute for a good code. This blog is an attempt to explain the overall incremental/full load process, different phases of incremental/full load, performance bottlenecks across these phases in general, and how one can improve performance when Analyze Table ETL step is performing very slowly.  This blog is based on an oracle tech note published around analyze table configuration.

OBIA Load Phases

OBIA Load Process has different phases such as dimension load, fact load, post load process etc. I would like to categories them into following areas based on loading phase and activities performed by tasks.




  • Change Capture Phase
  • SDE – Source Dependent Extract
  • SIL – Source Independent Loading
  • PLP – Post Load Process
  • Analyze Tables
  • Drop and Create Indexes 
To investigate issues around performance it is very important to look into current extraction and loading timings around these categories. The performance issue might be anywhere in the ETL spectrum drawn.

Analyze Table ETL step runs after SIL or PLP data load, the purpose of this step is to gather statistics of reporting table which offers benefits in performance while executing reports against the reporting tables. This blog is an attempt to discuss the necessity of analyze tables, its cost VS benefits analysis and also discuss number of options to improve overall performance of the analyze table ETL step.

Why Analyze Tables? Costs Vs Benefits 

Whether or not execute "analyze table" is a decision taken based on costs and benefits of performance. The recommendation is to analyze because, the database is gathering statistics about the data. It uses the statistics to access the data more efficiently, both during ETL and during BI reporting i.e. when your end-users execute BI reports or dashboards.

If you do less or no analysis of db statistics, then ETL and reporting performance will be degraded. How much is projecting dependent, but the recommendation is to do it and that is therefore the default setting.

Exclude Table from Analyze Table?

Is it essential to do analyze all reporting and aggregate tables? Is there any option by which one can exclude table from analyze table step. One can turn off Analyze for all tables or turn it on for truncated tables only, in DAC UI.

How frequently should it happen?

In short it is beneficial to analyze table for ETL and reporting performance then next question is how frequently analyze table should be executed. Is there any way to perform the analyze step less frequently, e.g. not every day. The wisdom says if there is no significant change in data volumes of table, analyze table won’t help at all.

DAC support configuration to perform the analyze step less frequently, the steps to achieve these are as follows 

Setup View, in Physical Data Sources Tab: Analyze Frequencies Subtab Analyze frequencies in days by table type.

You can see the possible table types by looking in the Design View at the Tables Tab; each table has a type like Aggregate, Dimension...etc

You will need to rebuild your execution plan when you make changes, because this rebuild is where DAC decides which Tasks and sub-Tasks to execute

Scope of Analyze Table?

The default syntax DAC uses for analyzing tables is as follows. The custom.xml file, location in the <DAC_Config_Location>\CustomSQLs directory, contains the default syntax DAC

<SqlQuery name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt => 'FOR ALL COLUMNS SIZE AUTO',cascade => true ) </SqlQuery>

The estimate percent, method opt and cascade are the parameters which decide the scope of analyze table.  It is important to choose right values for the mentioned parameters as it determine the execution time for analyze table step. For estimation percent 100% it is bound to take much longer than if you set it to 15% or 20%. While method opt may have option to choose all columns or indexed columns only.

Global - Changing Behavior of Analyze Table ETL Step

For changing behavior of analyze table ETL step, one has to change the customsql.xml file. Note that modifying the cusomsql.xml affect the syntax on a global level means it is applicable to all tables chosen for analyze and for all execution plan.

To edit the Analyze Table syntax, the steps are as follows  

Open the customsql.xml file located in the <DAC_Config_Location>\CustomSQLs directory.
Locate the Analyze Table syntax for the appropriate database type.

For example, to gather statistics for only the indexes column, one can edit the syntax as follows

<SqlQuery name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname => '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt => 'FOR ALL INDEXED COLUMNS',cascade => true ) </SqlQuery>

The Action functionality can be used to change behavior of the analyze table ETL step for a selected table.

Analyze but Less Frequently

Is there any way by which one can set the analyze table but less frequently rather than executing it every day. In DAC 10.1.3.4.1 and above, in Setup View, in Physical Data Sources Tab: Analyze Frequencies Subtab Analyze frequencies in days by table type. It is done for the specific table type e.g. Aggregate Dimension etc.

Once done, the execution plan is to rebuild to take this change in to effect. 


Careful about Configurations

It is very essential to make sure that you set configuration right, the wrong configuration parameters might add in your performance vows.

Check CreateQueryIndexesAtTheEnd parameter if when this property is set to True, these tables will be analyzed twice.

If any indices are marked as Query type indices and are used by ETL processes it can adversely affect the performance of the ETL process.


The hope you find this information useful. 

Thursday 6 February 2014

OBIEE 11g – Advanced Prompts for Financial and Other Data such as ‘Last Year’, Last Month’ or ‘Last 6 Month’



OBIEE 11g supports number of different prompts such as column prompt, currency prompt , image prompt and variable prompt. It supports number of  user input such as text, choice list, slider, check boxes, radio buttons and list boxes. It has been improved a lot from previous release and going stronger with every new release.

 Today I came across a customer request; customer would like to use prompts to choose as user friendly time parameters such as Last Year, Last Month or Last 6 Month to drive underline report. I do understand why customer would like these type of parameter specific dashboard filters to drive reports. I am struggling to find out why Oracle does not offer a straight forward option to provide global filter which caters time series specific parameters based filter so user can choose Last Year, Last Month or Last 6 Month are filters to drive underline report.

Is there any work around? I have started looking into the option to achieve this within the boundaries of what OBIEE 11g offer. Good news is that yes I have achieved something to show and tell. This blog is an attempt to explain how we can model to get prompt specified.


Dashboard Prompt and Presentation Variable

The first hurdle is to define a prompt with options such as Last Year, Last Month or Last 6 Month within dashboard prompt so user can choose the one of the option and apply to run report.


Type : - Column Prompt

Choose Any Column – should be of type Var Char if not make sure it is casted

Label  - Choose Filter

User Input – Choice List, you can choose any of the user input supported

SQL – Which generate parameters e,g Last Year, Last Month etc

SELECT  CASE WHEN "Time"."Month" IS NULL THEN  'Last Year' END FROM "Human Resources - Workforce Profile"
UNION ALL
SELECT  CASE WHEN "Time"."Month" IS NULL THEN  'Last Month' END FROM "Human Resources - Workforce Profile" UNION ALL

SELECT  CASE WHEN "Time"."Month" IS NULL THEN  'Last 6 Month' END FROM "Human Resources - Workforce Profile"

Set Variable – Presentation Variable
Name – q1



Report Definition

A simple report with three columns Year, Month and Headcount , from a standard OBIA Application.


Subject Area – Human Resources – Workforce Profile

Selected Columns
"Time"."Year"
"Time"."Month"
"Workforce Profile Facts"."Headcount"



Filter

This is most critical part of the report; the filter should be driven by presentation variable value selected from dashboard prompt. If user choose ‘Last Year’ it should show all data for last 12 months since today , if user choose ‘Last 6 Month’ the report should show all data for last 6 months since today.



"Time"."Date" >

CASE WHEN '@{q1}{Last Year}'='Last Year'   --------------------- Presentation variable is ‘Last Year’
THEN
TIMESTAMPADD(SQL_TSI_YEAR, - 1, CURRENT_DATE )  ------------ Date >  Current Date – 1 Year

WHEN  '@{q1}{Last Year}'='Last Month' THEN   ---------------Presentation variable is ‘Last Year’
TIMESTAMPADD(SQL_TSI_MONTH, - 1, CURRENT_DATE )   ---------Date >  Current Date – 1 Month

WHEN '@{q1}{Last Year}'='Last 6 Month' THEN   ---------------Presentation variable is ‘Last Year’
TIMESTAMPADD(SQL_TSI_MONTH, - 6, CURRENT_DATE )   ------ Date >  Current Date – 6 Months

END


 
Dashboard Page


Choose Filter – Last Year




Report
Headcount for Last 12 Months
Choose Filter – Last Month




Report
Headcount for Last 30 Days
Choose Filter – Last 6 Month




Report
Headcount for Last 6 Month


Time Stamp Add Usage

Some other expressions which might be useful for wider audience are as follows.

First Date of Week:
TIMESTAMPADD(SQL_TSI_DAY, (DAYOFWEEK(Current_date)*-1)+1, Current_date)

Lat Date of Week:
TIMESTAMPADD(SQL_TSI_DAY,DAYOFWEEK(Current_Date)*-1,TIMESTAMPADD(SQL_TSI_WEEK, 1,
Current_date))

First Date of Year:
TIMESTAMPADD( SQL_TSI_DAY , EXTRACT( DAY_OF_YEAR FROM CURRENT_DATE) * -(1) + 1,
CURRENT_DATE)

Last Date of Year:
TIMESTAMPADD(SQL_TSI_YEAR, 1, TIMESTAMPADD( SQL_TSI_DAY , -1,
TIMESTAMPADD( SQL_TSI_DAY , EXTRACT( DAY_OF_YEAR FROM CURRENT_DATE) * -(1) + 1,
CURRENT_DATE)))

Inference

Oh yes, it worked as expected. It is a shame that why OBIEE as product could not support such a genuine requirement. I have seen it is very common requirement to have an option to filer data for ‘Last Year’, ‘Last Month’, Last 6 Months’, … ‘Last 3 Years’ etc,

I hope you find this information useful. Please do comment if you like it or not.