Tuesday, April 20, 2010

The Oracle CBO statistical time bomb

Ever have a query or process that runs fine for years and years and years but then one day just slows to a crawl, going from seconds to hours in execution time? If so, you have run into what I call the Oracle CBO statistical time bomb.

Oracle 10g/11g has a daily job that queries for "stale" and missing optimizer statistics on Oracle table/index objects. Those optimizer statistics are used by the optimizer when choosing the best access plan, join types, index and table scans to use for a SQL query. The default for "stale" is if 10 percent of the data have changed (static in 10g, can be changed in 11g).

Let's assume we have this really, really big invoice table that grows at a constant rate over time. This table has an index on the financial quarter. Each quarter roughly 25,000 new rows are inserted, and then updated 3-5 times each (as status changes, etc. whatever), and some idiot needs to clean up a couple hundred bogus rows each month. Here is the table for 2010:

Financial    Total  Insert  Update  Delete  Changed  Percent
Quarter Rows Rows Rows Rows Rows Changed
2010 Q1 1,000,000 25,000 80,000 169 105,169 %10.51 - daily job gathers statistics
2010 Q2 1,024,836 25,000 80,000 169 105,169 %10.26 - daily job gathers statistics
2010 Q3 1,049,672 25,000 80,000 169 105,169 %10.01 - daily job gathers statistics
2010 Q4 1,074,508 25,000 80,000 169 105,169 %09.78 - not stale yet, still have 2010 Q3 statistics

Without current statistics, the Oracle cost-based optimizer thinks one index and access method would be more efficient than a different index and access method (when it really, really, really is not).

I'll need an Oracle Guru to correct me but here is what happens in 2010 Q4:

Your super-optimized, meticulously coded query, queries the invoices by quarter and by another column existing in a massive subquery.

For 2010 Q1,Q2,Q3 the CBO says there are 25,000 rows reading the quarter index and likely less rows using an index on another column (customer number perhaps) if it does the massive subquery first. The CBO got it right and the massive subquery executes first then reads the table via index on the other column.

For 2010 Q4 the CBO says there are no/few rows for Q4 (when there are really 25,000) so read the invoices using the quarter index and then do a nested loop each time on the massive subquery because the cost of the read by quarter index times the cost of the massive subquery is less than the plan it took above (when it really is not). Instead of doing that nested loop on a couple rows (yawn) it does that nested loop massive subquery 25,000 times. Yikes.

What to do? Plan ahead.

Go to Tahiti and research DBMS_STATS in the Oracle Database PL/SQL Packages and Types Reference. Know your tables and diffuse the time bomb before it goes off. Let the daily job handle the stale statistics. Tables that contain an index on a period of time (quarter, year, etc.) or code based on a period of time (i.e. the Spring 2010 line of Mountain Dew with Ginseng and Xanax) should be researched as candidates for statistics gathering outside of Oracle's daily job. Gather table statistics before the big processing periods.

References

Oracle and SSIS; ORA-00903 during verification

Can you tell I have been suppressing my pain and hate up into a nice little ball, instead of talking about my feelings while laying in some lush, peaceful green meadow, sipping Chamomile tea and popping anti-depressants like candy.

This issue is one of those classic, what the **** is going on here errors.

Here is what I am suffering with:

  • Windows XP and Windows Server 2003
  • SQL Server 2008 Integration Services
  • Native OLE DB\Oracle Provider for OLE DB, Oracle 11g R1 or Oracle 10g
  • OLE DB Source (Oracle) using a query.

The Problem

Getting ORA-00903 table or view not found during execution on verification step on Data Flow OLE DB Oracle source; even though at design time (in the OLE DB source editor dialog) the SQL parses correctly AND pulls back test data in the source AND the query runs successfully everywhere else using the same credentials.

The Solution

Drink.

Smoke.

Break something.

Then:

  • in Oracle, CREATE a VIEW for the query
  • if the query contains a user function you will also need to in Oracle, GRANT EXECUTE ON schema.function TO view_schema_owner WITH GRANT OPTION
  • in Oracle, GRANT SELECT on VIEW to the USER used by SSIS (if necessary)
  • in SSIS, use the Advanced Editor on the OLD DB source; Component Properties; AccessMode = OpenRowset, OpenRowset = SCHEMA.view (standard editor will not pull up the view and you are limited to a drop-down)

There that was painless, right?

I have been told that shutting down Visual Studio sometimes helps (it did not). Shutting down Visual Studio works if you switch databases in the connection manager and then get ORA-00903 table or view not found in the OLE DB source. One co-worker stated he rebooted his machine and then things worked (it did not).

I'm sure I was doing something stupid like trying to use a subquery or SYSDATE but WTF; it parsed, it pulled back test data, it works everywhere but the verification step when debugging it.

****!

Oracle and SSIS; SQL tasks and procedures

I am posting today to hopefully prevent others from the hell I have been through. I really have no idea why I continue to do this as my psyche at this point is so eroded that I should be deriving pleasure from the suffering of others like some serial killer in the making.

Getting Oracle and Microsoft to play nice is like trying to get Sean Hannity and Nancy Pelosi to share a quiet evening together making babies while discussing health care and what to spend their money on. If that reference is dated then consider the choice of wiping with toilet paper or the large grain sandpaper and then using the sandpaper; or perhaps shoving a sea urchin up your nose then pulling it out the other nostril.

Here is what I am suffering with:

  • Windows XP and Windows Server 2003
  • SQL Server 2008 Integration Services
  • Native OLE DB\Oracle Provider for OLE DB, Oracle 11g R1 or Oracle 10g
  • Execute SQL Task trying to execute a PROCEDURE or PACKAGE.PROCEDURE that has input or output parameters

Connection Manager

The only non-default property is RetainSameConnection set to True (connections to Oracle are expensive). Personally, I see no need for resource pooling but that is another post.

Execute SQL Task

General

ConnectionType
OLE DB
SQLSourceType
SQLStatement:
{call your_stupid_package.dumb_ass_procedure(?,?)}
or
BEGIN your_stupid_package.dumb_ass_procedure(?,?); END;
IsQueryStoredProcedure
False (usually greyed out, but can be set via Properties, but has no bearing on anything)

Parameter Mapping; Add each parameter:

Variable Name
whatever User::variable you have defined already
Direction
Input or Output
Data Type
VARCHAR (I have not been able to get anything else to work yet)
Parameter Name
the left to right sequence number of the parameter starting at zero (so, if you only have one the parameter name is 0)
Parameter Size
the size of the variable

Conclusion

I would cite references but that vast majority of searches dead-end with single posts in forums where some poor slob went begging for help and found none; or sometimes found a reply from another kindred spirit also suffering through the same agony. Those who did have solutions were not passing parameters or were using an Expression to build the query string with parameters (doing their own SQL injection; which also does not help those using output parameters).

The above works, with a couple warnings, but works.