Wednesday, October 08, 2008

Why SQL generators suck

Assume the following table:

CREATE TABLE craps
( craps_id NUMBER
, crap_size NUMBER
, crap_color VARCHAR2(16)
, crap_when_taken DATE DEFAULT SYSDATE
, crap_when_flushed DATE
, crap_rating NUMBER
, CONSTRAINT craps_pk PRIMARY KEY ( craps_id )
) storage_clauses, logging_clauses, santa_clauses ....

To get a list of craps and their size and color for a time interval by time it was flushed or if that was null (i.e. it was a proud poop that someone wanted everyone to see) the time it hit the bowl; the efficient SQL would be:

SELECT crap_size
, crap_color
, NVL(crap_when_flushed,crap_when_taken) crap_flushed_taken
FROM craps
WHERE NVL(crap_when_flushed,crap_when_taken) BETWEEN :1 and :2
ORDER BY crap_flushed_taken ASC NULLS FIRST;

Oracle would do a full table scan, efficient blocked I/O, and then poop out the results. Oh, wait. We're using a canned, record-based, RDBMS independant system that generates SQL that selects a list of primary keys and then reads each record by primary key. So we have something more like this:

SELECT craps_id
FROM craps
WHERE NVL(crap_when_flushed,crap_when_taken) BETWEEN :1 and :2
ORDER BY NVL(crap_when_flushed,crap_when_taken);

and for each selected primary key value:

SELECT crap_size
, crap_color
, NVL(crap_when_flushed,crap_when_taken) crap_flushed_taken
FROM craps
WHERE craps_id = :1;

Oh, wait. That NVL on those two fields is business logic that needs separated from the data layer; so we create a function:

CREATE OR REPLACE FUNCTION 
crap_flushed_taken (i_crap_when_flushed DATE, i_crap_when_taken DATE)
RETURN DATE AS
BEGIN
RETURN NVL(i_crap_when_flushed,i_crap_when_taken);
END;

No, no, no, stupid. That layer shouldn't need to know to pass two dates in it should work off the record/object primary key identifier:

CREATE OR REPLACE FUNCTION 
crap_flushed_taken (i_craps_id NUMBER)
RETURN DATE
AS
v_crap_when_flushed DATE;
v_crap_when_taken DATE;
BEGIN
SELECT crap_when_flushed, crap_when_taken
INTO v_crap_when_flushed, v_crap_when_taken
FROM craps
WHERE craps_id = i_craps_id;
RETURN NVL(v_crap_when_flushed,v_crap_when_taken);
END;

Actually, this is how they did it

CREATE OR REPLACE FUNCTION crap_flushed_taken (i_craps_id NUMBER)
RETURN DATE
AS as language java name 'com.crap.name.name1.CrapFlushedTaken(java.lang.Integer) return java.lang.Date';

That's a Java class that basically does the exact same thing that little slice of PL/SQL was doing. The final, generated SQL statement went something along the lines of this:

SELECT craps_id
FROM (
SELECT q1.craps_id
FROM craps q1
WHERE NVL(crap_flushed_taken(q1.craps_id),TO_DATE(1,'YYYY')) >= TO_DATE('????????','YYYYMMDD')
AND NVL(crap_flushed_taken(q1.craps_id),TO_DATE(1,'YYYY')) <= TO_DATE('????????','YYYYMMDD')
) t0
LEFT JOIN craps t1 ON t0.craps_id = t1.craps_id
ORDER BY WHERE crap_flushed_taken(t1.craps_id) ASC NULLS FIRST, t1.craps_id ASC NULLS FIRST;

Then the list program read each one of the selected primary key values and did this:

SELECT crap_size
, crap_color
, crap_when_flushed
, crap_when_taken
FROM craps
WHERE craps_id = :1;

Holy slang for feces! Oracle does an index fast full scan on the ID, which it would see as the most efficient way of going about this because in the optimizer's sane mind it would never think anyone would do a single read of the same table inside of a function. So for each primary key in that table (because we don't know the result of the function so no way to limit the working set) we are doing at least three (select, order by, list program), random (record location likely won't match the maintained order of the index), entire table reads using an index (read index block, then lookup corresponding block). Poop on me and call me a sundae...

I tried to simulate the above with inline select statements but the 10g optimizer was smart enough to figure it all out and come up with a plan five times the cost of the optimized plan for proper, hand-tuned SQL. I could try running the thing to get real statistics but unfortunately this thing runs for nine hours. My efficient SQL above runs in 30 seconds.

There are options available. First would be to rewrite the canned code and do the selection process with the optimized hand-written SQL. Another option would be to create an index on the function (haven't tried it yet) so theoretically Oracle will see the SELECT crap matches an index and use the index instead of reading the whole table one row at a time through the primary key index.

I know this is probably an extreme example (or one of many) but I think the same thoughts can be applied to code generators and to the object oriented purists that believe no business logic whatsoever belongs in the data tier.

No comments: