Monday, September 28, 2009

CRUD with the Oracle MERGE statement

Most of this post came about after a lengthy search of the clicker-web for information on Oracle materialized views. There are three or fifty situations at work where such a construct might be helpful to performance and overall health of the system (like when five different folks are running the same query over and over and over again).

Most situations that could benefit from a materialized view are "complex materialized views" according to Oracle manuals and guide (i.e. I am doing something more complicated than "hello world" in SQL). Others have documented that the materialized view update is really a MERGE behind the scenes. I am quite familiar with a MERGE and that was actually my first choice before I thought to myself "there is probably something already created for this complicated crap". Why recreate the square wheel, right? In this case the square wheel might be more efficient in the long run.

First, I will create a quick table and add some constraints to it:

CREATE TABLE table_full_of_crud AS
SELECT
LEVEL AS id
, TRUNC(SYSDATE,'YEAR')+(LEVEL-1) AS some_stupid_date
, CASE MOD(LEVEL,7) WHEN 0 THEN NULL ELSE TO_CHAR(TRUNC(SYSDATE,'YEAR')+(LEVEL-1),'DAY"," MONTH D YYYY') END AS varchar_crud
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 120;
ALTER TABLE table_full_of_crud
ADD (CONSTRAINT pk_table_full_of_crud PRIMARY KEY(id))
MODIFY (some_stupid_date NOT NULL);

Here is what I consider to be a well-formed, structured MERGE that will handle new rows, updated rows and rows that need to be deleted:

MERGE /*+ append */ INTO table_full_of_crud  m
USING (
SELECT
NVL(v1.id,t1.id) AS id
, NVL(v1.some_stupid_date,TRUNC(SYSDATE)-12345) AS some_stupid_date
, v1.varchar_crud
FROM table_full_of_crud t1
FULL JOIN (
SELECT * FROM (
SELECT
LEVEL AS id
, TRUNC(SYSDATE,'YEAR')+(LEVEL-1) AS some_stupid_date
, CASE MOD(LEVEL,20)
WHEN 0 THEN NULL
WHEN 7 THEN TO_CHAR(TRUNC(SYSDATE,'YEAR')+(LEVEL-1),'DAY"," MONTH D YYYY')
ELSE TO_CHAR(TRUNC(SYSDATE,'YEAR')+(LEVEL-1),'MM/DD/YYYY')
END AS varchar_crud
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 160
)
WHERE MOD(id,69) > 0
) v1 ON v1.id = t1.id
) u
ON (m.id = u.id)
WHEN MATCHED THEN
UPDATE SET
m.some_stupid_date = u.some_stupid_date
, m.varchar_crud = u.varchar_crud
WHERE NOT(
(m.some_stupid_date = u.some_stupid_date)
AND DECODE(m.varchar_crud,u.varchar_crud,'EQ') IS NOT NULL
)
DELETE
WHERE m.some_stupid_date = (TRUNC(SYSDATE)-12345)
AND m.varchar_crud IS NULL
WHEN NOT MATCHED THEN
INSERT (m.id, m.some_stupid_date, m.varchar_crud)
VALUES (u.id, u.some_stupid_date, u.varchar_crud);

Here is a line by line explanation of all this crap:

  1. The start of the MERGE statement, with append hint (direct path, etc.) INTO table; I use a table alias of "m" to indicate this is the table having data merged into it (makes sense)
  2. USING; we will be using a subquery
  3. SELECT (our subquery)
  4. The NVL of the primary key is required due to the FULL [OUTER] JOIN; if the row needs deleted then the merge data primary key will be NULL, if inserted the original primary key will be NULL, if update neither will be NULL
  5. The NVL of this column is due to the NOT NULL constraint and to use as our "delete trigger"; "delete triggers" should be a value that will never appear in the column
  6. VARCHAR2 column
  7. FROM the table that will be the target of the merge;
  8. FULL [OUTER] JOIN is used to return the rows that exist in either table
  9. SELECT the merge data; this subquery will contain new rows and rows that might need updating in the merge target table
  10. data
  11. data
  12. data
  13. data
  14. data modify some rows to null values
  15. data keep some rows the same
  16. data modify the rest
  17. data
  18. data
  19. data
  20. data
  21. data delete the 69th key value (remove it from the subquery)
  22. data
  23. End of using subquery; assign table alias of "u" (again, makes sense)
  24. ON here is where (usually) primary key values are matched back to each other and influence the next two sections; what to do when things do or do not match
  25. If the condition above evaluates to TRUE then the "WHEN MATCHED THEN" section is performed
  26. UPDATE the target table
  27. SET column value
  28. SET column value
  29. I use the WHERE clause of the merge update clause to determine if the row columns have changed or not; if they have NOT changed then why have the undo and redo overhead (???)
  30. If a column cannot be NULL then a simple comparison will work
  31. If a column can be NULL then DECODE can be used on both columns in the comparison
  32. End of NOT(
  33. DELETE from the target table
  34. The WHERE clause check for the "delete trigger(s)" in (TRUNC(SYSDATE)-12345)
  35. AND check for null columns (the FULL JOIN will produce NULL values in columns without corresponding rows in the merge data)
  36. The "WHEN NOT MATCHED THEN" is performed when merge data cannot be matched to rows in the merge target
  37. So INSERT (columns)
  38. With new VALUES (from the merge subquery data)

The USING sub-query will almost always contain a complex piece of SQL to collect new and updated rows. The handle deletes of rows in the merge target table it needs to be joined to the query data with a FULL OUTER JOIN. If deletes are not needed, then the FULL JOIN is not needed. Here is a visual representation of the FULL OUTER JOIN:

ID1 X1   Y1     ID2 X2   Y2
1 data data 1 data data
2 data data 2 data data
3 data data 4 data data
4 data data 5 data data
5 data data 6 data data

After FULL JOIN

ID1  ID2  X2   Y2
1 1 data data
2 2 data data
3 NULL NULL NULL
4 4 data data
5 5 data data
NULL 6 data data

The non-primary key columns in rows that need to be deleted will be NULL. This can be used in the topmost section of the USING sub-query (via NVL) to create delete trigger data, which brings me to the DELETE mechanism.

The hardest part of the above statement was getting the DELETE clause to work. Reading the Oracle 10g Database SQL Reference helped somewhat; especially this point:

The only rows affected by this clause are those rows in the destination table that are updated by the merge operation. That is, the DELETE WHERE condition evaluates the updated value, not the original value that was evaluated by the UPDATE SET ... WHERE condition. If a row of the destination table meets the DELETE condition but is not included in the join defined by the ON clause, then it is not deleted.

This means the DELETE does not use column data in the USING clause sub-query; very important. Some column in the UPDATE must be updated for the DELETE to work. To handle this set one or more columns to values that would never appear (in combination) in the data and use those values as a delete trigger.

I always add a WHERE clause to the UPDATE to determine if anything has changed. If nothing has changed then why generate the undo and redo data? Simply compare non-null columns and then null-evaluated columns to determine if data have changed.

The INSERT clause is the same as the INSERT statement we all know and love.

One discussion I came across on the Internet was about using a UNION ALL instead of the FULL OUTER JOIN. The UNION ALL performs better (one full index scan) than the FULL JOIN (two index scans and then a UNION-ALL). UNION ALL is a set operator, however, and updated rows would be duplicated (one unique row from the merge table and one from the query data). Something could probably be written but the FULL JOIN is simple, easy to maintain and worth whatever fractional overhead (compared to the actual merge) incurred.

Friday, September 25, 2009

Drugs, doctors, dumb gripe

I heard the most interesting conversation outside the door on the examination room I was waiting in. I was there for what felt like an upper respiratory infection (or more likely remnants of) and subsequent examination. A pharmaceutical representative was talking to the doctor about a drug. A normal human would have been shocked by the conversation but with my current mindset I was able to shrug it off and not be surprised.

Why? This crap is everywhere. There is not a single organization or area that does not have these back door shenanigans.

Welcome to the world.

People pile poop on the bag drug companies and they deserve every last ounce but they also deserve credit for developing life saving and life enhancing medicines. What I find equally offensive are the people who poop on those pharmaceutical companies but support the natural, organic and alternative products. They are doing the same thing – making money off a product and potentially causing harm to individuals where medicine is needed to prevent illness or death.

What was said? Drug rep asks about drug. Doctor says there is too much conflicting information so he prescribes drug. If drug benefits patient he lets them stay on the drug. Drug rep wants to target a one year life cycle per patient for the drug. Other things were said. Mentions of the conversation getting on Internet were made. It was humorous to them.

A co-worker once said they have a drug to wake you up, a drug to keep you awake, to put you to sleep, a to make you happy, to make you sad, so you just do not care anymore, to get you hard, to make you stay hard, to make the pain go away and a drug to make you – well they have a drug for everything. (I think Ozzy Osbourne also said this)...

Monday, September 21, 2009

Oracle Data Pump for data refreshes part 1

This post will be the first in a multi-part post about the Oracle data pump utility. My goal for this series of posts is to (hopefully) provide a road map for Oracle database users to use the data pump utilities to refresh one or more test/development databases from a periodic snapshot of production data.

The data pump import and export utilities were added alongside the older export and import utilities in version10g of Oracle Database. These utilities can be used to make logical backups of metadata and data in databases from the entire database down to an individual table.

The first thing I noted when doing a snapshot of a production database at work is that when enough parallel processes are given to the data pump it smokes the pants off the previous utilities. What took hours now took minutes. That made me happy and I am sure my arteries will be happy without the 4am runs to White Castle while the export chugged along. Note, I have not done an import of production data yet and there is a reason for that – the unknown.

Like any good administrator I wanted to understand this new tool before I just started restoring data to databases. What about jobs? How do I handle sequences? Unfortunately the Oracle documentation is painfully lacking in that area. The 10g documentation for the old export has a section called "considerations when exporting database objects" that explains how to handle sequences, LOB columns, synonyms, etc. It also has a lengthier section on "considerations when importing database objects" which was also helpful. The 10g and 11g documentation has examples but lacks the depth of information the previous two utilities had. The 10g documentation has more useful information about the old utilities than it does about the fancy new ones and the 11g documentation for the data pump is about the same as the 10g data pump (11g has a legacy mode to support the old utility parameters, btw). This meant I had to set up a database to test the following items that could be affected by a data refresh:

  • constraints
  • referential constraints (i.e. foreign keys)
  • indexes
  • statistics
  • triggers
  • sequences

There are other objects like dimensions and quotas but those items will wait until a later time and date.

The first major difference between the old and new utilities is the use of DIRECTORY objects in the data pump utilities. The system DBA will need to create a directory object and grant privileges to it or use the default directory object named DATA_PUMP_DIR (I recommend creating your own so you have more control over who and what and how much).So as someone with the appropriate privileges:

CREATE USER test_schema IDENTIFIED BY "testing"
PROFILE DEFAULT
DEFAULT TABLESPACE users
QUOTA UNLIMITED ON users
ACCOUNT UNLOCK;
GRANT CONNECT,DBA TO test_schema;
CREATE OR REPLACE DIRECTORY test_dp_folder AS '/u69/oraldata/testdpump';
GRANT READ,WRITE ON DIRECTORY test_dp_folder TO test_schema;

Next create tables.

--- Time periods
CREATE TABLE time_periods
AS
SELECT
time_date
, EXTRACT(YEAR FROM time_date) AS time_year
, EXTRACT(MONTH FROM time_date) AS time_month
, EXTRACT(DAY FROM time_date) AS time_day
, TO_NUMBER(TO_CHAR(time_date,'d')) AS time_day_of_week
FROM (
SELECT TRUNC(SYSDATE,'YEAR')+(LEVEL - 1) AS time_date
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND ADD_MONTHS(TRUNC(SYSDATE,'YEAR'),36)-TRUNC(SYSDATE,'YEAR')
);
ALTER TABLE time_periods ADD (CONSTRAINT pk_time_periods PRIMARY KEY(time_date));
--- Customer table
CREATE TABLE customers
AS
SELECT
LEVEL AS customer_id
, 'Test customer #'||TO_CHAR(LEVEL,'000000000') AS customer_name
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 100;
ALTER TABLE customers ADD (CONSTRAINT pk_customers PRIMARY KEY(customer_id));
--- Purchase order table
--- A sequence
CREATE SEQUENCE seq_pk_purchase_orders;
SELECT seq_pk_purchase_orders.NEXTVAL FROM dual;
--- Table with foreign keys
CREATE TABLE purchase_orders
( po_id NUMBER
, po_date DATE
, po_cust NUMBER
, po_tag VARCHAR2(64)
, po_tot_items NUMBER
, po_tot_value NUMBER
, CONSTRAINT pk_purchase_orders PRIMARY KEY(po_id)
, CONSTRAINT fk_po_date FOREIGN KEY(po_date) REFERENCES time_periods(time_date)
, CONSTRAINT fk_po_cust FOREIGN KEY(po_cust) REFERENCES customers(customer_id)
);
--- A trigger that modifies the PO tag
CREATE OR REPLACE TRIGGER tr_po_tag
BEFORE INSERT ON purchase_orders FOR EACH ROW
BEGIN
:new.po_tag := CASE WHEN :new.po_tag IS NULL THEN 'SGPO' ELSE 'CUST '||:new.po_tag END;
END;
/

The SQL is used to insert test data into the purchase orders table

INSERT /*+ APPEND */ INTO purchase_orders
SELECT
seq_pk_purchase_orders.NEXTVAL
, TRUNC(TRUNC(SYSDATE,'YEAR')+DBMS_RANDOM.value(0,(SELECT COUNT(time_date) FROM time_periods)))
, TRUNC(DBMS_RANDOM.value(1,(SELECT COUNT(customer_id) FROM customers)))
, CASE MOD(r1,69) WHEN 0 THEN NULL ELSE DBMS_RANDOM.string('A',12) END
, CEIL(r2)
, CEIL(r2*123456)/100
FROM (
SELECT LEVEL AS r1, ABS(DBMS_RANDOM.normal) AS r2
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 123
) v1
;
COMMIT;