Monday, September 21, 2009

Oracle Data Pump for data refreshes part 1

This post will be the first in a multi-part post about the Oracle data pump utility. My goal for this series of posts is to (hopefully) provide a road map for Oracle database users to use the data pump utilities to refresh one or more test/development databases from a periodic snapshot of production data.

The data pump import and export utilities were added alongside the older export and import utilities in version10g of Oracle Database. These utilities can be used to make logical backups of metadata and data in databases from the entire database down to an individual table.

The first thing I noted when doing a snapshot of a production database at work is that when enough parallel processes are given to the data pump it smokes the pants off the previous utilities. What took hours now took minutes. That made me happy and I am sure my arteries will be happy without the 4am runs to White Castle while the export chugged along. Note, I have not done an import of production data yet and there is a reason for that – the unknown.

Like any good administrator I wanted to understand this new tool before I just started restoring data to databases. What about jobs? How do I handle sequences? Unfortunately the Oracle documentation is painfully lacking in that area. The 10g documentation for the old export has a section called "considerations when exporting database objects" that explains how to handle sequences, LOB columns, synonyms, etc. It also has a lengthier section on "considerations when importing database objects" which was also helpful. The 10g and 11g documentation has examples but lacks the depth of information the previous two utilities had. The 10g documentation has more useful information about the old utilities than it does about the fancy new ones and the 11g documentation for the data pump is about the same as the 10g data pump (11g has a legacy mode to support the old utility parameters, btw). This meant I had to set up a database to test the following items that could be affected by a data refresh:

  • constraints
  • referential constraints (i.e. foreign keys)
  • indexes
  • statistics
  • triggers
  • sequences

There are other objects like dimensions and quotas but those items will wait until a later time and date.

The first major difference between the old and new utilities is the use of DIRECTORY objects in the data pump utilities. The system DBA will need to create a directory object and grant privileges to it or use the default directory object named DATA_PUMP_DIR (I recommend creating your own so you have more control over who and what and how much).So as someone with the appropriate privileges:

CREATE USER test_schema IDENTIFIED BY "testing"
PROFILE DEFAULT
DEFAULT TABLESPACE users
QUOTA UNLIMITED ON users
ACCOUNT UNLOCK;
GRANT CONNECT,DBA TO test_schema;
CREATE OR REPLACE DIRECTORY test_dp_folder AS '/u69/oraldata/testdpump';
GRANT READ,WRITE ON DIRECTORY test_dp_folder TO test_schema;

Next create tables.

--- Time periods
CREATE TABLE time_periods
AS
SELECT
time_date
, EXTRACT(YEAR FROM time_date) AS time_year
, EXTRACT(MONTH FROM time_date) AS time_month
, EXTRACT(DAY FROM time_date) AS time_day
, TO_NUMBER(TO_CHAR(time_date,'d')) AS time_day_of_week
FROM (
SELECT TRUNC(SYSDATE,'YEAR')+(LEVEL - 1) AS time_date
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND ADD_MONTHS(TRUNC(SYSDATE,'YEAR'),36)-TRUNC(SYSDATE,'YEAR')
);
ALTER TABLE time_periods ADD (CONSTRAINT pk_time_periods PRIMARY KEY(time_date));
--- Customer table
CREATE TABLE customers
AS
SELECT
LEVEL AS customer_id
, 'Test customer #'||TO_CHAR(LEVEL,'000000000') AS customer_name
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 100;
ALTER TABLE customers ADD (CONSTRAINT pk_customers PRIMARY KEY(customer_id));
--- Purchase order table
--- A sequence
CREATE SEQUENCE seq_pk_purchase_orders;
SELECT seq_pk_purchase_orders.NEXTVAL FROM dual;
--- Table with foreign keys
CREATE TABLE purchase_orders
( po_id NUMBER
, po_date DATE
, po_cust NUMBER
, po_tag VARCHAR2(64)
, po_tot_items NUMBER
, po_tot_value NUMBER
, CONSTRAINT pk_purchase_orders PRIMARY KEY(po_id)
, CONSTRAINT fk_po_date FOREIGN KEY(po_date) REFERENCES time_periods(time_date)
, CONSTRAINT fk_po_cust FOREIGN KEY(po_cust) REFERENCES customers(customer_id)
);
--- A trigger that modifies the PO tag
CREATE OR REPLACE TRIGGER tr_po_tag
BEFORE INSERT ON purchase_orders FOR EACH ROW
BEGIN
:new.po_tag := CASE WHEN :new.po_tag IS NULL THEN 'SGPO' ELSE 'CUST '||:new.po_tag END;
END;
/

The SQL is used to insert test data into the purchase orders table

INSERT /*+ APPEND */ INTO purchase_orders
SELECT
seq_pk_purchase_orders.NEXTVAL
, TRUNC(TRUNC(SYSDATE,'YEAR')+DBMS_RANDOM.value(0,(SELECT COUNT(time_date) FROM time_periods)))
, TRUNC(DBMS_RANDOM.value(1,(SELECT COUNT(customer_id) FROM customers)))
, CASE MOD(r1,69) WHEN 0 THEN NULL ELSE DBMS_RANDOM.string('A',12) END
, CEIL(r2)
, CEIL(r2*123456)/100
FROM (
SELECT LEVEL AS r1, ABS(DBMS_RANDOM.normal) AS r2
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 123
) v1
;
COMMIT;

No comments: