Saturday, January 21, 2017

Still here?

I'm still surprised this blog is still here, getting occasional comments and makes people contact me directly though Gmail. It is mostly for Oracle stuff. I don't do Oracle stuff anymore. Sorry, I've been prodded kicking and screaming to SQL Server and MySQL/MariaDB. And it's 2017 so obligatory 2017 post. See you in 2018.

Tuesday, March 08, 2016

Beginning cleanup

I haven't posted in this dump for a while. I think the "art" of blogging is a dying technology for a couple reasons. One - it has been replaced by Twitter, Pintrest, image boards, etc. Two the technology was inundated by everyone and cream that rose to the top was licked off by a cat or something leaving behind the bitter curds or coffee or however you wish to apply this. Thankfully there are Google filters to help weasel out the prolific idiots that post as much as they can to sell their consulting services. You know who they are for Oracle, SQL Server, Windows and Redhat. Yeah, those guys.

Anything no longer valid or usable will go bye bye.

Anything requiring an update might get it (depends upon how I'm feeling that day)

Good rants will be kept; at least 0.00000000000000000000017 percent of the planet likes to hear me complain.

The really good stuff will get published again.

Friday, January 31, 2014

Dealing with depression

I was going through my old, old posts on this blog and feel I need to update the Dealing with depression part of that post.

I have been diagnosed with a mild case of major depression disorder. I know that sounds strange, minor/major but it is mostly procedural. You need the MDD diagnosis code to get treatment covered by insurance.

Dealing with Depression

If you are like me, and (according to some news sources) like over thirty percent of the population in the United States, you deal with depression on (sometimes) a daily basis. These feelings usually manifest themselves in the form of feelings of hopelessness or you might feel tired all the time.
What not to do.

Drugs are NOT the Answer.
When I say drugs I am describing anything from prescription drugs like Prozac to binge drinking. Drugs are only a temporary solution to the problem and usually make things worse. I have seen some co-workers on the drug Prozac and they are both psychos, one step away from full-out fruitcake. I have family members on different prescriptions and they aren't any better. One has been on various prescriptions for twenty (yes, 20) years. The physicians just keep giving him the line "let's try this one". These quacks have no idea what they are doing, they just continue to milk the system. One of my friends smoked marijuana (i.e. pot, reefer, weed) to escape his depression. That loser is still depressed and now half his tenth-grade brain is gone. The common thread is that all of these people are still taking drugs and they are all still just as depressed, if not more than they ever were.

There have been at least three new classifications of anti-depression drugs that have entered the market since this was written. There is a blood test to determine levels of monoamine neurotransmitters (dopamine, norepinephrine, and serotonin) in your body and DNA tests to determine which drugs will work (or not work). A good, modern psychiatrist will offer at least the blood test; maybe the DNA test but both test are expensive and not all insurers are covering them yet. Once the results are in the psychiatrist will prescribe the best drugs to help with your depression. Drug affect people differently. What might work for one individual might be a nightmare for another. When starting most drugs expect the first week to four weeks to be hell – insomnia, diarrhea, anxiety, weight gain, weight loss, etc. After that breaking in period the medication should start working. I still believe there are plenty of quacks out there that simply a drug from the short list of what has worked best for their patients, hope it works, if it does not they try another one until some combination of the forty or so of these drugs work. The goal of the blood and DNA tests are to take the guesswork out of the equation (so insurance – would you rather pay for the tests and get it right or pay for a dozen quack visits).

Shrinks are NOT the Answer
If you a rock star or some wealthy Hollywood name, then a "shrink" might work for you. For the rest of us that need to work for a living, you might as well just stop someone on the street or sit next to someone at a fast food joint because you are likely to get the same answers. Basically you pay a "shrink" to listen and make observations but every single session boils down to a prescription (see above). They do not have the answers or if they do they are canned answers you could probably get from a box of cereal or Miss Cleo. I saw a psychiatrist once, and that was enough. There are no easy answers to anything, but there is something that can be prescribed.

Shrinks are necessary. The psychiatrist prescribes the drugs and the psychologist does the talk therapy. You usually need a psychologist referral to see a psychiatrist. My opinion on psychologists is still mostly the same. A therapist is good for three or four sessions before the information, dialogue and time off from life, etc. start becoming repetitive and in my opinion not of much use. If your employer has a wellness program for therapy sessions then by all means use them. First session should include doing paperwork to refer to a psychiatrist. If you suffer from chronic depression you likely need medication so get this started as soon as possible so you can have medical and verbal therapy working at the same time. The psychologist cannot prescribe medication. Use the first session to get all of your history with depression and any trauma at any point in your life in the open so the psychologist can assess your situation and then give you some first steps. The following sessions should clarify your situation and your psychologist should give you "tools" you can do to help your situation. Try doing what your psychologist says. Yes they are guessing but they work in what is common solutions to common problems; fix what is most common then work on the particulars. Eventually though, in my opinion, things start getting repetitive and the therapy starts getting beyond the scope of the psychologist. Obviously, if you have suffered a major trauma in your life do not listen to me and do therapy until you do not need it even if it takes decades. I speak for dealing with typical depression.

Another route is dealing with a family doctor first and then let your doctor do the referral if needed. I do not have the professional training to tell otherwise but here is what your doctor will do. They will prescribe one of the most commonly used generic medications first (Prozac, etc.). If that does not work they will either add second medication (effect a different neurotransmitter) or switch you to whatever brand medication they are being asked to pimp out (you will be given samples, a coupon card and a Rx). You might also be placed on Abilify (currently, a very popular add on drug; not sure if it is at the pimp stage at family practices yet but I wouldn't doubt it). If you are still depressed then you get the referral to a shrink. In my opinion, do the therapy route. Family doctors are great for keeping you healthy but there are reasons why there are specialists.

Is God the answer?
Maybe… Part of the problem with religion or other social groups is that they remove your self-esteem by placing everything in someone else's hands. Instead of depending upon yourself, being responsible, and understanding your strengths and weaknesses, everything is now controlled by an external entity (i.e. God or insert religious icon here). It works for you if you can completely give yourself. For most people it works temporarily until that first bout of doubt sets in, then there is the possibility (finishing my incomplete quote) for relapse and you might need to go to doctors anyways.

I really have no evidence to support or refute. My history with religion is personal and will not be discussed here. Best advice would be to not solely rely upon a deity for fixing depression. If you need medication for depression then see the right people and get on it; unless your deity is going to fix your brain chemistry for you.

My Solution?
Be depressed.
Seriously. Believe it or not, depression is a completely normal state of being just as happiness, sadness, ecstasy, and indifference are. It's okay to be depressed. If you are depressed, chances are good there is a reason why you are depressed. A couple of days later, you might still be depressed or you might do a one-hundred percent turn-around and be glowing with joy. It might even take a month or a year or your entire puberty. It's part of life. Depression makes happy days happier but can be overwhelming at times. Most people that "suffer" from depression have succumbed to it. Don't succumb to it. When the day is done, go over to the calendar and mark off the day because it's over.

Wow, that was a simple solution. I still believe the first part. Depression is normal and for most people it is not necessary to be happy all the time. If you had a rough day at work, had a family member leave or pass away, then you should feel depressed. That is natural. When depression overwhelms the day to day then it is time to look to therapy and medicine. Depression can require medication to cure. I still believe the second part because I think way too many people let depression (anger, anxiety, etc.) overwhelm them and they immediately go to something other than their own built in mechanisms for dealing with depression. I also think many people want that quick fix

I would also like to apologize for trivializing a lot of my previous statements. Depression is a serious condition. I was able to overcome depression in my early years through realization that occasional depression is acceptable as part of life, chances are good that most of what is making you depressed is out of your control and no medication or therapy is going to fix what you cannot control, and when I was depressed I should have been depressed. After dealing with friends and family dealing with depression, suicide, and attempted suicide I have come to learn a lot more about depression. Depression is a health issue and all people are not fruitcakes. Most of them suffer from mental health issues and are in treatment and with successful treatment are competent members of society.

The state of mental health in central Ohio is not good. There are not enough facilities, emergency facilities and treatment options available to individuals with mental health issues. Recently, one patient had to spend three days in an ER hallway waiting for a bed to open in the mental health ward. I had to wait six months to get into a psychiatrist office.

Here is a link I found useful during my life with depression:

Wednesday, August 11, 2010

Oracle FULL JOIN or UNION ALL

Someone was kind enough to argue with me recently about doing a UNION ALL instead of a FULL JOIN in Oracle (in a MERGE). This same poop chute also posted a spam link that probably led to some Trojan downloader.

To make a long post short, Oracle usually does what it thinks is the best way of fetching data and it is spot on most of the time. Hints and fresh statistics can be used when mother optimizer is not right.

Consider this table:

CREATE TABLE megacrap AS
SELECT
LEVEL AS mega_id
, TO_CHAR(TRUNC(SYSDATE)-LEVEL,'MM/DD/YYYY') AS mega_varchar
, TRUNC(SYSDATE,'YEAR')+LEVEL AS mega_date
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 4000;
ALTER TABLE megacrap ADD(CONSTRAINT pk_megacrap PRIMARY KEY(mega_id));

Updated with this MERGE:

MERGE INTO megacrap m
USING (
SELECT
NVL(v1.mega_id,t1.mega_id) AS mega_id
, v1.mega_varchar
, v1.mega_date
FROM megacrap t1
FULL JOIN (
SELECT
LEVEL AS mega_id
, TO_CHAR(TRUNC(SYSDATE)-LEVEL,'MM/DD/YYYY') AS mega_varchar
, DECODE(LEVEL,420,NULL,TRUNC(SYSDATE,'YEAR')+LEVEL) AS mega_date
FROM DUAL
CONNECT BY ROWNUM BETWEEN 1 AND 4100 --- 100 new rows
) v1 ON v1.mega_id = t1.mega_id
WHERE v1.mega_id IS NULL OR v1.mega_id NOT IN(1,2,69) --- delete 1,2,69
) u
ON (m.mega_id = u.mega_id)
WHEN MATCHED THEN
UPDATE SET
m.mega_varchar = u.mega_varchar
, m.mega_date = u.mega_date
WHERE DECODE(m.mega_varchar,u.mega_varchar,1,0) = 0 OR DECODE(m.mega_date,u.mega_date,1,0) = 0
DELETE WHERE m.mega_varchar IS NULL
WHEN NOT MATCHED THEN
INSERT VALUES (
u.mega_id
, u.mega_varchar
, u.mega_date
);

Guess what the plan looks like: MERGE STATEMENT

  • MERGE MEGACRAP
    • VIEW
      • HASH JOIN OUTER
        • VIEW
          • UNION ALL
            • HASH JOIN RIGHT OUTER
              • VIEW
                • COUNT
                  • CONNECT BY WITHOUT FILTERING
                    • FAST DUAL
              • INDEX PK_MEGACRAP FAST FULL SCAN
            • NESTED LOOPS ANTI
              • VIEW
                • COUNT
                  • CONNECT BY WITHOUT FILTERING
                    • FAST DUAL
              • INDEX PK_MEGACRAP UNIQUE SCAN
        • TABLE ACCESS MEGACRAP FULL

Mother optimizer decided a UNION ALL was the best option. Now which SQL statement do you want to code and maintain? The FULL JOIN above or the UNION ALL with GROUP BY or UNION ALL with GROUP BY and ANTI JOIN?

Oracle Database SQL Language Reference 11g Release 2 (11.2)

Tuesday, April 20, 2010

The Oracle CBO statistical time bomb

Ever have a query or process that runs fine for years and years and years but then one day just slows to a crawl, going from seconds to hours in execution time? If so, you have run into what I call the Oracle CBO statistical time bomb.

Oracle 10g/11g has a daily job that queries for "stale" and missing optimizer statistics on Oracle table/index objects. Those optimizer statistics are used by the optimizer when choosing the best access plan, join types, index and table scans to use for a SQL query. The default for "stale" is if 10 percent of the data have changed (static in 10g, can be changed in 11g).

Let's assume we have this really, really big invoice table that grows at a constant rate over time. This table has an index on the financial quarter. Each quarter roughly 25,000 new rows are inserted, and then updated 3-5 times each (as status changes, etc. whatever), and some idiot needs to clean up a couple hundred bogus rows each month. Here is the table for 2010:

Financial    Total  Insert  Update  Delete  Changed  Percent
Quarter Rows Rows Rows Rows Rows Changed
2010 Q1 1,000,000 25,000 80,000 169 105,169 %10.51 - daily job gathers statistics
2010 Q2 1,024,836 25,000 80,000 169 105,169 %10.26 - daily job gathers statistics
2010 Q3 1,049,672 25,000 80,000 169 105,169 %10.01 - daily job gathers statistics
2010 Q4 1,074,508 25,000 80,000 169 105,169 %09.78 - not stale yet, still have 2010 Q3 statistics

Without current statistics, the Oracle cost-based optimizer thinks one index and access method would be more efficient than a different index and access method (when it really, really, really is not).

I'll need an Oracle Guru to correct me but here is what happens in 2010 Q4:

Your super-optimized, meticulously coded query, queries the invoices by quarter and by another column existing in a massive subquery.

For 2010 Q1,Q2,Q3 the CBO says there are 25,000 rows reading the quarter index and likely less rows using an index on another column (customer number perhaps) if it does the massive subquery first. The CBO got it right and the massive subquery executes first then reads the table via index on the other column.

For 2010 Q4 the CBO says there are no/few rows for Q4 (when there are really 25,000) so read the invoices using the quarter index and then do a nested loop each time on the massive subquery because the cost of the read by quarter index times the cost of the massive subquery is less than the plan it took above (when it really is not). Instead of doing that nested loop on a couple rows (yawn) it does that nested loop massive subquery 25,000 times. Yikes.

What to do? Plan ahead.

Go to Tahiti and research DBMS_STATS in the Oracle Database PL/SQL Packages and Types Reference. Know your tables and diffuse the time bomb before it goes off. Let the daily job handle the stale statistics. Tables that contain an index on a period of time (quarter, year, etc.) or code based on a period of time (i.e. the Spring 2010 line of Mountain Dew with Ginseng and Xanax) should be researched as candidates for statistics gathering outside of Oracle's daily job. Gather table statistics before the big processing periods.

References

Oracle and SSIS; ORA-00903 during verification

Can you tell I have been suppressing my pain and hate up into a nice little ball, instead of talking about my feelings while laying in some lush, peaceful green meadow, sipping Chamomile tea and popping anti-depressants like candy.

This issue is one of those classic, what the **** is going on here errors.

Here is what I am suffering with:

  • Windows XP and Windows Server 2003
  • SQL Server 2008 Integration Services
  • Native OLE DB\Oracle Provider for OLE DB, Oracle 11g R1 or Oracle 10g
  • OLE DB Source (Oracle) using a query.

The Problem

Getting ORA-00903 table or view not found during execution on verification step on Data Flow OLE DB Oracle source; even though at design time (in the OLE DB source editor dialog) the SQL parses correctly AND pulls back test data in the source AND the query runs successfully everywhere else using the same credentials.

The Solution

Drink.

Smoke.

Break something.

Then:

  • in Oracle, CREATE a VIEW for the query
  • if the query contains a user function you will also need to in Oracle, GRANT EXECUTE ON schema.function TO view_schema_owner WITH GRANT OPTION
  • in Oracle, GRANT SELECT on VIEW to the USER used by SSIS (if necessary)
  • in SSIS, use the Advanced Editor on the OLD DB source; Component Properties; AccessMode = OpenRowset, OpenRowset = SCHEMA.view (standard editor will not pull up the view and you are limited to a drop-down)

There that was painless, right?

I have been told that shutting down Visual Studio sometimes helps (it did not). Shutting down Visual Studio works if you switch databases in the connection manager and then get ORA-00903 table or view not found in the OLE DB source. One co-worker stated he rebooted his machine and then things worked (it did not).

I'm sure I was doing something stupid like trying to use a subquery or SYSDATE but WTF; it parsed, it pulled back test data, it works everywhere but the verification step when debugging it.

****!

Oracle and SSIS; SQL tasks and procedures

I am posting today to hopefully prevent others from the hell I have been through. I really have no idea why I continue to do this as my psyche at this point is so eroded that I should be deriving pleasure from the suffering of others like some serial killer in the making.

Getting Oracle and Microsoft to play nice is like trying to get Sean Hannity and Nancy Pelosi to share a quiet evening together making babies while discussing health care and what to spend their money on. If that reference is dated then consider the choice of wiping with toilet paper or the large grain sandpaper and then using the sandpaper; or perhaps shoving a sea urchin up your nose then pulling it out the other nostril.

Here is what I am suffering with:

  • Windows XP and Windows Server 2003
  • SQL Server 2008 Integration Services
  • Native OLE DB\Oracle Provider for OLE DB, Oracle 11g R1 or Oracle 10g
  • Execute SQL Task trying to execute a PROCEDURE or PACKAGE.PROCEDURE that has input or output parameters

Connection Manager

The only non-default property is RetainSameConnection set to True (connections to Oracle are expensive). Personally, I see no need for resource pooling but that is another post.

Execute SQL Task

General

ConnectionType
OLE DB
SQLSourceType
SQLStatement:
{call your_stupid_package.dumb_ass_procedure(?,?)}
or
BEGIN your_stupid_package.dumb_ass_procedure(?,?); END;
IsQueryStoredProcedure
False (usually greyed out, but can be set via Properties, but has no bearing on anything)

Parameter Mapping; Add each parameter:

Variable Name
whatever User::variable you have defined already
Direction
Input or Output
Data Type
VARCHAR (I have not been able to get anything else to work yet)
Parameter Name
the left to right sequence number of the parameter starting at zero (so, if you only have one the parameter name is 0)
Parameter Size
the size of the variable

Conclusion

I would cite references but that vast majority of searches dead-end with single posts in forums where some poor slob went begging for help and found none; or sometimes found a reply from another kindred spirit also suffering through the same agony. Those who did have solutions were not passing parameters or were using an Expression to build the query string with parameters (doing their own SQL injection; which also does not help those using output parameters).

The above works, with a couple warnings, but works.

Thursday, February 25, 2010

Oracle MERGE, type 2 dimension

I decided to look into this topic while learning SQL Server Integration Services and debating with my caffeine soaked brain if the slowly changing dimension task container would be more efficient than a T-SQL MERGE statement. I began wondering why the hell we were planning on shipping about 40 gigabytes per day out of Oracle into tables in SQL Server that are truncated (emptied) and rebuilt then query to determine what 2 kilobytes worth of changes took place that day. Why on earth would we want to stress out that multi-core, refrigerator sized server when an underpowered server and network and storage can do all the work?

Fact tables and dimensions are part of data warehousing. A slowly changing dimension is a dimension that slowly changes over time (duh). There are three types of slowly changing dimensions:

  • Type 1: Update the old values with the new values (do not give a crap about history)
  • Type 2: Create a new row in the dimension with a new primary (surrogate) key, and optionally update existing rows
  • Type 3: Update the old values with the new values, and add additional data to the table (date of the change, keep one level of history)

You would figure type 3 would come before type 2 but who cares..? This dimension uses a generated surrogate key for uniqueness (a sequence) and dates to keep track of when the row was effective with a NULL end date indicating this row is the current version of the foreign primary key value and all others contain historical data and attributes.

The same logic used to maintain slowly changing dimensions can also be applied to tracking historical changes to tables.

The code that follows assumes only additions and changes are taking place; after all nobody ever deletes data do they? The dimension table is simple, no clustering, no partitioning, bit-map indexes, or compression, and is purely for example:

CREATE TABLE crap_dimension
( dim_pk_surrogate NUMBER
, dim_version_beg_date DATE
, dim_version_end_date DATE
, dim_pk_foreign VARCHAR2(10)
, dim_description VARCHAR2(60)
, dim_bulb VARCHAR2(05)
, dim_sum VARCHAR2(08)
)

The MERGE SQL statement is best used for inserts and updates. Here is the single statement used for the entire dimension update:

MERGE /*+ APPEND */
INTO crap_dimension m
USING (
--- Step 1 BEG :: Select new and changed data for the USING section of the MERGE
SELECT
DECODE(v2.dim_upsert_level,1,v2.dim_pk_surrogate,0) AS dim_pk_surrogate
, v1.dim_pk_foreign
, v1.dim_description
, v1.dim_bulb
, v1.dim_sum
FROM (
--- Step 1a BEG :: Collect data from your OLTP or ODS or POS
SELECT
crap_id AS dim_pk_foreign
, crap_desc AS dim_description
, crap_poop AS dim_bulb
, crap_turd AS dim_sum
FROM crap_oltp_or_ods_source
--- Step 1a END
--- Step 1b BEG :: Remove identical data attributes already in the dimension
MINUS
SELECT
, dim_pk_foreign
, dim_description
, dim_bulb
, dim_sum
FROM crap_dimension
WHERE dim_version_end_date IS NULL
--- Step 1b END
) v1
--- Step 1c BEG :: two rows are needed for updates, update existing and insert new data, LEFT JOIN cartesian product of
LEFT JOIN (
SELECT
dim_pk_surrogate
, dim_pk_foreign
, dim_upsert_level
FROM crap_dimension
, (SELECT LEVEL AS dim_upsert_level FROM DUAL CONNECT BY LEVEL BETWEEN 1 AND 2)
WHERE dim_version_end_date IS NULL
) v2 ON v2.dim_pk_foreign = v1.dim_pk_foreign
--- Step 1c END
--- Step 1 END
) u
ON (m.dim_pk_surrogate = u.dim_pk_surrogate)
--- Step 2 BEG :: If update, mark the end date
WHEN MATCHED THEN UPDATE
SET m.dim_version_end_date = SYSDATE
--- Step 2 END
--- Step 3 BEG :: Insert new data and the new versions of changed data
WHEN NOT MATCHED THEN INSERT
VALUES (
dim_pk_surrogate_seq.NEXTVAL
, SYSDATE
, NULL
, u.dim_pk_foreign
, u.dim_description
, u.dim_bulb
, u.dim_sum
)
--- Step 3 END
;

I tried to break down the statement into steps.

Step 1a collects data from your ODS or OLTP and can be a simple query or a gigantic, complicated parallel query with a hundred joins and aggregate steps. The important thing to remember is we only want to do this expensive step once. When 1a completes we will have a fairly large result set.

Step 1b subtracts all identical data attribute rows that already exist in the dimension from the previous result set. This should significantly reduce the working set of rows at the expense of a full table scan through our dimension. When 1b completes the working set will contain new data and changed data from the source tables.

Step 1c creates data for updates. If data in the source has changed then two rows will be needed for the dimension; one row to update the existing data making the row historical and one row to insert for the new data making it the current row. The surrogate and foreign primary keys for all current rows are Cartesian joined to values 1 and 2 (one full table scan then hash join instead of two full table scans) and then left joined to the previous result set. New data from the source will not match data and will remain as one row with a NULL surrogate primary key. Changes from the source will match and join to the two rows in the query creating the two rows needed to do the update and insert. The surrogate primary key will be 0 for inserts (assuming the sequence started at 1 or higher) and valued for updates.

Step 1 finishes with rows containing changes to be applied to the dimension. Step 2 updates historical rows simply setting the end date to the current date. Step 3 inserts new data from the source and the new current version of the row for changed data.

Tuesday, December 29, 2009

Yahoo! sucks on my netbook

I tried the new Yahoo! Mail today. I switched back to "classic" mail immediately. The first issue:

There seems to be a screen resolution problem. Your screen resolution is set below our minimum recommendation. When it's set under 1024 x 768 pixels, the all-new Yahoo! Mail won't look as good. You can continue with your current settings, but you won't be able to enjoy the full effect.

Really? My netbook has a 1024x600 resolution and I can go to 1024x768 with a scrolling desktop. The scrolling desktop is annoying and makes things unusable. Applications should scroll, not desktops.

The new mail does not render well in Opera. I use Opera because I think the scaling feature is superior to Firefox, IE and Chrome. It is nice to be able to size the contents of the web page to 70 to 90 percent while maintaining usability and readability. That is a nice feature when you have that 1024 x 600 netbook display. At 100 percent it looks like crap and does not get any better at zoomed levels.

I tried using Firefox and IE to see the full effects I should be enjoying. There are fixed frames and an annoying status bar, menu, message scroll, etc. I can see why they want a large fixed resolution.

I have been a Yahoo! mail user since 1998. That is over a decade of using Yahoo! One of the reasons why I liked Yahoo! was simplicity. While other web sites were going through the phases of hipness like massive image filled pages, Java applets, tons of unwanted JavaScript, excessive use of CSS and everything - Yahoo! kept most things simple. As the years passed, however, all of their services and applications got fatter. Messenger got fatter so I ended up using Gaim (Pidgin). Mail started getting fatter so I started using Google Mail more. Workstations are more powerful now than years ago and are able to handle newer technologies but it seems like some applications and web sites are forgetting that hardware using them are getting smaller or are coming from non-traditional sources (like game consoles and media devices). 

Maybe I am just getting old.

(I also realize the reason for some of the new Web 2.0 stuff is for smaller devices like phones and hand-helds).

Monday, November 23, 2009

Oracle Data Pump for data refreshes part 4

Part one, part two and part three.

This is a follow-up to the third part of this series with some real-world examples to make corrections and to justify some points.

I ran into a known 10g bug when attempting a DATA_ONLY export using parallel processing (ORA-00001: unique constraint () violated. The bug was fixed with 11g (Oracle forums). The source database was small so I did ran parallel=1 to get it to work. For larger databases that might not be feasible. The next test would be to modify the first export to get data and metadata and then do the DATA_ONLY on the first import step.

Other than the error above, the export and import for the first refresh went smoothly. The second import was more of a complete account copy do I could not use exactly what I specified in part three.

I did one export and one import. The export went quickly and without error. The import ground to a halt at the TABLE_STATISTICS step due to a bug in the Oracle Data Pump when schema and/or tablespace remapping is used. I had to terminate the import. Forums suggest using a EXCLUDE=STATISTICS in the parameter file to avoid these two incredibly slow steps and then rebuilding the statistics after the import (done through Enterprise Manager or whatever). The export of 57 GB of data took roughly 12 minutes. The import of that data took roughly 20 minutes for just the data. Source objects seemed to take a while and, as mentioned, the statistics took forever.

The data pump is a remarkable improvement over the previous import and export utilities. What took hours before can now be accomplished in less than an hour.

More to come.

Thursday, October 29, 2009

Oracle Data Pump for data refreshes part 3

Part one and part two.

This part gives you the UNIX scripts and SQL that seem to work best in the environments I work in. the standard disclaimers apply. This likely is not the best way of going about this (or it might be) and I provide zero support for this. If it works for you, fantastic; send me some beer. If it frustrates you to no end because most of it works but you constantly find little annoyances and Oracle manuals are absolutely no help; welcome to my world.

The following UNIX steps assume the user on the server has appropriate privileges to execute Oracle binaries and has the environment set correctly.

prod_expdp.sh

This shell script sets the ORACLE_SID environment variable and then executes the data pump export utility twice; once for data and once for sequence metadata.

#!/bin/sh
# prod_expdp.sh - run datapump exports for a complete refresh of PROD to TEST
#
ORACLE_SID=prod
expdp test_schema/testing PARFILE=prod_expdata.par
expdp test_schema/testing PARFILE=prod_expmeta.par

prod_expdata.par

The next two files are parameter files used by the export. The contents of the first file, for the data extract, is self-explanatory. The PARALLEL parameter was not used but included here because you will want some level of parallelism for optimal extract performance. The %U on the DUMPFILE parameter assigns a unique number to each extract file for parallel processing (4 files in this example). This parameter file should contain any EXCLUDE statements for tables that do not need to be exported.

DIRECTORY=test_dp_folder
DUMPFILE=prod_data_%U.dmp
LOGFILE=prod_data_exp.log
CONTENT=DATA_ONLY
SCHEMAS=test_schema
PARALLEL=4

prod_expmeta.par

The second parameter file is for sequence metadata. Why two exports? I have two reasons why, in my opinion, there should be two. The main reason was to separate metadata (small) from table data (massive). You cannot do a metadata and data import in the same import job without doing all objects and I did not want the second step processing a gigantic data extract file just to get the microscopic slice of metadata. Second reason, INCLUDE and EXCLUDE are exclusive; only one can be used per execution and most extracts will want to exclude table data (if any).

DIRECTORY=test_dp_folder
DUMPFILE=prod_meta_%U.dmp
LOGFILE=prod_meta_exp.log
CONTENT=METADATA_ONLY
SCHEMAS=test_schema
INCLUDE=SEQUENCE

Extracted files will need to be relocated to the test server.

test_impdp.sh

This shell script sets the ORACLE_SID environment variable, executes SQL*Plus to build some SQL, executes the data pump import utility twice - once for data and once for sequence metadata, and then executes SQL*Plus to do post import SQL and invalid object compiles.

#!/bin/sh
# test_impdp.sh - run pre/post SQL create SQL, datapump imports for a complete refresh of PROD to TEST
#
ORACLE_SID=test
sqlplus -s test_schema/testing @test_impsql.sql
impdp test_schema/testing PARFILE=test_impdata.par
impdp test_schema/testing PARFILE=test_impmeta.par
sqlplus -s test_schema/testing @test_imp_post.sql
sqlplus -s test_schema/testing @compile_invalid.sql
sqlplus -s test_schema/testing @compile_invalid.sql
sqlplus -s test_schema/testing @compile_invalid.sql

test_impdata.par

The data import parameter file is similar to the data export parameter file but with the extra TABLE_EXISTS_ACTION parameter instructing the data pump to truncate (empty) the table if it exists.

DIRECTORY=test_dp_folder
DUMPFILE=prod_data_%U.dmp
LOGFILE=prod_data_imp.log
CONTENT=DATA_ONLY
SCHEMAS=test_schema
PARALLEL=4
TABLE_EXISTS_ACTION=TRUNCATE

test_impmeta.par

The metadata import parameter file is self-explanatory.

DIRECTORY=test_dp_folder
DUMPFILE=prod_meta_%U.dmp
LOGFILE=prod_meta_imp.log
CONTENT=METADATA_ONLY
SCHEMAS=test_schema

test_impsql.sql

This SQL script is where all the magic happens. This is where all of the pain-in-the-ass, that is damned annoying, why oh why Oracle do you punish us you gigantic corporate fudge-packers, gets taken care of. This script dynamically builds two SQL scripts: a pre-import script and a post-import script. All enabled triggers and referential constraints (foreign keys) must be queries so they can be disabled and then enabled. Do not enable a constraint or trigger that was disabled; it was likely disabled for a reason likely by some developer behind your back. The script must also query all sequences so they can be dropped before import. The data pump import does not have a parameter or option to replace an existing sequence, so sequences must be dropped pre-import. Dropping the sequence causes two problems: privileges granted to those sequences will be lost and all source types containing the sequence will become invalid. The post-import SQL must grant privileges so this SQL script will query existing privileges (the schema owner must be able to query the all_tab_privs view).

SET FEEDBACK OFF TERMOUT OFF TRIMSPOOL ON VERIFY OFF PAGESIZE 0 LINESIZE 200
SPOOL test_imp_pre.sql
SELECT 'ALTER TRIGGER '||trigger_name||' DISABLE;'
FROM user_triggers
WHERE base_object_type = 'TABLE' AND status = 'ENABLED'
ORDER BY trigger_name;
SELECT 'ALTER TABLE '||table_name||' DISABLE CONSTRAINT '||constraint_name||';'
FROM user_constraints
WHERE constraint_type = 'R' AND status = 'ENABLED'
ORDER BY table_name;
SELECT 'DROP SEQUENCE '||sequence_name||';'
FROM user_sequences
ORDER BY sequence_name;
SPOOL OFF
SPOOL test_imp_post.sql
SELECT 'ALTER TRIGGER '||trigger_name||' ENABLE;'
FROM user_triggers
WHERE base_object_type = 'TABLE' AND status = 'ENABLED'
ORDER BY trigger_name;
SELECT 'ALTER TABLE '||table_name||' ENABLE CONSTRAINT '||constraint_name||';'
FROM user_constraints
WHERE constraint_type = 'R' AND status = 'ENABLED'
ORDER BY table_name;
SELECT 'GRANT '||privilege||' ON '||object_name||' TO '||grantee||';'
FROM user_objects
JOIN all_tab_privs ON table_schema = USER AND table_name = object_name
WHERE object_type = 'SEQUENCE'
ORDER BY object_name;
SPOOL OFF
@test_imp_pre.sql
QUIT

compile_invalid.sql

The generic SQL script will query invalid source objects (made invalid by the sequence being dropped), create SQL to compile those invalid source objects, and then execute that SQL. This generic script will need to be executed at least once but up to six times due to objects being invalidated when an object is compiled. For example: if procedure A calls procedure B that uses sequence C, both A and B will be invalid when the sequence is dropped. If A then B is compiled, A will be valid until B is compiled then will become invalid. The second execution of the compile script will compile A and since nothing depends upon A we are done.

SET FEEDBACK OFF TERMOUT OFF TRIMSPOOL ON VERIFY OFF PAGESIZE 0 LINESIZE 200
SPOOL compile_invalid_pass.sql
SELECT 'ALTER '||DECODE(SUBSTR(object_type,1,7),'PACKAGE','PACKAGE',object_type)||' '||object_name||' COMPILE'||DECODE(object_type,'PACKAGE BODY',' BODY;','PACKAGE',' PACKAGE;',';')
FROM user_objects
WHERE object_type IN ('TRIGGER','PACKAGE BODY','PACKAGE','FUNCTION','PROCEDURE') AND status = 'INVALID'
ORDER BY object_name, object_type;
SPOOL OFF
@compile_invalid_pass.sql

Sequence of Events

  1. run export script
    1. export table data
    2. export metadata for sequences
  2. move exported data to test server (if not on the production server)
  3. run import script
    1. run SQL to create pre and post-import SQL
      1. query referential constraints that need to be disabled
      2. query triggers that need to be disabled
      3. query sequences that need to be dropped
      4. query referential constraints that need to be enabled
      5. query triggers that need to be enabled
      6. query privileges to sequences that need to be granted on sequences that were dropped and then imported
      7. run the pre-import SQL
        1. disable referential constraints
        2. disable triggers
        3. drop sequences
    2. import table data
    3. import metadata for sequences
    4. run the post-import SQL
      1. enable referential constraints
      2. enable triggers
      3. grant privileges on sequences to users
    5. run compile SQL
      1. query for invalid objects
      2. compile source types made invalid after the sequence drop
    6. run compile SQL
      1. query for invalid objects
      2. compile source types made invalid after the last compile
    7. run compile SQL
      1. query for invalid objects
      2. compile source types made invalid after the last compile

You are probably thinking to yourself “holy fudging fudge, all of that bullfrog for a stupid little sequence?” I know I am. Hundreds of other individuals are thinking the same thing. There are other ways of handling sequences. Generate DDL for the sequence and then drop and create it; but that has the same effect as above (loss of privileges and invalid objects). Select MAX() and use that value to build SQL to drop and create it; same problems as above. Select MAX(), alter the increment by property of the sequence, get the next value, then restore the increment by property. The last two would require prior knowledge of the table(s) and column(s) using the sequence. One of our production databases is a vendor package that has over forty sequences so some up-front work would be required to match sequence to column to table. The last solution is cleanest because it does not require the recompilation of source objects or restoration of privileges. The whole issue could be avoided in the value of the sequence was considered a data object instead of a metadata object.

The next part will hopefully deal with the dirty details like how much undo/redo is generated, what about performance, etc.