SQL Tuning Question

Dear all,

I know it’s not a Toad related issue, but would like to ask experts’ help in
tuning an Update process. One Update suddenly ran 2 hours vs. normally it runs
10 + minutes. To my knowledge, nothing was changed (of course). Following is the
SQL and execution plan. The DB is in 10g. Any help will be greatly appreciated.
I am planning to rebuild the indexes and then gather new stats just for sake of
better performance.

UPDATE prembr_834_data_KY pd1

SET mbr_age = trunc(to_number(file_effective_date -
to_date(mbr_ymdbirth,‘yyyymmdd’))/365.25 ),

class_value =

(

SELECT DISTINCT (trim(lockin_grp_new)||trim(lockin_div_new))

FROM prembr_834_lockupdt_KY pu1

WHERE pu1.mbr_medicaid_id = pd1.mbr_medicaid_id

AND trim(pu1.class_value_834) = trim(pd1.class_value)

)

WHERE mbr_medicaid_id in

(

SELECT mbr_medicaid_id

FROM prembr_834_lockupdt_KY

);

Execution Plan:

UPDATE STATEMENT Optimizer=ALL_ROWS (Cost=758)

UPDATE OF PREMBR_834_DATA_KY

HASH JOIN (RIGHT SEMI) (Cost=758 Cardinality=3992 Bytes=331336)

INDEX (FAST FULL SCAN) OF IX_PREMBR_834_LOCKUPDT_01 (INDEX) (Cost=5
Cardinality=2347 Bytes=35205)

TABLE ACCESS (FULL) OF PREMBR_834_DATA_KY (TABLE) (Cost=752 Cardinality=3992
Bytes=271456)

HASH (UNIQUE) (Cost=11 Cardinality=1 Bytes=53)

TABLE ACCESS (BY INDEX ROWID) OF PREMBR_834_LOCKUPDT_KY (TABLE) (Cost=10
Cardinality=1 Bytes=53)

INDEX (RANGE SCAN) OF IX_PREMBR _834_LOCKUPDT_01 (INDEX) (Cost=1 Cardinality=9)

Thanks,

Lisa

Good idea on stats – very possible that tables and/or indexes grew and crossed
a threshold and performance suffered. DBA’s in 10G need to create jobs to
keeps key schema stats updated lest these kinds of things can happen.

One possible big hit is the full table scan on the table you are updating. That
would be dependent on the number of rows in your update table.

The explain plan shows prembr_834_data_KY table doing a full table scan with an
index usage against the prembr_834_lockupdt_KY table. That seems like a likely
candidate for your issue.

It would make more sense if your update table had fewer rows in it then your
primary table. After you update the stats, this part of the explain plan may
reverse itself and correct the issue.

Roger S.

Thank you for all tips. I added a hint in to rid of FTS. However, the cost went
to 3 times higher. Because it’s a production DB, I haven’t tested it the run
time with a hint. The Dev and Test DBs are fine. I’ll rebuild indexes and update
stats during maintenance window tonight, and keep you updateed.
Thanks again,

Lisa

Keep in mind the common misconception that any cost that is higher is worse –
that is not always true. I have no idea in your case – but don’t just always
assume higher cost means worst performance J

Morning Lisa,

On 27/02/12 22:45, Lisa Hu wrote:

Dear all,
I know it’s not a Toad related issue, but would like to ask experts’
help in tuning an Update process.One Update suddenly ran 2 hours vs.
normally it runs 10 + minutes.To my knowledge, nothing was changed (of
course).

I've had a similar problem on 10g myself, but it was not related to the
SQL or any changes, in fact, it was quite a pain to track down, however,
just in case it's relevant:

  • The problem wasn't always seen, only on occasions;
  • The database is in archive log mode;
  • The archives are written to the Flash Recovery Area (FRA);
  • The parameter DB_RECOVER_FILE_DEST_SIZE is set (too small!!!);
  • The database is backed up nightly by RMAN;
  • RMAN has an archive log deletion policy set to "backed up three times";
  • RMAN backs up to disc outside the FRA;
  • The code affected runs at night;

Because the recover file dest size parameter was set too small,
archiving hung on occasions during the code run as the code was creating
a large number of archived logs.

The archiver stopped, wrote a message to the alert log and the job hung.

After a while, RMAN kicked in and after the backups, did some tidying up
of old archived logs thus freeing space in the FRA.

The freed space took the usage in FRA down below the recover file dest
size setting and so the archiver un-hung and the job completed.

Everyone was looking at the SQL for problems, at table row counts, stats
etc etc - it was only when I was looking in the alert log for something
else that I noticed these alerts and on checking the times and dates
saw the connection!

We had an alerting system that monitored the alert log for problems but
it was missing this one somehow. (I don't particularly like or rate our
monitoring system for this and similar reasons that I will not go in to
here!)

Interesting one that one though! Once we increased the "broken" setting,
it just worked - and no more problems since.

Following is the SQL and execution plan.The DB is in 10g.Any
help will be greatly appreciated.I am planning to rebuild the indexes
and then gather new stats just for sake of better performance.

I agree on the stats front, but rebuilding indexes? That almost never
brings any performance benefits. As soon as you build them, Oracle will
start splitting blocks etc as the index is used. Oracle is pretty good
at maintaining indexes.

--
Cheers,
Norm. [TeamT]

Hi Lisa,

On 28/02/12 00:03, Lisa Hu wrote:

Thank you for all tips. I added a hint in to rid of FTS. However, the
cost went to 3 times higher.

I suspect the FTS is part of your main WHERE clause:

WHERE mbr_medicaid_id in
(
SELECT mbr_medicaid_id
FROM prembr_834_lockupdt_KY
);

Getting rid of that FTS might well increase the execution times which
would account for the cost going up.

I would suggest that the best way to find out what is really happening
is to add an alter session set events '10046 trace name context forever'
before the statement and 'alter session set events '10046 trace name
context off' after it.

The trace file generate will show you exactly where the time is being
spent. You can test it manually in a Toad session just by running the
alter, update, alter.

HTH

--
Cheers,
Norm. [TeamT]

I rebuilt all indexes and ran Gathertable_stats. The Update ran 30 minutes with
a similar execution plan which is still longer than normal. Because it’s a
delivered patch process, I can’t run the 10046 on Update process. I did run
the AWR, ADDM, ASH reports on database and found that there were many Commit
processes which used 80% system resources during poor performance period. I
don’t know where these Commits come from.

Thanks,

Lisa

A few count queries to run – if it’s ok for you to provide us with the
counts. Always be safe with regards any non-disclosure type agreement you may
have signed.

SELECT COUNT(*) FROM prembr_834_lockupdt_ky;

SELECT COUNT(DISTINCT mbr_medicaid_id) FROM prembr_834_lockupdt_ky;

SELECT COUNT(*) FROM prembr_834_data_ky;

SELECT COUNT(DISTINCT mbr_medicaid_id) FROM prembr_834_data_ky;

Roger S.

Yeah, the hard part of this is your statement that “I can’t run the 10046 on
Update process”.

Given that, we’re mostly limited to guessing. We can’t trace the actual detailed
resource use.

How similar is the test environment? Do you have the option of getting real data
volumes over there, and then doing a trace or using SQL*Optimizer to test out a
bunch of query rewrites? You might find a hint that would help.

Is it possible to run the process on production when there’s not much else going
on, so you can see if the delay is because of a change in this process or its
data, vs. a change in something else in the environment (“Oh, Joe just added 3
big jobs to this processing window”).

I’m assuming you don’t have a stored outline from the period when things were
running fine? That would be great, but it never happens in real life. Hardly
anybody managed to keep that sort of data; I know we don’t.

Can you maybe take the update statement, create a pure query version, and run
that on production with trace turned on? Might tell you something.

Hard to do much more remotely. Except to say we’ve all been there. Good luck.

– jim

SELECT COUNT(*) FROM prembr_834_lockupdt_ky; – 2979

SELECT COUNT(DISTINCT mbr_medicaid_id) FROM prembr_834_lockupdt_ky; — 2483

SELECT COUNT(*) FROM prembr_834_data_ky; — 2966

SELECT COUNT(DISTINCT mbr_medicaid_id) FROM prembr_834_data_ky; — 2648

Thanks,

Lisa

Prembr_834_lockupdt_ky – 2,979

Distinct mbr id: 2,483

Prembr_834_data_ky – 2,966

Distinct mbr id – 2,648

Ouchies!

With a half hour run time, I was expecting – at least - in the hundreds of
thousands of rows.

Even with a hash join and a full table scan of both, I wouldn’t expect a half
hour for merging a total of 6k rows.

I think I’m going to have to agree with Norm that your issue lies outside this
query… unless – I suppose – you’re running this on a server with very
low resources available.

Perhaps there’s another process that’s locking the rows that your query is
trying to update and so your query is in a waiting state for a large portion of
the time.

If you set timing on and run the following query, how quickly does it come back
with the count?

SELECT COUNT(*)

FROM (SELECT mbr_age, class_value, pu1.mbr_medicaid_id

FROM prembr_834_data_ky pd1, prembr_834_lockupdt_ky pu1

WHERE pu1.mbr_medicaid_id = pd1.mbr_medicaid_id

AND TRIM(pu1.class_value_834) = TRIM(pd1.class_value)

);

This would be a pure query time, but it would process all the rows that the
update query would touch.

Roger S.