Sunday, May 30, 2010

Identifying Execution Plan Problems

(original at http://sites.google.com/site/embtdbo/tuner/oracle-tcf-query-and-lios-per-row)
Based on the idea by Wolfgang Breitling that we can identify cost and cardinality errors of Oracle's optimizer by looking at the discrepancies between the ESTIMATED rows and the ACTUAL rows in each line of the execution plan.
I also added in a calculation to see how many IOs we were doing per row returned.
In order to get much out of this script you have to have run the query with the hint

/*+ gather_plan_statistics */
or
set statistics_level=all
or
sql_trace=true (only way on 9i other than an underscore parameter)

col cn format 99
col ratio format 99
col ratio1 format A6
--set pagesize 1000
set linesize 140
break on sql_id on cn
col lio_rw format 999
col "operation" format a60
col a_rows for 999,999,999
col e_rows for 999,999,999
col elapsed for 999,999,999

Def v_sql_id=&SQL_ID

select
-- sql_id,
--hv,
childn cn,
--ptime, stime,
case when stime - nvl(ptime ,0) > 0 then
stime - nvl(ptime ,0)
else 0 end as elapsed,
nvl(trunc((lio-nvl(plio,0))/nullif(a_rows,0)),0) lio_rw,
--id,
--parent_id,
--starts,
--nvl(ratio,0) ratio,
' '||case when ratio > 0 then
rpad('-',ratio,'-')
else
rpad('+',ratio*-1 ,'+')
end as ratio1,
starts*cardinality e_rows,
a_rows,
--nvl(lio,0) lio, nvl(plio,0) parent_lio,
"operation"
from (
SELECT
stats.LAST_ELAPSED_TIME stime,
p.elapsed ptime,
stats.sql_id sql_id
, stats.HASH_VALUE hv
, stats.CHILD_NUMBER childn
, to_char(stats.id,'990')
||decode(stats.access_predicates,null,null,'A')
||decode(stats.filter_predicates,null,null,'F') id
, stats.parent_id
, stats.CARDINALITY cardinality
, LPAD(' ',depth)||stats.OPERATION||' '||
stats.OPTIONS||' '||
stats.OBJECT_NAME||
DECODE(stats.PARTITION_START,NULL,' ',':')||
TRANSLATE(stats.PARTITION_START,'(NRUMBE','(NR')||
DECODE(stats.PARTITION_STOP,NULL,' ','-')||
TRANSLATE(stats.PARTITION_STOP,'(NRUMBE','(NR') "operation",
stats.last_starts starts,
stats.last_output_rows a_rows,
(stats.last_cu_buffer_gets+stats.last_cr_buffer_gets) lio,
p.lio plio,
trunc(log(10,nullif
(stats.last_starts*stats.cardinality/
nullif(stats.last_output_rows,0),0))) ratio
FROM
v$sql_plan_statistics_all stats
, (select sum(last_cu_buffer_gets + last_cr_buffer_gets) lio,
sum(LAST_ELAPSED_TIME) elapsed,
child_number,
parent_id,
sql_id
from v$sql_plan_statistics_all
group by child_number,sql_id, parent_id) p
WHERE
stats.sql_id='&v_sql_id' and
p.sql_id(+) = stats.sql_id and
p.child_number(+) = stats.child_number and
p.parent_id(+)=stats.id
)
order by sql_id, childn , id
/
clear breaks

output like

Enter value for sql_id: g2w9n4gksyys6
old 59: stats.sql_id='&v_sql_id' and
new 59: stats.sql_id='g2w9n4gksyys6' and

ELAPSED LIO_RW RATIO1 E_ROWS A_ROWS operation
---------- ------ ------ ------- ------ ---------------------------------
0 0 1 SELECT STATEMENT
5,720,456 0 1 1 HASH GROUP BY
29,711 0 1,909 NESTED LOOPS
0 0 +++ 1 1,909 NESTED LOOPS
1,969,304 0 +++ 1 1,909 NESTED LOOPS
0 0 +++ 1 2,027 NESTED LOOPS
7,939,649 0 +++ 1 1,656 NESTED LOOPS
716,054 0 +++ 1 1,657 NESTED LOOPS
270,201 0 ++ 39 23,171 HASH JOIN
23 0 5 1 JOIN FILTER CREATE :BF00
31 1 5 1 TABLE ACCESS BY INDEX R
14 2 5 1 INDEX RANGE SCAN PS0PA
141,467 0 18,503 23,171 VIEW VW_SQ_1
3,032,120 0 18,503 23,171 HASH GROUP BY
152,564 0 163,420 33,020 JOIN FILTER USE :BF000
407,746 0 163,420 33,020 MERGE JOIN
55 0 5 1 SORT JOIN
12 2 5 1 INDEX RANGE SCAN PS
79,435 0 40,000 33,020 SORT JOIN
119,852 0 40,000 40,000 INDEX FAST FULL SCA
2,959,031 13 - 23,171 1,657 TABLE ACCESS BY INDEX ROW
944,887 1 23,171 23,174 INDEX RANGE SCAN WB_JOB
102,650 0 1,657 1,656 VIEW PUSHED PREDICATE VW_
73,769 0 1,657 1,657 SORT AGGREGATE
25,617 0 1,657 1,657 FIRST ROW
225,497 1 1,657 1,657 INDEX RANGE SCAN (MIN/M
357,872 0 3,312 2,027 TABLE ACCESS BY INDEX ROWID
3,655,774 1 3,312 2,027 INDEX RANGE SCAN WB_RETROP
199,884 0 2,027 1,909 TABLE ACCESS BY INDEX ROWID
317,793 1 2,027 1,909 INDEX RANGE SCAN PS_RETROPA
71,534 0 1,909 1,909 INDEX RANGE SCAN PS#RETROPAYP
18,396 0 1,909 1,909 TABLE ACCESS BY INDEX ROWID PS

I can see that are 5 lines where the optimizer only expect 1 row and the actual results were over 1000, ie 3 orders of magnitude difference.
These are the three lines with
+++
There is one line with
-
where actual was an order of magnitude smaller. On that same line wee see it's one of the slower lines almost 3 seconds and that the were 13 lio's per row returned, which is sign of inefficiency.

No comments:

Post a Comment