DB Optimizer: Scalar Subqueries

Monday, February 8, 2010

Scalar Subqueries

Better formatted at http://tinyurl.com/yfrjbwx

Query 2
The VST diagram looks like

There are two interesting things about this diagram.

Every thing is OUTER JOINED to F_OUTER
There are correlated subqueries in the select

There are two things to check - what is the effect of the OUTER JOINS. The OUTER JOINS can easily mean that all joins into F_OUTER don't change the result set size. Let's confirm by looking at the TTJ sizes:

The only thing that bounds the number of rows returned by F_OUTER is it's self join (the line going in and out of F_OUTER) on the bottom left of F_OUTER.
What this means is that it doesn't matter what order we join tables into F_OUTER.
Now we can turn to the other interesting thing about the query. There are 4 correlated subselects in the select clause. These queries in the select clause are called "scalar subqueries." These scalar subqueries can be a bad idea or a good idea depending on how mainly on how many distinct values are used as input into the them. ~~AFAIK, Oracle doesn't merge subqueries in the select, and certainly not the ones in this query because they are embedded in cases statements~~ - looks like I might have spoken too soon! more reseach to do but looks like Oracle can merge these scalar subqueries even with the case statement. I will try to run some more tests . To be continued)
In the worst case scenario the scalar subqueries are going to be run 845,012 times - that is one heck of a lot of work which would be a BAD IDEA.

Looking at the above diagram, and the 4 scalar subqueries in the select clause, the top red number is how many times the scalar subquery will be run (based on the case statement in the select clause) and the orange highlight is how many distinct values will be used in the scalar subquery where clause. P3 will benefit for scalar subquery caching but F won't because there are too many distinct values. On the other hand for P1 and P2 could if there are no collisions (see CBOF p216-217) and the scalar subquery caching is actually supports 1024 values. ( see http://www.oratechinfo.co.uk/scalar_subqueries.html#scalar3 for a great write up on analysis of scalar subquery caching - the analysis on this page seems to show that caching maxes out way before 1024 but that might be because of collisions)

The subqueries in the select clause look like

select CASE WHEN F.f1 IS NULL

THEN NULL

ELSE (SELECT X.f2

FROM X

WHERE code_vl = F.f1)

END AS f0

from F;

and could be merged into the query like:

select CASE WHEN F.f1 IS NULL

THEN NULL

ELSE ( X.f2)

END AS f0

from F , X

where code_vl(+) = F.f1;

( NOTE: the first query will break if the correlated sub query returns more than one value where as the second query will return the mulitple rows.)

The VST diagram can't tell you that this is the solution, but they can point out that the place to spend your time is the subqueries in the select.

2 comments:

SydOracleFebruary 8, 2010 at 5:08 PM
"we have 4 queries in th select clause that are going to be run 845,012 times"
Not necessarily true as the select is embedded in a CASE and will only be run if the relevant value is not null. Also they can benefit from scalar subqeury caching. http://www.oratechinfo.co.uk/scalar_subqueries.html#scalar3
Finally, the first of those four queries selects an aggregate which is trickier to accommodate in the main query.
ReplyDelete
Replies
Kyle HaileyFebruary 9, 2010 at 10:01 PM
Hi Gary,
Good points - I added a 3rd diagram showing the number of times the scalar subqueries are run and the number of distinct values used in the scalar subqueries.
For the case of 12 distinct values, node P3, the performance is better with the case statement.
I will try and put up execution stats soon if I have time.
Thanks for the link - great analysis and nice to have the table of of ranges for the scalar subquery caching.
Jonathan Lewis also gives an interesting example using integer fields in CBOF p216-217 where the the cached values can have collisions (not just the size of the cache) and thus ruin the effect of the cache. Lot to keep track of. In the case above the fields are varchar2
ReplyDelete
Replies

Add comment

Kyle Hailey

I'm Kyle Hailey and I'm the designer and product manager behind Embarcadero's

DB Optimzer

Before working at Embarcadero I redesigned the performance pages in Oracles OEM 10g (see a before and after example). I am on the patent from the team for our tuning methodology (Patents: 20060059205) . I freely provide graphical monitoring tools such as ASHMON (TCL/TK) as well as scripts to profile database load such as S-ASH ie Simulated ASH (PLSQL). S-ASH, ASHMON and more are available at my old site http://ashmasters.com/ . I have made available C code to directly read profiling data out of Oracle's memory structures, called Direct Memory Access. I speak at conferences and give classes around the world.This blog has some of my new ideas on Visual SQL Tuning (VST), work on DB Optimizer, possibly some past material on DB performance monitoring as well as lots of references and pointers to other information on the web that I find important and worth emphasizing.

DB Optimizer

Monday, February 8, 2010

Scalar Subqueries

2 comments:

The Oaktable

Oracle "deep dive" search by T. Pystynen

My Blog List

Facebook

Kyle Hailey