The magic LIKE optimization


It appears that there's a special optimization for predicates of the form:

col like 'pq%'

where the number of leading characters is 2. Consider the following example (8174 and 9204 show the same behaviour).

SQL> create table o3 as
  2  select chr(trunc(rownum/1000)+65)||to_char(rownum,'fm000000000000000') x,
  3      'pppppppppppppppppp' padding
  4  from all_Objects
  5  where rownum <= 26000
  6  order by dbms_random.value;

Table created.

So we've got a table with 1000 rows for each letter of the alphabet,and we index on col X

SQL> create index o3x on o3 ( x );

Index created.

SQL> analyze table o3 compute statistics;

Table analyzed.

Test 1: Single leading char
---------------------------
SQL> set autotrace traceonly explain
SQL> select /*+ INDEX(o3) */ * from o3 where x like 'J%';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=300 Card=1002 Bytes=34068)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'O3' (Cost=300 Card=1002 Bytes=34068)
   2    1     INDEX (RANGE SCAN) OF 'O3X' (NON-UNIQUE) (Cost=5 Card=1002)

Test 2: Two leading chars
---------------------------
SQL> select /*+ INDEX(o3) */ * from o3 where x like 'J0%';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=5 Bytes=170)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'O3' (Cost=3 Card=5  Bytes=170)
   2    1     INDEX (RANGE SCAN) OF 'O3X' (NON-UNIQUE) (Cost=2 Card=5)

Wow! Look at the "brilliant" cost

Test 3: Three leading chars
--------------------------
SQL> select /*+ INDEX(o3) */ * from o3 where x like 'J00%';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=326 Card=1085 Bytes=36890)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'O3' (Cost=326 Card=1085 Bytes=36890)
   2    1     INDEX (RANGE SCAN) OF 'O3X' (NON-UNIQUE) (Cost=6 Card=1085)

Test 4: Four leading chars
--------------------------
SQL> select /*+ INDEX(o3) */ * from o3 where x like 'J000%';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=173 Card=575 Bytes=19550)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'O3' (Cost=173 Card=575 Bytes=19550)
   2    1     INDEX (RANGE SCAN) OF 'O3X' (NON-UNIQUE) (Cost=4 Card=575)


Test 5: Five leading chars
--------------------------
SQL> select /*+ INDEX(o3) */ * from o3 where x like 'J0000%';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=92 Card=305 Bytes=10370)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'O3' (Cost=92 Card=305 Bytes=10370)
   2    1     INDEX (RANGE SCAN) OF 'O3X' (NON-UNIQUE) (Cost=3 Card=305)

Test 6: Six leading chars
--------------------------
SQL> select /*+ INDEX(o3) */ * from o3 where x like 'J00000%';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=162 Bytes=5508)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'O3' (Cost=49 Card=162 Bytes=5508)
   2    1     INDEX (RANGE SCAN) OF 'O3X' (NON-UNIQUE) (Cost=2 Card=162)

With all the tests, as you increase the number of leading chars, the index gets cheaper which seems reasonable. The exception is the "magical" 2-leading char case which always comes ultra-cheap. A 10053 trace showed the computed cardinality to about spot-on for a single leading char (ie cmptcd=1000), but this drops to 5 for the 2-char case.

A friend has hypothesized that this may be linked to the fact that for years Oracle Apps would do case insensitive searching by generating predicates such as:

   col1 like 'AB%' 
or col1 like 'aB%' 
or col1 like 'Ab%' 
or col1 like 'ab%' 
and upper (col1) like upper('AbcdE%')'

in the times before function-based indexes became available. This allowed for the use of four index probes rather than a scan to do case insensitve searching.