Subtle stats collection


Just the smallest changes in how you work can have a significant impact on how your optimizer plans work out...

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Prod
PL/SQL Release 10.2.0.2.0 - Production
CORE    10.2.0.2.0      Production
TNS for 32-bit Windows: Version 10.2.0.2.0 - Production
NLSRTL Version 10.2.0.2.0 - Production

SQL> drop table T;

Table dropped.

SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

50033 rows created.

SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

50033 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create index TX on T ( upper(x) ) ;

Index created.

SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------
Plan hash value: 22936165

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |  1014 |   101K|   189   (0)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |  1014 |   101K|   189   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN          | TX   |   405 |       |   182   (0)| 00:00:03 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(UPPER("X")=:B1)

14 rows selected.

Now we change from "gather table stats, create index, gather index stats" to "create index, gather table stats, gather index stats" and watch what happens


SQL>
SQL> drop table T;

Table dropped.

SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

50035 rows created.

SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

50035 rows created.

SQL> commit;

Commit complete.

SQL> create index TX on T ( upper(x) ) ;

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 98835 |  9941K|   337   (3)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| T    | 98835 |  9941K|   337   (3)| 00:00:05 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(UPPER("X")=:B1)

13 rows selected.

SQL>

The secret is when hidden column statistics are calculated. In the first example above, the table column statistics look like this:

COLUMN_NAME                    NUM_DISTINCT LOW_VALUE    HIGH_VALUE      DENSITY
------------------------------ ------------ ------------ ------------ ----------
X                                         2 58           78                   .5
Y                                         1 7A2020202020 7A2020202020          1
SYS_NC00003$

so the all important stats on the hidden column are missing. Whereas in the second example, the table column statistics look like this:

COLUMN_NAME                    NUM_DISTINCT LOW_VALUE    HIGH_VALUE      DENSITY
------------------------------ ------------ ------------ ------------ ----------
X                                         2 58           78                   .5
Y                                         1 7A2020202020 7A2020202020          1
SYS_NC00003$                              1 58           58                    1