DBMS_STATS in 9i


As Oracle has evolved through version 7, 8 and 8i, the commands for generating statistics have evolved to allow better collection of statistics for the cost based optimizer, culminating with the DBMS_STATS package from 8.1.5 onwards. However, a shortcoming was that if you wanted to use the "global" routines (that is, collection at a schema or database level), you were committed to use the same granularity to collect the stats. That is, you could ESTIMATE for all segments, or COMPUTE for all segments and not a combination of the two. This is a pain when you want to possibly use COMPUTE for small segments, but only an ESTIMATE for larger segments (or perhaps vary the estimation percentage dependent on segment size)

Most of us thus wrote our own routines to cycle through DBA_SEGMENTS to perform the required work ourselves. Oracle have tried to address this in 9i, with the new DBMS_STATS.AUTO_SAMPLE_SIZE facility. In this mechanism, Oracle will try to determine an appropriate sample size for each segment encountered.

Here is an example:

REM
REM First we create (roughly) a 100m table, the reason for the name THIN is that each row is
REM very thin, the reason for which will became apparent later in this note
REM
create table THIN ( x number(2)) pctfree 0;

REM
REM Stick about 100M of data in there
REM
declare
  byt number;
begin
loop
  insert /*+ APPEND */ 
  into THIN
  select mod(rownum,10)
  from sys.source$;
  commit;
  select bytes into byt from user_segments
  where segment_name = 'THIN';
  exit when byt > 100000000;
end loop;
end;
/

REM
REM And now use the 'auto' sample size option
REM
begin
dbms_stats.GATHER_TABLE_STATS(
 OWNNAME=>'MCDONAC',
 TABNAME=>'THIN',
 ESTIMATE_PERCENT=>dbms_stats.auto_sample_size);
end;
/

If we check USER_TABLES after this operation, we see that Oracle chose a sample size of 0.07. In fact, if we trace the operation, we can see the following work being done:

select  /*+ */ count(*)  from "MCDONAC"."THIN" sample block (.001)
select  /*+ */ count(*)  from "MCDONAC"."THIN" sample block (.1)

followed by the actual analyze work

select  /*+ */  count(*),count("X"),count(distinct "X"),sum(vsize("X")),
                         substrb(dump(min("X"),16,0,32),1,120),
                         substrb(dump(max("X"),16,0,32),1,120) 
            from "MCDONAC"."THIN" sample (.0715773034877667881311816762103071317022) t 

So the first thing you should immediately take heed of is that to determine the sample size, DBMS_STATS will do extra work on your table, which will have some performance overhead.

However, let us now try a "wide" table, populated up to 100M as well

create table wide ( x number(2), p char(1000)) pctfree 50;

REM
REM Stick about 100M of data in there, in fact, there ended up being closer to 200M in this table, since
REM it looped around just under the 100M threshold
REM

declare
  byt number;
begin
loop
  insert /*+ APPEND */ 
  into wide
  select mod(rownum,10),'x'
  from sys.source$;
  commit;
  select bytes into byt from user_segments
  where segment_name = 'WIDE';
  exit when byt > 100000000;
end loop;
end;
/

REM
REM And now use the 'auto' sample size option
REM
begin
dbms_stats.GATHER_TABLE_STATS(
 OWNNAME=>'MCDONAC',
 TABNAME=>'WIDE',
 ESTIMATE_PERCENT=>dbms_stats.auto_sample_size);
end;
/

In this case, even though the table is nearly twice the size, our sample estimate used was 5.5 (8 times the size used for table THIN). Also, the trace file reveals even more preliminary work being done:

 select  /*+ */ count(*)  from "MCDONAC"."WIDE" sample block (.001)
 select  /*+ */ count(*)  from "MCDONAC"."WIDE" sample block (.1)
 select  /*+ */ count(*)  from "MCDONAC"."WIDE" sample block (10)

and then the actual analyze work

select  /*+ */  count(*),count("X"),count(distinct "X"),sum(vsize("X")),
             substrb(dump(min("X"),16,0,32),1,120),substrb(dump(max("X"),16,0,32),1,120),
             count("P"),count(distinct "P"),substrb(dump(min(substrb("P",1,32)),16,0,32),1,120),
              substrb(dump(max(substrb("P",1,32)),16,0,32),1,120) 
          from "MCDONAC"."WIDE" sample (5.50220088035214085634253701480592236895) t 

Thus it would appear that Oracle attempts to determine an approximate count of the rows in the table to assess an appropriate sample size. Whilst this is a reasonable strategy, it probably also needs to take into account the size of the segment (or at least an approximation) as well.

So whilst this feature looks attractive, be warned that