|
直方图
当某列数据分布不均衡,为了让CBO能生成最佳的执行计划,我们可能需要对表收集直方图,直方图最大的桶数(Bucket)是254。收集直方图是一个很耗时的过程,如无必要,千万别去收集直方图。Oracle的直方图有两种:一种是频率直方图(FREQUENCY HISTOGRAM),当列中Distinct_keys 较少(小于254),如果不手工指定直方图桶数(BUCKET),Oracle就会自动的创建频率直方图,并且桶数(BUCKET)等于Distinct_Keys。一种是高度平衡直方图(HEIGHT BALANCED),当列中Distinct_keys大于254,如果不手工指定直方图桶数(BUCKET),Oracle就会自动的创建高度平衡直方图。直方图用在什么情况下? 列的值分布非常不均衡的时候,并且where条件中经常用到这个列。直方图都准吗? 不一定。如果一个字段distinct值的个数非常多,基本接近主键的distinct值的个数,就没必要做直方图,直方图也不一定100%准确。
- SQL> drop table a;
- 表已删除。
- SQL> create table a as select * from dba_objects where rownum<=10000;
- 表已创建。
- SQL> @anatab --常规的表分析
- 输入 ownname 的值: ggs
- 输入 tabname 的值: a
- 输入 estimate_percent 的值: 100
- 输入 skewonly_repeat_auto 的值: auto
- 输入 degree 的值: 4
- PL/SQL 过程已成功完成。
- 已用时间: 00: 00: 00.26
- SQL> @getcolstat --字段的直方图
- 输入 owner 的值: ggs
- 输入 table_name 的值: a
- COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS LAST_ANALYZED
- ---------------- ---------- ----------- ----------- --------------------- --------------
- SECONDARY 10000 1 .01 NONE 1 28-7月 -14
- GENERATED 10000 2 .02 NONE 1 28-7月 -14
- TEMPORARY 10000 2 .02 NONE 1 28-7月 -14
- STATUS 10000 1 .01 NONE 1 28-7月 -14
- TIMESTAMP 10000 350 3.5 NONE 1 28-7月 -14
- LAST_DDL_TIME 10000 385 3.85 NONE 1 28-7月 -14
- CREATED 10000 303 3.03 NONE 1 28-7月 -14
- OBJECT_TYPE 10000 34 .34 NONE 1 28-7月 -14
- DATA_OBJECT_ID 10000 1836 18.36 NONE 1 28-7月 -14
- OBJECT_ID 10000 10000 100 NONE 1 28-7月 -14
- SUBOBJECT_NAME 10000 27 .27 NONE 1 28-7月 -14
- OBJECT_NAME 10000 7725 77.25 NONE 1 28-7月 -14
- OWNER 10000 9 .09 NONE 1 28-7月 -14
- 已选择13行。
- SQL>
- SQL> select object_type,count(*) from a group by object_type;
- OBJECT_TYPE COUNT(*)
- ------------------- ----------
- INDEX 946
- JOB CLASS 2
- CONTEXT 2
- TYPE BODY 82
- PROCEDURE 50
- RESOURCE PLAN 3
- RULE 1
- SCHEDULE 1
- TABLE PARTITION 52
- WINDOW 2
- WINDOW GROUP 1
- TABLE 841
- TYPE 1088
- VIEW 2953
- LIBRARY 113
- FUNCTION 68
- TRIGGER 5
- PROGRAM 3
- CLUSTER 10
- SYNONYM 2458
- PACKAGE BODY 470
- QUEUE 21
- CONSUMER GROUP 5
- EVALUATION CONTEXT 8
- RULE SET 11
- DIRECTORY 2
- UNDEFINED 6
- OPERATOR 15
- SEQUENCE 102
- LOB 128
- PACKAGE 485
- JOB 6
- INDEX PARTITION 59
- LOB PARTITION 1
- 已选择34行。
- SQL> explain plan for select count(*) from a where object_type='INDEX';
- 已解释。
- SQL> @getplan
- 'general,outline,starts'
- Enter value for plan type:general
- PLAN_TABLE_OUTPUT
- ---------------------------------------------------------------------------------------------------------------------------------------
- Plan hash value: 2223038180
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 7 | 25 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 7 | | |
- |* 2 | TABLE ACCESS FULL| A | 294 | 2058 | 25 (0)| 00:00:01 | --跟实际不一致,上面查出来的是946
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("OBJECT_TYPE"='INDEX')
- SQL> select 10000/34 from dual; --说明rows中的294是 估算值=总行数/字段distinct值的个数
- 10000/34
- ----------
- 294.117647
- 已选择 1 行。
- SQL>
- SQL> @anatab_col
- 输入 owner 的值: ggs
- 输入 table_name 的值: a
- 输入 columns 的值: object_type --做object_type字段的直方图
- PL/SQL 过程已成功完成。
- SQL> explain plan for select count(*) from a where object_type='INDEX';
- 已解释。
- SQL> @getplan
- 'general,outline,starts'
- Enter value for plan type:general
- PLAN_TABLE_OUTPUT
- ---------------------------------------------------------------------------------------------------------------------------------------
- Plan hash value: 2223038180
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 7 | 25 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 7 | | |
- |* 2 | TABLE ACCESS FULL| A | 946 | 6622 | 25 (0)| 00:00:01 | --这里返回的是真实的行数,做object_type字段的直方图后,执行计划非常准。
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("OBJECT_TYPE"='INDEX')
- SQL>
- SQL>
- SQL>
- SQL>
- SQL> @getcolstat
- 输入 owner 的值: ggs
- 输入 table_name 的值: a
- COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS LAST_ANALYZED
- ---------------- ---------- ----------- ----------- --------- ----------- --------------
- SECONDARY 10000 1 .01 NONE 1 28-7月 -14
- GENERATED 10000 2 .02 NONE 1 28-7月 -14
- TEMPORARY 10000 2 .02 NONE 1 28-7月 -14
- STATUS 10000 1 .01 NONE 1 28-7月 -14
- TIMESTAMP 10000 350 3.5 NONE 1 28-7月 -14
- LAST_DDL_TIME 10000 385 3.85 NONE 1 28-7月 -14
- CREATED 10000 303 3.03 NONE 1 28-7月 -14
- OBJECT_TYPE 10000 34 .34 FREQUENCY 34 28-7月 -14 --刚好等于distinct值
- DATA_OBJECT_ID 10000 1836 18.36 NONE 1 28-7月 -14
- OBJECT_ID 10000 10000 100 NONE 1 28-7月 -14
- SUBOBJECT_NAME 10000 27 .27 NONE 1 28-7月 -14
- OBJECT_NAME 10000 7725 77.25 NONE 1 28-7月 -14
- OWNER 10000 9 .09 NONE 1 28-7月 -14
-
- 已选择13行。
- SQL> select count(distinct object_name) from a; --总共才10000行,可以看出object_name的选择性是比较高的
- COUNT(DISTINCTOBJECT_NAME)
- --------------------------
- 7725
- 已选择 1 行。
- SQL> @anatab_col
- 输入 owner 的值: ggs
- 输入 table_name 的值: a
- 输入 columns 的值: object_name
- PL/SQL 过程已成功完成。
- SQL> @getcolstat
- 输入 owner 的值: ggs
- 输入 table_name 的值: a
- COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS LAST_ANALYZED
- ---------------- ---------- ----------- ----------- --------------- ----------- --------------
- SECONDARY 10000 1 .01 NONE 1 28-7月 -14
- GENERATED 10000 2 .02 NONE 1 28-7月 -14
- TEMPORARY 10000 2 .02 NONE 1 28-7月 -14
- STATUS 10000 1 .01 NONE 1 28-7月 -14
- TIMESTAMP 10000 350 3.5 NONE 1 28-7月 -14
- LAST_DDL_TIME 10000 385 3.85 NONE 1 28-7月 -14
- CREATED 10000 303 3.03 NONE 1 28-7月 -14
- OBJECT_TYPE 10000 34 .34 FREQUENCY 34 28-7月 -14
- DATA_OBJECT_ID 10000 1836 18.36 NONE 1 28-7月 -14
- OBJECT_ID 10000 10000 100 NONE 1 28-7月 -14
- SUBOBJECT_NAME 10000 27 .27 NONE 1 28-7月 -14
- OBJECT_NAME 10000 7725 77.25 HEIGHT BALANCED 75 28-7月 -14
- OWNER 10000 9 .09 NONE 1 28-7月 -14
- 已选择13行。
- SQL> select count(*) from a where object_name like '%A%';
- COUNT(*)
- ----------
- 6404
- 已选择 1 行。
- SQL> explain plan for select count(*) from a where object_name like '%A%';
- 已解释。
- SQL> @getplan
- 'general,outline,starts'
- Enter value for plan type:general
- PLAN_TABLE_OUTPUT
- --------------------------------------------------------------------------------------------------------------
- Plan hash value: 2223038180
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 19 | 25 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 19 | | |
- |* 2 | TABLE ACCESS FULL| A | 500 | 9500 | 25 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("OBJECT_NAME" LIKE '%A%') --LIKE '%A%'对于cbo而言太复杂了,没有真正跑的话,cbo根本不知道真正返回多少行。
- SQL>
- 已选择13行。
- SQL> col OBJECT_NAME for a30
- SQL> select OBJECT_NAME,count(*) from a group by OBJECT_NAME having count(*)>3 order by count(*) desc;
- OBJECT_NAME COUNT(*)
- ------------------------------ ----------
- DBMS_REPCAT_AUTH 5
- 已选择 1 行。
- SQL> explain plan for select count(*) from a where OBJECT_NAME='DBMS_REPCAT_AUTH';
- 已解释。
- SQL> @getplan
- 'general,outline,starts'
- Enter value for plan type:general
- PLAN_TABLE_OUTPUT
- ---------------------------------------------------------------------------------------------------------
- Plan hash value: 2223038180
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 19 | 25 (0)| 00:00:01 |
- | 1 | SORT AGGREGATE | | 1 | 19 | | |
- |* 2 | TABLE ACCESS FULL| A | 1 | 19 | 25 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 2 - filter("OBJECT_NAME"='DBMS_REPCAT_AUTH') --这个不复杂了吧,rows=1,一样不准,直方图也不可能保证100%准确的
- 所以说,并不是所有字段都适合做直方图。distinct值非常多的,根本不适合做直方图,默认的桶数也装不下。
- 只有字段值倾斜非常严重,distinct值少,而且用到的sql中where条件包含了这个字段。如果sql中都没有用到这个字段,那也没必要做直方图,
- 因为做直方图是非常cpu性能的。
- @脚本
- --anatab.sql
- set timing on
- BEGIN
- DBMS_STATS.GATHER_TABLE_STATS(ownname => '&ownname',
- tabname => '&tabname' ,
- estimate_percent => &estimate_percent,
- method_opt => 'for all columns size &skewonly_repeat_auto',
- no_invalidate => FALSE,
- degree => °ree,
- cascade => TRUE);
- END;
- /
- set timing off
- --anatab_col.sql
- BEGIN
- DBMS_STATS.GATHER_TABLE_STATS(ownname => '&owner',
- tabname => '&table_name',
- estimate_percent => 100,
- method_opt => 'for columns &columns ', --such as:col1,col2,col3...
- no_invalidate => FALSE,
- degree => 4,
- granularity => 'ALL',
- cascade => TRUE);
- END;
- /
- --getcolstat.sql
- col COLUMN_NAME for a30
- select a.column_name,
- b.num_rows,
- a.num_distinct Cardinality,
- round(a.num_distinct / b.num_rows * 100, 2) selectivity,
- a.histogram,
- a.num_buckets,
- a.last_analyzed
- from dba_tab_col_statistics a, dba_tables b
- where a.owner = b.owner
- and a.table_name = b.table_name
- and a.owner = upper('&owner')
- and a.table_name = upper('&table_name');
- --getplan.sql
- set feedback off
- pro 'general,outline,starts'
- pro
- acc type prompt 'Enter value for plan type:' default 'general'
- select * from table(dbms_xplan.display) where '&&type'='general';
- select * from table(dbms_xplan.display(null, null,'advanced -projection')) where '&&type'='outline';
- SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS LAST')) where '&&type'='starts';
- set feedback on
- undef type
复制代码
|
|