什么是Clustering Factor

发布时间：2021-11-05 10:57:11 来源：亿速云阅读：566 作者：iii 栏目：关系型数据库

这篇文章主要介绍“什么是Clustering Factor”，在日常操作中，相信很多人在什么是Clustering Factor问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”什么是Clustering Factor”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

一、什么是聚簇因子

Clustering Factor（CF）聚簇因子，意味着对索引的表的群集度,用来比较索引的顺序程度与表的杂乱排序程度的一个度量。

集群因子用于判断索引回表需要消耗的物理 I/O 次数。

CF是在假设内存大小只能载入一个块，这是伴随索引扫描所需要的表扫描次数的计算值。

更准确地说，他表示沿着索引的叶块，在ROWID值上代表块编号的第1~15位的值与之前ROWID作出比较后被更替的次数。

如：

有五个块组成的索引和由五个块组成的表，而一个块里有4行记录，那么总行数就是20行。

顺序的扫描索引的同时读取相对应的表，这时可能有两个极端情况：

1、CF最低时

如果一个索引块所包含的ROWID都被一个表块所包含，则通过索引扫描表时，只扫描索引5次和表5次，就可以得到想要的数据。

此时CF=5（表块的扫描次数），CF的最小值与表的块数相同

2、CF最高时

如果一个索引块所包含的ROWID分别被不同的表块所包含，则通过索引扫描表时，

必须扫描20次[4(索引块数)+4(索引块数)+4(索引块数)+4(索引块数)+4(索引块数) = 20次]才能获得想要的数据。

这时CF=20（表块的扫描次数），CF最大值与表的行数相同

二、聚簇因子计算过程

聚簇因子大致的计算方法顺序如下：

1、进行一次索引全扫描

2、检查索引块中的rowid信息。比较与前一个rowid是否指向同一个数据块。若不同，则聚簇因子加1.

3、当完成整个的索引扫面后，即得到该索引的聚簇因子的数值。

计算过程

SELECT *
FROM   (SELECT ID
              ,ROWID
        FROM   TMP_CF
        ORDER  BY ID)
WHERE  ROWNUM <= 5;        
        ID ROWID
---------- ------------------
         2 AAAVqxAAEAAAAfDACL
         3 AAAVqxAAEAAAAfDAAf
         3 AAAVqxAAEAAAAfDAJR
         4 AAAVqxAAEAAAAfDAGZ
         4 AAAVqxAAEAAAAfDAKP

首先我们比较 2、3 对应的 ROWID 是否在同一个数据块，如果在同一个数据块，Clustering Factor +0；

如果不在同一个数据块，那么 Clustering Factor 值加 1。

然后我们比较 3、4 对应的 ROWID 是否在同一个数据块，如果在同一个数据块，Clustering Factor 值不变；

如果不在同一个数据块，那么 Clustering Factor 值加 1。

接下来我们比较4、5对应的ROWID是否在同一个数据块，如果在同一个数据块，Clustering Factor +0；

如果不在同一个数据块，那么 Clustering Factor 值加 1。

像上面步骤一样，一直这样有序地比较下去，直到比较完索引中最后一个键值。

表中的数据块数 <= 聚簇因子的取值范围 <= 表中的记录数

三、聚簇因子的影响

如果集群因子与块数接近，说明表的数据基本上是有序的，而且其顺序基本与索引顺序一样。

这样在进行索引范围或者索引全扫描的时候，回表只需要读取少量的数据块就能完成。

如果集群因子与表记录数接近，说明表的数据和索引顺序差异很大，在进行索引范围扫描或者索引全扫描的时候，回表会读取更多的数据块。

集群因子只会影响索引范围扫描（INDEX RANGE SCAN）以及索引全扫描（INDEX FULL SCAN），因为只有这两种索引扫描方式会有大量数据回表。

集群因子不会影响索引唯一扫描（INDEX UNIQUE SCAN），因为索引唯一扫描只返回一条数据。

集群因子更不会影响索引快速全扫描（INDEX FAST FULL SCAN），因为索引快速全扫描不回表。

集群因子影响的是索引回表的物理 I/O 次数。

我们假设索引范围扫描返回了 1 000 行数据，如果 buffer cache 中没有缓存表的数据块，

假设这 1000行数据都在同一个数据块中，那么回表需要耗费的物理 I/O 就只需要一个；

假设这 1000 行数据都在不同的数据块中，那么回表就需要耗费 1 000 个物理 I/O。

因此，集群因子影响索引回表的物理 I/O 次数。

假设I/O不使用内存，CF越高读取表块的次数就越多，因此物理I/O也会增加。

即，CF越高，通过ROWID读取表块的次数就会越多，db file sequential read也会相应的增加。

通过高速缓冲区读取过一次的块因为不在发生物理I/O，因此CF高未必一定会降低SQL语句的性能。

但是大量扫描CF值高的索引，则需要读取的表块增加，因此可能给性能带来严重问题。

比如当ORACLE计算索引范围扫描（IRS）及其回表时的成本时，

使用如下的公式：

IRS COST=I/O COST + CPU COST

其中 I/O COST = INDEX ACCESS I/O COST + TABLE ACCESS I/O COST

其中:

INDEX ACCESS I/O COST = BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)

TABLE ACCESS I/O COST = CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)

索引的聚簇因子越大

--》从该索引回表的成本越大

--》使用该索引的综合成本越大

--》使用该索引的可能越小

对索引收集统计信息，则索引的聚簇因子会被记录在 DBA_INDEXES的CLUSTING_FACTOR列。

如果CF与表的块数相当就是“好现象”；如果CF和表的行数相当就不是“好现象”。

四、如何优化

如果SQL语句的性能问题被判断位由于聚簇因子导致，如何进行优化？

1、将索引扫描换成全表扫描

2、使用其他索引

3、按照索引的排列顺序重新创建表(最后的选择)

create table new_table as select ... from old_table order by indexd_column

4、某些情况，设置table_cached_blocks参数

5、设置set_index_stats的聚簇因子值clstfct

注意：
当索引范围扫描，索引全扫描不回表或者返回数据量很少的时候，不管集群因子多大，对 SQL 查询性能几乎没有任何影响。
在进行 SQL 优化的时候，往往会建立合适的组合索引消除回表，或者建立组合索引尽量减少回表次数。

五、聚簇因子对执行计划的影响的实验：

1、聚簇因子高对执行计划的影响

2、聚簇因子低对执行计划的影响

3、调整聚簇因子

3.1、根据索引列重新创建表

3.2、设置table_cached_blocks参数

3.3、设置set_index_stats的聚簇因子值clstfct

1、聚簇因子高对执行计划的影响

/* 创建测试数据 */
/* 为了制造出聚簇因子较高的索引，对生成的ID数值取随机数1至1000。*/
SQL> create table tmp_cf as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000;
Table created.
/* 在ID列上创建索引 */
SQL> create index ind_id on tmp_cf(id);
Index created.
/* 收集统计信息 */
SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf');
PL/SQL procedure successfully completed.
/* 开启autotrace查看执行计划 */
SQL> set autot traceonly
/* 执行查询，条件为ID<=6 */
SQL> select * from tmp_cf where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     6 |    36 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     6 |    36 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=6)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        676  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

/* 从执行计划中可见，在只返回6行的情况下，估算出的走索引并回表的成本仍然大于全表扫描。 */

/* 尝试改条件为 id<=5 ，id<=4,id<=3 进一步减少返回的行数。 */

SQL> select * from tmp_cf where id<=5;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     5 |    30 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     5 |    30 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=5)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed
SQL> select * from tmp_cf where id<=4;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     4 |    24 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     4 |    24 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=4)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed
SQL> select * from tmp_cf where id<=3;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     3 |    18 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     3 |    18 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=3)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        652  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed
SQL> select * from tmp_cf where id<=2;
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     2 |    12 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |     2 |    12 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |     2 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=2)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        592  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

/* 可见，当返回的行数为1行时，才选择了用索引并回表的访问路径。*/

2、聚簇因子低对执行计划的影响

/* 创建测试数据 */
/* 创建测试表，但这次ID列的值为顺序产生。 */
SQL> create table tmp_cf_2 as select rownum id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000;
Table created.
/* 在ID列上创建索引 */
SQL> create index ind_id_2 on tmp_cf_2(id);
Index created.
/* 执行查询，条件为ID<=6 */
SQL> select * from tmp_cf_2 where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 1887041669
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     6 |   390 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_2 |     6 |   390 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_2 |     6 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
         23  consistent gets
          1  physical reads
          0  redo size
        986  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

/* 从上面可见，此时已经采用了先索引再回表的访问路径，说明此时其计算出来的成本，是小于全表扫描的。*/

/* 继续执行查询，将条件依次为ID<=200,ID<=222,ID<=223,逐步增加返回的行数 */

SQL> select * from tmp_cf_2 where id<=200;
Execution Plan
----------------------------------------------------------
Plan hash value: 1887041669
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   200 | 13000 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_2 |   200 | 13000 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_2 |   200 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=200)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         48  consistent gets
          0  physical reads
          0  redo size
      14949  bytes sent via SQL*Net to client
        667  bytes received via SQL*Net from client
         15  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        200  rows processed
SQL> select * from tmp_cf_2 where id<=222;
Execution Plan
----------------------------------------------------------
Plan hash value: 1887041669
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   222 | 14430 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_2 |   222 | 14430 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_2 |   222 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=222)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         50  consistent gets
          0  physical reads
          0  redo size
      16455  bytes sent via SQL*Net to client
        678  bytes received via SQL*Net from client
         16  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        222  rows processed
SQL> select * from tmp_cf_2 where id<=223;
Execution Plan
----------------------------------------------------------
Plan hash value: 2467544485
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |   223 | 14495 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF_2 |   223 | 14495 |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=223)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         43  consistent gets
          0  physical reads
          0  redo size
      15679  bytes sent via SQL*Net to client
        678  bytes received via SQL*Net from client
         16  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        223  rows processed

/* 从上可见，当返回223行时，才开始采用全表扫描的方式。*/

/* 查看一下两个索引的聚簇因子 */

SQL>  select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2');
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
IND_ID                                       443
IND_ID_2                                       9

当索引的聚簇因子较高时，会导致该索引回表的成本成大，使得该索引的综合成本越大，因此使用该索引的可能性越低。

但是，如果在执行计划出现错误，使用了该索引，并且该索引的聚簇因子很大，那么通过索引回表的次数就会很大，同时会造成大量的db file sequential read等待事件。

3.1、根据索引列重新创建表

/* 根据tmp_cf的表索引列排完序在创建新表tmp_cf_3 */

SQL> create table tmp_cf_3 as select * from tmp_cf order by id;
Table created.
/* 创建索引 */
SQL> create index ind_id_3 on tmp_cf_3(id);
Index created.
/* 执行查询 id<=6 */
SQL> select * from tmp_cf_3 where id <= 6;
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2546856486
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     6 |    96 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_3 |     6 |    96 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_3 |     6 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
         15  consistent gets
          1  physical reads
          0  redo size
        692  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

/* 从上面可见，此时已经采用了先索引再回表的访问路径，说明此时其计算出来的成本，是小于全表扫描的。*/

再次查看聚簇因子
SQL> select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2','IND_ID_3');
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
IND_ID                                       443
IND_ID_2                                       9
IND_ID_3                                       2

根据索引列排完序创建的表，然后在原索引列创建索引，此时的聚簇因子很小。

3.2、设置table_cached_blocks参数

在设置该参数前，有一种方式可以在创建索引前先计算出该索引的聚簇因子。

/* 创建测试数据 */
SQL> create table t1 as select * from dba_objects;
Table created.
计算聚簇因子（需要在object_name列创建索引）
SELECT SUM(CASE
             WHEN BLOCK#1 = BLOCK#2 AND FILE#1 = FILE#2 THEN
              0
             ELSE
              1
           END) CLUSTERING_FACTOR
FROM   (SELECT DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) FILE#1
              ,LEAD(DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) FILE#2
              ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK#1
              ,LEAD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) BLOCK#2
        FROM   T1
        WHERE  OBJECT_NAME IS NOT NULL);
或者
/*更推荐使用下面*/
SELECT SUM(COUNT_CHG)
FROM   (SELECT CASE
                 WHEN SUBSTRB(ROWID, 1, 15) <>
                      LAG(SUBSTRB(ROWID, 1, 15), 1, '000000000')
                  OVER(ORDER BY OBJECT_NAME, ROWID) THEN
                  1
                 ELSE
                  0
               END COUNT_CHG
        FROM   T1
        WHERE  OBJECT_NAME IS NOT NULL);
SUM(COUNT_CHG)
--------------
         44076

/* 实际创建索引并且查看聚簇因子 */

SQL> create index t1_i1 on t1(object_name);
Index created.
SQL> select  index_name,clustering_factor from user_indexes  where  table_name = 'T1' and index_name = 'T1_I1';
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
T1_I1                                      44076

/* 可以看到计算的聚簇因子的值和实际创建完索引的聚簇因子的值是一样的 */

参数 TABLE_CACHED_BLOCKS：

1、该参数自12C引入(11g也有效)。

2、告诉索引统计信息收集工具，假设索引所在的基表上有多少个数据块已经在内存中。

3、取值范围为1至255，默认值为1。

4、对小表上索引的聚簇因子影响大，而对于大表上索引的聚簇因子影响小。

/* 接此前tmp_cf表的实验 */
/* 查看ID列上索引的聚簇因子 */
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_ID                               1000               443
查看当前的TABLE_CACHED_BLOCKS的值
SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual;
DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1
修改为16
SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF','TABLE_CACHED_BLOCKS',16);
PL/SQL procedure successfully completed.
SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual;
DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16
查看聚簇因子
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_ID                               1000               443
收集索引的统计信息
SQL> exec dbms_stats.gather_index_stats('DEMO','IND_ID');
PL/SQL procedure successfully completed.
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_ID                               1000                 2

/*此时，再执行查询，条件为ID<=6 看此时的表现*/

SQL> set autot traceonly
SQL> select * from tmp_cf where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 3869640664
----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |     6 |   330 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF |     6 |   330 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IND_ID |     6 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Statistics
----------------------------------------------------------
          0  recursive calls
          3  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
        876  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

/* 从执行计划中可见，原来走全表扫描，现在变成走索引范围扫描了。 */

/* 尝试改条件为 id<=100 ，id<=200,id<=300 逐步增大返回的行数。看看何时不走索引了*/

SQL> select * from tmp_cf where id<=100;
110 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   100 |   600 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |   100 |   600 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |   100 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=100)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         59  consistent gets
          0  physical reads
          0  redo size
       3034  bytes sent via SQL*Net to client
        601  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        110  rows processed
SQL> select * from tmp_cf where id<=200;
206 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   200 |  1200 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |   200 |  1200 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |   200 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=200)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        118  consistent gets
          0  physical reads
          0  redo size
       5205  bytes sent via SQL*Net to client
        667  bytes received via SQL*Net from client
         15  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        206  rows processed
SQL> select * from tmp_cf where id<=300;
286 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3869640664
----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |   300 | 16500 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF |   300 | 16500 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IND_ID |   300 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=300)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        271  consistent gets
          0  physical reads
          0  redo size
      21347  bytes sent via SQL*Net to client
        816  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        286  rows processed
SQL> select * from tmp_cf where id<=332;  
336 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   333 |  1998 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |   333 |  1998 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |   333 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=332)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        195  consistent gets
          0  physical reads
          0  redo size
       8307  bytes sent via SQL*Net to client
        766  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        336  rows processed
SQL> select * from tmp_cf where id<=333;
339 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |   334 |  2004 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |   334 |  2004 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=333)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         26  consistent gets
          0  physical reads
          0  redo size
       7592  bytes sent via SQL*Net to client
        766  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        339  rows processed

/* 如上实验，我们可以看到，TABLE_CACHED_BLOCKS对聚簇因子的大小是有影响的。而聚簇因子变小后，CBO会更倾向于使用该索引，而不是全表扫描。*/

参数 TABLE_CACHED_BLOCKS是如何影响聚簇因子的计算过程的：

下面以设置TABLE_CACHED_BLOCKS为3来描述计算过程：

准备一个存储三条记录的数组，每条记录有两列，一列存储块ID，一列存储顺序号。

按顺序遍历索引，并依此顺序为每个索引条目提供一个顺序号。

从当前索引条目中提取块id并在数组中搜索匹配的块ID。

如果你找到匹配的，就用当前的顺序号更新数组的顺序号；

如果没有找到匹配项，那么用当前块ID和顺序号替换数组中顺序号最小的记录，并增加clustering_factor的计数。

/* 创建测试用表 */
/* 为了构造一个行数为10行，但会占用多个数据块的表，所以，我们把其中一个列C1，用长度为2000个字节的字符填充。同时，ID列中的值也是随机的，非顺序的。*/
SQL> create table tmp_cf_4 as select ceil(dbms_random.value(1,10)) id,lpad('x',2000,'x') c1 from dual connect by rownum<=10;
Table created.
/* 在ID列上创建索引 */
SQL> create index ind_tmp_cf_4_id on tmp_cf_4(id) tablespace demo;
Index created.
/* 查看此时该索引的聚簇因子值 */
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_4_ID                        10                 8
/* 设置测试表TMP_CF_4上的TABLE_CACHED_BLOCKS的参数为3后，收集索引上的统计信息，并查看其聚簇因子的值*/
SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF_4','TABLE_CACHED_BLOCKS',3);
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_index_stats('DEMO','IND_TMP_CF_4_ID');
PL/SQL procedure successfully completed.
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_4_ID                        10                 5
/* 此时，我们可以看到其聚簇因子的值由此前的8变小到了5. */
/* 下面，我们来看一下，这个新值是如何计算出来的？ */
/* 按ID列在索引中的顺序显示其值，顺序号以及对应行所在的块号 */
SELECT ROWNUM RN
      ,ID
      ,BLOCK_NUM
FROM   (SELECT ID
              ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK_NUM
        FROM   TMP_CF_4
        ORDER  BY ID
                 ,ROWID);
        RN         ID  BLOCK_NUM
---------- ---------- ----------
         1          2       4859
         2          3       4862
         3          4       4860
         4          5       4861
         5          6       4859
         6          6       4859
         7          6       4861
         8          8       4860
         9          8       4860
        10         10       4861
10 rows selected.
/* 打开一个EXCEL表，模拟一个存储3行记录的数组，并计算聚簇因子的值 */
        RN         ID  BLOCK_NUM     块ID顺序号
---------- ---------- ----------     
         1          2       4859     104861
         2          3       4862     64859
         3          4       4860     94860
         4          5       4861     
         5          6       4859     
         6          6       4859     
         7          6       4861     
         8          8       4860     CF5
         9          8       4860
        10         10       4861

3.3、设置set_index_stats的聚簇因子值clstfct

exec dbms_stats.set_index_stats

('DEMO', #索引的属主

'IND_TAB_CACHED_1_ID', #索引名

clstfct=>100 #聚簇因子值

);

注：设置后，要把该索引所在的基表统计信息加锁

（dbms_stats.lock_table_stats）。以避免下次收集统计时，被新值覆盖。（dbms_stats包无单独针对索引统计信息加锁的方法）

/* 创建测试数据 */
/* 为了制造出聚簇因子较高的索引，对生成的ID数值取随机数1至1000。*/
SQL> create table tmp_cf_5 as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000;
Table created.
/* 在ID列上创建索引 */
SQL> create index ind_tmp_cf_5_id on tmp_cf_5(id);
Index created.
/* 收集统计信息 */
SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf_5');
PL/SQL procedure successfully completed.
/* 开启autotrace查看执行计划 */
SQL> select * from tmp_cf_5 where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 4131497868
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |     7 |    42 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF_5 |     7 |    42 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=6)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed
当前的索引的聚簇因子
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_5_ID                      1000               438
修改当前的聚簇因子值
SQL> exec dbms_stats.set_index_stats('DEMO','IND_TMP_CF_5_ID',clstfct=>10);
PL/SQL procedure successfully completed.
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_5_ID                      1000                10
再次查看执行计划
SQL> select * from tmp_cf_5 where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 4231491781
-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |     7 |    42 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_5        |     7 |    42 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_TMP_CF_5_ID |     7 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

/* 从上面可见，此时已经采用了先索引再回表的访问路径，说明此时其计算出来的成本，是小于全表扫描的。*/

到此，关于“什么是Clustering Factor”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注亿速云网站，小编会继续努力为大家带来更多实用的文章！

向AI问一下细节

什么是Clustering Factor

一、什么是聚簇因子

二、聚簇因子计算过程

三、聚簇因子的影响

四、如何优化

五、聚簇因子对执行计划的影响的实验：

1、聚簇因子高对执行计划的影响

2、聚簇因子低对执行计划的影响

3.1、根据索引列重新创建表

3.2、设置table_cached_blocks参数

参数 TABLE_CACHED_BLOCKS：

参数 TABLE_CACHED_BLOCKS是如何影响聚簇因子的计算过程的：

猜你喜欢

最新资讯

相关推荐

相关标签