获取索引使用统计的示例

本节假设您已经运行了步骤 调试慢查询的示例.

Using the PLAYERS table, we can look at index statistics.

大学教师’t forget Analyze! Remember, run: 分析 TABLE PLAYERS to be sure index statistics are generated.
# Don't forget to analyze your table分析 TABLE players;

# Note the DISTINCTCOUNT for our IDX_PLAYERS_FIRSTNAME index - not too good...选择 * FROM system.indexstatistics WHERE tablename = 'PLAYERS';
 OBJECTID  CATALOGID  SCHEMA  TABLENAME        INDEXNAME       KEYCOUNT
 --------- ---------- ------- ---------- --------------------- ---------
     1        3567    HOCKEY   PLAYERS   PLAYERS..PRIMARY_KEY     7520
    23        3567    HOCKEY   PLAYERS   IDX_PLAYERS_FIRSTNAME    7520

SELECT * FROM system.indexprefixstatistics WHERE objectid = 23 and catalogid = 3567;
  OBJECTID  CATALOGID  PREFIXLENGTH  DISTINCTCOUNT  AVERAGEKEYLENGTH    SOURCE
 --------- ---------- ------------- -------------- ----------------- ----------
     23        3567          1             845           5.32        statistics

SELECT * FROM system.indexhistograms where objectid = 23 and catalogid = 3567;
 OBJECTID  CATALOGID  HISTOGRAMID  FIELDCOUNT  MAXRESOLUTION
 --------- ---------- ------------ ----------- --------------
    23        3567         1            1            10

# The selectivity of our Histogram buckets leaves a lot to be desired...选择 * FROM system.indexhistogrambuckets WHERE objectid = 23 AND catalogid = 3567;
 OBJECTID  CATALOGID  HISTOGRAMID  BUCKETINDEX     BOUNDARY
 --------- ---------- ------------ ------------ ---------------
    23        3567         1             0      [0]
    23        3567         1             1      [Dan]
    23        3567         1             2      [Jack]
    23        3567         1             3      [Matt]
    23        3567         1             4      [Mike]
    23        3567         1             5      [Mike]
    23        3567         1             6      [Mike]
    23        3567         1             7      [Mike]
    23        3567         1             8      [Mike]
    23        3567         1             9      [Ray]
    23        3567         1            10      [Ziggy]

Now we see that there are eleven buckets in the histogram for the index IDX_PLAYERS_FIRSTNAME. The 0th bucket contains one element and that is the lowest key in the index, in this case a [0] zero length key. We can easily see that we do not have good selectivity on this index and that most buckets are filled with keys equal to “Mike”.

在大多数情况下,使用系统默认值10为索引将用于存储统计的直方图存储桶的数量将允许成本估计,这足以让优化器选择最有效的路径以检索A的结果集询问。然而,表尺寸的增加和数据值分布变得高度倾斜,只有10个直方图桶可以隐藏数据值的分布中这些尖峰。这可能导致索引统计的最佳成本估算。因此,在索引创建时可以增加指数直方图存储桶的数量。

例如,使用上面的示例,如果我们增加此索引的直方图存储桶的数量,这些成本如何估计。

# Create the index with an increased number of histogram buckets下降索引 idx_players_firstname;
CREATE INDEX idx_players_firstname ON players(firstname) WITH (RESOLUTION 300);

#Analyze the new index to update statistics分析 INDEX idx_players_firstname;

SELECT * FROM system.indexhistogrambuckets WHERE objectid = 24 AND catalogid = 3567;
 OBJECTID  CATALOGID  HISTOGRAMID  BUCKETINDEX     BOUNDARY
 --------- ---------- ------------ ------------ ---------------
    24        3567         1             0      [0]
    24        3567         1             1      [Al]
    24        3567         1             2      [Aleksey]
    24        3567         1             3      [Alexander]
    24        3567         1             4      [Alexsandr]
    24        3567         1             5      [Andre]
...
    24        3567         1            93      [Mikael]
    24        3567         1            94      [Mike]
    24        3567         1            95      [Mike]
...
    24        3567         1           256      [Mike]
    24        3567         1           257      [Mikhail]
    24        3567         1           258      [Morris]
...
    24        3567         1           299      [Wayne]
    24        3567         1           300      [Ziggy]
EXPLAIN (opt_estimates on) SELECT firstname, lastname, playerID FROM players WHERE firstname='Mike';
 Select
  List
    Field HOCKEY.PLAYERS.FIRSTNAME (1)
    Field HOCKEY.PLAYERS.LASTNAME (1)
    Field HOCKEY.PLAYERS.PLAYERID (1)
  Boolean sieve
    Eql
      Field HOCKEY.PLAYERS.FIRSTNAME (1)
      "Mike" (varchar)
    Inversion HOCKEY.PLAYERS  (1)
      Bitmap index IDX_PLAYERS_FIRSTNAME [cost: 12182.4, selectivity: 54.000%, rows: 4060]
        "Mike" (varchar)

现在正在返回的成本估计更加符合实际数据值。在增加的直方图桶中分布数据会降低每个桶中的行的行数,这更好地重复了数据分布中的峰值。

Previously, the optimizer cost estimates for retrieving rows with a FIRSTNAME equal to “Mike” were:

Bitmap index FIRST_IDX [cost: 9024.0, selectivity: 40.000%, rows: 3008]

现在计算出索引的直方图桶数后的新成本估算现在计算为:

Bitmap index FIRST_IDX [cost: 12182.4, selectivity: 54.000%, rows: 4060]

选择性和行数都更接近实际值。有4093行“Mike”在7520行中,总行或总行的54%。但是,收集和存储这些统计数据都有成本。此成本必须超过查询性能中获得的好处。