Google cloud platform 为什么扳手在LIKE中使用下划线执行完整表扫描，而使用%利用索引？_Google Cloud Platform_Google Cloud Spanner

Google cloud platform 为什么扳手在LIKE中使用下划线执行完整表扫描，而使用%利用索引？

google-cloud-platform

Google cloud platform 为什么扳手在LIKE中使用下划线执行完整表扫描，而使用%利用索引？,google-cloud-platform,google-cloud-spanner,Google Cloud Platform,Google Cloud Spanner,在查询中，如果我在主键上使用LIKE“%”，它将使用以下索引执行良好： Operator | Rows returned | Executions | Latency -- | -- | -- | -- Serialize Result 32 1 1.80 ms Sort 32 1 1.78 ms Hash Aggregate 32 1 1.73 ms Distributed union 32 1 1.61 ms Hash Aggregate 32

在查询中，如果我在主键上使用LIKE“%”，它将使用以下索引执行良好：

Operator | Rows returned | Executions | Latency
-- | -- | -- | --
 Serialize Result   32  1   1.80 ms
 Sort   32  1   1.78 ms
 Hash Aggregate 32  1   1.73 ms
 Distributed union  32  1   1.61 ms
 Hash Aggregate 32  1   1.56 ms
 Distributed union  128 1   1.34 ms
 Compute    -   -   -
 FilterScan 128 1   1.33 ms
 Table Scan: <tablename>    128 1   1.30 ms

尽管如此，使用类似的“389;”执行完整的表扫描：

Operator | Rows returned | Executions | Latency
-- | -- | -- | --
Serialize Result | 32 | 1 | 76.27 s
Sort | 32 | 1 | 76.27 s
Hash Aggregate | 32 | 1 | 76.27 s
Distributed union | 32 | 1 | 76.27 s
Hash Aggregate | 32 | 2 | ~72.18 s
Distributed union | 128 | 2 | ~72.18 s
Compute | - | - | -
FilterScan | 128 | 2 | ~72.18 s
Table Scan: <tablename> (full scan: true) | 13802624 | 2 | ~69.97 s

查询如下所示：

SELECT
    'aggregated-quadkey AS quadkey' AS quadkey, day,
    SUM(a_value_1), SUM(a_value_2), AVG(a_value_3), SUM(a_value_4), SUM(a_value_5), AVG(a_value_6), AVG(a_value_6), AVG(a_value_7), SUM(a_value_8), SUM(a_value_9), AVG(a_value_10), SUM(a_value_11), SUM(a_value_12), AVG(a_value_13), AVG(a_value_14), AVG(a_value_15), SUM(a_value_16), SUM(a_value_17), AVG(a_value_18), SUM(a_value_19), SUM(a_value_20), AVG(a_value_21), AVG(a_value_22), AVG(a_value_23)
FROM <tablename>
WHERE quadkey LIKE '03201012212212322_'
GROUP BY quadkey, day ORDER BY day

对于像“xxx%”这样的前缀匹配模式列，查询优化程序会在内部将条件转换为STARTS_with列“xxx”，然后使用索引

因此，原因可能是查询优化器不够聪明，无法转换与模式匹配的精确长度前缀

column LIKE 'xxx_'

进入组合状态：

(STARTS_WITH(column, 'xxx') AND CHAR_LENGTH(column)=4)

类似地，一种模式，如

`column LIKE 'abc%def'`

未在组合条件下进行优化：

`(STARTS_WITH(column,'abc') AND ENDS_WITH(column,'def'))`.

您可以通过使用上述条件在SQL生成中优化查询来解决此问题

这是假设LIKE模式是查询中的字符串值，而不是参数-LIKE使用参数无法优化，因为在查询编译时该模式未知。

谢谢您的报告！我在待办事项列表中添加了此重写。同时，您可以使用Start_WITH和CHAR_LENGTH来解决RedPandaCurios建议的问题。

您可以找到一些信息来升级您的答案。知道你是如何得出答案的可能会很有趣。yongchul在他的个人资料中使用Google Cloud Spaner是的，这是我的假设。我只是想知道这是否只是一个优化器的限制，或者是否存在任何我看不到的内在问题uu比%，更具限制性，所以它的性能应该不会更差。PD：是的，模式是一个值，而不是一个参数。