优化SQL:如何重写此查询以提高性能? (使用子查询,要摆脱GROUP BY吗?)

我正在使用 MySQL 5.7.18-16

我使用的表格:

CREATE TABLE `invoice` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,`transaction_id` bigint(20) unsigned NOT NULL,`transaction_name` varchar(50) NOT NULL,`unit_price` decimal(19,5) DEFAULT NULL,`quantity` decimal(19,`customer_name` varchar(50) DEFAULT NULL,`date` bigint(20) NOT NULL,PRIMARY KEY (`ID`),KEY `i_transaction_id` (`transaction_id`),KEY `i_date` (`date`)
)


CREATE TABLE `transaction` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,....
)


CREATE TABLE `hierarchy` (
  `PRODUCT_ID` int(11) unsigned NOT NULL,`PRODUCT_NAME` varchar(255) NOT NULL,`PRODUCT_FAMILY_ID` int(11) unsigned NOT NULL,`PRODUCT_FAMILY_NAME` varchar(255) NOT NULL,`ORG_ID` int(11) unsigned NOT NULL,`ORG_NAME` varchar(255) NOT NULL
...
)


CREATE TABLE `product` (
  `ID` int(11) unsigned NOT NULL AUTO_INCREMENT,`PRODUCT_NAME` varchar(50) NOT NULL,`COMPONENT_NAME` varchar(50) NOT NULL,...
)

每个发票记录都与交易和客户名称相关,而每笔交易都与产品和组件相关。每个产品都属于一个产品家族,每个产品家族都属于一个组织。

我的要求:

我需要根据指定的发票日期计算不同层次结构级别(组织/产品系列/产品/组件)下每个客户名称的成本和数量,并为每个客户名称按成本排序。

我当前的查询以获取每个组织下每个客户的成本/数量:

SELECT  
    h.org_id,h.org_name,h.product_family_id,h.product_family_name,h.product_id,h.product_name,p.component_id,p.component_name,i.transaction_id,i.customer_name,sum(CASE WHEN i.transaction_name = 'TEST' THEN i.quantity END) AS records,sum(i.unit_price * i.quantity) AS cost
FROM invoice i
    LEFT JOIN transaction t
        ON i.transaction_id = t.transaction_id
    JOIN hierarchy h
        ON t.product_id = h.product_id
    JOIN product p
        ON t.product_id = p.id
    WHERE i.date >= 1514764800000
    AND i.date <= 1543622400000
    GROUP BY h.org_id,i.customer_name
    ORDER by i.cost DESC;

对于其他级别的计算,我只需更改WHERE和GROUP BY:

    //By product_family under one specific org
    WHERE h.org_id = 9
    AND i.date >= 1514764800000
    AND i.date <= 1543622400000
    GROUP BY h.product_family_id,i.customer_name
    ORDER by i.cost DESC;

    //By product under one specific product family
    WHERE h.product_family_id = 2011
    AND i.date >= 1514764800000
    AND i.date <= 1543622400000
    GROUP BY h.product_id,i.customer_name
    ORDER by i.cost DESC;

    //By component under one specific product
    WHERE h.product_id = 101
    AND i.date >= 1514764800000
    AND i.date <= 1543622400000
    GROUP BY p.component_name,i.customer_name
    ORDER by i.cost DESC;

在生产数据库上运行大约3.5秒用于组织级别的计算,这太慢了。一个主要原因是“发票”表无法使用任何索引。 (我为'i.date'创建了索引,但是由于日期范围太大,因此不使用该索引。)

有没有可能重写此查询以优化速度的方法?

tongfang001 回答:优化SQL:如何重写此查询以提高性能? (使用子查询,要摆脱GROUP BY吗?)

我建议您创建一个视图或cte以获取每个客户的产品总数,然后加入层次结构和产品进行过滤并获取其他信息

with customer_products as (
select t.product_id,i.customer_name
    sum(CASE WHEN i.transaction_name = 'TEST' THEN i.quantity END) AS records,sum(i.unit_price * i.quantity) AS cost
FROM invoice i
LEFT JOIN transaction t
ON i.transaction_id = t.transaction_id
WHERE i.date >= 1514764800000
AND i.date <= 1543622400000
group by t.product_id,i.customer_name)
SELECT
h.org_id,h.org_name,h.product_family_id,h.product_family_name,h.product_id,h.product_name,p.component_id,p.component_name,cp.customer_name,cp.records,cp.cost
FROM customer_products cp
JOIN hierarchy h
ON cp.product_id = h.product_id
JOIN product p
ON cp.product_id = p.id

按家庭分组

SELECT
    h.org_id,sum(cp.records) as records,sum(cp.cost) as cost
FROM customer_products cp
JOIN hierarchy h
ON cp.product_id = h.product_id
JOIN product p
ON cp.product_id = p.id
group by h.org_id,cp.customer_name

按组件分组

SELECT
h.org_id,sum(cp.cost) as cost
FROM customer_products cp
JOIN hierarchy h
ON cp.product_id = h.product_id
JOIN product p
ON cp.product_id = p.id
group by     h.org_name,cp.customer_name

或仍使用窗口函数在同一查询中获得所有结果

SELECT
h.org_id,cp.cost,sum(cp.records) over (partition by h.org_id,cp.customer_name) as familyRecord,sum(cp.cost) over (partition by h.org_id,cp.customer_name) as familyCost,sum(cp.records) as (parititon by h.org_name,cp.customer_name) as componentRecord,sum(cp.cost) over (partition by h.org_name,cp.customer_name) as costComponent
FROM customer_products cp
JOIN hierarchy h
ON cp.product_id = h.product_id
JOIN product p
ON cp.product_id = p.id

,
  

使用子查询

通常,这是最糟糕的结果,而不是更好的结果。顾名思义,关系数据库可以很好地处理关系(又名JOIN)。

很可能是索引设置不正确。为了显示它,有一个EXPLAIN命令,只需在查询的开头写下这个词,然后看看优化器必须告诉什么。

https://dev.mysql.com/doc/refman/8.0/en/using-explain.html

然后需要进行一些挖掘才能在架构中设置索引。您也可以将EXPLAIN结果粘贴到您的问题中。

本文链接:https://www.f2er.com/2846507.html

大家都在问