Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/71.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
MySQL通过标识符从连续行的累积结果中获取更改_Mysql_Sql_Subquery_Window Functions_Cumulative Sum - Fatal编程技术网

MySQL通过标识符从连续行的累积结果中获取更改

MySQL通过标识符从连续行的累积结果中获取更改,mysql,sql,subquery,window-functions,cumulative-sum,Mysql,Sql,Subquery,Window Functions,Cumulative Sum,我正在运行MySQL社区服务器版本8.0.19 在处理公开的COVID19数据时,我一直在努力解决以下问题。我使用的数据集既可靠又高质量,但是数据(已确认的总数)是使用累积的总数报告的,而不是每天的感染计数: +----------------+---------------------+-----------------+ | country_region | date | total_confirmed | +----------------+---------

我正在运行MySQL社区服务器版本8.0.19

在处理公开的COVID19数据时,我一直在努力解决以下问题。我使用的数据集既可靠又高质量,但是数据(已确认的总数)是使用累积的总数报告的,而不是每天的感染计数:

+----------------+---------------------+-----------------+
| country_region | date                | total_confirmed |
+----------------+---------------------+-----------------+
| Afghanistan    | 2020-04-05 00:00:00 |             349 |
| Afghanistan    | 2020-04-06 00:00:00 |             367 |
| Afghanistan    | 2020-04-07 00:00:00 |             423 |
| Albania        | 2020-04-05 00:00:00 |             361 |
| Albania        | 2020-04-06 00:00:00 |             377 |
| Albania        | 2020-04-07 00:00:00 |             383 |
| Algeria        | 2020-04-05 00:00:00 |            1320 |
| Algeria        | 2020-04-06 00:00:00 |            1423 |
| Algeria        | 2020-04-07 00:00:00 |            1468 |
+----------------+---------------------+-----------------+
我的要求是有累积计数和每日新病例。有一个很好的解决方案可以做到这一点,如果我只关注一个国家,它就像我的数据集上的一个符咒一样有效(我在本例中只使用了一个填充了阿富汗数据的表):

输出:

+----------------+---------------------+-----------+-----------------+
| country_region | DateCreated         | new_cases | total_confirmed |
+----------------+---------------------+-----------+-----------------+
| Afghanistan    | 2020-04-05 00:00:00 |         0 |             349 |
| Afghanistan    | 2020-04-06 00:00:00 |        18 |             367 |
| Afghanistan    | 2020-04-07 00:00:00 |        56 |             423 |
+----------------+---------------------+-----------+-----------------+
然而,当数据中存在多个国家/地区时,它就完全失败了,我对SQL的了解还不够透彻,无法弄清楚我需要更改什么

+----------------+---------------------+-----------+-----------------+
| country_region | DateCreated         | new_cases | total_confirmed |
+----------------+---------------------+-----------+-----------------+
| Afghanistan    | 2020-04-05 00:00:00 |         0 |             349 |
| Afghanistan    | 2020-04-06 00:00:00 |      -953 |             367 |
| Afghanistan    | 2020-04-07 00:00:00 |     -1000 |             423 |
| Albania        | 2020-04-05 00:00:00 |        12 |             361 |
| Albania        | 2020-04-06 00:00:00 |        10 |             377 |
| Albania        | 2020-04-07 00:00:00 |       -40 |             383 |
| Algeria        | 2020-04-05 00:00:00 |       959 |            1320 |
| Algeria        | 2020-04-06 00:00:00 |      1046 |            1423 |
| Algeria        | 2020-04-07 00:00:00 |      1085 |            1468 |
+----------------+---------------------+-----------+-----------------+
期望输出:

+----------------+---------------------+-----------+-----------------+
| country_region | DateCreated         | new_cases | total_confirmed |
+----------------+---------------------+-----------+-----------------+
| Afghanistan    | 2020-04-05 00:00:00 |         0 |             349 |
| Afghanistan    | 2020-04-06 00:00:00 |        18 |             367 |
| Afghanistan    | 2020-04-07 00:00:00 |        56 |             423 |
| Albania        | 2020-04-05 00:00:00 |         0 |             361 |
| Albania        | 2020-04-06 00:00:00 |        16 |             377 |
| Albania        | 2020-04-07 00:00:00 |         6 |             383 |
| Algeria        | 2020-04-05 00:00:00 |         0 |            1320 |
| Algeria        | 2020-04-06 00:00:00 |       103 |            1423 |
| Algeria        | 2020-04-07 00:00:00 |        45 |            1468 |
+----------------+---------------------+-----------+-----------------+

如蒙协助,将不胜感激。显然,在现实世界的数据集中,2020-04-05的新案例值不会是0,但在这个示例数据集中,这是正确的。

如果您运行的是MySQL 8.0,您可以使用窗口函数
lag()


如果您正在运行MySQL 8.0,则可以使用窗口函数
lag()


您可以使用三个参数形式的
lag()

在MySQL的旧版本中,您可以使用连接,前提是不存在丢失的日期:

select sc.*,
       coalesce(sc.total_confirmed - sc_prev.total_confirmed, 0) as new_cases
from so_confirmed sc left join
     so_confirmed sc_prev
     on sc_prev.country_region = sc.country_region and
        sc_prev.datecreated = sc.datecreated - interval 1 day;

您可以使用三个参数形式的
lag()

在MySQL的旧版本中,您可以使用连接,前提是不存在丢失的日期:

select sc.*,
       coalesce(sc.total_confirmed - sc_prev.total_confirmed, 0) as new_cases
from so_confirmed sc left join
     so_confirmed sc_prev
     on sc_prev.country_region = sc.country_region and
        sc_prev.datecreated = sc.datecreated - interval 1 day;

您正在运行哪个版本的MySQL?我正在运行MySQL社区服务器版本8.0.19。我还将更新原始帖子以反映这一点。您正在运行哪个版本的MySQL?我正在运行MySQL社区服务器版本8.0.19。我也会更新原来的帖子来反映这一点。
select sc.*,
       (total_confirmed -
        lag(total_confirmed, 1, total_confirmed) over (partition by country_region order by date_created)
       ) as new_cases
from so_confirmed sc;
select sc.*,
       coalesce(sc.total_confirmed - sc_prev.total_confirmed, 0) as new_cases
from so_confirmed sc left join
     so_confirmed sc_prev
     on sc_prev.country_region = sc.country_region and
        sc_prev.datecreated = sc.datecreated - interval 1 day;