Python 2.7 sqlite窗口函数以获取差异和产品

Python 2.7 sqlite窗口函数以获取差异和产品,python-2.7,sqlite,Python 2.7,Sqlite,我使用一个窗口函数来获取两个日期之间的列(下载)值的差异。我还想得到这个差的乘积乘以文件的大小,得到这个时期下载的字节数 在这个社区的帮助下,我能够获得下载的数量,但找不到正确的语法来获得下载*大小的产品 表“文件” +---------------+------------------------+------+-----------+------------+ | site | full_path | size | downloads | date

我使用一个窗口函数来获取两个日期之间的列(下载)值的差异。我还想得到这个差的乘积乘以文件的大小,得到这个时期下载的字节数

在这个社区的帮助下,我能够获得下载的数量,但找不到正确的语法来获得下载*大小的产品

表“文件”

+---------------+------------------------+------+-----------+------------+
|      site     |       full_path        | size | downloads | date_stamp |
+---------------+------------------------+------+-----------+------------+
| Lawrenceville | lr1/dir1/subdir1/file1 | 1000 |     7     | 2019-08-08 |
| Lawrenceville | lr1/dir1/subdir1/file1 | 1010 |     9     | 2019-08-15 |
| Lawrenceville | lr1/dir1/subdir1/file2 | 1213 |     5     | 2019-08-08 |
| Lawrenceville | lr1/dir1/subdir1/file2 | 2000 |     5     | 2019-08-15 |
| Lawrenceville | lr1/dir2/subdir1/file1 | 2213 |     5     | 2019-08-15 |
| Rennes        | rr1/dir1/subdir1/file3 | 200  |     3     | 2019-08-08 |
| Rennes        | rr1/dir1/subdir1/file3 | 201  |     4     | 2019-08-15 |
+---------------+------------------------+------+-----------+------------+

产生以下结果:

+---------------+-----------+
| site          | downloads |
+---------------+-----------+
| Lawrenceville |     2     |
| Rennes        |     1     |
+---------------+-----------+
我试过:

SELECT site, sum(diff), sum(sum(diff)*bytes) FROM (SELECT site, downloads - lag(downloads, 1), size OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff, bytes FROM files WHERE date_stamp IN ('2019-08-15', '2019-08-08')) group by site

sqlite3.OperationalError: near "(": syntax error

理想情况下,我需要以下输出:

+---------------+-----------+----------+
| site          | downloads | bytes    |
+---------------+-----------+----------+
| Lawrenceville |     2     | 2020     |
| Rennes        |     1     | 201      |
+---------------+-----------+----------+
Lawrenceville下载了2次文件lr1/dir1/subdir1/file1,共1010字节(2019-08-15)。文件lr1/dir1/subdir1/file2在此期间没有下载。最好包括文件lr1/dir1/subdir1/file2和lr1/dir2/subdir1/file1,但它们会被窗口函数排除。我可以通过单独的查询得到它们


Rennes下载了1个文件rr1/dir1/subdir1/file3

如果当前查询有效,则只需在子查询中使用max()窗口函数:

SELECT site, sum(diff) downloads, sum(diff) * size bytes
FROM (
  SELECT 
    site, 
    downloads - lag(downloads, 1) OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff,
    max(size) OVER (PARTITION BY site, full_path) AS size
  FROM files 
  WHERE date_stamp IN ('2019-08-15', '2019-08-08')
) 
group by site
请参阅。
结果:


这对于我提供的数据集非常有效,但当我向数据集添加另一行时会出现故障,因为我没有您的数据。这段代码回答了您发布的问题。是的,对不起,我试图在注释中显示另一行,但未成功。如果添加带有这些值的行,结果会发生变化:
|Rennes | rr1/dir1/file6 | 1 | 5 | 2019-08-08 |
我本来希望这一新行被排除在“窗口”之外,因为没有另一行具有相同的站点/full|路径组合,因此就不会有“lag”(下载,1)行。这似乎排除了应该排除的行,其中site='Lawrenceville“我添加到您的查询中的函数是max()。您在已经拥有并发布的代码中使用了lag()。我假设它正在工作。不是吗?
SELECT site, sum(diff) downloads, sum(diff) * size bytes
FROM (
  SELECT 
    site, 
    downloads - lag(downloads, 1) OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff,
    max(size) OVER (PARTITION BY site, full_path) AS size
  FROM files 
  WHERE date_stamp IN ('2019-08-15', '2019-08-08')
) 
group by site
| site          | downloads | bytes |
| ------------- | --------- | ----- |
| Lawrenceville | 2         | 2020  |
| Rennes        | 1         | 201   |