Python 2.7 sqlite窗口函数以获取差异和产品
我使用一个窗口函数来获取两个日期之间的列(下载)值的差异。我还想得到这个差的乘积乘以文件的大小,得到这个时期下载的字节数 在这个社区的帮助下,我能够获得下载的数量,但找不到正确的语法来获得下载*大小的产品 表“文件”Python 2.7 sqlite窗口函数以获取差异和产品,python-2.7,sqlite,Python 2.7,Sqlite,我使用一个窗口函数来获取两个日期之间的列(下载)值的差异。我还想得到这个差的乘积乘以文件的大小,得到这个时期下载的字节数 在这个社区的帮助下,我能够获得下载的数量,但找不到正确的语法来获得下载*大小的产品 表“文件” +---------------+------------------------+------+-----------+------------+ | site | full_path | size | downloads | date
+---------------+------------------------+------+-----------+------------+
| site | full_path | size | downloads | date_stamp |
+---------------+------------------------+------+-----------+------------+
| Lawrenceville | lr1/dir1/subdir1/file1 | 1000 | 7 | 2019-08-08 |
| Lawrenceville | lr1/dir1/subdir1/file1 | 1010 | 9 | 2019-08-15 |
| Lawrenceville | lr1/dir1/subdir1/file2 | 1213 | 5 | 2019-08-08 |
| Lawrenceville | lr1/dir1/subdir1/file2 | 2000 | 5 | 2019-08-15 |
| Lawrenceville | lr1/dir2/subdir1/file1 | 2213 | 5 | 2019-08-15 |
| Rennes | rr1/dir1/subdir1/file3 | 200 | 3 | 2019-08-08 |
| Rennes | rr1/dir1/subdir1/file3 | 201 | 4 | 2019-08-15 |
+---------------+------------------------+------+-----------+------------+
产生以下结果:
+---------------+-----------+
| site | downloads |
+---------------+-----------+
| Lawrenceville | 2 |
| Rennes | 1 |
+---------------+-----------+
我试过:
SELECT site, sum(diff), sum(sum(diff)*bytes) FROM (SELECT site, downloads - lag(downloads, 1), size OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff, bytes FROM files WHERE date_stamp IN ('2019-08-15', '2019-08-08')) group by site
sqlite3.OperationalError: near "(": syntax error
理想情况下,我需要以下输出:
+---------------+-----------+----------+
| site | downloads | bytes |
+---------------+-----------+----------+
| Lawrenceville | 2 | 2020 |
| Rennes | 1 | 201 |
+---------------+-----------+----------+
Lawrenceville下载了2次文件lr1/dir1/subdir1/file1,共1010字节(2019-08-15)。文件lr1/dir1/subdir1/file2在此期间没有下载。最好包括文件lr1/dir1/subdir1/file2和lr1/dir2/subdir1/file1,但它们会被窗口函数排除。我可以通过单独的查询得到它们
Rennes下载了1个文件rr1/dir1/subdir1/file3如果当前查询有效,则只需在子查询中使用max()窗口函数:
SELECT site, sum(diff) downloads, sum(diff) * size bytes
FROM (
SELECT
site,
downloads - lag(downloads, 1) OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff,
max(size) OVER (PARTITION BY site, full_path) AS size
FROM files
WHERE date_stamp IN ('2019-08-15', '2019-08-08')
)
group by site
请参阅。结果:
这对于我提供的数据集非常有效,但当我向数据集添加另一行时会出现故障,因为我没有您的数据。这段代码回答了您发布的问题。是的,对不起,我试图在注释中显示另一行,但未成功。如果添加带有这些值的行,结果会发生变化:
|Rennes | rr1/dir1/file6 | 1 | 5 | 2019-08-08 |
我本来希望这一新行被排除在“窗口”之外,因为没有另一行具有相同的站点/full|路径组合,因此就不会有“lag”(下载,1)行。这似乎排除了应该排除的行,其中site='Lawrenceville“我添加到您的查询中的函数是max()。您在已经拥有并发布的代码中使用了lag()。我假设它正在工作。不是吗?
SELECT site, sum(diff) downloads, sum(diff) * size bytes
FROM (
SELECT
site,
downloads - lag(downloads, 1) OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff,
max(size) OVER (PARTITION BY site, full_path) AS size
FROM files
WHERE date_stamp IN ('2019-08-15', '2019-08-08')
)
group by site
| site | downloads | bytes |
| ------------- | --------- | ----- |
| Lawrenceville | 2 | 2020 |
| Rennes | 1 | 201 |