Python 使用Peewee ORM从行到列
我有一个Peewee对象,看起来像:Python 使用Peewee ORM从行到列,python,sql,sqlite,peewee,Python,Sql,Sqlite,Peewee,我有一个Peewee对象,看起来像: class Status(peewee.Model): host = peewee.ForeignKeyField( Host, backref='checks', on_delete='CASCADE') check_date = peewee.DateTimeField() status = peewee.TextField() 这将记录在多台主机上运行某些服务检查的结果。每行包含
class Status(peewee.Model):
host = peewee.ForeignKeyField(
Host,
backref='checks',
on_delete='CASCADE')
check_date = peewee.DateTimeField()
status = peewee.TextField()
这将记录在多台主机上运行某些服务检查的结果。每行包含单个主机的单个结果,包括检查日期和状态。生成的表如下所示:
+----+---------+----------------------------+---------+
| id | host_id | check_date | status |
+----+---------+----------------------------+---------+
| 1 | 123 | 2020-02-04 17:52:28.716036 | UP |
| 2 | 321 | 2020-02-04 17:52:28.716036 | REFUSED |
| 3 | 555 | 2020-02-04 17:52:28.716036 | UP |
...
| 50 | 123 | 2020-02-04 21:21:48.319062 | TIMEOUT |
| 51 | 321 | 2020-02-04 21:21:48.319062 | UNKNOWN |
| 52 | 555 | 2020-02-04 21:21:48.319062 | UP |
+----+---------+----------------------------+---------+
我想生成一个摘要视图,如下所示:
+----------------------------+-----+---------+---------+---------+-------+
| check_date | UP | REFUSED | TIMEOUT | UNKNOWN | TOTAL |
+----------------------------+-----+---------+---------+---------+-------+
| 2020-02-04 17:52:28.716036 | 221 | 34 | 10 | 2 | 267 |
| 2020-02-04 21:21:48.319062 | 230 | 30 | 15 | 4 | 279 |
+----------------------------+-----+---------+---------+---------+-------+
select
check_date,
count(*) filter (where status = 'UP') as UP,
count(*) filter (where status = 'REFUSED') as REFUSED,
count(*) filter (where status = 'TIMEOUT') as TIMEOUT,
count(*) filter (where status = 'UNKNOWN') as UNKNOWN,
count(*) as TOTAL
from status
group by check_date
(Status
.select(
fn.date_trunc('day', Status.check_date).alias('date'),
fn.COUNT(Status.id).filter(Status.status == 'UP').alias('up'),
fn.COUNT(Status.id).filter(Status.status == 'REFUSED').alias('refused'))
.group_by(fn.date_trunc('day', Status.check_date)))
我可以在SQL中这样做:
+----------------------------+-----+---------+---------+---------+-------+
| check_date | UP | REFUSED | TIMEOUT | UNKNOWN | TOTAL |
+----------------------------+-----+---------+---------+---------+-------+
| 2020-02-04 17:52:28.716036 | 221 | 34 | 10 | 2 | 267 |
| 2020-02-04 21:21:48.319062 | 230 | 30 | 15 | 4 | 279 |
+----------------------------+-----+---------+---------+---------+-------+
select
check_date,
count(*) filter (where status = 'UP') as UP,
count(*) filter (where status = 'REFUSED') as REFUSED,
count(*) filter (where status = 'TIMEOUT') as TIMEOUT,
count(*) filter (where status = 'UNKNOWN') as UNKNOWN,
count(*) as TOTAL
from status
group by check_date
(Status
.select(
fn.date_trunc('day', Status.check_date).alias('date'),
fn.COUNT(Status.id).filter(Status.status == 'UP').alias('up'),
fn.COUNT(Status.id).filter(Status.status == 'REFUSED').alias('refused'))
.group_by(fn.date_trunc('day', Status.check_date)))
我将如何使用构造一个类似的查询?我知道可以通过peewee.fn
名称空间访问sql函数,但我不确定是否可以使用该语法构造那些过滤器
子查询
现在,我已经解决了这个问题,首先是:
status_summary = (
Status.select(Status.check_date,
Status.status,
peewee.fn.Count(Status.id).alias('count'))
.group_by(Status.check_date, Status.status)
.order_by(Status.check_date, Status.status)
)
这让我感到:
+----------------------------+---------+-------+
| check_date | status | count |
+----------------------------+---------+-------+
| 2020-02-04 17:52:28.716036 | UP | 221 |
| 2020-02-04 17:52:28.716036 | REFUSED | 34 |
| 2020-02-04 17:52:28.716036 | TIMEOUT | 10 |
| 2020-02-04 17:52:28.716036 | UNKNOWN | 34 |
| 2020-02-04 21:21:48.319062 | UP | 230 |
| 2020-02-04 21:21:48.319062 | REFUSED | 30 |
| 2020-02-04 21:21:48.319062 | TIMEOUT | 15 |
| 2020-02-04 21:21:48.319062 | UNKNOWN | 4 |
+----------------------------+---------+-------+
[
{
"date": "2020-02-04 17:52:28.716036",
"summary": {
"OPEN": 538,
"REFUSED": 13,
"TIMEOUT": 41,
"UNKNOWN": 4,
"UNREACHABLE": 2
}
},
{
"date": "2020-02-04 17:55:22.655965",
"summary": {
"OPEN": 533,
"REFUSED": 15,
"TIMEOUT": 42,
"UNKNOWN": 5,
"UNREACHABLE": 3
}
},
{
"date": "2020-02-04 18:51:31.937254",
"summary": {
"OPEN": 541,
"REFUSED": 11,
"TIMEOUT": 41,
"UNKNOWN": 4,
"UNREACHABLE": 1
}
},
{
"date": "2020-02-04 21:21:48.319062",
"summary": {
"OPEN": 544,
"REFUSED": 9,
"TIMEOUT": 39,
"UNKNOWN": 4,
"UNREACHABLE": 2
}
},
{
"date": "2020-02-05 00:11:23.377746",
"summary": {
"OPEN": 547,
"REFUSED": 8,
"TIMEOUT": 37,
"UNKNOWN": 5,
"UNREACHABLE": 1
}
}
]
然后我使用Python中的itertools.groupby
:
status_summary = itertools.groupby(status_summary, lambda x: x.check_date)
status_summary = [
{'date': date, 'summary': {x.status: x.count for x in results}}
for date, results in status_summary
]
这让我感到:
+----------------------------+---------+-------+
| check_date | status | count |
+----------------------------+---------+-------+
| 2020-02-04 17:52:28.716036 | UP | 221 |
| 2020-02-04 17:52:28.716036 | REFUSED | 34 |
| 2020-02-04 17:52:28.716036 | TIMEOUT | 10 |
| 2020-02-04 17:52:28.716036 | UNKNOWN | 34 |
| 2020-02-04 21:21:48.319062 | UP | 230 |
| 2020-02-04 21:21:48.319062 | REFUSED | 30 |
| 2020-02-04 21:21:48.319062 | TIMEOUT | 15 |
| 2020-02-04 21:21:48.319062 | UNKNOWN | 4 |
+----------------------------+---------+-------+
[
{
"date": "2020-02-04 17:52:28.716036",
"summary": {
"OPEN": 538,
"REFUSED": 13,
"TIMEOUT": 41,
"UNKNOWN": 4,
"UNREACHABLE": 2
}
},
{
"date": "2020-02-04 17:55:22.655965",
"summary": {
"OPEN": 533,
"REFUSED": 15,
"TIMEOUT": 42,
"UNKNOWN": 5,
"UNREACHABLE": 3
}
},
{
"date": "2020-02-04 18:51:31.937254",
"summary": {
"OPEN": 541,
"REFUSED": 11,
"TIMEOUT": 41,
"UNKNOWN": 4,
"UNREACHABLE": 1
}
},
{
"date": "2020-02-04 21:21:48.319062",
"summary": {
"OPEN": 544,
"REFUSED": 9,
"TIMEOUT": 39,
"UNKNOWN": 4,
"UNREACHABLE": 2
}
},
{
"date": "2020-02-05 00:11:23.377746",
"summary": {
"OPEN": 547,
"REFUSED": 8,
"TIMEOUT": 37,
"UNKNOWN": 5,
"UNREACHABLE": 1
}
}
]
这实际上是我想要的,但我觉得到达这里的过程是不必要的复杂。我会使用CASE,但COUNT/FILTER应该适用于Postgres
select
check_date,
count(*) filter (where status = 'UP') as UP,
count(*) filter (where status = 'REFUSED') as REFUSED,
count(*) filter (where status = 'TIMEOUT') as TIMEOUT,
count(*) filter (where status = 'UNKNOWN') as UNKNOWN,
count(*) as TOTAL
from status
group by check_date
Peewee应该支持以下内容:
+----------------------------+-----+---------+---------+---------+-------+
| check_date | UP | REFUSED | TIMEOUT | UNKNOWN | TOTAL |
+----------------------------+-----+---------+---------+---------+-------+
| 2020-02-04 17:52:28.716036 | 221 | 34 | 10 | 2 | 267 |
| 2020-02-04 21:21:48.319062 | 230 | 30 | 15 | 4 | 279 |
+----------------------------+-----+---------+---------+---------+-------+
select
check_date,
count(*) filter (where status = 'UP') as UP,
count(*) filter (where status = 'REFUSED') as REFUSED,
count(*) filter (where status = 'TIMEOUT') as TIMEOUT,
count(*) filter (where status = 'UNKNOWN') as UNKNOWN,
count(*) as TOTAL
from status
group by check_date
(Status
.select(
fn.date_trunc('day', Status.check_date).alias('date'),
fn.COUNT(Status.id).filter(Status.status == 'UP').alias('up'),
fn.COUNT(Status.id).filter(Status.status == 'REFUSED').alias('refused'))
.group_by(fn.date_trunc('day', Status.check_date)))
但实际上,您可以依靠“分组方式”为您提供更简单的服务:
(Status
.select(
fn.date_trunc('day', Status.check_date).alias('date'),
Status.status,
fn.COUNT(Status.id).alias('count'))
.group_by(fn.date_trunc('day', Status.check_date), Status.status))
这将为每个日期+每个状态提供一行,但更加灵活(因为您不必硬编码所有状态)。谢谢您的回答!但如果我理解正确,这与我问题中的查询相同(
status\u summary=(status.select(status.check\u date,status.status,peewee.fn.Count(status.id)。别名('Count'))。分组依据(status.check\u date,status.status)。订单依据(status.check\u date,status.status))
,除此之外,它使用date\u trunc
修改日期显示。第一个查询将摘要显示在一行中,第二个查询将为每个摘要显示一行。只需使用第一个查询。