Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Postgres将列设置为百分位_Python_Postgresql_Psycopg2 - Fatal编程技术网

Python Postgres将列设置为百分位

Python Postgres将列设置为百分位,python,postgresql,psycopg2,Python,Postgresql,Psycopg2,我有一个交易表,希望添加一个百分比列,根据金额列指定当月该交易的百分比 以四分位数而非百分位数为例: 输入示例: id | month | amount 1 | 1 | 1 2 | 1 | 2 3 | 1 | 5 4 | 1 | 3 5 | 2 | 1 6 | 2 | 3 1 | 2 | 5 1 | 2 | 7 1 | 2 | 9 1 | 2 | 11 1

我有一个交易表,希望添加一个百分比列,根据金额列指定当月该交易的百分比

以四分位数而非百分位数为例:

输入示例

id | month | amount
1  |   1   |   1
2  |   1   |   2
3  |   1   |   5
4  |   1   |   3
5  |   2   |   1
6  |   2   |   3
1  |   2   |   5
1  |   2   |   7
1  |   2   |   9
1  |   2   |   11
1  |   2   |   15
1  |   2   |   16
示例输出

id | month | amount |  quartile
1  |   1   |   1    |      25
2  |   1   |   2    |      50
3  |   1   |   5    |      100
4  |   1   |   3    |      75
5  |   2   |   1    |      25
6  |   2   |   3    |      25
1  |   2   |   5    |      50
1  |   2   |   15   |      100
1  |   2   |   9    |      75
1  |   2   |   11   |      75
1  |   2   |   7    |      50
1  |   2   |   16   |      100
目前,我使用postgres的
percentile_cont
函数来确定不同百分位的截止点的数量值,然后遍历并相应地更新百分位列。不幸的是,这种方法太慢了,因为我有很多不同的月份。关于如何更快地完成此操作的任何想法,最好将百分位数的计算和更新合并到一个SQL语句中

我的代码:

num_buckets = 10

for i in range(num_buckets):
    decimal_percentile = (i+1)*(1.0/num_buckets)
    prev_decimal_percentile = i*1.0/num_buckets
    percentile = int(decimal_percentile*100)
    cursor.execute("SELECT month, 
                           percentile_cont(%s) WITHIN GROUP (ORDER BY amount ASC), 
                           percentile_cont(%s) WITHIN GROUP (ORDER BY amount ASC) 
                     FROM transactions GROUP BY month;", 
                     (prev_decimal_percentile, decimal_percentile))
    iter_cursor = connection.cursor()
    for data in cursor:
        iter_cursor.execute("UPDATE transactions SET percentile=%s 
                             WHERE month = %s 
                                   AND amount >= %s AND amount <= %s;", 
                            (percentile, data[0], data[1], data[2]))
num_bucket=10
对于范围内的i(个桶):
小数百分位数=(i+1)*(1.0/num)
上一个十进制百分比=i*1.0/num
百分位数=整数(十进制百分位数*100)
cursor.execute(“选择月份,
组内百分位控制(%s)(按ASC金额排序),
集团内百分位控制(%s)(按金额ASC排序)
按月从交易组中删除;“,
(上一个小数点,小数点)
iter_cursor=connection.cursor()
对于游标中的数据:
iter_cursor.execute(“更新事务集百分比=%s
其中月份=%s

和amount>=%s和amount您可以在单个查询中执行此操作,例如4个存储桶:

update transactions t
set percentile = calc_percentile
from (
    select distinct on (month, amount) 
        id, 
        month, 
        amount, 
        calc_percentile
    from transactions
    join (
        select 
            bucket,
            month as calc_month, 
            percentile_cont(bucket*1.0/4) within group (order by amount asc) as calc_amount,
            bucket*100/4 as calc_percentile
        from transactions 
        cross join generate_series(1, 4) bucket
        group by month, bucket
        ) s on month = calc_month and amount <= calc_amount
    order by month, amount, calc_percentile 
    ) s
where t.month = s.month and t.amount = s.amount;
顺便说一句,
id
应该是主键,然后它可以用于连接以获得更好的性能

select *
from transactions
order by month, amount;

 id | month | amount | percentile 
----+-------+--------+------------
  1 |     1 |      1 |         25
  2 |     1 |      2 |         50
  4 |     1 |      3 |         75
  3 |     1 |      5 |        100
  5 |     2 |      1 |         25
  6 |     2 |      3 |         25
  1 |     2 |      5 |         50
  1 |     2 |      7 |         50
  1 |     2 |      9 |         75
  1 |     2 |     11 |         75
  1 |     2 |     15 |        100
  1 |     2 |     16 |        100
(12 rows)