Sql 动态联接数据透视表

Sql 动态联接数据透视表,sql,postgresql,Sql,Postgresql,也许社区可以就PostgresQL9.5这个问题向我提供建议 有一个大的150万行flightlog表,其中有列:action type 4 action type、timestamp和user_id。 用户表有6K行 **flightlog** user_id, time, action 2301 "2016-10-25 14:13:25.74668" "View" 8 "2016-04-25 15:02:13.916204" "Download" 8 "2016-04-2

也许社区可以就PostgresQL9.5这个问题向我提供建议

有一个大的150万行flightlog表,其中有列:action type 4 action type、timestamp和user_id。 用户表有6K行

**flightlog**
user_id, time, action
2301    "2016-10-25 14:13:25.74668" "View"
8   "2016-04-25 15:02:13.916204"    "Download"
8   "2016-04-25 15:01:20.553475"    "Download"
8   "2016-04-25 14:57:02.430493"    "Download"
8   "2016-04-25 14:57:02.160002"    "Download"
8   "2016-04-25 14:57:01.397602"    "Download"
26  "2016-10-25 16:01:25.005285"    "View"
216 "2016-10-24 14:46:16.035242"    "View"
2182    "2016-10-24 14:47:43.713"   "View"
243 "2016-10-24 12:10:12.187181"    "View"
26  "2016-10-24 15:01:26.269981"    "View"
26  "2016-10-24 15:01:28.122361"    "View"

**users**
user_id, email
8 "ndoe@mysite.com"
26  "jdoe@mysite.com"
2301 "kdoe@mysite.com"

**subscriptions**
user_id, expires
8    "2017-08-30 15:48:06.827258"
26   "2017-08-10 00:00:00"
2301 "2017-09-28 09:09:17.56549"
我需要有一个统计表,每个用户每月4次不同操作的计数, 因此,这将是用户,然后每月4个行动,这是重复12次。 这些列的外观如下所示:

user1  period1_action1 period1_action2 period1_action3 period1_action4 period2_action1 etc
为了使其更加复杂,这12个月对于每个用户都应该是动态的,从订阅表10K中的订阅日期算起+12个月

到目前为止,我可以提供一个基于动作的过滤器,并通过cte选择用户名和订阅开始加入它

with counters ( <doing counts using windowing functions>),
     pivot1   ( <pivoting counters using FILTER>
              ...sum(times) filter (where action = 'action1')...
              ),
     recent_subscription (<picking latest subscription for a user>),
     titles   (<using previous cte and adding more info from info table>)

select t.user, t.id, t.subscription_starts, t.expires_at, t.title, email,
                p."action1", p."action2", p."action3 ", p."action4"
      from titles t
      join pivot1 p
...

但现在的挑战是再次围绕这一点,以获得12个周期/4个动作组合。 如果按以下步骤操作,则可能仅为12 使用json_object_aggr作为句点:然后执行4个操作

 --using the piece above as another CTE called merged 
 --this code does not work :(
select 
        email, id, ends, subs, info, 
        json_object_aggr(starts, s1,v1,p1,d1 ORDER BY starts) as P1,
        json_object_aggr(starts, s2,v2,p2,d2 ORDER BY starts) as P2,
        json_object_aggr(starts, s3,v3,p3,d3 ORDER BY starts) as P3,
        json_object_aggr(starts, s4,v4,p4,d4 ORDER BY starts) as P4,
        json_object_aggr(starts, s5,v5,p5,d5 ORDER BY starts) as P5,
        json_object_aggr(starts, s6,v6,p6,d6 ORDER BY starts) as P6,
        json_object_aggr(starts, s7,v7,p7,d7 ORDER BY starts) as P7,
        json_object_aggr(starts, s8,v8,p8,d8 ORDER BY starts) as P8,
        json_object_aggr(starts, s9,v9,p9,d9 ORDER BY starts) as P9,
        json_object_aggr(starts, s10,v10,p10,d10 ORDER BY starts) as P10,
        json_object_aggr(starts, s11,v11,p11,d11 ORDER BY starts) as P11,
        json_object_aggr(starts, s12,v12,p12,d12 ORDER BY starts) as P12
    from 
            (select email, id,  starts, ends, subs, info, starts, 
  sum("action1") as s1,sum("action2") as v1,sum("action3") as 
  p1,sum("action4") 
  as d1 
              from merged
              group by email, id,  starts, ends, subs, info, starts

            ) m
    group by email, id,  starts, ends, subs, info
    order by email, id,  starts, ends, subs, info 
这是否可以是json_object_agg,每个周期执行4个操作? 我能得到关于如何旋转这个的帮助吗


谢谢。

有什么原因不能使用一系列大小写条件,结果是1或0,然后求和?这将使这个过程更加简单

WITH subs AS (
      SELECT s.user_id, u.email, MAX(s.sub_date) AS recent_sub_date 
      FROM subscriptions s 
      JOIN users u ON s.userid = u.user_id
      GROUP BY s.user_id, u.email
)
SELECT s.user_id,
       SUM(CASE WHEN f.action = 'action1' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action1,
       SUM(CASE WHEN f.action = 'action2' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action2,
       SUM(CASE WHEN f.action = 'action3' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action3,
       SUM(CASE WHEN f.action = 'action4' AND f.time <= s.recent_sub_date + INTERVAL '1 month' THEN 1 ELSE 0 END) AS period1_action4,
       SUM(CASE WHEN f.action = 'action1' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action1,
       SUM(CASE WHEN f.action = 'action2' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action2,
       SUM(CASE WHEN f.action = 'action3' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action3,
       SUM(CASE WHEN f.action = 'action4' AND f.time <= s.recent_sub_date + INTERVAL '2 months' THEN 1 ELSE 0 END) AS period2_action4,
       ...
FROM flightlog f
JOIN subs s ON s.user_id = f.user_id 
WHERE f.time > s.recent_sub_date
AND f.time <= DATE_TRUNC('month', s.recent_sub_date + INTERVAL '13 months') -- end of the 12 months after sub
GROUP BY s.user_id;

注意:如果您的日期没有索引,无论您如何编写查询,这可能都会非常缓慢。

从上面的工作查询开始,该查询以以下方式生成结果:

user1 ... 1st_period_4user1 action1 action2 action3 action4
user1 ... 2nd_period_4user1 action1 action2 action3 action4
我将最后5列连接到句点+4操作字符串中。 然后我对这个表进行了排名,这样会有1到12个排名。 然后使用连接的列作为值,列作为列,我再次透视。。。 这样做12次:

array_agg(stf) FILTER (where rnk = 1) AS period_1
array_agg(stf) FILTER (where rnk = 2) AS period_2
获取具有12个数据列的用户id。 这种方法的缺点是,每月跳过一次,它仍然会被列为下一个期间

然后,我加入其他信息/组织表,为用户id添加更多信息

这需要67行代码和4个CTE。。。 仍然希望有一个更完善的解决方案


此答案仅供参考。

我发现没有此功能,交叉表更易于使用。只需使用条件聚合计数。。过滤器…@一匹没有名字的马谢谢,用过滤器重做。。。现在,我需要以某种方式再次关注这一点……我会再次检查并回复您。我喜欢您的方法,但由于某些原因,我无法使其适用于我,因为时间比较可能会产生错误的结果。这就是我所拥有的。这是根据脚本为特定用户计算的内容。考虑到开始阶段,我希望根据interval+month中的设置,在第1和第3阶段或第2和第4阶段出现视图。但是根据你的剧本我有1,2,3等等。。。从逻辑上讲,这里有些问题……我可以通过添加和f.time>subdate,间隔为f.time加上1个月的差值,来让你的方法起作用。好吧,看来你是唯一一个在这里提供帮助的人。谢谢,我接受你的回答,因为我可以用这种方法来解决这个问题。
array_agg(stf) FILTER (where rnk = 1) AS period_1
array_agg(stf) FILTER (where rnk = 2) AS period_2