SQL:连续行之间的差异

SQL:连续行之间的差异,sql,datetime,hive,window-functions,datediff,Sql,Datetime,Hive,Window Functions,Datediff,表有3列:订单id、成员id、订单日期 需要拉取按天数细分的订单分布,b/w按会员id列出2个连续订单 我所拥有的是: SELECT a1.member_id, count(distinct a1.order_id) as num_orders, a1.order_date, DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order from orders as a1 inner join o

表有3列:订单id、成员id、订单日期

需要拉取按天数细分的订单分布,b/w按会员id列出2个连续订单

我所拥有的是:

SELECT 
  a1.member_id,
  count(distinct a1.order_id) as num_orders, 
  a1.order_date, 
  DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1 
inner join orders as a2 
  on a2.member_id = a1.member_id+1;
这并不能完全帮助我,因为我需要的输出是:

您可以使用lag获取同一客户上一次订单的日期:

select o.*,
    datediff(
        order_date,
        lag(order_date) over(partition by member_id order by order_date, order_id)
    ) days_diff
from orders o
当同一日期有两行时,首先考虑最小的订单号。还请注意,我修复了datediff语法:在Hive中,函数只接受两个日期,没有单位

我只是不明白您想要计算num_订单的逻辑。

您可以使用lag来获取同一客户上一次订单的日期:

select o.*,
    datediff(
        order_date,
        lag(order_date) over(partition by member_id order by order_date, order_id)
    ) days_diff
from orders o
当同一日期有两行时,首先考虑最小的订单号。还请注意,我修复了datediff语法:在Hive中,函数只接受两个日期,没有单位


我只是不明白您想要计算num_顺序的逻辑。

可能是这样的:

SELECT 
  a1.member_id,
  count(distinct a1.order_id) as num_orders, 
  a1.order_date, 
  DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1 
inner join orders as a2 
  on a2.member_id = a1.member_id
where not exists (
      select intermediate_order
      from orders as intermedite_order 
      where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

可能是这样的:

SELECT 
  a1.member_id,
  count(distinct a1.order_id) as num_orders, 
  a1.order_date, 
  DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1 
inner join orders as a2 
  on a2.member_id = a1.member_id
where not exists (
      select intermediate_order
      from orders as intermedite_order 
      where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

计算num_顺序的逻辑是什么?我不明白,对不起,那是什么逻辑?例如,对于客户22222,如何为订单1212和1215获取2,为以下两个订单获取1?这是一个项目类型的订单id。一个人可以点两份相同的东西,比如两杯咖啡。咖啡的订单id是相同的,例如:138但num_orders是2,但这不是强制性的,我想计算的是会员的上次订单列之后的天数,计算num_orders的逻辑是什么?我不明白,对不起,那是什么逻辑?例如,对于客户22222,如何为订单1212和1215获取2,为以下两个订单获取1?这是一个项目类型的订单id。一个人可以点两份相同的东西,比如两杯咖啡。咖啡的订单id是相同的,例如:138,但订单数量是2,但这不是强制性的,我想计算的是会员上次订单列的天数。谢谢分享,我对滞后函数不熟悉,所以尝试自加入。您的查询实际上有所帮助,它是有效的。我想我会在这个查询/表格上计算num_orders谢谢分享,我不熟悉lag函数,所以我尝试了自连接。您的查询实际上有所帮助,它是有效的。我想我会在这个查询/表格上计算num_订单