Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 随时间查询DAU/MAU(每日)_Sql_Postgresql - Fatal编程技术网

Sql 随时间查询DAU/MAU(每日)

Sql 随时间查询DAU/MAU(每日),sql,postgresql,Sql,Postgresql,我有一个包含user_id和date列的daily sessions表。我想以每天为基础绘制DAU/MAU每日活跃用户/每月活跃用户的图表。例如: Date MAU DAU DAU/MAU 2014-06-01 20,000 5,000 20% 2014-06-02 21,000 4,000 19% 2014-06-03 20,050 3,050 17% ... ... ... ... 计

我有一个包含user_id和date列的daily sessions表。我想以每天为基础绘制DAU/MAU每日活跃用户/每月活跃用户的图表。例如:

Date         MAU      DAU     DAU/MAU
2014-06-01   20,000   5,000   20%
2014-06-02   21,000   4,000   19%
2014-06-03   20,050   3,050   17%
...          ...      ...     ...
计算每日活动量很简单,但计算每月活动量(例如30天内登录的用户数)会导致问题。如果没有每天的左连接,这是如何实现的


编辑:我正在使用Postgres。

您没有向我们显示完整的表定义,但可能是这样的:

select date,
       count(*) over (partition by date_trunc('day', date) order by date) as dau,
       count(*) over (partition by date_trunc('month', date) order by date) as mau
from sessions
order by date;
要在不重复窗口函数的情况下获取百分比,只需将其包装在派生表中:

select date, 
       dau,
       mau,
       dau::numeric / (case when mau = 0 then null else mau end) as pct
from (
    select date,
           count(*) over (partition by date_trunc('day', date) order by date) as dau,
           count(*) over (partition by date_trunc('month', date) order by date) as mau
    from sessions
) t
order by date;
以下是一个示例输出:

postgres=> select * from sessions; session_date | user_id --------------+--------- 2014-05-01 | 1 2014-05-01 | 2 2014-05-01 | 3 2014-05-02 | 1 2014-05-02 | 2 2014-05-02 | 3 2014-05-02 | 4 2014-05-02 | 5 2014-06-01 | 1 2014-06-01 | 2 2014-06-01 | 3 2014-06-02 | 1 2014-06-02 | 2 2014-06-02 | 3 2014-06-02 | 4 2014-06-03 | 1 2014-06-03 | 2 2014-06-03 | 3 2014-06-03 | 4 2014-06-03 | 5 (20 rows) postgres=> select session_date, postgres-> dau, postgres-> mau, postgres-> round(dau::numeric / (case when mau = 0 then null else mau end),2) as pct postgres-> from ( postgres(> select session_date, postgres(> count(*) over (partition by date_trunc('day', session_date) order by session_date) as dau, postgres(> count(*) over (partition by date_trunc('month', session_date) order by session_date) as mau postgres(> from sessions postgres(> ) t postgres-> order by session_date; session_date | dau | mau | pct --------------+-----+-----+------ 2014-05-01 | 3 | 3 | 1.00 2014-05-01 | 3 | 3 | 1.00 2014-05-01 | 3 | 3 | 1.00 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-06-01 | 3 | 3 | 1.00 2014-06-01 | 3 | 3 | 1.00 2014-06-01 | 3 | 3 | 1.00 2014-06-02 | 4 | 7 | 0.57 2014-06-02 | 4 | 7 | 0.57 2014-06-02 | 4 | 7 | 0.57 2014-06-02 | 4 | 7 | 0.57 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 (20 rows) postgres=>
假设每天都有值,则可以使用子查询获取总计数,范围介于:

不幸的是,我认为您需要不同的用户,而不仅仅是用户数。这使得问题变得更加困难,特别是因为Postgres不支持countdistinct作为窗口函数

我认为你必须为此做一些自我连接。这里有一种方法:

with dau as (
      select date, count(distinct userid) as dau
      from dailysessions ds
      group by date
     )
select date, dau,
       (select count(distinct user_id)
        from dailysessions ds
        where ds.date between date - 29 * interval '1 day' and date
       ) as mau
from dau;

这一个使用COUNT DISTINCT获得滚动30天DAU/MAU:

计算reddit在BigQuery中的用户参与度——但SQL已经足够标准,可以用于其他数据库

SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
  SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
  FROM (
    SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
    FROM [fh-bigquery:reddit_comments.2015_09]
    WHERE subreddit='AskReddit') a
  JOIN (
    SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
    FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
    CROSS JOIN (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
      FROM [fh-bigquery:reddit_comments.2015_09]
      GROUP BY 1
    ) b
    WHERE subreddit='AskReddit'
    AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
    GROUP BY 1
  ) b
  ON a.day=b.stopday
  GROUP BY 1
)
ORDER BY 1

我在

上进一步讨论了这一点

正如你所注意到的,DAU很简单。可以通过首先创建一个具有布尔值的视图来解决MAU问题,该视图用于用户激活和取消激活的时间,如下所示:

CREATE OR REPLACE VIEW "vw_login" AS 
 SELECT *
    , LEAST (LEAD("date") OVER w, "date" + 30) AS "activeExpiry"
    , CASE WHEN LAG("date") OVER w IS NULL THEN true ELSE false AS "activated"
    , CASE
 WHEN LEAD("date") OVER w IS NULL THEN true
 WHEN LEAD("date") OVER w - "date" > 30 THEN true
 ELSE false
 END AS "churned"
    , CASE
 WHEN LAG("date") OVER w IS NULL THEN false
 WHEN "date" - LAG("date") OVER w <= 30 THEN false
 WHEN row_number() OVER w > 1 THEN true
 ELSE false
 END AS "resurrected"
   FROM "login"
   WINDOW w AS (PARTITION BY "user_id" ORDER BY "date")
最后,通过计算各列的累计和,计算活动MAU的运行总数。您需要参加两次vw_活动,因为第二次活动是在用户进入非活动状态的那一天(即自上次登录后30天)加入的

我包含了一个日期序列,以确保数据集中存在所有的日期。您也可以不使用它,但您可能会在数据集中跳过几天

SELECT
 d."date"
 , SUM(COALESCE(a.activated::int,0)
   - COALESCE(a2.churned::int,0)
   + COALESCE(a.resurrected::int,0)) OVER w
 , d."date", a."activated", a2."churned", a."resurrected" FROM
 generate_series('2010-01-01'::date, CURRENT_DATE, '1 day'::interval) d
 LEFT OUTER JOIN vw_activity a ON d."date" = a."date"
 LEFT OUTER JOIN vw_activity a2 ON d."date" = (a2."date" + INTERVAL '30 days')::date
 WINDOW w AS (ORDER BY d."date") ORDER BY d."date";

当然,您可以在单个查询中执行此操作,但这有助于更好地理解结构。

您使用的数据库是MySQL还是Postgres?看起来很棒-问题:MAU是日历月还是每天的前一个月?理想的情况是那天的前一个月。我已经运行了这个查询,但它不起作用。MAU从月初的0增加到月底的累计用户总数。另外,包装器需要一个按日期、dau、mau分组。@DavidBailey:然后您需要提供更多的细节,特别是表结构和更多的示例数据。不,包装器不需要GROUPBY,因为我使用的是一个窗口函数,它生成一个累积计数。不幸的是,SQLFiddle现在不起作用,因为那时我会提供一个实例。我已经添加了一个psql会话的记录,向您展示了我的示例表。您假设的数据结构非常准确:在实时示例中,每月活跃用户数为5月1日3个,5月2日8个,6月1日3个。现在还不清楚这代表着什么……在5月1日和5月2日的会话表中有三个条目。因此,结果显示DAU在5月1日为3,5月2日为5,但MAU进行累积计数,这意味着在5月2日有8个会话。当我运行此查询多天时,结果集中的MAU列不会更改,理想情况下应该更改,因为每天的MAU应该不同。有没有关于如何解决这个问题的建议?@Patthebug。试着用样本数据和期望的结果问一个新问题。这是一个新问题:@GordonLinoff,像往常一样,非常优雅的解决方案,谢谢!该解决方案的前提忽略了MAU不是一个月内DAU的总和。否则,如果一个月内每天都有相同的用户,那么MAU将是30,而实际上应该是1。
CREATE OR REPLACE VIEW "vw_activity" AS
SELECT 
    SUM("activated"::int) "activated"
  , SUM("churned"::int) "churned"
  , SUM("resurrected"::int) "resurrected"
  , "date"
  FROM "vw_login"
  GROUP BY "date"
  ;
SELECT
 d."date"
 , SUM(COALESCE(a.activated::int,0)
   - COALESCE(a2.churned::int,0)
   + COALESCE(a.resurrected::int,0)) OVER w
 , d."date", a."activated", a2."churned", a."resurrected" FROM
 generate_series('2010-01-01'::date, CURRENT_DATE, '1 day'::interval) d
 LEFT OUTER JOIN vw_activity a ON d."date" = a."date"
 LEFT OUTER JOIN vw_activity a2 ON d."date" = (a2."date" + INTERVAL '30 days')::date
 WINDOW w AS (ORDER BY d."date") ORDER BY d."date";