Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql Postgres:使用左联接计算行数_Sql_Postgresql - Fatal编程技术网

Sql Postgres:使用左联接计算行数

Sql Postgres:使用左联接计算行数,sql,postgresql,Sql,Postgresql,我正在尝试使用Postgres进行一些分析,我有两个表,分别是:predictionstate和pageviews 预测状态表: 此表包含具有我们的算法结果的列,使用以下结构: id({company\u identifier}:{user\u identifier}) 模型(参考字符串值) 预测(浮点数介于0.0和1.0之间) 页面浏览量表: 此表包含使用以下结构的用户信息: 公司识别码 用户标识符 页面视图\当前\ url\类型 问题 WITH ranges AS ( SELEC

我正在尝试使用Postgres进行一些分析,我有两个表,分别是:
predictionstate
pageviews

预测状态表:

此表包含具有我们的算法结果的列,使用以下结构:

  • id(
    {company\u identifier}:{user\u identifier}
  • 模型(参考字符串值)
  • 预测(浮点数介于0.0和1.0之间)
页面浏览量
表:

此表包含使用以下结构的用户信息:

  • 公司识别码
  • 用户标识符
  • 页面视图\当前\ url\类型
问题

WITH ranges AS (
  SELECT
    myrange::text || '-' || (myrange + 0.1)::text AS segment,
    myrange as r_min, myrange + 0.1 as r_max
  FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
  SPLIT_PART(p.id, ':', 1) as company_identifier,
  p.model,
  r.segment,
  COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users",
  COUNT(b.*) as "converted_users"
FROM
  ranges r
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
INNER JOIN (
  SELECT users.company_identifier, COUNT(users.user_identifier) AS n
  FROM pageviews
  INNER JOIN (
    SELECT SPLIT_PART(ps.id, ':', 2) AS user_identifier,
           SPLIT_PART(ps.id, ':', 1) AS company_identifier
    FROM predictionstate ps
    WHERE provider_id=47 AND
          prediction > 0.7
   ) users ON (
      pageviews.user_identifier=users.user_identifier AND
      pageviews.company_identifier=users.company_identifier
    )
  WHERE pageview_current_url_type='BUYSUCCESS'
  GROUP BY users.company_identifier
) AS b
ON (
  b.company_identifier = company_identifier
)
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
我试图根据我们最好的模型获取数据,分析它的准确性,基本上我需要知道如何创建细分,并计算我有多少成员。以下代码执行以下操作:

WITH ranges AS (
  SELECT
    myrange::text || '-' || (myrange + 0.1)::text AS segment,
    myrange as r_min, myrange + 0.1 as r_max
  FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
  SPLIT_PART(p.id, ':', 1) as company_identifier,
  p.model,
  r.segment,
  COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users"
FROM
  ranges r
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
但我的问题是,因为我不知道具体怎么做,所以对于每个(公司、型号、细分市场),我都需要获得准确度的数据,查询
pageviews
表并识别
pageview\u current\u url\u type==“buysecess”

我试过了,但没有成功:

WITH ranges AS (
  SELECT
    myrange::text || '-' || (myrange + 0.1)::text AS segment,
    myrange as r_min, myrange + 0.1 as r_max
  FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
  SPLIT_PART(p.id, ':', 1) as company_identifier,
  p.model,
  r.segment,
  COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users",
  b.n as "converted_users"
FROM
  ranges r,
  (
    SELECT COUNT(DISTINCT(pvs.user_identifier)) as n
    FROM pageviews pvs
    INNER JOIN (
        SELECT
            SPLIT_PART(id, ':', 1) as company_identifier,
            SPLIT_PART(id, ':', 2) as user_identifier
        FROM predictionstate ps
        WHERE prediction BETWEEN r.r_min AND r.r_max ) users
        ON (
            pvs.user_identifier = users.user_identifier AND
            pvs.company_identifier= users.company_identifier) 
        WHERE pageview_current_url_type = 'BUYSUCCESS'

  ) b
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
TL;DR:我需要根据主查询用户统计一个连接

编辑:

我添加了一个SQL小提琴

我想知道的是,对于那些
segment\u用户
,他们中有多少人拥有
pageview\u current\u url\u type='buysecsuccess'
,在结果中再添加一列:
segmented\u really\u buy

编辑2:再次尝试无效(错误:列“p.id”必须出现在GROUP BY子句中或用于聚合函数)

编辑3:添加所需的输出

使用以下代码生成:


没有样本输出很难判断您需要什么,但我认为您需要的是:

WITH ranges AS (
  SELECT
    myrange::text || '-' || (myrange + 0.1)::text AS segment,
    myrange as r_min, myrange + 0.1 as r_max
  FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
  p.company_identifier,
  p.model,
  r.segment,
  COUNT(DISTINCT(p.user_identifier)) as "segment_users",
  COUNT(CASE WHEN pv.pageview_current_url_type = 'BUYSUCCESS' THEN 1 END) AS segmented_really_bought
FROM
  ranges r
INNER JOIN (
  SELECT
    SPLIT_PART(id, ':', 1) as company_identifier,
    SPLIT_PART(id, ':', 2) as user_identifier,
    model,
    prediction
  FROM
    predictionstate
  ) p ON p.prediction BETWEEN r.r_min AND r.r_max
LEFT JOIN pageviews pv ON 
  p.company_identifier = pv.company_identifier
  AND p.user_identifier = pv.user_identifier
GROUP BY p.company_identifier, p.model, r.segment
ORDER BY p.company_identifier, p.model, r.segment;
对fiddle查询的更改:

  • predictionstate
    替换为我们加入的子查询,在子查询中,我们执行
    split\u部分
    逻辑,以将comapny和用户标识符作为单独的列
  • 使用这些标识符将
    左连接到
    页面视图
  • 添加了带有大小写的
    COUNT的
    segmented\u really\u buy

答:我真的建议您的id列应该分为两列,以便更好地处理。这将节省您拆分字符串(编写查询并执行查询)的大量时间,而且更具可读性。这就是我添加第二个CTE的原因

B:
COUNT(DISTINCT)
统计组中的不同用户

C:对所有用户进行计数(不区分),但在计数前过滤掉预期状态



我想知道:如果一个预测正好在一个阈值上,例如
0.3
,会怎么样。使用
BETWEEN
子句,该范围将在
0.2-0.3
范围和
0.3-0.4
范围内加入(因为
BETWEEN
等于
r\u min>=x>=r\u max
)。最好将范围定义为
r\u min>=x>r\u max
r\u min>x>=r\u max
。我创建了您在示例中提到的连接,但我更愿意更改它。我还是不知道往哪个方向走。为什么您的ID是串联字符串?如果您有两列作为主键,那么在代码中会容易得多。2.这似乎很复杂。您能添加一个示例表和预期输出吗?@S-Man我在这里创建了它:您发布的示例的预期结果是什么?请将它添加到您的问题中。@KamilGosciminski我添加了所需的输出和生成它的代码。很抱歉。我的答案似乎正是您想要的,尽管我不知道为什么您的输出中的段数少于数据生成的段数。
WITH ranges AS (
  SELECT
    myrange::text || '-' || (myrange + 0.1)::text AS segment,
    myrange as r_min, myrange + 0.1 as r_max
  FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
  p.company_identifier,
  p.model,
  r.segment,
  COUNT(DISTINCT(p.user_identifier)) as "segment_users",
  COUNT(CASE WHEN pv.pageview_current_url_type = 'BUYSUCCESS' THEN 1 END) AS segmented_really_bought
FROM
  ranges r
INNER JOIN (
  SELECT
    SPLIT_PART(id, ':', 1) as company_identifier,
    SPLIT_PART(id, ':', 2) as user_identifier,
    model,
    prediction
  FROM
    predictionstate
  ) p ON p.prediction BETWEEN r.r_min AND r.r_max
LEFT JOIN pageviews pv ON 
  p.company_identifier = pv.company_identifier
  AND p.user_identifier = pv.user_identifier
GROUP BY p.company_identifier, p.model, r.segment
ORDER BY p.company_identifier, p.model, r.segment;
WITH ranges AS (
  SELECT
    myrange::text || '-' || (myrange + 0.1)::text AS segment,
    myrange as r_min, myrange + 0.1 as r_max
  FROM generate_series(0.0, 0.9, 0.1) AS myrange
), pstate AS (                                         -- A
  SELECT 
    SPLIT_PART(ps.id, ':', 1) AS company_identifier,
    SPLIT_PART(ps.id, ':', 2) AS user_identifier,
    model,
    prediction
  FROM predictionstate ps
)
SELECT 
  company_identifier, model, segment,
  COUNT(DISTINCT user_identifier) as segment_users,    -- B
  -- C: 
  COUNT(user_identifier) FILTER (WHERE pageview_current_url_type = 'BUYSUCCESS') as really_bought
FROM pstate ps
LEFT JOIN ranges r 
ON prediction BETWEEN r_min AND r_max
LEFT JOIN pageviews pv 
USING (company_identifier, user_identifier)
GROUP BY company_identifier, model, segment
ORDER BY company_identifier, model, segment