Sql Postgres:使用左联接计算行数
我正在尝试使用Postgres进行一些分析,我有两个表,分别是:Sql Postgres:使用左联接计算行数,sql,postgresql,Sql,Postgresql,我正在尝试使用Postgres进行一些分析,我有两个表,分别是:predictionstate和pageviews 预测状态表: 此表包含具有我们的算法结果的列,使用以下结构: id({company\u identifier}:{user\u identifier}) 模型(参考字符串值) 预测(浮点数介于0.0和1.0之间) 页面浏览量表: 此表包含使用以下结构的用户信息: 公司识别码 用户标识符 页面视图\当前\ url\类型 问题 WITH ranges AS ( SELEC
predictionstate
和pageviews
预测状态表:
此表包含具有我们的算法结果的列,使用以下结构:
- id(
){company\u identifier}:{user\u identifier}
- 模型(参考字符串值)
- 预测(浮点数介于0.0和1.0之间)
表:
此表包含使用以下结构的用户信息:
- 公司识别码
- 用户标识符
- 页面视图\当前\ url\类型
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
SPLIT_PART(p.id, ':', 1) as company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users",
COUNT(b.*) as "converted_users"
FROM
ranges r
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
INNER JOIN (
SELECT users.company_identifier, COUNT(users.user_identifier) AS n
FROM pageviews
INNER JOIN (
SELECT SPLIT_PART(ps.id, ':', 2) AS user_identifier,
SPLIT_PART(ps.id, ':', 1) AS company_identifier
FROM predictionstate ps
WHERE provider_id=47 AND
prediction > 0.7
) users ON (
pageviews.user_identifier=users.user_identifier AND
pageviews.company_identifier=users.company_identifier
)
WHERE pageview_current_url_type='BUYSUCCESS'
GROUP BY users.company_identifier
) AS b
ON (
b.company_identifier = company_identifier
)
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
我试图根据我们最好的模型获取数据,分析它的准确性,基本上我需要知道如何创建细分,并计算我有多少成员。以下代码执行以下操作:
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
SPLIT_PART(p.id, ':', 1) as company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users"
FROM
ranges r
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
但我的问题是,因为我不知道具体怎么做,所以对于每个(公司、型号、细分市场),我都需要获得准确度的数据,查询pageviews
表并识别pageview\u current\u url\u type==“buysecess”
我试过了,但没有成功:
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
SPLIT_PART(p.id, ':', 1) as company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(SPLIT_PART(p.id, ':', 2))) as "segment_users",
b.n as "converted_users"
FROM
ranges r,
(
SELECT COUNT(DISTINCT(pvs.user_identifier)) as n
FROM pageviews pvs
INNER JOIN (
SELECT
SPLIT_PART(id, ':', 1) as company_identifier,
SPLIT_PART(id, ':', 2) as user_identifier
FROM predictionstate ps
WHERE prediction BETWEEN r.r_min AND r.r_max ) users
ON (
pvs.user_identifier = users.user_identifier AND
pvs.company_identifier= users.company_identifier)
WHERE pageview_current_url_type = 'BUYSUCCESS'
) b
INNER JOIN predictionstate p ON p.prediction BETWEEN r.r_min AND r.r_max
GROUP BY company_identifier, p.model, r.segment
ORDER BY company_identifier, p.model, r.segment;
TL;DR:我需要根据主查询用户统计一个连接
编辑:
我添加了一个SQL小提琴
我想知道的是,对于那些segment\u用户
,他们中有多少人拥有pageview\u current\u url\u type='buysecsuccess'
,在结果中再添加一列:segmented\u really\u buy
编辑2:再次尝试无效(错误:列“p.id”必须出现在GROUP BY子句中或用于聚合函数)
编辑3:添加所需的输出
使用以下代码生成:
没有样本输出很难判断您需要什么,但我认为您需要的是:
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
p.company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(p.user_identifier)) as "segment_users",
COUNT(CASE WHEN pv.pageview_current_url_type = 'BUYSUCCESS' THEN 1 END) AS segmented_really_bought
FROM
ranges r
INNER JOIN (
SELECT
SPLIT_PART(id, ':', 1) as company_identifier,
SPLIT_PART(id, ':', 2) as user_identifier,
model,
prediction
FROM
predictionstate
) p ON p.prediction BETWEEN r.r_min AND r.r_max
LEFT JOIN pageviews pv ON
p.company_identifier = pv.company_identifier
AND p.user_identifier = pv.user_identifier
GROUP BY p.company_identifier, p.model, r.segment
ORDER BY p.company_identifier, p.model, r.segment;
对fiddle查询的更改:
- 将
替换为我们加入的子查询,在子查询中,我们执行predictionstate
逻辑,以将comapny和用户标识符作为单独的列split\u部分
- 使用这些标识符将
左连接到
页面视图
- 添加了带有大小写的
COUNT的
列segmented\u really\u buy
COUNT(DISTINCT)
统计组中的不同用户
C:对所有用户进行计数(不区分),但在计数前过滤掉预期状态
我想知道:如果一个预测正好在一个阈值上,例如
0.3
,会怎么样。使用BETWEEN
子句,该范围将在0.2-0.3
范围和0.3-0.4
范围内加入(因为BETWEEN
等于r\u min>=x>=r\u max
)。最好将范围定义为r\u min>=x>r\u max
或r\u min>x>=r\u max
。我创建了您在示例中提到的连接,但我更愿意更改它。我还是不知道往哪个方向走。为什么您的ID是串联字符串?如果您有两列作为主键,那么在代码中会容易得多。2.这似乎很复杂。您能添加一个示例表和预期输出吗?@S-Man我在这里创建了它:您发布的示例的预期结果是什么?请将它添加到您的问题中。@KamilGosciminski我添加了所需的输出和生成它的代码。很抱歉。我的答案似乎正是您想要的,尽管我不知道为什么您的输出中的段数少于数据生成的段数。
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
)
SELECT
p.company_identifier,
p.model,
r.segment,
COUNT(DISTINCT(p.user_identifier)) as "segment_users",
COUNT(CASE WHEN pv.pageview_current_url_type = 'BUYSUCCESS' THEN 1 END) AS segmented_really_bought
FROM
ranges r
INNER JOIN (
SELECT
SPLIT_PART(id, ':', 1) as company_identifier,
SPLIT_PART(id, ':', 2) as user_identifier,
model,
prediction
FROM
predictionstate
) p ON p.prediction BETWEEN r.r_min AND r.r_max
LEFT JOIN pageviews pv ON
p.company_identifier = pv.company_identifier
AND p.user_identifier = pv.user_identifier
GROUP BY p.company_identifier, p.model, r.segment
ORDER BY p.company_identifier, p.model, r.segment;
WITH ranges AS (
SELECT
myrange::text || '-' || (myrange + 0.1)::text AS segment,
myrange as r_min, myrange + 0.1 as r_max
FROM generate_series(0.0, 0.9, 0.1) AS myrange
), pstate AS ( -- A
SELECT
SPLIT_PART(ps.id, ':', 1) AS company_identifier,
SPLIT_PART(ps.id, ':', 2) AS user_identifier,
model,
prediction
FROM predictionstate ps
)
SELECT
company_identifier, model, segment,
COUNT(DISTINCT user_identifier) as segment_users, -- B
-- C:
COUNT(user_identifier) FILTER (WHERE pageview_current_url_type = 'BUYSUCCESS') as really_bought
FROM pstate ps
LEFT JOIN ranges r
ON prediction BETWEEN r_min AND r_max
LEFT JOIN pageviews pv
USING (company_identifier, user_identifier)
GROUP BY company_identifier, model, segment
ORDER BY company_identifier, model, segment