Sql 我该如何在这个统计数据上加入这一点?
首先,我对问题的标题感到抱歉。我不懂统计学术语,也不懂这种连接困难 我有一个查询*,通过它我基本上生成了三件事。。aSql 我该如何在这个统计数据上加入这一点?,sql,postgresql,join,postgresql-9.1,Sql,Postgresql,Join,Postgresql 9.1,首先,我对问题的标题感到抱歉。我不懂统计学术语,也不懂这种连接困难 我有一个查询*,通过它我基本上生成了三件事。。arandom\u sex,random\u first和random\u last。我现在正试着加入我们 基本上,人口普查数据放在这样的表格中 name | freq | cumfreq | rank | name_type ------------+-------+---------+------+----------- SMITH | 1.006
random\u sex
,random\u first
和random\u last
。我现在正试着加入我们
基本上,人口普查数据放在这样的表格中
name | freq | cumfreq | rank | name_type
------------+-------+---------+------+-----------
SMITH | 1.006 | 1.006 | 1 | LAST
JOHNSON | 0.81 | 1.816 | 2 | LAST
WILLIAMS | 0.699 | 2.515 | 3 | LAST
JONES | 0.621 | 3.136 | 4 | LAST
BROWN | 0.621 | 3.757 | 5 | LAST
DAVIS | 0.48 | 4.237 | 6 | LAST
MILLER | 0.424 | 4.66 | 7 | LAST
WILSON | 0.339 | 5 | 8 | LAST
MOORE | 0.312 | 5.312 | 9 | LAST
TAYLOR | 0.311 | 5.623 | 10 | LAST
ANDERSON | 0.311 | 5.934 | 11 | LAST
THOMAS | 0.311 | 6.245 | 12 | LAST
JACKSON | 0.31 | 6.554 | 13 | LAST
WHITE | 0.279 | 6.834 | 14 | LAST
HARRIS | 0.275 | 7.109 | 15 | LAST
MARTIN | 0.273 | 7.382 | 16 | LAST
THOMPSON | 0.269 | 7.651 | 17 | LAST
GARCIA | 0.254 | 7.905 | 18 | LAST
MARTINEZ | 0.234 | 8.14 | 19 | LAST
在这种情况下
random_sex | random_first | random_last
male | 47.7101715711225 | 24.3833348881337
我希望它像这样连接(程序性):
所以这个绅士的名字应该是银哈珀。我这辈子没见过一个,但是
我想在上面的查询中返回“Silver”“Harper”,而不是随机数。我怎样才能让它像这样工作
脚注 *:为了简单起见:
SELECT
CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
, RANDOM() * 90.020 AS random_first -- dataset is 90% of most popular
, RANDOM() * 90.483 AS random_last
FROM generate_series(1,10,1);
事实上,我也不知道统计数字。但我想这就是你想要的 让我们命名返回随机列的表
Randoms
WITH RANDOMS AS
(
SELECT
CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
, RANDOM() * 90.020 AS random_first
, RANDOM() * 90.483 AS random_last
FROM generate_series(1,10,1)
)
SELECT (
SELECT A.NAME
FROM census.names A
WHERE A.cumfreq > R.random_first
AND A.name_type = 'MALE_FIRST'
order by A.cumfreq asc limit 1
),
(
SELECT A.NAME
FROM census.names A
WHERE A.cumfreq > R.random_last
AND A.name_type = 'LAST'
order by A.cumfreq asc limit 1
) AS NAME
FROM RANDOMS R ;
相关子查询
SELECT
*
FROM
yourRandomTable
INNER JOIN
census.names AS first_name
ON first_name.cumfreq = (SELECT MIN(cumfreq)
FROM census.names
WHERE cumfreq > yourRandomTable.random_first
AND type = yourRandomTable.random_sex + '_FIRST')
AND first_name.type = yourRandomTable.random_sex + '_FIRST'
INNER JOIN
census.names AS last_name
ON last_name.cumfreq = (SELECT MIN(cumfreq)
FROM census.names
WHERE cumfreq > yourRandomTable.random_last
AND type = 'LAST')
AND last_name.type = 'LAST'
你可以改变这种模式很多。具体的选择方式取决于您如何设置索引
EXPLAIN ANALYZE SELECT
r.sex
, r.detail
, COALESCE(
(SELECT name FROM census.names AS mf WHERE r.sex = 'male' AND mf.name_type = 'MALE_FIRST' AND mf.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
, (SELECT name FROM census.names AS ff WHERE r.sex = 'female' AND ff.name_type = 'FEMALE_FIRST' AND ff.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
) AS first
, (SELECT name FROM census.names AS l WHERE l.name_type = 'LAST' AND l.cumfreq > r.last ORDER BY cumfreq LIMIT 1) AS last
FROM (
SELECT
RANDOM() * 90.020 AS first
, RANDOM() * 90.483 AS last
, CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS sex
FROM generate_series(1,10,1)
) AS r;
这就是我最终的结果
SELECT
*
FROM
yourRandomTable
INNER JOIN
census.names AS first_name
ON first_name.cumfreq = (SELECT MIN(cumfreq)
FROM census.names
WHERE cumfreq > yourRandomTable.random_first
AND type = yourRandomTable.random_sex + '_FIRST')
AND first_name.type = yourRandomTable.random_sex + '_FIRST'
INNER JOIN
census.names AS last_name
ON last_name.cumfreq = (SELECT MIN(cumfreq)
FROM census.names
WHERE cumfreq > yourRandomTable.random_last
AND type = 'LAST')
AND last_name.type = 'LAST'
EXPLAIN ANALYZE SELECT
r.sex
, r.detail
, COALESCE(
(SELECT name FROM census.names AS mf WHERE r.sex = 'male' AND mf.name_type = 'MALE_FIRST' AND mf.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
, (SELECT name FROM census.names AS ff WHERE r.sex = 'female' AND ff.name_type = 'FEMALE_FIRST' AND ff.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
) AS first
, (SELECT name FROM census.names AS l WHERE l.name_type = 'LAST' AND l.cumfreq > r.last ORDER BY cumfreq LIMIT 1) AS last
FROM (
SELECT
RANDOM() * 90.020 AS first
, RANDOM() * 90.483 AS last
, CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS sex
FROM generate_series(1,10,1)
) AS r;