将原始Sql语句按列分组,并使用相同名称的不同字符串

将原始Sql语句按列分组,并使用相同名称的不同字符串,sql,postgresql,knex.js,Sql,Postgresql,Knex.js,对于创建一个更复杂的sql语句来说,这是一个全新的概念,我正在尝试使用一个名称来进行分组,其中的名称可以有不同的形式。例如,名称可以是“凯恩,帕特里克”,“P.凯恩,帕特里克”,“凯恩,帕特里克”* 到目前为止,我在下面查询了大约7000个结果: SELECT SUM(games_played) as games_played, SUM(goals) as goals, SUM(points) as points, player_name FROM player_stats GROUP BY

对于创建一个更复杂的sql语句来说,这是一个全新的概念,我正在尝试使用一个名称来进行分组,其中的名称可以有不同的形式。例如,名称可以是“凯恩,帕特里克”,“P.凯恩,帕特里克”,“凯恩,帕特里克”*

到目前为止,我在下面查询了大约7000个结果:

SELECT 
SUM(games_played) as games_played,
SUM(goals) as goals,
SUM(points) as points,
player_name
FROM player_stats
GROUP BY player_name;
生成json的示例

[
{games_played: 123, goals: 12, points: 40, player_name: "Kane, Patrick"},
{games_played: 123, goals: 12, points: 40, player_name: "P. Kane, Patrick"},
{games_played: 123, goals: 12, points: 40, player_name: "Kane, Patrick*"},
{games_played: 123, goals: 12, points: 40, player_name: "Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "A. Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "Nylander, Alex*"},
{games_played: 123, goals: 12, points: 40, player_name: "Lemieux, Mario"},
{games_played: 123, goals: 12, points: 40, player_name: "Gretzky, Wayne"},
]
问题是如何获得由相似参与者分组的每个列的总和,以便结果看起来更像如下所示:

[
{games_played: 369, goals: 36, points: 120, player_name: "Kane, Patrick"},
{games_played: 369, goals: 36, points: 120, player_name: "Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "Lemieux, Mario"},
{games_played: 123, goals: 12, points: 40, player_name: "Gretzky, Wayne"},
]

even better if i can get a knex.js query but i have no problem using a raw query here. DB is postgresSQL. 

thanks in advance

您需要做一些事情来将名称转换为一致的形式,可以是字符串替换、按句点拆分和只取第二个值、删除特殊字符等。没有任何人工智能可以“哦;p kane,Patrick显然与PatrickKane*相同-你必须自己进行操作。你甚至可以有一个表,它有两列,每个名称都有不同的变体,映射到一个一致的名称,然后对不同的名称进行连接,对一致的名称进行分组

我想我的第一步是整理数据:

UPDATE player_stats 
SET player_name = REPLACE(player_name, '*', '')

UPDATE player_stats 
SET player_name = SUBSTRING(player_name from 3)
WHERE player_name LIKE '_.%'
您可以停在这里,继续永远重新运行它,以不断删除表中的垃圾,随着更多变体的出现,添加更多规则

但是你应该为玩家制作一张新的桌子:

SELECT uuid_generate_v4() as player_id, player_name 
INTO players
FROM (SELECT distinct player_name FROM player_stats)x

ALTER TABLE players ADD PRIMARY KEY (player_id);
然后将列添加到stats以获取id:

ALTER TABLE player_stats ADD player_id UUID;
将数据复制到:

UPDATE player_stats d
SET d.player_id = s.player_id
FROM players s
WHERE s.player_name = d.player_name
设置外键:

ALTER TABLE player_stats
ADD CONSTRAINT fk_playersstats_playerid__players_playerid FOREIGN KEY player_id REFERENCES players(player_id)
最后转储名称列:

ALTER TABLE player_stats DROP player_name

然后首先修复用各种各样的垃圾填充表格的程序:)

你需要做一些事情来将名称转换为一致的形式,可以是字符串替换、按句点拆分和只取第二个值、删除特殊字符等。没有任何人工智能可以做到“哦;p kane,Patrick显然与PatrickKane*相同-你必须自己进行操作。你甚至可以有一个表,它有两列,每个名称都有不同的变体,映射到一个一致的名称,然后对不同的名称进行连接,对一致的名称进行分组

我想我的第一步是整理数据:

UPDATE player_stats 
SET player_name = REPLACE(player_name, '*', '')

UPDATE player_stats 
SET player_name = SUBSTRING(player_name from 3)
WHERE player_name LIKE '_.%'
您可以停在这里,继续永远重新运行它,以不断删除表中的垃圾,随着更多变体的出现,添加更多规则

但是你应该为玩家制作一张新的桌子:

SELECT uuid_generate_v4() as player_id, player_name 
INTO players
FROM (SELECT distinct player_name FROM player_stats)x

ALTER TABLE players ADD PRIMARY KEY (player_id);
然后将列添加到stats以获取id:

ALTER TABLE player_stats ADD player_id UUID;
将数据复制到:

UPDATE player_stats d
SET d.player_id = s.player_id
FROM players s
WHERE s.player_name = d.player_name
设置外键:

ALTER TABLE player_stats
ADD CONSTRAINT fk_playersstats_playerid__players_playerid FOREIGN KEY player_id REFERENCES players(player_id)
最后转储名称列:

ALTER TABLE player_stats DROP player_name

然后首先修复用各种垃圾填充表格的程序:)

如果必须这样做,您可以尝试以下方法:

SELECT 
SUM(games_played) as games_played,
SUM(goals) as goals,
SUM(points) as points,
player_name
FROM player_stats
GROUP BY
 CASE 
      when player_name like '%Patr%' then 'Kane, Patrick'
      when player_name like '%Alex%' then 'Nylander, Alex'
      when player_name like '%Mar%' then 'Lemieux, Mario'
      when player_name like '%Wayn%' then 'Gretzky, Wayne'
 ELSE NULL
 END

但是你应该接受Caius Jard的建议…

如果你必须这样做,你可以尝试以下方法:

SELECT 
SUM(games_played) as games_played,
SUM(goals) as goals,
SUM(points) as points,
player_name
FROM player_stats
GROUP BY
 CASE 
      when player_name like '%Patr%' then 'Kane, Patrick'
      when player_name like '%Alex%' then 'Nylander, Alex'
      when player_name like '%Mar%' then 'Lemieux, Mario'
      when player_name like '%Wayn%' then 'Gretzky, Wayne'
 ELSE NULL
 END

但是你应该接受Caius Jard的建议…

你应该用
唯一ID
创建一个新表,并用这些名称填充。同意Georgy;在我的答案中添加了更多细节,以说明如何使用
唯一ID创建一个新表并用这些名称填充。同意Georgy;在我的答案中添加了更多细节为了说明我如何打电话给我们建议,这对我的场景不起作用,因为有7000多条记录,所以我会写一大堆案例,谢谢:)我要打电话给我们建议,这对我的场景不起作用,因为有7000多条记录,所以我会写一大堆案例,谢谢:)让我知道如果你遇到了语法错误;它写在手机上,我无法运行任何程序来测试它让我知道你是否遇到语法错误;它写在手机上,我无法运行任何程序来测试它