Sql 有条件地重新标记行值
我想写一个脚本,查看数据id和数据原始数字上的值。如果3列的值相同,则从user_name列中获取第一个非空值,并用相同的值重新标记与特定数据_id关联的所有列 这就是我现在拥有的Sql 有条件地重新标记行值,sql,amazon-redshift,Sql,Amazon Redshift,我想写一个脚本,查看数据id和数据原始数字上的值。如果3列的值相同,则从user_name列中获取第一个非空值,并用相同的值重新标记与特定数据_id关联的所有列 这就是我现在拥有的 data_id data_raw_digits data_user_name data_ended at event_sequence 1 0000 abc 112 1 1
data_id data_raw_digits data_user_name data_ended at event_sequence
1 0000 abc 112 1
1 0000 2
1 0000 3
1 0000 4
2 1111 1
2 1111 ccc 212 2
3 2222 1
3 2222 ddd 2
3 2222 303 3
期望输出:
data_id data_raw_digits data_user_name data_ended at event_sequence
1 0000 abc 112 1
1 0000 abc 112 2
1 0000 abc 112 3
1 0000 abc 112 4
2 1111 ccc 212 1
2 1111 ccc 212 2
3 2222 ddd 303 1
3 2222 ddd 303 2
3 2222 ddd 303 3
我将进行以下工作:
- 对于要处理的每个列(
,data\u user\u name
),使用子查询中的窗口函数对共享相同data\u ended\u at
和数据原始\u数字
的记录组中相关字段不为空的记录进行排序data\u id
将这些结果与原始表合并,并使用LEFT JOIN
将空值替换为相应组中第一条记录的值COALESCE
SELECT
t.data_id,
t.data_raw_digits,
COALESCE(t.data_user_name, t_user_name.data_user_name) data_user_name,
COALESCE(t.data_ended_at, t_ended_at.data_ended_at) data_ended_at,
t.event_sequence
FROM mytable t
LEFT JOIN (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY data_id, data_raw_digits ORDER BY event_sequence) rn
FROM mytable t
WHERE data_user_name IS NOT NULL
) t_user_name
ON t_user_name.rn = 1
AND t_user_name.data_id = t.data_id
AND t_user_name.data_raw_digits = t.data_raw_digits
LEFT JOIN (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY data_id, data_raw_digits ORDER BY event_sequence) rn
FROM mytable t
WHERE data_ended_at IS NOT NULL
) t_ended_at
ON t_ended_at.rn = 1
AND t_ended_at.data_id = t.data_id
AND t_ended_at.data_raw_digits = t.data_raw_digits;
:
| data_id | data_raw_digits | event_sequence | data_user_name | data_ended_at |
| ------- | --------------- | -------------- | -------------- | ------------- |
| 1 | 0 | 1 | abc | 112 |
| 1 | 0 | 2 | abc | 112 |
| 1 | 0 | 3 | abc | 112 |
| 1 | 0 | 4 | abc | 112 |
| 2 | 1111 | 1 | ccc | 212 |
| 2 | 1111 | 2 | ccc | 212 |
| 3 | 2222 | 1 | ddd | 303 |
| 3 | 2222 | 2 | ddd | 303 |
| 3 | 2222 | 3 | ddd | 303 |
注意:这是在MySQL fiddle中测试的,因为据我所知,互联网上没有公共aws athena fiddle;但是,这是一种标准的SQL语法,应该适用于大多数RDBMS,包括您的RDBMS。我将按照以下步骤进行操作:
- 对于要处理的每个列(
,data\u user\u name
),使用子查询中的窗口函数对共享相同data\u ended\u at
和数据原始\u数字
的记录组中相关字段不为空的记录进行排序data\u id
将这些结果与原始表合并,并使用LEFT JOIN
将空值替换为相应组中第一条记录的值COALESCE
SELECT
t.data_id,
t.data_raw_digits,
COALESCE(t.data_user_name, t_user_name.data_user_name) data_user_name,
COALESCE(t.data_ended_at, t_ended_at.data_ended_at) data_ended_at,
t.event_sequence
FROM mytable t
LEFT JOIN (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY data_id, data_raw_digits ORDER BY event_sequence) rn
FROM mytable t
WHERE data_user_name IS NOT NULL
) t_user_name
ON t_user_name.rn = 1
AND t_user_name.data_id = t.data_id
AND t_user_name.data_raw_digits = t.data_raw_digits
LEFT JOIN (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY data_id, data_raw_digits ORDER BY event_sequence) rn
FROM mytable t
WHERE data_ended_at IS NOT NULL
) t_ended_at
ON t_ended_at.rn = 1
AND t_ended_at.data_id = t.data_id
AND t_ended_at.data_raw_digits = t.data_raw_digits;
:
| data_id | data_raw_digits | event_sequence | data_user_name | data_ended_at |
| ------- | --------------- | -------------- | -------------- | ------------- |
| 1 | 0 | 1 | abc | 112 |
| 1 | 0 | 2 | abc | 112 |
| 1 | 0 | 3 | abc | 112 |
| 1 | 0 | 4 | abc | 112 |
| 2 | 1111 | 1 | ccc | 212 |
| 2 | 1111 | 2 | ccc | 212 |
| 3 | 2222 | 1 | ddd | 303 |
| 3 | 2222 | 2 | ddd | 303 |
| 3 | 2222 | 3 | ddd | 303 |
注意:这是在MySQL fiddle中测试的,因为据我所知,互联网上没有公共aws athena fiddle;但是,这是标准的SQL语法,应该适用于大多数RDBMS,包括您的RDBMS。我认为您可以使用窗口函数来实现这一点:
select data_id, data_raw_digits,
max(data_user_name) over (partition by data_id, data_raw_digits) as data_user_name,
max(data_ended) over (partition by data_id, data_raw_digits) as data_ended,
row_number() over (partition by data_id, data_raw_digits order by data_id) as event_sequence
from t;
注意:在结果集中,只有事件\u序列
区分行。关键的一点是,原始行的顺序没有被保留——但是没有办法判断
SQL表表示无序集。除非列显式包含该信息,否则没有排序。而且您似乎没有这样的专栏。我认为您可以通过窗口功能实现这一点:
select data_id, data_raw_digits,
max(data_user_name) over (partition by data_id, data_raw_digits) as data_user_name,
max(data_ended) over (partition by data_id, data_raw_digits) as data_ended,
row_number() over (partition by data_id, data_raw_digits order by data_id) as event_sequence
from t;
注意:在结果集中,只有事件\u序列
区分行。关键的一点是,原始行的顺序没有被保留——但是没有办法判断
SQL表表示无序集。除非列显式包含该信息,否则没有排序。而且您似乎没有这样的专栏。使用聚合进行一次好的旧的自连接怎么样
SELECT
a.data_id as data_id,
a.data_raw_digits as data_raw_digits,
a.event_sequence as event_sequence,
max(b.data_user_name) as data_user_name,
max(b.data_ended_at) as data_ended_at
from your_table a left join your_table b on a.data_raw_digits = b.data_raw_digits
group by a.data_id, a.data_raw_digits, a.event_sequence;
使用聚合进行良好的旧自连接如何
SELECT
a.data_id as data_id,
a.data_raw_digits as data_raw_digits,
a.event_sequence as event_sequence,
max(b.data_user_name) as data_user_name,
max(b.data_ended_at) as data_ended_at
from your_table a left join your_table b on a.data_raw_digits = b.data_raw_digits
group by a.data_id, a.data_raw_digits, a.event_sequence;