Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 折叠配置单元中的行并保持非空值_Sql_Hive_Hiveql - Fatal编程技术网

Sql 折叠配置单元中的行并保持非空值

Sql 折叠配置单元中的行并保持非空值,sql,hive,hiveql,Sql,Hive,Hiveql,我在配置单元中有一个表,其中athr_名称和post_日期字段为90%null(在配置单元中用“?”表示)。我想通过athr_名称、发布日期、页面和访问日期查询表和组,以获得访问次数和访客数。但是,我还希望将空值与athr_名称和post_日期不为空的值合并并替换为空值(page_nm包含唯一值,因此只能有正确的athr_名称或空值) 换句话说,我有: athr_name post_date page_nm visit_date visit visitors 1

我在配置单元中有一个表,其中athr_名称和post_日期字段为90%null(在配置单元中用“?”表示)。我想通过athr_名称、发布日期、页面和访问日期查询表和组,以获得访问次数和访客数。但是,我还希望将空值与athr_名称和post_日期不为空的值合并并替换为空值(page_nm包含唯一值,因此只能有正确的athr_名称或空值)

换句话说,我有:

   athr_name post_date         page_nm visit_date visit visitors
1      Steve  9/1/2019 /page1/content/   20191014    45       11
2      Steve  9/1/2019 /page1/content/   20191015    62       38
3      Steve  9/1/2019 /page1/content/   20191016    28       49
4      Steve  9/1/2019 /page1/content/   20191207    54       70
5      Steve  9/1/2019 /page1/content/   20191208    39       26
6          ?         ? /page1/content/   20191014    28       24
7          ?         ? /page1/content/   20191015    17       63
8          ?         ? /page1/content/   20191016    48       40
9          ?         ? /page1/content/   20191017    47       14
10         ?         ? /page1/content/   20191018    33        1
我想把这些数据压缩成这样的结果:

  athr_name post_date         page_nm visit_date visit visitors
1     Steve  9/1/2019 /page1/content/   20191014    73       35
2     Steve  9/1/2019 /page1/content/   20191015    79      101
3     Steve  9/1/2019 /page1/content/   20191016    76       89
4     Steve  9/1/2019 /page1/content/   20191017    47       14
5     Steve  9/1/2019 /page1/content/   20191018    33        1
6     Steve  9/1/2019 /page1/content/   20191207    54       70
7     Steve  9/1/2019 /page1/content/   20191208    39       26

如果它是列而不是行,则可以通过合并函数对其进行寻址。非常感谢您的帮助

这是你想要的吗

select max(athr_name), max(post_date), page_nm, 
       visit_date, sum(visit), sum(visitors)
from t
group by page_nm, visit_date;

首先,您需要使用填充空值。您的查询可能如下所示:

SELECT athr_name, 
       post_date, 
       page_nm, visit_date, 
       sum(visit), 
       sum(visitors)
from (
    select nvl(athr_name, LAST_VALUE(athr_name, TRUE)
                                              OVER (ORDER BY page_nm, athr_name NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as athr_name,
           nvl(post_date, LAST_VALUE(post_date, TRUE)
                                              OVER (ORDER BY page_nm, post_date NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) as post_date,
           page_nm,
           visit_date,
           visit,
           visitors
    from your_table) as tmp_view
GROUP BY athr_name, post_date, page_nm, visit_date;
更新:

如果某些页面可能没有相应的用户名或发布日期,最好使用此查询来保留此信息:

SELECT athr_name, post_date, page_nm, visit_date, sum(visit), sum(visitors)
from (
         select name_view.athr_name as athr_name,
                date_view.post_date as post_date,
                main.page_nm,
                main.visit_date,
                main.visit,
                main.visitors
         from your_table main
                  LEFT JOIN (select athr_name, page_nm, row_number() over (PARTITION BY page_nm) as rn
                             from your_table
                             where athr_name is not null) name_view
                            ON main.page_nm = name_view.page_nm AND name_view.rn = 1
                  LEFT JOIN (select post_date, page_nm, row_number() over (PARTITION BY page_nm) as rn
                             from your_table
                             where post_date is not null) date_view
                            ON main.page_nm = date_view.page_nm AND date_view.rn = 1) as tmp_view
GROUP BY athr_name, post_date, page_nm, visit_date;

不,我相信这在SQL中会起作用,但在Hive中似乎不起作用。使用和不使用max()语句时,我得到的结果是相同的。@P5C768。如果您可以删除
max()
s,那么您的
分组依据
与此答案中的不一样。谢谢@Lyashko,我发现我有一些潜在的数据问题,但我相信此解决方案会起作用。