Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 配置单元查询,有没有优化这些联合的好方法?_Sql_Hadoop_Optimization_Hive_Hiveql - Fatal编程技术网

Sql 配置单元查询,有没有优化这些联合的好方法?

Sql 配置单元查询,有没有优化这些联合的好方法?,sql,hadoop,optimization,hive,hiveql,Sql,Hadoop,Optimization,Hive,Hiveql,总之,我不熟悉配置单元和通用查询优化 我有3个工会或多或少都是完全相同的查询。这些联合存在的唯一原因是因为我的源表没有周末或假日日期,我需要为不存在的假日/周末日期保留源表中存在的前一个日历日的一些基本值。Dateadd函数实际上是3个联合的唯一区别点1天、2天或3天 有没有办法将这3个查询组合成一个查询,或者只是以更高效的方式来实现 我有点卡住了,但我已经把整个过程从45分钟缩短到了4.5分钟。只是不知道如何优化这些联合。请帮助:/ UNION ALL --ADDING 1 DAYS

总之,我不熟悉配置单元和通用查询优化

我有3个工会或多或少都是完全相同的查询。这些联合存在的唯一原因是因为我的源表没有周末或假日日期,我需要为不存在的假日/周末日期保留源表中存在的前一个日历日的一些基本值。Dateadd函数实际上是3个联合的唯一区别点1天、2天或3天

有没有办法将这3个查询组合成一个查询,或者只是以更高效的方式来实现

我有点卡住了,但我已经把整个过程从45分钟缩短到了4.5分钟。只是不知道如何优化这些联合。请帮助:/

   UNION ALL 

--ADDING 1 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,1) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where

a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1

UNION ALL 

--ADDING 2 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,2) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where

a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1

UNION ALL 

--ADDING 3 DAYS To Holidays
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,3) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where

a.portfolio_code IN ('1994',1998,2523)
and  a.year=2020 and a.month=09
and performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000')
) a
where row_no= 1

如果它与您所写的完全一样,唯一的区别是date_add parameter函数,那么您可以从其中一个并集获取sql,并使用1,2和3个常量之间的并集交叉连接它。也许交叉连接比联盟更有效;还取决于来源的数字。此外,您还可以在进行交叉联接之前过滤行数,以便联接较少的行。在下面的示例中,我没有过滤行号

查询将如下所示:

SELECT a.portfolio_name, 
       Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS 
       performance_end_date, 
       a.car_return, 
       a.nav, 
       a.nav_id, 
       a.performance_end_date, 
       a.row_no 
FROM   (SELECT a.portfolio_name, 
               -- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date, 
               Cast(0.0000000 AS STRING)   AS car_return, 
               a.nav, 
               a.nav_id, 
               a.performance_end_date, 
               Row_number() 
                 OVER ( 
                   partition BY a.portfolio_code, a.performance_end_date 
                   ORDER BY a.nav_id DESC) AS row_no 
        FROM   carsales a 
        WHERE  a.portfolio_code IN ( '1994', 1998, 2523 ) 
               AND a.year = 2020 
               AND a.month = 09 
               AND Dayofweek(performance_end_date) = 6) a 
       CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs 
编辑1:关于date1或date2的评论,你可以像你写的那样做。在where子句中,放置date\u column=something或date\u column=something

SELECT a.portfolio_name, 
       Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS 
       performance_end_date, 
       a.car_return, 
       a.nav, 
       a.nav_id, 
       a.performance_end_date, 
       a.row_no 
FROM   (SELECT a.portfolio_name, 
               -- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date, 
               Cast(0.0000000 AS STRING)   AS car_return, 
               a.nav, 
               a.nav_id, 
               a.performance_end_date, 
               Row_number() 
                 OVER ( 
                   partition BY a.portfolio_code, a.performance_end_date 
                   ORDER BY a.nav_id DESC) AS row_no 
        FROM   carsales a 
        WHERE  a.portfolio_code IN ( '1994', 1998, 2523 ) 
               AND a.year = 2020 
               AND a.month = 09 
               AND (Dayofweek(performance_end_date) = 6 or performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000'))
       ) a 
       CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs

如果它与您所写的完全一样,唯一的区别是date_add parameter函数,那么您可以从其中一个并集获取sql,并使用1,2和3个常量之间的并集交叉连接它。也许交叉连接比联盟更有效;还取决于来源的数字。此外,您还可以在进行交叉联接之前过滤行数,以便联接较少的行。在下面的示例中,我没有过滤行号

查询将如下所示:

SELECT a.portfolio_name, 
       Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS 
       performance_end_date, 
       a.car_return, 
       a.nav, 
       a.nav_id, 
       a.performance_end_date, 
       a.row_no 
FROM   (SELECT a.portfolio_name, 
               -- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date, 
               Cast(0.0000000 AS STRING)   AS car_return, 
               a.nav, 
               a.nav_id, 
               a.performance_end_date, 
               Row_number() 
                 OVER ( 
                   partition BY a.portfolio_code, a.performance_end_date 
                   ORDER BY a.nav_id DESC) AS row_no 
        FROM   carsales a 
        WHERE  a.portfolio_code IN ( '1994', 1998, 2523 ) 
               AND a.year = 2020 
               AND a.month = 09 
               AND Dayofweek(performance_end_date) = 6) a 
       CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs 
编辑1:关于date1或date2的评论,你可以像你写的那样做。在where子句中,放置date\u column=something或date\u column=something

SELECT a.portfolio_name, 
       Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS 
       performance_end_date, 
       a.car_return, 
       a.nav, 
       a.nav_id, 
       a.performance_end_date, 
       a.row_no 
FROM   (SELECT a.portfolio_name, 
               -- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date, 
               Cast(0.0000000 AS STRING)   AS car_return, 
               a.nav, 
               a.nav_id, 
               a.performance_end_date, 
               Row_number() 
                 OVER ( 
                   partition BY a.portfolio_code, a.performance_end_date 
                   ORDER BY a.nav_id DESC) AS row_no 
        FROM   carsales a 
        WHERE  a.portfolio_code IN ( '1994', 1998, 2523 ) 
               AND a.year = 2020 
               AND a.month = 09 
               AND (Dayofweek(performance_end_date) = 6 or performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000'))
       ) a 
       CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs

除了@F.Lazarescu answer,您还可以重写交叉连接子查询

与此相反:

CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs 
使用堆栈UDTF,它将执行得更快:

CROSS JOIN (SELECT stack(3, 1,2,3) as crs) crs 

除了@F.Lazarescu answer,您还可以重写交叉连接子查询

与此相反:

CROSS JOIN (SELECT 1 crs 
                   UNION ALL 
                   SELECT 2 
                   UNION ALL 
                   SELECT 3) crs 
使用堆栈UDTF,它将执行得更快:

CROSS JOIN (SELECT stack(3, 1,2,3) as crs) crs 

更具体地说,对于某些假日,我可能只需要添加一行,而且看起来交叉连接和堆栈将占用每个日期并添加3行,对吗?我想知道是否有办法进行同样的设置,但可以更灵活地根据日期定义需要添加多少行。@业余爱好者请参阅我的文章,编辑1。我整合了你想要的条件。谢谢你,但有一件事我们可能会断开连接。所以这个解决方案可以为每个星期五和我指定的每个日期引入3行。它没有做的是允许我指定为某些日期引入多少行,这些日期可能只需要1或2个日期,如果这有意义的话。以感恩节为例,我只需要将感恩节前一天在未来的日期中保留1天。添加1,而不是3。至少有条件地遵循这些原则:交叉连接选择case,当Dayofweeka.performance\u end\u date=6时,然后堆栈2,当a.performance\u end\u date在'2020-09-04 00:00:00.000'时,则堆栈1,3作为crs结束crs@Amateurhour35是否要参数化生成的行数?更具体地说,对于某些节假日,我可能只需要添加一行,而且看起来交叉连接和堆栈将在每个日期添加3行,对吗?我想知道是否有办法进行同样的设置,但可以更灵活地根据日期定义需要添加多少行。@业余爱好者请参阅我的文章,编辑1。我整合了你想要的条件。谢谢你,但有一件事我们可能会断开连接。所以这个解决方案可以为每个星期五和我指定的每个日期引入3行。它没有做的是允许我指定为某些日期引入多少行,这些日期可能只需要1或2个日期,如果这有意义的话。以感恩节为例,我只需要将感恩节前一天在未来的日期中保留1天。添加1,而不是3。至少有条件地遵循这些原则:交叉连接选择case,当Dayofweeka.performance\u end\u date=6时,然后堆栈2,当a.performance\u end\u date在'2020-09-04 00:00:00.000'时,则堆栈1,3作为crs结束crs@Amateurhour35是否要参数化生成的行数?