Sql 配置单元查询,有没有优化这些联合的好方法?
总之,我不熟悉配置单元和通用查询优化 我有3个工会或多或少都是完全相同的查询。这些联合存在的唯一原因是因为我的源表没有周末或假日日期,我需要为不存在的假日/周末日期保留源表中存在的前一个日历日的一些基本值。Dateadd函数实际上是3个联合的唯一区别点1天、2天或3天 有没有办法将这3个查询组合成一个查询,或者只是以更高效的方式来实现 我有点卡住了,但我已经把整个过程从45分钟缩短到了4.5分钟。只是不知道如何优化这些联合。请帮助:/Sql 配置单元查询,有没有优化这些联合的好方法?,sql,hadoop,optimization,hive,hiveql,Sql,Hadoop,Optimization,Hive,Hiveql,总之,我不熟悉配置单元和通用查询优化 我有3个工会或多或少都是完全相同的查询。这些联合存在的唯一原因是因为我的源表没有周末或假日日期,我需要为不存在的假日/周末日期保留源表中存在的前一个日历日的一些基本值。Dateadd函数实际上是3个联合的唯一区别点1天、2天或3天 有没有办法将这3个查询组合成一个查询,或者只是以更高效的方式来实现 我有点卡住了,但我已经把整个过程从45分钟缩短到了4.5分钟。只是不知道如何优化这些联合。请帮助:/ UNION ALL --ADDING 1 DAYS
UNION ALL
--ADDING 1 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,1) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1
UNION ALL
--ADDING 2 DAYS TO FRIDAYS--
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,2) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and a.year=2020 and a.month=09
and DAYOFWEEK(performance_end_date) = 6
) a
where row_no= 1
UNION ALL
--ADDING 3 DAYS To Holidays
select * from
(
SELECT a.portfolio_name, cast(date_add(performance_end_date,3) as timestamp) as performance_end_date, cast(0.0000000 as string) as car_return, a.nav, a.nav_id
,row_number() over (partition by a.portfolio_code,a.performance_end_date order by a.nav_id desc) as row_no
FROM carsales a
where
a.portfolio_code IN ('1994',1998,2523)
and a.year=2020 and a.month=09
and performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000')
) a
where row_no= 1
如果它与您所写的完全一样,唯一的区别是date_add parameter函数,那么您可以从其中一个并集获取sql,并使用1,2和3个常量之间的并集交叉连接它。也许交叉连接比联盟更有效;还取决于来源的数字。此外,您还可以在进行交叉联接之前过滤行数,以便联接较少的行。在下面的示例中,我没有过滤行号 查询将如下所示:
SELECT a.portfolio_name,
Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS
performance_end_date,
a.car_return,
a.nav,
a.nav_id,
a.performance_end_date,
a.row_no
FROM (SELECT a.portfolio_name,
-- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date,
Cast(0.0000000 AS STRING) AS car_return,
a.nav,
a.nav_id,
a.performance_end_date,
Row_number()
OVER (
partition BY a.portfolio_code, a.performance_end_date
ORDER BY a.nav_id DESC) AS row_no
FROM carsales a
WHERE a.portfolio_code IN ( '1994', 1998, 2523 )
AND a.year = 2020
AND a.month = 09
AND Dayofweek(performance_end_date) = 6) a
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
编辑1:关于date1或date2的评论,你可以像你写的那样做。在where子句中,放置date\u column=something或date\u column=something
SELECT a.portfolio_name,
Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS
performance_end_date,
a.car_return,
a.nav,
a.nav_id,
a.performance_end_date,
a.row_no
FROM (SELECT a.portfolio_name,
-- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date,
Cast(0.0000000 AS STRING) AS car_return,
a.nav,
a.nav_id,
a.performance_end_date,
Row_number()
OVER (
partition BY a.portfolio_code, a.performance_end_date
ORDER BY a.nav_id DESC) AS row_no
FROM carsales a
WHERE a.portfolio_code IN ( '1994', 1998, 2523 )
AND a.year = 2020
AND a.month = 09
AND (Dayofweek(performance_end_date) = 6 or performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000'))
) a
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
如果它与您所写的完全一样,唯一的区别是date_add parameter函数,那么您可以从其中一个并集获取sql,并使用1,2和3个常量之间的并集交叉连接它。也许交叉连接比联盟更有效;还取决于来源的数字。此外,您还可以在进行交叉联接之前过滤行数,以便联接较少的行。在下面的示例中,我没有过滤行号 查询将如下所示:
SELECT a.portfolio_name,
Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS
performance_end_date,
a.car_return,
a.nav,
a.nav_id,
a.performance_end_date,
a.row_no
FROM (SELECT a.portfolio_name,
-- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date,
Cast(0.0000000 AS STRING) AS car_return,
a.nav,
a.nav_id,
a.performance_end_date,
Row_number()
OVER (
partition BY a.portfolio_code, a.performance_end_date
ORDER BY a.nav_id DESC) AS row_no
FROM carsales a
WHERE a.portfolio_code IN ( '1994', 1998, 2523 )
AND a.year = 2020
AND a.month = 09
AND Dayofweek(performance_end_date) = 6) a
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
编辑1:关于date1或date2的评论,你可以像你写的那样做。在where子句中,放置date\u column=something或date\u column=something
SELECT a.portfolio_name,
Cast(Date_add(a.performance_end_date, crs.crs) AS TIMESTAMP) AS
performance_end_date,
a.car_return,
a.nav,
a.nav_id,
a.performance_end_date,
a.row_no
FROM (SELECT a.portfolio_name,
-- Cast(Date_add(performance_end_date, 1) AS TIMESTAMP) AS performance_end_date,
Cast(0.0000000 AS STRING) AS car_return,
a.nav,
a.nav_id,
a.performance_end_date,
Row_number()
OVER (
partition BY a.portfolio_code, a.performance_end_date
ORDER BY a.nav_id DESC) AS row_no
FROM carsales a
WHERE a.portfolio_code IN ( '1994', 1998, 2523 )
AND a.year = 2020
AND a.month = 09
AND (Dayofweek(performance_end_date) = 6 or performance_end_date in ('2020-09-04 00:00:00.000','2020-10-09 00:00:00.000'))
) a
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
除了@F.Lazarescu answer,您还可以重写交叉连接子查询 与此相反:
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
使用堆栈UDTF,它将执行得更快:
CROSS JOIN (SELECT stack(3, 1,2,3) as crs) crs
除了@F.Lazarescu answer,您还可以重写交叉连接子查询 与此相反:
CROSS JOIN (SELECT 1 crs
UNION ALL
SELECT 2
UNION ALL
SELECT 3) crs
使用堆栈UDTF,它将执行得更快:
CROSS JOIN (SELECT stack(3, 1,2,3) as crs) crs
更具体地说,对于某些假日,我可能只需要添加一行,而且看起来交叉连接和堆栈将占用每个日期并添加3行,对吗?我想知道是否有办法进行同样的设置,但可以更灵活地根据日期定义需要添加多少行。@业余爱好者请参阅我的文章,编辑1。我整合了你想要的条件。谢谢你,但有一件事我们可能会断开连接。所以这个解决方案可以为每个星期五和我指定的每个日期引入3行。它没有做的是允许我指定为某些日期引入多少行,这些日期可能只需要1或2个日期,如果这有意义的话。以感恩节为例,我只需要将感恩节前一天在未来的日期中保留1天。添加1,而不是3。至少有条件地遵循这些原则:交叉连接选择case,当Dayofweeka.performance\u end\u date=6时,然后堆栈2,当a.performance\u end\u date在'2020-09-04 00:00:00.000'时,则堆栈1,3作为crs结束crs@Amateurhour35是否要参数化生成的行数?更具体地说,对于某些节假日,我可能只需要添加一行,而且看起来交叉连接和堆栈将在每个日期添加3行,对吗?我想知道是否有办法进行同样的设置,但可以更灵活地根据日期定义需要添加多少行。@业余爱好者请参阅我的文章,编辑1。我整合了你想要的条件。谢谢你,但有一件事我们可能会断开连接。所以这个解决方案可以为每个星期五和我指定的每个日期引入3行。它没有做的是允许我指定为某些日期引入多少行,这些日期可能只需要1或2个日期,如果这有意义的话。以感恩节为例,我只需要将感恩节前一天在未来的日期中保留1天。添加1,而不是3。至少有条件地遵循这些原则:交叉连接选择case,当Dayofweeka.performance\u end\u date=6时,然后堆栈2,当a.performance\u end\u date在'2020-09-04 00:00:00.000'时,则堆栈1,3作为crs结束crs@Amateurhour35是否要参数化生成的行数?