Sql 如何加入最新记录?

Sql 如何加入最新记录?,sql,performance,sas,Sql,Performance,Sas,我有两张桌子。表A包含2004年至2012年公司债券交易的每日信息,表B包含特定日期的债券评级信息。我需要连接这两个表,这样对于表A中的每一笔交易,都会附加该特定债券的最新评级 Table A: daily_transactions -------------------------------------------- DATE |BOND |PRICE -------------------------------------------- 20110401 |AE

我有两张桌子。表A包含2004年至2012年公司债券交易的每日信息,表B包含特定日期的债券评级信息。我需要连接这两个表,这样对于表A中的每一笔交易,都会附加该特定债券的最新评级

Table A: daily_transactions
--------------------------------------------
DATE        |BOND    |PRICE
--------------------------------------------
20110401    |AES     |100
20110402    |AES     |101
20110403    |AES     |102
20110404    |AES     |103
20110401    |BPP     |99
20110402    |BPP     |98


Table B: bond_ratings
--------------------------------------------
DATE        |BOND    |RATING
--------------------------------------------
20110401    |AES     |AAA
20110403    |AES     |BB
20110401    |BPP     |CCC


Table C: joined_data
--------------------------------------------
DATE        |BOND    |PRICE   |RATING
--------------------------------------------
20110401    |AES     |100     |AAA
20110402    |AES     |101     |AAA
20110403    |AES     |102     |BB
20110404    |AES     |103     |BB
20110401    |BPP     |99      |CCC
20110402    |BPP     |98      |CCC
我在表A中有大约1000000条记录,在表B中有14000条记录

更新:

到目前为止,我得到的是:

create table test_merge as
SELECT a.date, b.date, a.bond, a.price, b.rating
FROM   daily_transactions  a
LEFT   JOIN bond_ratings b ON a.bond = b.bond AND b.date <= a.date
WHERE  NOT EXISTS (
   SELECT 1 FROM bond_ratings b1
   WHERE  b1.bond = a.bond
   AND b1.date <= a.date
   AND b1.date >  b.date
   );
它似乎工作得很好,但是由于我拥有的数据量,它的运行速度非常慢。大约需要2个小时。有什么方法可以优化它以更快地运行吗


我是sql新手,非常感谢您的帮助

我怀疑在您的例子中,子查询正在破坏性能

以下方法避免了子查询,从而使连接过程更加高效

/*sample data:*/
DATA daily_transactions;
input date bond $ price;
informat date yymmdd8.;
format date yymmddn8.;
infile datalines dsd delimiter = '|';
datalines;
20110401|AES|100
20110402|AES|101
20110403|AES|102
20110404|AES|103
20110401|BPP|99
20110402|BPP|98
;
run;

DATA bond_ratings;
input date bond $ rating $;
informat date yymmdd8.;
format date yymmddn8.;
infile datalines dsd delimiter = '|';
datalines;
20110401|AES |AAA
20110403|AES |BB
20110401|BPP |CCC
;
run;

/*Modify the bond_ratings dataset such that for each record we can specify up till when that rating is valid*/
/*essentially we will have two date fields (from_date, to_date)
from_date   bond    rating  to_date
20110401       AES      AAA     20110402
20110403       AES      BB           .
20110401       BPP      CCC          .
*/

/*since there is no LEAD function in SAS, we sort in decending order by date and apply the LAG function - in effect getting the leading value*/
PROC SORT DATA = bond_ratings OUT = bond_ratings_sorted;
by bond descending date;
run; 
/*capture the to_date by using lag function on the date.*/
data bond_ratings_lookup(rename = (date=from_date));
set bond_ratings_sorted;
by bond descending date;
format to_date yymmddn8.;
lag_date = lag(date);/*note: the reason we keep lag function outside the if-else group below because of the way lag-function works-just look it on google*/
if first.bond and first.date then to_date =.;
else to_date=lag_date-1;/*-1, so that to_date is set to 1 day less the next available bond rating date*/
drop lag_date;
run;
/*this sort is not necessary, but if you want to just verify the output then it is usefull*/
proc sort data = bond_ratings_lookup out = bond_ratings_lookup_sorted;
by bond from_date;
run;

/*final query:*/
proc sql;
create table joined as 
select a.*, b.rating, b.from_date as bond_rating_start_period, b.to_date as bond_rating_end_period
from daily_transactions as a 
left join bond_ratings_lookup_sorted as b
on a.bond = b.bond and
(
b.to_date  ne . and (a.date >=b.from_date and a.date<= b.to_date )
or
b.to_date  = . and (a.date >=b.from_date )
)
order by a.bond, a.date, b.from_date
;
quit;

我通过在bond列上建立索引,将运行时间缩短到了5分钟


对于更基于SAS而不是SQL的方法,您可以对表B使用SAS格式,并且可能会加快速度。A只是一个查找表,将开始和结束之间的任何内容映射到标签。例如,将此表作为格式加载:

fmtname   |  START       | END         | LABEL
-----------------------------------------------------------
$bondRate |  AES20110401 | AES20110403 | AAA
将开始和结束之间的任何文本字符串映射到标签。所以AES20110302->AAA

下面是完整的代码,使用上面的表B,假设日期是一个数字字段,如果不使用inputDATE,则使用YYDDMMN8。要将其转换为数字,请执行以下操作:

PROC SORT DATA = TABLE_B;
    by bond descending date;
run;

/*Use lag function to get the start and end date on one line*/
data bond_ratings_fmt;
    set TABLE_B;
    by bond descending date;

    START_DT = put(date,$8);*Character date like '20110401';
    END_DT = put(lag(date)-1,$8);* 1 less than the prior records end;
    *first.bond is the most recent rating for each bond;
    *setting the END_DT to some future date in this case.;
    if first.bond then END_DT= '20991231';

    START = cats(BOND,START_DT);*Cats concatenates and trims spaces, makes AES20110401;
    END = cats(BOND,END_DT);
    LABEL = Rating;
    fmtName='$bondRate';    
run;
*Load the format, using CNTLIN (Control Table In);
proc format cntlin=bond_ratings_fmt;

*Apply the format;
data TableC_withRating (drop=_:);
    set TableA;
    _DateChar = put(DATE,$8.);
    Rating = put(BOND||_DateChar,$bondRate.);
run;

您可以通过在格式中添加另一个案例来获得更高的兴趣-网上有许多使用cntlin和proc格式的好例子。

关于什么查询的建议?如果知道您尝试了什么,那就太好了。@LearningNeverStops,请参阅更新的问题。谢谢。@Danielfries数据库是什么?@hashbrown。如前所述,我不熟悉数据库和sql,但查询是用SASGood的东西执行的,我认为这对子查询没有帮助。在债券和日期上创建一个综合指数可能会更好。谢谢@sashikanthdardy。在我找到这个解决方案后,我看到了你的答案。谢谢你的意见和时间。有没有办法自动定义开始和结束?因此,开始设置为评级日期,结束设置为下一个评级的评级日期?是-编辑以添加更多代码。第一步与Sashikanth Dareddy的代码非常相似。我绝对推荐另一条记录hlo='o'。每种格式都应该有一条其他记录,除非您明确希望other传递传入值。
PROC SORT DATA = TABLE_B;
    by bond descending date;
run;

/*Use lag function to get the start and end date on one line*/
data bond_ratings_fmt;
    set TABLE_B;
    by bond descending date;

    START_DT = put(date,$8);*Character date like '20110401';
    END_DT = put(lag(date)-1,$8);* 1 less than the prior records end;
    *first.bond is the most recent rating for each bond;
    *setting the END_DT to some future date in this case.;
    if first.bond then END_DT= '20991231';

    START = cats(BOND,START_DT);*Cats concatenates and trims spaces, makes AES20110401;
    END = cats(BOND,END_DT);
    LABEL = Rating;
    fmtName='$bondRate';    
run;
*Load the format, using CNTLIN (Control Table In);
proc format cntlin=bond_ratings_fmt;

*Apply the format;
data TableC_withRating (drop=_:);
    set TableA;
    _DateChar = put(DATE,$8.);
    Rating = put(BOND||_DateChar,$bondRate.);
run;