R 合并正在添加的额外行_R_Merge

R 合并正在添加的额外行

r merge

R 合并正在添加的额外行,r,merge,R,Merge,我有两个非常大的dfs（x=379638行，y[routes]=4103141行）。我想将y合并到x，我使用两个dfs之间的每个公共变量来实现这一点。然而，尽管我使用了所有可以使用的变量，我的合并代码仍然额外添加了5000行（合并的df=384586行）。我的代码是默认的all.x=FALSE。。。。所以不确定这里发生了什么，因为我正在使用每个df中的每个变量来合并。有人知道我做错了什么吗下面是我的两个dfs中的两个小示例（只需复制并粘贴到控制台中即可查看）以及我的代码 df<-merg

我有两个非常大的dfs（x=379638行，y[routes]=4103141行）。我想将y合并到x，我使用两个dfs之间的每个公共变量来实现这一点。然而，尽管我使用了所有可以使用的变量，我的合并代码仍然额外添加了5000行（合并的df=384586行）。我的代码是默认的all.x=FALSE。。。。所以不确定这里发生了什么，因为我正在使用每个df中的每个变量来合并。有人知道我做错了什么吗

下面是我的两个dfs中的两个小示例（只需复制并粘贴到控制台中即可查看）以及我的代码

df<-merge(x,routes, by=c('hai_dispense_number', 'hai_age', 'sex', 'date_of_claim', 'quantity', 'hai_ddd', 'hai_strength', 'eligibility_end_date', 'hai_atc'))

dfIt很高兴您给出了可以直接粘贴到R中的数据集示例。但是，对于这些数据集，您的问题是不可再现的，因为数据集中的记录都不匹配。没有可重复的示例，我们只能猜测。我的猜测是，您在x
和y
中都有重复项（在合并键上）。这将生成这些记录的所有组合，例如，请参见merge（data.frame（a=c（1,1）），data.frame（a=c（1,1,1）））
@JanvanderLaan-nuts，对不起，关于示例，我应该已经发现了。关于你对x和y中DUP的解释，我认为你是对的。谢谢。：）尝试在plyr
包中使用join。它通常会给出更直观的结果。
x<- read.table(header=T, text=" hai_dispense_number sex hai_age eligibility_end_date quantity date_of_claim hai_atc hai_strength hai_ddd
13   PatientHAI0000092   F      42           2011-02-28        9    2010-06-16 N05BA01         2.00    10.0
14   PatientHAI0000092   F      42           2011-02-28        3    2010-06-16 N05CF02         5.00    10.0
41   PatientHAI0000110   F      31           2011-07-31       10    2010-09-09 N05BA12       250.00     1.0
72   PatientHAI0000360   F      58           2014-10-31       30    2010-04-21 N05CF02        10.00    10.0
82   PatientHAI0000360   F      58           2014-10-31       30    2010-07-19 N05CF02        10.00    10.0
111  PatientHAI0000522   M      38           2012-08-31       10    2010-07-06 N05CF01         7.50     7.5
134  PatientHAI0000731   F      28           2010-12-29        7    2010-06-15 N05CF01         7.50     7.5
137  PatientHAI0000731   F      28           2010-12-29       15    2010-08-18 N05BA12       500.00     1.0
139  PatientHAI0000731   F      29           2012-02-12       42    2010-09-10 N05BA12         0.25     1.0
159  PatientHAI0000798   F      41           2011-08-31       14    2010-06-30 N05CF01         7.50     7.5
") 

routes<- read.table(header=T, text="hai_dispense_number sex hai_age quantity date_of_claim hai_atc hai_roa hai_strength hai_ddd eligibility_end_date
1   PatientHAI0217603   F      75       14    2010-04-16 N05BA12     O           0.25    1.00           2016-04-30
2   PatientHAI1614296   F      74       30    2010-04-28 N05CD06     O           1.00    1.00           2015-11-30
3   PatientHAI0408690   F      91       28    2010-04-15 N05BA12     O           0.25    1.00           2013-06-30
4   PatientHAI0050917   M      67       56    2010-04-15 N05BE01     O          10.00   30.00           2020-12-31
5   PatientHAI0143945   M      64       30    2010-04-14 N05BA01     O           5.00   10.00           2010-07-31
8   PatientHAI2149890   M      72       84    2010-04-27 N05BA08     O           1.50   10.00           2011-06-30
10  PatientHAI1903034   F      80       45    2010-04-01 N05CD07     O          20.00   20.00           2020-12-31
11  PatientHAI0205229   F      80       56    2010-04-22 N05CD07     O          20.00   20.00           2020-12-31
13  PatientHAI0317751   F      71       30    2010-04-26 N05CD05     O           0.25    0.25           2016-11-30
14  PatientHAI1986979   M      22       15    2010-04-19 N05BA01     O          10.00   10.00           2012-11-30
") 

natural_dups<- read.table(header=T, text="  hai_dispense_number sex hai_age eligibility_end_date quantity date_of_claim hai_atc hai_strength hai_ddd
1597868  Patient HAI0002446   F      82           2011-08-31       42    2010-08-25 N05BA01            2      10
5495829  Patient HAI0002446   F      83           2011-08-31       30    2010-11-25 N05BA01            2      10
5580466  Patient HAI0002446   F      83           2011-08-31       30    2010-11-05 N05BA01            2      10
5686765  Patient HAI0002446   F      83           2011-08-31       30    2010-12-22 N05BA01            2      10
6146708  Patient HAI0002446   F      83           2011-08-31       30    2011-02-23 N05BA01            2      10
6351254  Patient HAI0002446   F      83           2013-05-31       28    2011-03-23 N05BA01            2      10
6686613  Patient HAI0002446   F      83           2013-05-31       28    2011-05-20 N05BA01            2      10
6686620  Patient HAI0002446   F      83           2013-05-31       28    2011-05-20 N05BA01            2      10
") 

merged_dups<- read.table(header=T, text=" hai_dispense_number hai_age sex date_of_claim quantity hai_ddd hai_strength eligibility_end_date hai_atc hai_roa
184  Patient HAI0002446      83   F    2011-05-20       28      10            2           2013-05-31 N05BA01     O  
185  Patient HAI0002446      83   F    2011-05-20       28      10            2           2013-05-31 N05BA01     O  
186  Patient HAI0002446      83   F    2011-05-20       28      10            2           2013-05-31 N05BA01     O  
")