R中的有效规划
我有一个类似于R中的有效规划,r,loops,stringdist,R,Loops,Stringdist,我有一个类似于 author_id paper_id confirmed author_name1 author_affiliation1 author_name 826 25733 1 Emanuele Buratti Genetic engineering Emanuele Buratti 826 25733 1 Emanuele Buratti International c
author_id paper_id confirmed author_name1 author_affiliation1 author_name
826 25733 1 Emanuele Buratti Genetic engineering Emanuele Buratti
826 25733 1 Emanuele Buratti International center Emanuele Buratti
826 47276 1 Emanuele Buratti Emanuele Buratti
826 77012 1 Emanuele Buratti Emanuele Buratti
826 77012 1 Emanuele Buratti Emanuele Buratti
826 79468 1 Emanuele Buratti Emanuele Buratti
author_affiliation
Genetic enginereing
The International Centre for Genetic Engineering and Biotechnology, Padriciano 66,
Trieste, Italy
International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34149
Trieste, Italy
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
现在,我必须检查每一行作者姓名和作者姓名1(姓名区)之间的strindist,以及作者所属单位和作者所属单位1(aff-sit)之间的strindist
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
我正在使用
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
name\u dist试试看
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
res您可以直接将其矢量化
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
i=1:nrow(mer1)
name_dist<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
i=1:nrow(mer1)
name_dist您可以使用sapply
(或其他一些矢量化方法),如下所示:
name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
}
a = letters[1:5] # your mer1$author_name1
b = LETTERS[1:5] # your mer1$author_name
name_dist = sapply(a, stringdist, b, method="lv")
关于R的一般评论:许多循环执行缓慢的原因是向量是动态增长的。通过在vector()中添加length
参数来预分配空间,可以获得很多效率
。这不是完全正确的代码。您应该完全松开索引。@MarkvanderLoo很好。在这种情况下,我们处理整个数据集时,最好不要使用i
和[i]
s。