如何在R中基于某些条件快速添加列？_R

如何在R中基于某些条件快速添加列？

如何在R中基于某些条件快速添加列？,r,R,我有一个包含一列的dataframe，我想根据第一列中的某些条件生成另一列。这是我的脚本，我已经写了到目前为止，它的工作，但它是非常缓慢，因为它有大约5万行 data <- read.table("~/Documents/git_repos/Aspen/Reference_genome/Potrs01-genome_mod_id.txt") > dim(data) # [1] 509744 1 > head(data) V1 1 Potrs00

我有一个包含一列的dataframe，我想根据第一列中的某些条件生成另一列。这是我的脚本，我已经写了到目前为止，它的工作，但它是非常缓慢，因为它有大约5万行

 data <- read.table("~/Documents/git_repos/Aspen/Reference_genome/Potrs01-genome_mod_id.txt")
> dim(data) # [1] 509744      1
> head(data)
           V1
1 Potrs000004
2 Potrs000004
3 Potrs000004
4 Potrs000004
5 Potrs000004
6 Potrs000004

test <- paste("Potrs00000", seq(000001,10000,by=1), sep ="")
length(test) # [1] 10000
> head(test)
[1] "Potrs000001" "Potrs000002" "Potrs000003" "Potrs000004" "Potrs000005"
[6] "Potrs000006"

test.m <- matrix("NA", nrow = 509744, ncol = 2 )
dim(test.m) # [1] 509744      2
> head(test.m)
     [,1] [,2]
[1,] "NA" "NA"
[2,] "NA" "NA"
[3,] "NA" "NA"
[4,] "NA" "NA"
[5,] "NA" "NA"
[6,] "NA" "NA"

 for (i in test) {
   for (j in data$V1) {
     if (i == j)
       test.m[,1] = j
       test.m[,2] = "chr9"
      }
    }
test.d <- as.data.frame(test.m)
> head(test.d)
           V1   V2
1 Potrs000004 chr9
2 Potrs000004 chr9
3 Potrs000004 chr9
4 Potrs000004 chr9
5 Potrs000004 chr9
6 Potrs000004 chr9

数据尺寸（数据）#[1]509744 1
>总目（数据）
V1
1 POTRS00004
2 POTRS00004
3 POTRS00004
4 POTRS00004
5 POTRS00004
6 POTRS00004
测试头（测试）
[1] “POTRS00001”“Potrs000002”“Potrs000003”“POTRS00004”“Potrs000005”
[6] “POTRS00006”
试验m水头（试验m）
[,1] [,2]
[1，]不适用“不适用”
[2，]不适用“不适用”
[3，]不适用“不适用”
[4，]不适用“不适用”
[5，]不适用“不适用”
[6，]不适用“不适用”
用于（测试中的i）{
对于（数据中的j$V1）{
如果（i==j）
测试m[，1]=j
test.m[，2]=“chr9”
}
}
试验d头（试验d）
V1 V2
1 POTRS00004 chr9
2 POTRS00004 chr9
3 POTRS00004 chr9
4 POTRS00004 chr9
5 POTRS00004 chr9
6 POTRS00004 chr9

是否有办法修改代码以加快速度？

您似乎希望从

数据中获得与测试中的元素匹配的V1
值
我将使用数据执行此操作。表：
library(data.table)
setDT(data)
data[,.(V1[V1 %in% test], "chr9")]

请注意，结果已经是一个数据.table
（它也是一个数据.frame
）
样本数据
set.seed（10239）
数据数据
V1
一维
2A
3 E
4 Potrs000006
5 Potrs000001
6 POTRS00007
7 POTRS00008
8 Potrs000003
9 B
10 Potrs000002
setDT（数据）
>数据[，（V1[V1%在%测试中]，“chr9”）]
V1 V2
1:POTRS00006 chr9
2:POTRS00001 chr9
3:POTRS00007 chr9
4:POTRS00008 chr9
5:POTRS00003 chr9
6:POTRS00002 chr9
请发布示例数据和所需输出PRO提示：您不需要在矩阵中指定“NA”
，这是默认值，因此您可以只编写矩阵（nrow=…，ncol=2）
@VeerendraGadekar，我已经添加了示例数据和所需输出output@upendra您可以尝试库（data.table）；setDT（data）[V1%in%test，V2:=“chr9”]
@VeerendraGadekar的代码运行良好。请注意，数据的第一行结果[V1%in%test，V2:=“chr9”]
是1:dna
这也运行良好，但我必须将其写入另一个对象<代码>数据2@upendra是，请参见上面的注释。如果数据中有NA
可以，那么另一种方法也可以。如果所有的data$V1
都在test
中的某个地方，那么您的问题就无关紧要了，您应该只编写数据$V2@MichaelChirico对我来说，使用：=
看起来是一种标准方法。而不匹配的值将是NAs，可在下一步中轻松移除step@VeerendraGadekar如果不复制它们（请参阅），就无法（不费很大的努力）删除它们，那么为什么不像我那样在第一步就删除它们呢？
set.seed(10239)
data<-data.frame(V1=sample(c(test[1:10],LETTERS[1:10]),10))
> data
            V1
1            D
2            A
3            E
4  Potrs000006
5  Potrs000001
6  Potrs000007
7  Potrs000008
8  Potrs000003
9            B
10 Potrs000002
setDT(data)
> data[,.(V1[V1 %in% test], "chr9")]
            V1   V2
1: Potrs000006 chr9
2: Potrs000001 chr9
3: Potrs000007 chr9
4: Potrs000008 chr9
5: Potrs000003 chr9
6: Potrs000002 chr9