R 如何根据数据帧中的位置将字符串替换为数字?
我有一个字符串向量,格式如下:R 如何根据数据帧中的位置将字符串替换为数字?,r,R,我有一个字符串向量,格式如下: strings <- c("UUDBK", "KUVEB", "YVCYE") stringslibrary(tidyr) 字符串与@AntoniosK的想法类似,它使用hashmap将字符串映射到它们的值hashmap在内部通过Rcpp实现,因此速度非常快: library(hashmap) library(tidyr) search_replace = separate_rows(dataframe, searchhere) search_hash
strings <- c("UUDBK", "KUVEB", "YVCYE")
stringslibrary(tidyr)
字符串与@AntoniosK的想法类似,它使用hashmap
将字符串映射到它们的值hashmap
在内部通过Rcpp
实现,因此速度非常快:
library(hashmap)
library(tidyr)
search_replace = separate_rows(dataframe, searchhere)
search_hash = hashmap(search_replace[,2], search_replace[,1])
search_hash[[strings]]
结果:
> search_hash
## (character) => (numeric)
## [KHUDN] => [+2.000000]
## [KUEBN] => [+2.000000]
## [UGEVB] => [+4.000000]
## [KUVEB] => [+4.000000]
## [IYVEK] => [+8.000000]
## [IHVYV] => [+8.000000]
## [...] => [...]
> search_hash[[strings]]
[1] 8 4 8
> OP_func = function(){sapply(as.character(strings), function(x)
as.numeric(dataframe[grep(x,dataframe$searchhere), 1]))}
Unit: microseconds
expr min lq mean median uq max neval
OP_func() 121.191 124.9410 190.36472 129.8760 151.193 3370.047 100
d[d$searchhere %in% strings, ] 36.714 40.6605 52.85093 43.8185 61.583 147.246 100
search_hash[[strings]] 14.212 18.1590 25.05212 21.5150 29.608 58.820 100
> strings_large = sample(search_replace$searchhere, 100, replace = TRUE)
> strings_large
[1] "YVCYE" "KUVEB" "KUYVE" "KHUDN" "KUYVE" "KHUDN" "KUEBN" "UUDBK" "KHUDN" "YVCYE" "IYVEK"
[12] "KUEBN" "KHUDN" "IHBEJ" "YVCYE" "KHUDN" "KUEBN" "UGEVB" "UUDBK" "KUYVE" "KHUDN" "IHBEJ"
[23] "IHVYV" "KUVEB" "IYVEK" "KHUDN" "KHUDN" "KUYVE" "YVCYE" "UUDBK" "KUYVE" "IHVYV" "KUYVE"
[34] "KUEBN" "KUYVE" "UUDBK" "KUYVE" "KUVEB" "KUVEB" "YVCYE" "KUYVE" "KHUDN" "KUVEB" "YVCYE"
[45] "IHBEJ" "YVCYE" "KHUDN" "UUDBK" "KUEBN" "IYVEK" "IHVYV" "UUDBK" "KUYVE" "KUEBN" "YVCYE"
[56] "UGEVB" "YVCYE" "KUYVE" "IHVYV" "KUEBN" "IHVYV" "IHBEJ" "KUVEB" "IHVYV" "KUYVE" "KUEBN"
[67] "IYVEK" "KUVEB" "KUEBN" "UGEVB" "KUEBN" "KUVEB" "IHBEJ" "KUYVE" "YVCYE" "YVCYE" "IHVYV"
[78] "YVCYE" "KHUDN" "KHUDN" "YVCYE" "IYVEK" "KUYVE" "KHUDN" "UGEVB" "YVCYE" "IHVYV" "KUVEB"
[89] "IYVEK" "KUEBN" "UGEVB" "UUDBK" "IYVEK" "IHBEJ" "IHBEJ" "UUDBK" "KUVEB" "UGEVB" "IYVEK"
[100] "IYVEK"
> search_hash[[strings_large]]
[1] 8 4 8 2 8 2 2 8 2 8 8 2 2 2 8 2 2 4 8 8 2 2 8 4 8 2 2 8 8 8 8 8 8 2 8 8 8 4 4 8 8 2 4 8
[45] 2 8 2 8 2 8 8 8 8 2 8 4 8 8 8 2 8 2 4 8 8 2 8 4 2 4 2 4 2 8 8 8 8 8 2 2 8 8 8 2 4 8 8 4
[89] 8 2 4 8 8 2 2 8 4 4 8 8
基准:
> search_hash
## (character) => (numeric)
## [KHUDN] => [+2.000000]
## [KUEBN] => [+2.000000]
## [UGEVB] => [+4.000000]
## [KUVEB] => [+4.000000]
## [IYVEK] => [+8.000000]
## [IHVYV] => [+8.000000]
## [...] => [...]
> search_hash[[strings]]
[1] 8 4 8
> OP_func = function(){sapply(as.character(strings), function(x)
as.numeric(dataframe[grep(x,dataframe$searchhere), 1]))}
Unit: microseconds
expr min lq mean median uq max neval
OP_func() 121.191 124.9410 190.36472 129.8760 151.193 3370.047 100
d[d$searchhere %in% strings, ] 36.714 40.6605 52.85093 43.8185 61.583 147.246 100
search_hash[[strings]] 14.212 18.1590 25.05212 21.5150 29.608 58.820 100
> strings_large = sample(search_replace$searchhere, 100, replace = TRUE)
> strings_large
[1] "YVCYE" "KUVEB" "KUYVE" "KHUDN" "KUYVE" "KHUDN" "KUEBN" "UUDBK" "KHUDN" "YVCYE" "IYVEK"
[12] "KUEBN" "KHUDN" "IHBEJ" "YVCYE" "KHUDN" "KUEBN" "UGEVB" "UUDBK" "KUYVE" "KHUDN" "IHBEJ"
[23] "IHVYV" "KUVEB" "IYVEK" "KHUDN" "KHUDN" "KUYVE" "YVCYE" "UUDBK" "KUYVE" "IHVYV" "KUYVE"
[34] "KUEBN" "KUYVE" "UUDBK" "KUYVE" "KUVEB" "KUVEB" "YVCYE" "KUYVE" "KHUDN" "KUVEB" "YVCYE"
[45] "IHBEJ" "YVCYE" "KHUDN" "UUDBK" "KUEBN" "IYVEK" "IHVYV" "UUDBK" "KUYVE" "KUEBN" "YVCYE"
[56] "UGEVB" "YVCYE" "KUYVE" "IHVYV" "KUEBN" "IHVYV" "IHBEJ" "KUVEB" "IHVYV" "KUYVE" "KUEBN"
[67] "IYVEK" "KUVEB" "KUEBN" "UGEVB" "KUEBN" "KUVEB" "IHBEJ" "KUYVE" "YVCYE" "YVCYE" "IHVYV"
[78] "YVCYE" "KHUDN" "KHUDN" "YVCYE" "IYVEK" "KUYVE" "KHUDN" "UGEVB" "YVCYE" "IHVYV" "KUVEB"
[89] "IYVEK" "KUEBN" "UGEVB" "UUDBK" "IYVEK" "IHBEJ" "IHBEJ" "UUDBK" "KUVEB" "UGEVB" "IYVEK"
[100] "IYVEK"
> search_hash[[strings_large]]
[1] 8 4 8 2 8 2 2 8 2 8 8 2 2 2 8 2 2 4 8 8 2 2 8 4 8 2 2 8 8 8 8 8 8 2 8 8 8 4 4 8 8 2 4 8
[45] 2 8 2 8 2 8 8 8 8 2 8 4 8 8 8 2 8 2 4 8 8 2 8 4 2 4 2 4 2 8 8 8 8 8 2 2 8 8 8 2 4 8 8 4
[89] 8 2 4 8 8 2 2 8 4 4 8 8
还要注意的是,如果字符串
中存在重复项,@AntoniosK的解决方案不起作用,而hashmap
将在正确位置返回每个元素的正确映射
示例:
> search_hash
## (character) => (numeric)
## [KHUDN] => [+2.000000]
## [KUEBN] => [+2.000000]
## [UGEVB] => [+4.000000]
## [KUVEB] => [+4.000000]
## [IYVEK] => [+8.000000]
## [IHVYV] => [+8.000000]
## [...] => [...]
> search_hash[[strings]]
[1] 8 4 8
> OP_func = function(){sapply(as.character(strings), function(x)
as.numeric(dataframe[grep(x,dataframe$searchhere), 1]))}
Unit: microseconds
expr min lq mean median uq max neval
OP_func() 121.191 124.9410 190.36472 129.8760 151.193 3370.047 100
d[d$searchhere %in% strings, ] 36.714 40.6605 52.85093 43.8185 61.583 147.246 100
search_hash[[strings]] 14.212 18.1590 25.05212 21.5150 29.608 58.820 100
> strings_large = sample(search_replace$searchhere, 100, replace = TRUE)
> strings_large
[1] "YVCYE" "KUVEB" "KUYVE" "KHUDN" "KUYVE" "KHUDN" "KUEBN" "UUDBK" "KHUDN" "YVCYE" "IYVEK"
[12] "KUEBN" "KHUDN" "IHBEJ" "YVCYE" "KHUDN" "KUEBN" "UGEVB" "UUDBK" "KUYVE" "KHUDN" "IHBEJ"
[23] "IHVYV" "KUVEB" "IYVEK" "KHUDN" "KHUDN" "KUYVE" "YVCYE" "UUDBK" "KUYVE" "IHVYV" "KUYVE"
[34] "KUEBN" "KUYVE" "UUDBK" "KUYVE" "KUVEB" "KUVEB" "YVCYE" "KUYVE" "KHUDN" "KUVEB" "YVCYE"
[45] "IHBEJ" "YVCYE" "KHUDN" "UUDBK" "KUEBN" "IYVEK" "IHVYV" "UUDBK" "KUYVE" "KUEBN" "YVCYE"
[56] "UGEVB" "YVCYE" "KUYVE" "IHVYV" "KUEBN" "IHVYV" "IHBEJ" "KUVEB" "IHVYV" "KUYVE" "KUEBN"
[67] "IYVEK" "KUVEB" "KUEBN" "UGEVB" "KUEBN" "KUVEB" "IHBEJ" "KUYVE" "YVCYE" "YVCYE" "IHVYV"
[78] "YVCYE" "KHUDN" "KHUDN" "YVCYE" "IYVEK" "KUYVE" "KHUDN" "UGEVB" "YVCYE" "IHVYV" "KUVEB"
[89] "IYVEK" "KUEBN" "UGEVB" "UUDBK" "IYVEK" "IHBEJ" "IHBEJ" "UUDBK" "KUVEB" "UGEVB" "IYVEK"
[100] "IYVEK"
> search_hash[[strings_large]]
[1] 8 4 8 2 8 2 2 8 2 8 8 2 2 2 8 2 2 4 8 8 2 2 8 4 8 2 2 8 8 8 8 8 8 2 8 8 8 4 4 8 8 2 4 8
[45] 2 8 2 8 2 8 8 8 8 2 8 4 8 8 8 2 8 2 4 8 8 2 8 4 2 4 2 4 2 8 8 8 8 8 2 2 8 8 8 2 4 8 8 4
[89] 8 2 4 8 8 2 2 8 4 4 8 8
final
仅显示UUDBK KUVEB YVCYE 8。我遗漏了什么吗?@AntoniosK你是在说我原来问题中的变量final吗?这就是所需的输出,一个向量,其中包含替换的值。这是对@RichScriven的回复,因为他之前提到过一些东西。我的代码对你有用吗?不起作用,因为我的字符串中有重复项,就像上面提到的@useR一样。谢谢你的帮助!不客气。很乐意帮忙。您的问题/示例应能代表您的真实数据。你没有提到任何关于复制品的事。但是,有一种方法可以修改查找表,使其具有唯一的字符串值,具体取决于您希望如何处理重复项。