R 数据帧每行字符串的部分匹配_R

R 数据帧每行字符串的部分匹配

R 数据帧每行字符串的部分匹配,r,R,以虹膜数据为例来说明我的问题，我想对say“.5”进行部分匹配，并得到位置的索引（在我的实际数据中，0.5实际上是一个字符串“_”）我打算遍历每一行，执行部分匹配，获得第一个匹配的索引。我使用了以下方法： idx = regexpr(pattern, txt[i,], ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)[1] idx = regexec(pattern, txt[i,], ignore.cas

以虹膜数据为例来说明我的问题，我想对say“.5”进行部分匹配，并得到位置的索引（在我的实际数据中，0.5实际上是一个字符串“_”）

我打算遍历每一行，执行部分匹配，获得第一个匹配的索引。我使用了以下方法：

idx = regexpr(pattern, txt[i,], ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)[1]

idx = regexec(pattern, txt[i,], ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)[1]

gregexpr(pattern, txt[j,], ignore.case = FALSE, perl = FALSE,
          fixed = FALSE, useBytes = FALSE)

stri_locate_first_regex(txt[i,], pattern)

str_detect(txt[i,], pattern)

示例数据如下所示

library(ggplot2)
txt = iris
pattern=".5"

预期结果是第一次匹配的索引。

您是否尝试过将

哪个与grepl
结合使用
which(grepl("0.5", iris$Petal.Width))[1]

编辑
在您的评论之后，这里是另一个尝试，它提供了一个包含所有行索引的向量，其中包含一个部分匹配
library(tidyverse)

iris %>%
  mutate(row_index = as.numeric(rownames(.))) %>%
  filter_all(any_vars(grepl("0.5", .))) %>%
  pull()

但不确定这是最简单的方法。
以下内容是否产生了您想要的输出
grep(pattern, txt[i,])[1]

将匹配的所有值替换为TRUE

df <- iris
# [] notation preserves structure
df[] <- lapply(X = df, function(x) {
    grepl(pattern = ".5",
          x = as.character(x),
          fixed = TRUE)
})

结果
笔记
有很多方法可以解决这个问题。我喜欢这个解决方案，因为结果非常可读，但我认为在很大程度上这是一个品味问题。
您可以使用rapply
使用grepl
搜索每个单元格中的模式，并将单元格值替换为TRUE
或FALSE
。然后使用rowSums
按行累加所有TRUE
（1）和FALSE
（0）单元格，查看是否至少有一个匹配=1

rowSums(rapply(iris, function(x) grepl(pattern = ".5", x, fixed = T), how = "replace")) >= 1

在这里，我假设您希望
匹配一个句点，而不是任何字符（如果它没有转义（即\.
），或者fixed=TRUE
，则它将在正则表达式中匹配该字符）。另外，请注意，如果您正在搜索“\u突变”
在更大的数据集上，fixed=TRUE
会更快：
并且，使用它来子集：
idx <- rowSums(rapply(iris, function(x) grepl(pattern = ".5", x, fixed = T), how = "replace")) >= 1
head(iris[idx, ])

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
8           5.0         3.4          1.5         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa

idx=1
头部（虹膜[idx，]）
萼片。长萼片。宽花瓣。长花瓣。宽种
1 5.1 3.5 1.4 0.2刚毛
4.6 3.1 1.5 0.2刚毛
8.5.0 3.4 1.5 0.2刚毛
10 4.9 3.1 1.5 0.1刚毛
11 5.4 3.7 1.5 0.2刚毛
16 5.7 4.4 1.5 0.4刚毛
事实上，我正在寻找每个rowGreat答案中所有列的任何.5”部分匹配！如果OP想要一些他们可以使用的子集，另一个选项是使用lappy
语句w/pmax
（即，do.call（pmax，results\u from\u lappy））
rowSums(rapply(iris, function(x) grepl(pattern = ".5", x, fixed = T), how = "replace")) >= 1

idx <- rowSums(rapply(iris, function(x) grepl(pattern = ".5", x, fixed = T), how = "replace")) >= 1
head(iris[idx, ])

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
8           5.0         3.4          1.5         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
11          5.4         3.7          1.5         0.2  setosa
16          5.7         4.4          1.5         0.4  setosa