R 如何计算“数量”&引用；每列_R

R 如何计算“数量”&引用；每列

R 如何计算“数量”&引用；每列,r,R,我有一个奇怪的问题。如果我有一些句子，我想计算每个句子中有多少个“，”，新变量number等于number of，+1。我该怎么做？看起来像这样的东西：可以使用以下代码生成示例数据： df<-structure(list(Outcome = c("Happy, New", "Year, to, you", "this", "is, a , very", "strange, question&qu

我有一个奇怪的问题。如果我有一些句子，我想计算每个句子中有多少个“，”，新变量

number

等于

number of，+1

。我该怎么做？看起来像这样的东西：

可以使用以下代码生成示例数据：

df<-structure(list(Outcome = c("Happy, New", "Year, to, you", "this", 
"is, a , very", "strange, question")), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

df使用stru count
计算单词数更容易
library(stringr)
library(dplyr)
df %>% 
    mutate(Number = str_count(Outcome, "\\w+"))

-输出
# A tibble: 5 x 2
#  Outcome           Number
#  <chr>              <int>
#1 Happy, New             2
#2 Year, to, you          3
#3 this                   1
#4 is, a , very           3
#5 strange, question      2


或者在base R
中使用strsplit
和长度
df$Number <- lengths(strsplit(df$Outcome, ",\\s*"))

df$Number#删除除逗号以外的所有字符并计数
nchar（gsub（'[^，]'，''，df$结果））+1
#[1] 2 3 1 3 2
df$Number另一个基本R选项是使用长度+gregexpr
，例如
transform(
  df,
  Number = lengths(gregexpr("\\w+", Outcome))
)

给
            Outcome Number
1        Happy, New      2
2     Year, to, you      3
3              this      1
4      is, a , very      3
5 strange, question      2

base R中的count.fields
函数用于read.table
等函数中，以确定生成的data.frame
所需的列数。您也可以在这里使用它，尽管count.fields
设计用于文件或连接
count.fields(textConnection(df$Outcome), ",")
# [1] 2 3 1 3 2

鉴于该函数是一个经常使用的函数，它的执行效率相当高。但是，如果您正在处理一个非常大的字符串，您可能需要使用“stringi”包中的stri\u count\u fixed

以下是一些测试：
fun_cf <- function(x = df$Outcome) count.fields(textConnection(x), ",")
fun_gs <- function(x = df$Outcome) nchar(gsub('[^,]', '', x)) + 1
fun_sc <- function(x = df$Outcome) stringr::str_count(x, ",") + 1
fun_ss <- function(x = df$Outcome) lengths(strsplit(x, ",", TRUE))
fun_scf <- function(x = df$Outcome) stringi::stri_count_fixed(x, ",") + 1

string <- rep(c(df$Outcome, paste(df$Outcome, df$Outcome, sep = ",")), 1e5)
length(string)
# [1] 1000000

bench::mark(fun_cf(string), fun_gs(string), fun_sc(string),
            fun_ss(string), fun_scf(string))
## # A tibble: 5 x 13
##   expression           min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
##   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
## 1 fun_cf(string)  792.64ms 792.64ms     1.26     11.6MB     0        1     0
## 2 fun_gs(string)     5.28s    5.28s     0.189    19.1MB     0        1     0
## 3 fun_sc(string)  840.17ms 840.17ms     1.19     11.4MB     1.19     1     1
## 4 fun_ss(string)  830.35ms 830.35ms     1.20     11.4MB     0        1     0
## 5 fun_scf(string) 154.86ms 155.44ms     6.24     11.4MB     1.56     4     1
## # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
## #   time <list>, gc <list>

fun\cf如果，
之间的单词不是一个单词，可能是两个或三个单词，看起来像“这是一个奇怪的问题”？@Stataq你可以计算，
即df%>%变异（Number=stru count（output），”）+1
或者如果还有空格df%>%变异（Number=stru count（output），”[，]“”+1）
count.fields(textConnection(df$Outcome), ",")
# [1] 2 3 1 3 2

fun_cf <- function(x = df$Outcome) count.fields(textConnection(x), ",")
fun_gs <- function(x = df$Outcome) nchar(gsub('[^,]', '', x)) + 1
fun_sc <- function(x = df$Outcome) stringr::str_count(x, ",") + 1
fun_ss <- function(x = df$Outcome) lengths(strsplit(x, ",", TRUE))
fun_scf <- function(x = df$Outcome) stringi::stri_count_fixed(x, ",") + 1

string <- rep(c(df$Outcome, paste(df$Outcome, df$Outcome, sep = ",")), 1e5)
length(string)
# [1] 1000000

bench::mark(fun_cf(string), fun_gs(string), fun_sc(string),
            fun_ss(string), fun_scf(string))
## # A tibble: 5 x 13
##   expression           min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
##   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
## 1 fun_cf(string)  792.64ms 792.64ms     1.26     11.6MB     0        1     0
## 2 fun_gs(string)     5.28s    5.28s     0.189    19.1MB     0        1     0
## 3 fun_sc(string)  840.17ms 840.17ms     1.19     11.4MB     1.19     1     1
## 4 fun_ss(string)  830.35ms 830.35ms     1.20     11.4MB     0        1     0
## 5 fun_scf(string) 154.86ms 155.44ms     6.24     11.4MB     1.56     4     1
## # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
## #   time <list>, gc <list>