Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/81.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 对载体中的重复序列进行排序和评估_R - Fatal编程技术网

R 对载体中的重复序列进行排序和评估

R 对载体中的重复序列进行排序和评估,r,R,我试图创建一个变量,用于标识向量中的字符串是第一次出现,是在前三位,还是多于三位。例如: 在下面的数据集中,我有name(将会有更多的名称)、text和一个dup变量。我希望dup变量能够识别文本是否是第一次出现(原点),是否在前三次出现(前三次)内,或者是否出现超过三次(超过三次)。我也需要为每个人做这件事。。。但我想我能理解这一部分。提前感谢您的帮助 name =c("T","T","T","T","T","T","T","T","T","T") text =c("a","b","a","a

我试图创建一个变量,用于标识向量中的字符串是第一次出现,是在前三位,还是多于三位。例如:

在下面的数据集中,我有name(将会有更多的名称)、text和一个dup变量。我希望dup变量能够识别文本是否是第一次出现(原点),是否在前三次出现(前三次)内,或者是否出现超过三次(超过三次)。我也需要为每个人做这件事。。。但我想我能理解这一部分。提前感谢您的帮助

name =c("T","T","T","T","T","T","T","T","T","T")
text =c("a","b","a","a","b","c","a","a","b","a")
dup =c("origin","origin","FirstThree","FirstThree","FirstThree","origin","MoreThanThree","MoreThanThree","FirstThree","MoreThanThree")
dfA = data.frame(name,text,dup)

 name text           dup
1     T    a        origin
2     T    b        origin
3     T    a    FirstThree
4     T    a    FirstThree
5     T    b    FirstThree
6     T    c        origin
7     T    a MoreThenThree
8     T    a MoreThenThree
9     T    b    FirstThree
10    T    a MoreThenThree

您可以将
data.table::rowid
与两个
ifelse
检查一起使用

dfA[, ict := {
        r <- rowid(text)
        ifelse(r == 1, 'origin', 
        ifelse(r <= 3, 'FirstThree', 
               'MoreThanThree'))}
    , by = name]

dfA
#     name text           dup           ict
#  1:    T    a        origin        origin
#  2:    T    b        origin        origin
#  3:    T    a    FirstThree    FirstThree
#  4:    T    a    FirstThree    FirstThree
#  5:    T    b    FirstThree    FirstThree
#  6:    T    c        origin        origin
#  7:    T    a MoreThanThree MoreThanThree
#  8:    T    a MoreThanThree MoreThanThree
#  9:    T    b    FirstThree    FirstThree
# 10:    T    a MoreThanThree MoreThanThree

dplyr
中,我们可以在
case>语句中比较
行数()

library(dplyr)

dfA %>%
  group_by(text) %>%
  mutate(row = row_number(), 
         dup = case_when(row == 1 ~ "origin", 
                         row <= 3 ~ "FirstThree", 
                         TRUE ~ "MoreThenThree"))

#   name  text    row dup          
#   <fct> <fct> <int> <chr>        
# 1 T     a         1 origin       
# 2 T     b         1 origin       
# 3 T     a         2 FirstThree   
# 4 T     a         3 FirstThree   
# 5 T     b         2 FirstThree   
# 6 T     c         1 origin       
# 7 T     a         4 MoreThenThree
# 8 T     a         5 MoreThenThree
# 9 T     b         3 FirstThree   
#10 T     a         6 MoreThenThree
库(dplyr)
dfA%>%
分组依据(文本)%>%
变异(行=行编号(),
当(行==1~“原点”,

row Nice!我不知道rowid,仍然会使用
seq_len(.N)
by=(name,text)
来实现这个目的,或者使用
base R
和(dfA,cut(ave(seq_-along(text),text,name,FUN=seq_-along),c(0,1,3,Inf),labels=c('origin','FirstThree','moretree'))
不知道ave的用法。我用数据得到了相同的解决方案。表:
dt[,dup\u cut:=cut(x=1.N,breaks=c(0,1,3,Inf),include.lost=t,labels=c(“origin”,“FirstThree”,“MoreThanThree”)),by=(name,text)
library(dplyr)

dfA %>%
  group_by(text) %>%
  mutate(row = row_number(), 
         dup = case_when(row == 1 ~ "origin", 
                         row <= 3 ~ "FirstThree", 
                         TRUE ~ "MoreThenThree"))

#   name  text    row dup          
#   <fct> <fct> <int> <chr>        
# 1 T     a         1 origin       
# 2 T     b         1 origin       
# 3 T     a         2 FirstThree   
# 4 T     a         3 FirstThree   
# 5 T     b         2 FirstThree   
# 6 T     c         1 origin       
# 7 T     a         4 MoreThenThree
# 8 T     a         5 MoreThenThree
# 9 T     b         3 FirstThree   
#10 T     a         6 MoreThenThree