Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/.htaccess/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 根据文本结构使用分隔符分析文本_R - Fatal编程技术网

R 根据文本结构使用分隔符分析文本

R 根据文本结构使用分隔符分析文本,r,R,我的数据帧: >datasetM Mean ENSORLG00000001933:tex11 2500.706 ENSORLG00000010797: 44225.330 ENSORLG00000003008:pabpc1a 11788.555 ENSORLG00000001973:sept6 3100.493 ENSORLG000000

我的数据帧:

>datasetM
                                 Mean
ENSORLG00000001933:tex11     2500.706       
ENSORLG00000010797:         44225.330       
ENSORLG00000003008:pabpc1a  11788.555       
ENSORLG00000001973:sept6     3100.493      
ENSORLG00000000997:          5418.796
所需产出:

>out
[1] "tex11" "ENSORLG00000010797" "pabpc1a" "sept6" "ENSORLG00000000997"
我尝试了这个,但我只检索到分隔符之前的零件:

titles <- rownames(datasetM)
vapply(strsplit(titles,":"), `[`, 1, FUN.VALUE=character(1))

titles这里是一个使用基本R的解决方案:

sapply(strsplit(rownames(df), ":"), function(x) x[length(x)])
# [1] "tex11"              "ENSORLG00000010797" "pabpc1a"            "sept6"             
# [5] "ENSORLG00000000997"
另一个带有
的解决方案可能更简单:

sub("^\\w+:(?=\\w)|:", "", rownames(df), perl = TRUE)
# [1] "tex11"              "ENSORLG00000010797" "pabpc1a"            "sept6"             
# [5] "ENSORLG00000000997"
数据:

df = read.table(text = "                                 Mean
ENSORLG00000001933:tex11     2500.706       
ENSORLG00000010797:         44225.330       
ENSORLG00000003008:pabpc1a  11788.555       
ENSORLG00000001973:sept6     3100.493      
ENSORLG00000000997:          5418.796", header = TRUE, row.names = 1)

下面是一种矢量化方法,使用正则表达式(取自)来标识每个行名的最后一个字符

 rownames(df)[!sub('.*(?=.$)', '', rownames(df), perl=TRUE) == ':'] <-
       sub('.*:', '', rownames(df)[!sub('.*(?=.$)', '', rownames(df), perl=TRUE) == ':'])
数据

dput(df)
structure(list(V2 = c(2500.706, 44225.33, 11788.555, 3100.493, 
5418.796)), .Names = "V2", row.names = c("tex11", "ENSORLG00000010797:", 
"pabpc1a", "sept6", "ENSORLG00000000997:"), class = "data.frame")

注意您可以通过以下方法从行名中删除冒号

rownames(df) <- sub(':', '', rownames(df))

rownames(df)所以当后面什么都没有时:那么你需要传感器。。。是吗?是的,没错,希望很清楚这些都是行名是吗?…不是一列是的,很抱歉没有告诉我我会更新我尝试了另一种
sapply(strsplit(rownames(df),split=“:”),function(x){ifelse(length(x)=2,x[2],x[1])
@amrrs当然也可以,但你不认为我的更简单:)的确,这就是为什么没有添加一个单独的答案:p
rownames(df) <- sub(':', '', rownames(df))