R 将向量值列添加到数据帧-混乱摘要(df)
我有一个字符串向量,每个字符串都是id的csv列表。 我想将每个字符串拆分为一个列表,并将长度和id集存储为两个 数据帧中的新列。以下是一个例子:R 将向量值列添加到数据帧-混乱摘要(df),r,vector,dataframe,summary,R,Vector,Dataframe,Summary,我有一个字符串向量,每个字符串都是id的csv列表。 我想将每个字符串拆分为一个列表,并将长度和id集存储为两个 数据帧中的新列。以下是一个例子: df = data.frame(ids = c("a,b,c", "d", "e", "", "f,g", "", "h", "i", ""), stringsAsFactors=FALSE) ids = sapply(df$ids, function (s) unlist(strsplit(as.character(s), ","))) df$nu
df = data.frame(ids = c("a,b,c", "d", "e", "", "f,g", "", "h", "i", ""), stringsAsFactors=FALSE)
ids = sapply(df$ids, function (s) unlist(strsplit(as.character(s), ",")))
df$num.ids = sapply(ids, length)
df$ids.vec = sapply(ids, unlist)
到目前为止,这看起来不错:
> df
ids num.ids ids.vec
1 a,b,c 3 a, b, c
2 d 1 d
3 e 1 e
4 0
5 f,g 2 f, g
6 0
7 h 1 h
8 i 1 i
9 0
但当我键入summary(df)时,我会得到ids.vec的神秘列。更重要的是,,
summary不计算摘要,而是列出每一行(当我将其应用于实际数据集时,这是一个问题)
知道我做错了什么吗
谢谢!
凯文你没有做错什么。正如@joran所提到的,问题实际上是您希望从summary()中获得什么信息 您看到的是两个摘要的组合:
# df1 is df less ids.vec; df2 is only ids.vec
df1 <- df[,names(df) != "ids.vec"]
df2 <- df[,names(df) == "ids.vec"]
> summary(df1) # summary for a data frame
ids num.ids
Length:9 Min. :0
Class :character 1st Qu.:0
Mode :character Median :1
Mean :1
3rd Qu.:1
Max. :3
> summary(df2) # summary for a list
Length Class Mode
a,b,c 3 -none- character
d 1 -none- character
e 1 -none- character
0 -none- character
f,g 2 -none- character
0 -none- character
h 1 -none- character
i 1 -none- character
0 -none- character
还要注意,df2是一个列表
> str(df2)
List of 9
$ a,b,c: chr [1:3] "a" "b" "c"
$ d : chr "d"
$ e : chr "e"
$ : chr(0)
$ f,g : chr [1:2] "f" "g"
$ : chr(0)
$ h : chr "h"
$ i : chr "i"
$ : chr(0)
这是原始数据帧的一部分
> str(df)
'data.frame': 9 obs. of 3 variables:
$ ids : chr "a,b,c" "d" "e" "" ...
$ num.ids: int 3 1 1 0 2 0 1 1 0
$ ids.vec:List of 9
..$ a,b,c: chr "a" "b" "c"
..$ d : chr "d"
..$ e : chr "e"
..$ : chr
..$ f,g : chr "f" "g"
..$ : chr
..$ h : chr "h"
..$ i : chr "i"
..$ : chr
你到底在期待什么?您已经向数据帧中添加了一列,该列是列表,而不是原子向量。这会让你的想法看起来有点“怪异”。
> str(df2)
List of 9
$ a,b,c: chr [1:3] "a" "b" "c"
$ d : chr "d"
$ e : chr "e"
$ : chr(0)
$ f,g : chr [1:2] "f" "g"
$ : chr(0)
$ h : chr "h"
$ i : chr "i"
$ : chr(0)
> str(df)
'data.frame': 9 obs. of 3 variables:
$ ids : chr "a,b,c" "d" "e" "" ...
$ num.ids: int 3 1 1 0 2 0 1 1 0
$ ids.vec:List of 9
..$ a,b,c: chr "a" "b" "c"
..$ d : chr "d"
..$ e : chr "e"
..$ : chr
..$ f,g : chr "f" "g"
..$ : chr
..$ h : chr "h"
..$ i : chr "i"
..$ : chr