R 通过收集多个列来整理数据集?
我希望通过以下方式处理数据来整理数据集:R 通过收集多个列来整理数据集?,r,tidyr,R,Tidyr,我希望通过以下方式处理数据来整理数据集: age gender education previous_comp_exp tutorial_time qID.1 time_taken.1 qID.2 time_taken.2 18 Male Undergraduate casual gamer 62.17926 sor9 39.61206 sor8 19.4892 24 Male
age gender education previous_comp_exp tutorial_time qID.1 time_taken.1 qID.2 time_taken.2
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343 sor8 16.15616
为此:
age gender education previous_comp_exp tutorial_time qID time_taken
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206
18 Male Undergraduate casual gamer 62.17926 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343
24 Male Undergraduate casual gamer 85.01288 sor8 16.15616
我尝试过使用gather()
,但我只能让它在一列中工作,而且我一直收到以下警告:
警告消息:度量变量的属性不相同;
它们将被丢弃
有什么想法吗?使用
数据中的熔化。表(请参见?模式):
结果:
age gender education previous_comp_exp tutorial_time variable qID time_taken
1: 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206
2: 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343
3: 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920
4: 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616
# A tibble: 4 x 7
age gender education previous_comp_exp tutorial_time qID time_taken
<int> <fctr> <fctr> <fctr> <dbl> <chr> <dbl>
1 18 Male Undergraduate casual_gamer 62.17926 sor9 39.61206
2 18 Male Undergraduate casual_gamer 62.17926 sor8 19.48920
3 24 Male Undergraduate casual_gamer 85.01288 sor9 50.92343
4 24 Male Undergraduate casual_gamer 85.01288 sor8 16.15616
df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L,
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L,
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926,
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"),
time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L,
1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892,
16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp",
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))
或使用tidyr
:
library(dplyr)
library(tidyr)
df %>%
gather(variable, value, qID.1:time_taken.2) %>%
mutate(variable = sub("\\.\\d$", "", variable)) %>%
group_by(variable) %>%
mutate(ID = row_number()) %>%
spread(variable, value, convert = TRUE) %>%
select(-ID)
结果:
age gender education previous_comp_exp tutorial_time variable qID time_taken
1: 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206
2: 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343
3: 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920
4: 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616
# A tibble: 4 x 7
age gender education previous_comp_exp tutorial_time qID time_taken
<int> <fctr> <fctr> <fctr> <dbl> <chr> <dbl>
1 18 Male Undergraduate casual_gamer 62.17926 sor9 39.61206
2 18 Male Undergraduate casual_gamer 62.17926 sor8 19.48920
3 24 Male Undergraduate casual_gamer 85.01288 sor9 50.92343
4 24 Male Undergraduate casual_gamer 85.01288 sor8 16.15616
df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L,
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L,
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926,
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"),
time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L,
1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892,
16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp",
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))
在base R中,您可以使用功能强大的重塑
在一行语句中将数据从宽格式转换为长格式:
reshape(dx,direction="long",
varying=list(grep("qID",colnames(dx)),
grep("time_taken",colnames(dx))),
v.names=c("qID","time_taken"))
age gender education previous_comp_exp tutorial_time time qID time_taken id
1.1 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206 1
2.1 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343 2
1.2 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920 1
2.2 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616 2
这不是一个错误,这是一个警告,让您知道堆叠的两列具有不同的属性(可能它们都是因子,但具有不同的级别),因此这些属性会被删除到输出中。考虑到你有一对需要叠放的柱子,这可能有助于从宽到长的重塑。非常好地利用了熔体@用户,在Tidy方法中,年龄变量消失了。知道为什么吗?@stenfeio谢谢你能听到!我实际上读错了数据。由于我使用的是read.table
,casual
和gamer
被视为单独的列,第一列作为行名。它没有抛出错误,因为列数恰好匹配。看我的编辑我不知道图案的功能,线条很好@用户在看到您的解决方案之前,我有一个问题:您可以在spread
中使用convert=TRUE
,而不是稍后在mutate
中手动设置类型。我想您也错误地读取了数据。您可以在我的回答中使用新的dput
。@用户答对了。现在修好了。