R如何从长格式转换为宽格式
我需要一个数据框R如何从长格式转换为宽格式,r,dplyr,reshape,reshape2,R,Dplyr,Reshape,Reshape2,我需要一个数据框df_wide,包含以下列: userID SAT GRE task_conf task_chall active_conf active_chall sleep_conf sleep_chall morn_conf morn_chall 30798 A 1400 2 3 5 2 6 1 4 2 30895
df_wide
,包含以下列:
userID SAT GRE task_conf task_chall active_conf active_chall sleep_conf sleep_chall morn_conf morn_chall
30798 A 1400 2 3 5 2 6 1 4 2
30895 A 1200 6 2 5 3 5 2 5 3
32678 B 1000 5 3 6 3 6 2 5 2
34679 A 1300 4 3 4 2 6 1 6 3
35999 A 1400 2 2 2 2 2 2 2 2
有关功能的一些信息:
The variables '_conf' and '_chall' contain integer values between 1 and 6
'userID's can be factors or integers but they are not continuous numbers
SAT represents the grade of that 'userID'
GRE represents the score of that 'userID'
SAT and GRE always stay the same for a given 'userID'
我的原始数据df_long
目前的格式如下:
userID SAT GRE action ConfChall vals
30798 A 1400 task conf 2
30798 A 1400 task chall 3
30798 A 1400 active conf 5
30798 A 1400 active chall 2
30798 A 1400 sleep conf 6
30798 A 1400 sleep chall 1
30798 A 1400 morn conf 4
30798 A 1400 morn chall 2
30895 A 1200 task conf 6
30895 A 1200 task chall 2
30895 A 1200 active conf 5
30895 A 1200 active chall 3
30895 A 1200 sleep conf 5
30895 A 1200 sleep chall 2
30895 A 1200 morn conf 5
30895 A 1200 morn chall 3
32678 B 1000 task conf 5
32678 B 1000 task chall 3
32678 B 1000 active conf 6
32678 B 1000 active chall 3
32678 B 1000 sleep conf 6
32678 B 1000 sleep chall 2
32678 B 1000 morn conf 5
32678 B 1000 morn chall 2
34679 A 1300 task conf 4
34679 A 1300 task chall 3
34679 A 1300 active conf 4
34679 A 1300 active chall 2
34679 A 1300 sleep conf 6
34679 A 1300 sleep chall 1
34679 A 1300 morn conf 6
34679 A 1300 morn chall 3
35999 A 1400 task conf 2
35999 A 1400 task chall 2
35999 A 1400 active conf 2
35999 A 1400 active chall 2
35999 A 1400 sleep conf 2
35999 A 1400 sleep chall 2
35999 A 1400 morn conf 2
35999 A 1400 morn chall 2
我尝试使用以下代码,但两种情况下的输出都不正确
library(reshape2)
df_wide = recast(df_long, userID ~ c('action','confChall','vals'),
id.var = c("userID", "SAT", "GRE"))
df_wide = dcast(df_long, userID + SAT + GRE ~ c(action + ConfChall), value.var = "vals")
我试图按照下面几页中的示例代码进行操作。但是我很难把这些应用到我的问题上。如果您对此有任何建议,我们将不胜感激
您可以使用
tidyr
软件包(属于tidyverse
软件包套件)中的pivot\u wide
重塑多个类别列和多个值列的形状:
reformae2
是一个旧软件包,据我所知,它已不再处于积极开发阶段,并已被tidyverse
软件包所取代
为了解决您在注释中提到的警告:如果宽数据框中有任何单元格具有多个值,那么您将得到您得到的结果。在您的情况下,当有多行具有相同的userID、SAT、GRE、action和ConfChall时,或者通常当它们是可以出现在多行中的行和列类别的组合时,就会发生这种情况。这不会发生在数据样本中,但会发生在真实数据中
因此,让我们向数据示例中添加一个重复的行:
df_long = read.table(text="userID SAT GRE action ConfChall vals
30798 A 1400 task conf 2
30798 A 1400 task chall 3
30798 A 1400 task chall 4 # added row to create a duplicate
30798 A 1400 active conf 5
30798 A 1400 active chall 2
30798 A 1400 sleep conf 6
30798 A 1400 sleep chall 1
30798 A 1400 morn conf 4
30798 A 1400 morn chall 2
30895 A 1200 task conf 6
30895 A 1200 task chall 2
30895 A 1200 active conf 5
30895 A 1200 active chall 3
30895 A 1200 sleep conf 5
30895 A 1200 sleep chall 2
30895 A 1200 morn conf 5
30895 A 1200 morn chall 3
32678 B 1000 task conf 5
32678 B 1000 task chall 3
32678 B 1000 active conf 6
32678 B 1000 active chall 3
32678 B 1000 sleep conf 6
32678 B 1000 sleep chall 2
32678 B 1000 morn conf 5
32678 B 1000 morn chall 2
34679 A 1300 task conf 4
34679 A 1300 task chall 3
34679 A 1300 active conf 4
34679 A 1300 active chall 2
34679 A 1300 sleep conf 6
34679 A 1300 sleep chall 1
34679 A 1300 morn conf 6
34679 A 1300 morn chall 3", header=TRUE)
现在让我们再次将形状改宽。请注意,我们得到警告,其中一个列表列单元格有两个值,而不是一个值:
df_long %>%
pivot_wider(names_from=c(action, ConfChall), values_from=vals)
Warning message:
Values in `vals` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(vals = list)` to suppress this warning.
* Use `values_fn = list(vals = length)` to identify where the duplicates arise
* Use `values_fn = list(vals = summary_fun)` to summarise duplicates
userID SAT GRE task_conf task_chall active_conf active_chall sleep_conf sleep_chall morn_chall
1 30798 A 1400 2 3.5 5 2 6 1 4 2
230895 A 1200 6 2 5 3 5 3
332678 B 1000 5 3 6 3 6 2 5 2
434679A130043426163
?重塑2::melt
谢谢@eipi10我尝试了这个,我收到了以下警告消息:警告消息:vals
中的值没有唯一标识;输出将包含列表列。*使用values\u fn=list(vals=list)
抑制此警告。*使用values\u fn=list(vals=length)
来识别重复项出现的位置*使用values\u fn=list(vals=summary\u fun)
来总结重复项我需要将操作和确认作为因素吗?在真实数据中,是否至少有一个userID、SAT和GRE组合出现多次?是,在我的原始df_long
数据中,大多数userID
s位于多行。SAT
和GRE
对于给定的userID
保持不变。我已经更新了我的问题,请看一下。对不起,我上面说的是不正确的。让我重新开始:如果宽数据帧中有任何单元格具有多个值,那么您将得到您得到的结果。当有多行具有相同的userID、SAT、GRE、action和ConfChall时,就会发生这种情况。这不会发生在数据样本中,但会发生在真实数据中。
df_long = read.table(text="userID SAT GRE action ConfChall vals
30798 A 1400 task conf 2
30798 A 1400 task chall 3
30798 A 1400 task chall 4 # added row to create a duplicate
30798 A 1400 active conf 5
30798 A 1400 active chall 2
30798 A 1400 sleep conf 6
30798 A 1400 sleep chall 1
30798 A 1400 morn conf 4
30798 A 1400 morn chall 2
30895 A 1200 task conf 6
30895 A 1200 task chall 2
30895 A 1200 active conf 5
30895 A 1200 active chall 3
30895 A 1200 sleep conf 5
30895 A 1200 sleep chall 2
30895 A 1200 morn conf 5
30895 A 1200 morn chall 3
32678 B 1000 task conf 5
32678 B 1000 task chall 3
32678 B 1000 active conf 6
32678 B 1000 active chall 3
32678 B 1000 sleep conf 6
32678 B 1000 sleep chall 2
32678 B 1000 morn conf 5
32678 B 1000 morn chall 2
34679 A 1300 task conf 4
34679 A 1300 task chall 3
34679 A 1300 active conf 4
34679 A 1300 active chall 2
34679 A 1300 sleep conf 6
34679 A 1300 sleep chall 1
34679 A 1300 morn conf 6
34679 A 1300 morn chall 3", header=TRUE)
df_long %>%
pivot_wider(names_from=c(action, ConfChall), values_from=vals)
Warning message:
Values in `vals` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(vals = list)` to suppress this warning.
* Use `values_fn = list(vals = length)` to identify where the duplicates arise
* Use `values_fn = list(vals = summary_fun)` to summarise duplicates
userID SAT GRE task_conf task_chall active_conf active_chall sleep_conf sleep_chall morn_conf morn_chall
<int> <fct> <int> <list<int>> <list<int>> <list<int>> <list<int>> <list<int>> <list<int>> <list<int>> <list<int>>
1 30798 A 1400 [1] [2] [1] [1] [1] [1] [1] [1]
2 30895 A 1200 [1] [1] [1] [1] [1] [1] [1] [1]
3 32678 B 1000 [1] [1] [1] [1] [1] [1] [1] [1]
4 34679 A 1300 [1] [1] [1] [1] [1] [1] [1] [1]
df_long %>%
pivot_wider(names_from=c(action, ConfChall), values_from=vals) %>%
unnest()
userID SAT GRE task_conf task_chall active_conf active_chall sleep_conf sleep_chall morn_conf morn_chall
<int> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 30798 A 1400 2 3 5 2 6 1 4 2
2 30798 A 1400 2 4 5 2 6 1 4 2
3 30895 A 1200 6 2 5 3 5 2 5 3
4 32678 B 1000 5 3 6 3 6 2 5 2
5 34679 A 1300 4 3 4 2 6 1 6 3
df_long %>%
pivot_wider(names_from=c(action, ConfChall), values_from=vals,
values_fn=list(vals=mean))
userID SAT GRE task_conf task_chall active_conf active_chall sleep_conf sleep_chall morn_conf morn_chall
<int> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 30798 A 1400 2 3.5 5 2 6 1 4 2
2 30895 A 1200 6 2 5 3 5 2 5 3
3 32678 B 1000 5 3 6 3 6 2 5 2
4 34679 A 1300 4 3 4 2 6 1 6 3