R 如何在数据帧上使用强制转换?
我有一个如下所示的数据帧:R 如何在数据帧上使用强制转换?,r,R,我有一个如下所示的数据帧: year income group 1 2008 27907 Under25 2 2009 25522 Under25 3 2010 26777 Under25 4 2008 58809 Age25_34 5 2009 57239 Age25_34 6 2010 58558 Age25_34 7 2008 75677 Age35_44 8 2009 74900 Age35_44 9 2010 74136 Age35_
year income group
1 2008 27907 Under25
2 2009 25522 Under25
3 2010 26777 Under25
4 2008 58809 Age25_34
5 2009 57239 Age25_34
6 2010 58558 Age25_34
7 2008 75677 Age35_44
8 2009 74900 Age35_44
9 2010 74136 Age35_44
10 2008 78537 Age45_54
11 2009 77460 Age45_54
12 2010 76266 Age45_54
13 2008 69009 Age55_64
14 2009 67586 Age55_64
15 2008 44402 Age65_74
16 2009 46147 Age65_74
17 2010 48595 Age65_74
18 2008 32747 Over75
19 2009 31272 Over75
20 2010 31638 Over75
> str(df)
'data.frame': 20 obs. of 3 variables:
$ year : int 2008 2009 2010 2008 2009 2010 2008 2009 2010 2008 ...
$ income: int 27907 25522 26777 58809 57239 58558 75677 74900 74136 78537 ...
$ group : Factor w/ 7 levels "Age25_34","Age35_44",..: 7 7 7 1 1 1 2 2 2 3 ...
我想使用cast来查找组的平均值。此外,我想从这个df创建一个广泛的data.frame,其中第一列是年份,下面几列是不同群体的收入。比如说
year under25 Age25_34 Age35_44 Age45_54 ...
2008 27907 58809 75677 78537 ...
2009 25522 57239 74900 77460 ...
...
当我尝试强制转换时,出现以下错误:
铸造(df,收入组,平均值)
使用组作为值列。使用值参数强制转换以覆盖此选择
[.data.frame
中出错(数据、变量、drop=FALSE):
选择未定义的列
我对cast命令有什么错
如何将其转换为示例中所示的宽格式
下面列出了我的R版本信息
> unlist(R.Version())
platform arch os
"x86_64-pc-mingw32" "x86_64" "mingw32"
system status major
"x86_64, mingw32" "" "2"
minor year month
"13.1" "2011" "07"
day svn rev language
"08" "56322" "R"
version.string
"R version 2.13.1 (2011-07-08)"
为什么不使用tapply
with(df, tapply(income, list(year, group), mean))
(感谢Ramnath的好评)请用
cast
cast(df, year ~ group, mean, value = 'income')
year Age25_34 Age35_44 Age45_54 Age55_64 Age65_74 Over75 Under25
1 2008 58809 75677 78537 69009 44402 32747 27907
2 2009 57239 74900 77460 67586 46147 31272 25522
3 2010 58558 74136 76266 NaN 48595 31638 26777
创建数据框:
year<-c(2008,2009, 2010,2008,2009, 2010, 2008,2009, 2010,2008, 2009, 2010, 2008, 2009, 2008, 2009, 2010, 2008,2009,2010)
income<-c(27907,25522, 26777,58809, 57239, 58558, 75677,74900, 74136, 78537,77460,76266, 69009,67586, 44402, 46147,48595,32747, 31272,31638)
group<-c("Under25","Under25","Under25","Age25_34","Age25_34","Age25_34","Age35_44","Age35_44","Age35_44","Age45_54","Age45_54","Age45_54","Age55_64","Age55_64","Age65_74","Age65_74","Age65_74","Over75","Over75","Over75")
demographic_data<-data.frame(year, income,group)
demographic_data
str(demographic_data)
year没有给出正确的输出,因为它跨年份计算平均值。请查看提供的示例输出,以及描述…第一列是year…
完美!我会使用with
语句来避免重复多次df
。是的,这对于获得我想要的平均值非常有用。原因是重点是我经常使用melt,我认为这是一个很好的补充,但效果很好。谢谢。谢谢!我确实试过了,但没有在收入上加引号。这是我的问题!
year<-c(2008,2009, 2010,2008,2009, 2010, 2008,2009, 2010,2008, 2009, 2010, 2008, 2009, 2008, 2009, 2010, 2008,2009,2010)
income<-c(27907,25522, 26777,58809, 57239, 58558, 75677,74900, 74136, 78537,77460,76266, 69009,67586, 44402, 46147,48595,32747, 31272,31638)
group<-c("Under25","Under25","Under25","Age25_34","Age25_34","Age25_34","Age35_44","Age35_44","Age35_44","Age45_54","Age45_54","Age45_54","Age55_64","Age55_64","Age65_74","Age65_74","Age65_74","Over75","Over75","Over75")
demographic_data<-data.frame(year, income,group)
demographic_data
str(demographic_data)
library(reshape)
melted_demographic_data<-melt(demographic_data,id=c("group","year"))
melted_demographic_data
groupmeans<-cast(melted_demographic_data,group~variable, mean)
groupmeans
yearmeans<-cast(melted_demographic_data,year~variable, mean)
yearmeans