计算R中连接字符串中的统计信息
假设我有这样一个数据帧:计算R中连接字符串中的统计信息,r,dplyr,stringr,R,Dplyr,Stringr,假设我有这样一个数据帧: X. Name Type Total HP Attack Defense Sp..Atk Sp..Def Speed 795 718 Zygarde50% Forme Dragon/Ground 600 108 100 121 81 95 95 796 719 Diancie R
X. Name Type Total HP Attack Defense Sp..Atk Sp..Def Speed
795 718 Zygarde50% Forme Dragon/Ground 600 108 100 121 81 95 95
796 719 Diancie Rock/Fairy 600 50 100 150 100 150 50
797 719 DiancieMega Diancie Rock/Fairy 700 50 160 110 160 110 110
798 720 HoopaHoopa Confined Psychic/Ghost 600 80 110 60 150 130 70
799 720 HoopaHoopa Unbound Psychic/Dark 680 80 160 60 170 130 80
800 721 Volcanion Fire/Water 600 80 110 120 130 90 70
如果我想计算每种类型的龙、地、石、仙等的平均属性(总数、生命、攻击、防御等)。。。(而不是输入龙/地,岩石/仙女),我将如何继续?属于任意两种类型的口袋妖怪的统计数据将用于计算这两种类型的平均统计数据
我已经使用dplyr
包中的函数编写了代码:
summaryStats_byType<- summarise(byType,
count = n(),
averageTotal = mean(Total, na.rm = T),
averageHP = mean(HP, na.rm = T),
averageDefense = mean(Defense, na.rm = T),
averageSpAtk = mean(Sp..Atk, na.rm = T),
averageSpDef = mean(Sp..Def, na.rm = T),
averageSpeed = mean(Speed, na.rm = T))
summaryStats\u byType一种方法是以长格式拆分Type
列(我从splitstackshape
中选择了cSplit
来执行此操作),并像往常一样按分组,即
library(splitstackshape)
library(dplyr)
df1 <- cSplit(df, 'Type', sep = '/', 'long')
df1 %>%
group_by(Type) %>%
summarise_each(funs(mean), -c(X., Name))
# A tibble: 9 × 8
# Type Total HP Attack Defense Sp..Atk Sp..Def Speed
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Dark 680 80 160 60 170 130 80
#2 Dragon 600 108 100 121 81 95 95
#3 Fairy 650 50 130 130 130 130 80
#4 Fire 600 80 110 120 130 90 70
#5 Ghost 600 80 110 60 150 130 70
#6 Ground 600 108 100 121 81 95 95
#7 Psychic 640 80 135 60 160 130 75
#8 Rock 650 50 130 130 130 130 80
#9 Water 600 80 110 120 130 90 70
这当然会产生同样的结果
数据
dput(df)
structure(list(X. = c(718L, 719L, 719L, 720L, 720L, 721L), Name = structure(c(6L,
1L, 2L, 3L, 4L, 5L), .Label = c("Diancie", "DiancieMega_Diancie",
"HoopaHoopa_Confined", "HoopaHoopa_Unbound", "Volcanion", "Zygarde50%_Forme"
), class = "factor"), Type = structure(c(1L, 5L, 5L, 4L, 3L,
2L), .Label = c("Dragon/Ground", "Fire/Water", "Psychic/Dark",
"Psychic/Ghost", "Rock/Fairy"), class = "factor"), Total = c(600L,
600L, 700L, 600L, 680L, 600L), HP = c(108L, 50L, 50L, 80L, 80L,
80L), Attack = c(100L, 100L, 160L, 110L, 160L, 110L), Defense = c(121L,
150L, 110L, 60L, 60L, 120L), Sp..Atk = c(81L, 100L, 160L, 150L,
170L, 130L), Sp..Def = c(95L, 150L, 110L, 130L, 130L, 90L), Speed = c(95L,
50L, 110L, 70L, 80L, 70L)), .Names = c("X.", "Name", "Type",
"Total", "HP", "Attack", "Defense", "Sp..Atk", "Sp..Def", "Speed"
), class = "data.frame", row.names = c("795", "796", "797", "798",
"799", "800"))
一种方法是以长格式拆分Type
列(我从splitstackshape
中选择cSplit
来执行此操作),并像往常一样按分组,即
library(splitstackshape)
library(dplyr)
df1 <- cSplit(df, 'Type', sep = '/', 'long')
df1 %>%
group_by(Type) %>%
summarise_each(funs(mean), -c(X., Name))
# A tibble: 9 × 8
# Type Total HP Attack Defense Sp..Atk Sp..Def Speed
# <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Dark 680 80 160 60 170 130 80
#2 Dragon 600 108 100 121 81 95 95
#3 Fairy 650 50 130 130 130 130 80
#4 Fire 600 80 110 120 130 90 70
#5 Ghost 600 80 110 60 150 130 70
#6 Ground 600 108 100 121 81 95 95
#7 Psychic 640 80 135 60 160 130 75
#8 Rock 650 50 130 130 130 130 80
#9 Water 600 80 110 120 130 90 70
这当然会产生同样的结果
数据
dput(df)
structure(list(X. = c(718L, 719L, 719L, 720L, 720L, 721L), Name = structure(c(6L,
1L, 2L, 3L, 4L, 5L), .Label = c("Diancie", "DiancieMega_Diancie",
"HoopaHoopa_Confined", "HoopaHoopa_Unbound", "Volcanion", "Zygarde50%_Forme"
), class = "factor"), Type = structure(c(1L, 5L, 5L, 4L, 3L,
2L), .Label = c("Dragon/Ground", "Fire/Water", "Psychic/Dark",
"Psychic/Ghost", "Rock/Fairy"), class = "factor"), Total = c(600L,
600L, 700L, 600L, 680L, 600L), HP = c(108L, 50L, 50L, 80L, 80L,
80L), Attack = c(100L, 100L, 160L, 110L, 160L, 110L), Defense = c(121L,
150L, 110L, 60L, 60L, 120L), Sp..Atk = c(81L, 100L, 160L, 150L,
170L, 130L), Sp..Def = c(95L, 150L, 110L, 130L, 130L, 90L), Speed = c(95L,
50L, 110L, 70L, 80L, 70L)), .Names = c("X.", "Name", "Type",
"Total", "HP", "Attack", "Defense", "Sp..Atk", "Sp..Def", "Speed"
), class = "data.frame", row.names = c("795", "796", "797", "798",
"799", "800"))
您能dput
初始数据帧吗?(只需将dput(yourdataframe)
的输出添加到您的帖子中即可)。很抱歉,您能解释一下在这种情况下dput
的用法吗?“我真的不明白。”卡蒂特隆看我答案的最后一部分dput
生成数据帧的可复制示例您可以dput
初始数据帧吗?(只需将dput(yourdataframe)
的输出添加到您的帖子中即可)。很抱歉,您能解释一下在这种情况下dput
的用法吗?“我真的不明白。”卡蒂特隆看我答案的最后一部分dput
生成一个可复制的数据帧示例我不知道,我正在寻找另一种拆分方法!我不知道这一点,我正在寻找另一种分裂的方式!