R 计算数值的平均值和字符串值的唯一连接
我有一个df,如下所示,现在我想要计算数值的平均值和字符串值的唯一连接 尝试以下代码R 计算数值的平均值和字符串值的唯一连接,r,R,我有一个df,如下所示,现在我想要计算数值的平均值和字符串值的唯一连接 尝试以下代码 out = dcast(df, Date+Name+class~gender,fun.aggregate = mean,value.var='value') 输入数据帧如下所示 Date = c("8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019") Name = c("A
out = dcast(df, Date+Name+class~gender,fun.aggregate = mean,value.var='value')
输入数据帧如下所示
Date = c("8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019","8/20/2019")
Name = c("ABC","ABC","CBC","CBC","XYLEM","XYLEM","XYLEM","XYLEM")
class = c("one","one","two","two","three","three","three","three")
gender = c("M","M","F","F","M","M","F","F")
value = c("1","2","top","topper","low","lower","1","3")
df = data.frame(Date,Name,class,gender,value)
输出数据框如下所示,共有五列
Date = c("8/20/2019","8/20/2019","8/20/2019")
Name = c("ABC","CBC","XYLEM")
class = c("one","two","three")
M=c("1.5","NA","low,lower")
F = c("NA","top,topper","2")
out = data.frame(Date,Name,class,M,F)
老实说,我很难理解为什么数据是这样格式化的,因为我们似乎在单个列中混合了数字值和字符串值。但这是可以做到的:
library(dplyr)
library(tidyr)
df.out = df %>%
# Figure out which cells contain numbers and which contain strings, and store
# them in separate columns.
mutate(numeric.value = as.numeric(as.character(value)),
string.value = ifelse(is.na(numeric.value), as.character(value), NA)) %>%
# Group by date, name, class, and gender so we can summarize.
group_by(Date, Name, class, gender) %>%
# For each group, get the mean of the numeric values and the concatenation of
# string values.
summarize(numeric.mean = mean(numeric.value, na.rm = T),
string.concat = gsub("^,|,,|,$", "", gsub("NA", "", paste0(string.value, collapse = ",")))) %>%
# Fill in NAs where appropriate.
mutate(numeric.mean = ifelse(is.nan(numeric.mean), NA, numeric.mean),
string.concat = ifelse(string.concat == "", NA, string.concat)) %>%
# The form of the desired output suggests that each group will have *either*
# numeric values *or* string values, but not both. So let's put the two
# summary values back into a single column.
mutate(summary.by.gender = coalesce(as.character(numeric.mean), string.concat)) %>%
# Pivot so that each gender gets its own column.
select(Date, Name, class, gender, summary.by.gender) %>%
spread(gender, summary.by.gender)