如何使用循环将值替换为基于r dataframe中另一列的平均值
如果在r中有以下数据帧:如何使用循环将值替换为基于r dataframe中另一列的平均值,r,R,如果在r中有以下数据帧: Pitcher Pitch.Spin..rpm. A 2350 A 2400 A 2233 A 1100 B 2145 B 2200 B 2340 B 1050 我想在R中写一个循环,用它们各自的方法替换a和B中的低值,排除坏读数,这样输出将是: A 2350 A 2400 A 2233 A 2328 B 2145 B 2200 B 2340 B
Pitcher Pitch.Spin..rpm.
A 2350
A 2400
A 2233
A 1100
B 2145
B 2200
B 2340
B 1050
我想在R中写一个循环,用它们各自的方法替换a和B中的低值,排除坏读数,这样输出将是:
A 2350
A 2400
A 2233
A 2328
B 2145
B 2200
B 2340
B 2228
我该怎么做呢?下面是我的尝试,我的问题来自于不确定如何正确引用特定行中的投手值
for (i in 1:nrow(data)){
if (data$Pitch.Spin..rpm. < 1500)
data$Pitch.Spin..rpm. <- mean(data$Pitch.Spin..rpm.[Pitcher == {i}],na.rm = TRUE)
}
for(1中的i:nrow(数据)){
if(数据$Pitch.Spin..rpm.<1500)
数据$Pitch.Spin..rpm.我们可以通过分组操作来实现这一点。在按“Pitcher”分组后,对“Pitch.Spin..rpm.”进行变异。用该列的平均值替换小于1500的元素
library(dplyr)
data <- data %>%
group_by(Pitcher) %>%
mutate(`Pitch.Spin..rpm.` = replace(`Pitch.Spin..rpm.`,
`Pitch.Spin..rpm.` < 1500, mean(`Pitch.Spin..rpm.`, na.rm = TRUE)))
库(dplyr)
数据%
组别(投手)%>%
变异(`Pitch.Spin..rpm.`=替换(`Pitch.Spin..rpm.`),
`俯仰.旋转..rpm.`<1500,平均值(`俯仰.旋转..rpm.`,na.rm=TRUE)))
下一步是使用dplyr
和ifelse()
替换值的方法:
library(dplyr)
#Data
df <- structure(list(Pitcher = c("A", "A", "A", "A", "B", "B", "B",
"B"), Pitch.Spin..rpm. = c(2350L, 2400L, 2233L, 1100L, 2145L,
2200L, 2340L, 1050L)), class = "data.frame", row.names = c(NA,
-8L))
基本R解决方案,具有ave
ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
i <- x < 1500
if(any(i)) x[i] <- mean(x[!i])
x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333
ave(df$`Pitch.Spin..rpm.`,df$Pitcher,FUN=function(x){
我
# A tibble: 8 x 2
# Groups: Pitcher [2]
Pitcher Pitch.Spin..rpm.
<chr> <dbl>
1 A 2350
2 A 2400
3 A 2233
4 A 2328.
5 B 2145
6 B 2200
7 B 2340
8 B 2228.
#Unique pitcher
val <- unique(df$Pitcher)
#Create empty list
List <- list()
#Loop
for(i in val)
{
#Isolate data
data1 <- subset(df,Pitcher==i)
#Compute mean
meanval <- mean(data1$Pitch.Spin..rpm.[!data1$Pitch.Spin..rpm.<1500])
#Replace
data1$Pitch.Spin..rpm.[data1$Pitch.Spin..rpm.<1500]<-meanval
#Save in list
List[[i]] <- data1
}
#Now bind the list
newdf <- do.call(rbind,List)
rownames(newdf) <- NULL
Pitcher Pitch.Spin..rpm.
1 A 2350.000
2 A 2400.000
3 A 2233.000
4 A 2327.667
5 B 2145.000
6 B 2200.000
7 B 2340.000
8 B 2228.333
ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
i <- x < 1500
if(any(i)) x[i] <- mean(x[!i])
x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333
df$Pitch.Spin..rpm. <- ave(df$Pitch.Spin..rpm., df$Pitcher, FUN = function(x){
i <- x < 1500
if(any(i)) x[i] <- mean(x[!i])
x
})
df
# Pitcher Pitch.Spin..rpm.
#1 A 2350.000
#2 A 2400.000
#3 A 2233.000
#4 A 2327.667
#5 B 2145.000
#6 B 2200.000
#7 B 2340.000
#8 B 2228.333