如何使用循环将值替换为基于r dataframe中另一列的平均值

如何使用循环将值替换为基于r dataframe中另一列的平均值,r,R,如果在r中有以下数据帧: Pitcher Pitch.Spin..rpm. A 2350 A 2400 A 2233 A 1100 B 2145 B 2200 B 2340 B 1050 我想在R中写一个循环,用它们各自的方法替换a和B中的低值,排除坏读数,这样输出将是: A 2350 A 2400 A 2233 A 2328 B 2145 B 2200 B 2340 B

如果在r中有以下数据帧:

Pitcher Pitch.Spin..rpm. 
A     2350
A     2400
A     2233
A     1100
B     2145
B     2200
B     2340
B     1050
我想在R中写一个循环,用它们各自的方法替换a和B中的低值,排除坏读数,这样输出将是:

A     2350
A     2400
A     2233
A     2328
B     2145
B     2200
B     2340
B     2228
我该怎么做呢?下面是我的尝试,我的问题来自于不确定如何正确引用特定行中的投手值

for (i in 1:nrow(data)){
  if (data$Pitch.Spin..rpm. < 1500)
  data$Pitch.Spin..rpm. <- mean(data$Pitch.Spin..rpm.[Pitcher == {i}],na.rm = TRUE)
}
for(1中的i:nrow(数据)){
if(数据$Pitch.Spin..rpm.<1500)

数据$Pitch.Spin..rpm.我们可以通过分组操作来实现这一点。在按“Pitcher”分组后,
对“Pitch.Spin..rpm.”进行变异。
用该列的
平均值替换
小于1500的元素

library(dplyr)
data <- data %>%
   group_by(Pitcher) %>%
   mutate(`Pitch.Spin..rpm.` = replace(`Pitch.Spin..rpm.`, 
        `Pitch.Spin..rpm.` < 1500, mean(`Pitch.Spin..rpm.`, na.rm = TRUE)))
库(dplyr)
数据%
组别(投手)%>%
变异(`Pitch.Spin..rpm.`=替换(`Pitch.Spin..rpm.`),
`俯仰.旋转..rpm.`<1500,平均值(`俯仰.旋转..rpm.`,na.rm=TRUE)))

下一步是使用
dplyr
ifelse()
替换值的方法:

library(dplyr)

#Data
df <- structure(list(Pitcher = c("A", "A", "A", "A", "B", "B", "B", 
"B"), Pitch.Spin..rpm. = c(2350L, 2400L, 2233L, 1100L, 2145L, 
2200L, 2340L, 1050L)), class = "data.frame", row.names = c(NA, 
-8L))

基本R解决方案,具有
ave

ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333
ave(df$`Pitch.Spin..rpm.`,df$Pitcher,FUN=function(x){
我
# A tibble: 8 x 2
# Groups:   Pitcher [2]
  Pitcher Pitch.Spin..rpm.
  <chr>              <dbl>
1 A                  2350 
2 A                  2400 
3 A                  2233 
4 A                  2328.
5 B                  2145 
6 B                  2200 
7 B                  2340 
8 B                  2228.
#Unique pitcher
val <- unique(df$Pitcher)
#Create empty list
List <- list()
#Loop
for(i in val)
{
  #Isolate data
  data1 <- subset(df,Pitcher==i)
  #Compute mean
  meanval <- mean(data1$Pitch.Spin..rpm.[!data1$Pitch.Spin..rpm.<1500])
  #Replace
  data1$Pitch.Spin..rpm.[data1$Pitch.Spin..rpm.<1500]<-meanval
  #Save in list
  List[[i]] <- data1
}
#Now bind the list
newdf <- do.call(rbind,List)
rownames(newdf) <- NULL
  Pitcher Pitch.Spin..rpm.
1       A         2350.000
2       A         2400.000
3       A         2233.000
4       A         2327.667
5       B         2145.000
6       B         2200.000
7       B         2340.000
8       B         2228.333
ave(df$`Pitch.Spin..rpm.`, df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})
#[1] 2350.000 2400.000 2233.000 2327.667 2145.000 2200.000 2340.000
#[8] 2228.333
df$Pitch.Spin..rpm. <- ave(df$Pitch.Spin..rpm., df$Pitcher, FUN = function(x){
  i <- x < 1500
  if(any(i)) x[i] <- mean(x[!i])
  x
})

df
#  Pitcher Pitch.Spin..rpm.
#1       A         2350.000
#2       A         2400.000
#3       A         2233.000
#4       A         2327.667
#5       B         2145.000
#6       B         2200.000
#7       B         2340.000
#8       B         2228.333