Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 循环如何帮助将数据集中到最大值_R_Loops_Dplyr_Tidyverse - Fatal编程技术网

R 循环如何帮助将数据集中到最大值

R 循环如何帮助将数据集中到最大值,r,loops,dplyr,tidyverse,R,Loops,Dplyr,Tidyverse,我试图设计一些代码来迭代我的数据,并根据每列的最大值将值集中 这里有一个我正在尝试做的例子: 输入 输出 | A | B | |:-----|-----:| | 4 | 4 | | 5 | 4 | | 9 | 7 | | 8 | 5 | | 4 | 3 | 正如所提议的,我现在使用的这段代码对于上面的例子来说非常好。但这仍然给我的数据带来了问题。我得到的错误是: (the_max-4)中出错:(the_max

我试图设计一些代码来迭代我的数据,并根据每列的最大值将值集中

这里有一个我正在尝试做的例子:

输入

输出

|  A   |   B  |       
|:-----|-----:|
|   4  |   4  |
|   5  |   4  |
|   9  |   7  |
|   8  |   5  |
|   4  |   3  |
正如所提议的,我现在使用的这段代码对于上面的例子来说非常好。但这仍然给我的数据带来了问题。我得到的错误是:

(the_max-4)中出错:(the_max+4):长度为0的参数

有什么帮助吗?我完全卡住了

df_cent<-apply(df, 2, function(x) {
  the_max<-which.max(x == max(x))
  return(x[(the_max-4):(the_max+4)])
})

向量的一个示例:

x = sample(c(4,5,9,8,4))
x.ordered = sort(x)
middle = floor(length(x)/2)
below = x.ordered[1:middle]
above = rev(x.ordered[(middle+1):length(x)])
new.x = c(x.ordered[1:middle], rev(x.ordered[(middle+1):length(x)]))
要使n值高于或低于最大值:

above[1:n]
below[-n:-1]

想法是检测中间位置的位置,然后使用有序向量及其反转(函数
rev

,如注释中所述,您可以简化此操作。在每列上使用
apply
,确定
max
,并在列+/-指定宽度中包含值(在本例中,2表示
n_宽度
)。结果是一个矩阵,但可以根据需要进行变换

注意:如果列中缺少值
NA
,请确保在确定
max
值时包含
NA.rm=TRUE

n_width <- 2

apply(df, 2, function(x) {
  the_max <- which.max(x == max(x, na.rm = T))
  if (the_max < n_width + 1) {
    return (c(rep(NA, n_width - the_max + 1),
              x[1:(the_max + n_width)]))
  } else {
    return(x[(the_max - n_width):(the_max + n_width)])
  }
})
编辑:对于第二个数据集,如果在最大值之前的观测值小于
n_宽度
,则需要
NA
s。例如:

n_width <- 5
数据

dput(raw[1:20,1:5]) structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599, 3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273, 2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797, 2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655, 2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179, 2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91, 2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829, 2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042, 3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875, 2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966, 2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982, 2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857, 2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802, 2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409, 2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944, 2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")
# First data set
df <- structure(list(A = c(4, 5, 9, 8, 4, 3, 2), B = c(3, 2, 4, 7, 
5, 3, 1)), class = "data.frame", row.names = c(NA, -7L))

# Second data set
df <- structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599, 
3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273, 
2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797, 
2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655, 
2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179, 
2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91, 
2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829, 
2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042, 
3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875, 
2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966, 
2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982, 
2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857, 
2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802, 
2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409, 
2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944, 
2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")
#第一个数据集

你能再解释一下吗?相对于每列的最大值,
将值居中是什么?为什么在
B
的输出中有两个
4
?使用
dput()
共享数据。从您的表格中,我们无法判断您是否正在使用矩阵、数据帧或其他内容。您可能可以大大简化代码。例如,
apply(x1,2,which.max)
将为您提供每列中最大值的索引号。不同的列对应于对不同单元格的单个分析,每个单元格都给出一条凹曲线,因此我需要将数据沿峰值(最大值)居中,以便进行比较。对于示例,我只使用了随机数。一旦我使用for循环找到最大值来检测最大值的位置,我只需在该值上方和下方选择50行(顶部和底部向量)。我需要的是一个循环遍历数据的带扣。这是我的输入类=c(“tbl_df”,“tbl”,“data.frame”))我不想对数据排序,我只想选择高于和低于最大值的n个值(保持输入顺序!)它对示例非常有效,但对我的真实数据仍然不起作用。我应该如何调整应用功能中的边距?ups!对不起!多谢你帮了我大忙。但是我不能让它工作。它仍然表示(the_max+4):(the_max-4):长度为0的参数中的错误。我担心_max不正确。它对示例非常有效,但对我的数据无效。我正在使用R studio接口导入excel表格,然后使用df2I将其转换为数据帧我已经解决了!!!!输入的列具有不同的长度(使用NA值完成)。输入数据不应具有NA值。谢谢你@本
n_width <- 5
            Y0       Y1       Y2       Y3       Y4
 [1,] 3178.701       NA 2928.439       NA       NA
 [2,] 3224.385       NA 2942.857       NA       NA
 [3,] 3304.599       NA 2949.042       NA       NA
 [4,] 3427.954       NA 3007.030       NA       NA
 [5,] 3564.216       NA 3025.960 2451.421 2350.779
 [6,] 3663.065 2678.572 3028.522 2454.053 2361.946
 [7,] 3607.685 2647.732 3019.542 2448.346 2354.645
 [8,] 3416.442 2624.185 3006.743 2430.966 2339.802
 [9,] 3213.872 2617.655 3020.229 2425.783 2257.112
[10,] 3082.273 2589.248 3023.875 2429.053 2230.763
[11,] 2967.310 2559.836 2985.960 2416.686 2235.095
# First data set
df <- structure(list(A = c(4, 5, 9, 8, 4, 3, 2), B = c(3, 2, 4, 7, 
5, 3, 1)), class = "data.frame", row.names = c(NA, -7L))

# Second data set
df <- structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599, 
3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273, 
2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797, 
2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655, 
2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179, 
2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91, 
2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829, 
2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042, 
3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875, 
2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966, 
2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982, 
2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857, 
2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802, 
2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409, 
2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944, 
2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")