基于多个其他列的数据帧列中值的条件替换-R_R_Dataframe

基于多个其他列的数据帧列中值的条件替换-R

r dataframe

基于多个其他列的数据帧列中值的条件替换-R,r,dataframe,R,Dataframe,我的数据框看起来像这样 > tornado_frame tornado_names Level value 1 node per cluster low -34.72222 2 TB per node low -52.08333 3 expense per cluster low -104.16667 4 Total TB low -62.50000 5 revenue per clus

我的数据框看起来像这样

> tornado_frame
         tornado_names Level      value
1     node per cluster   low  -34.72222
2          TB per node   low  -52.08333
3  expense per cluster   low -104.16667
4             Total TB   low  -62.50000
5  revenue per cluster   low  -52.08333
6     node per cluster  high   20.83333
7          TB per node  high   41.66667
8  expense per cluster  high   52.08333
9             Total TB  high  145.83333
10 revenue per cluster  high  156.25000

我想把桌子变成这个

> tornado_frame
         tornado_names Level      value
1     node per cluster   low   34.72222
2          TB per node   low   52.08333
3  expense per cluster   low  104.16667
4             Total TB   low  -62.50000
5  revenue per cluster   low  -52.08333
6     node per cluster  high  -20.83333
7          TB per node  high  -41.66667
8  expense per cluster  high  -52.08333
9             Total TB  high  145.83333
10 revenue per cluster  high  156.25000

如果“值”中的负号的绝对值大于“高”级别列和同一tornado_名称列的绝对值，则“值”中的负号会发生变化

我尝试了一些嵌套的if，但那对我来说太麻烦了。任何帮助都将不胜感激

以下是我的数据：

> dput(tornado_frame)
structure(list(tornado_names = structure(c(2L, 4L, 1L, 5L, 3L, 
2L, 4L, 1L, 5L, 3L), .Label = c("expense per cluster", "node per cluster", 
"revenue per cluster", "TB per node", "Total TB"), class = "factor"), 
    Level = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L
    ), .Label = c("high", "low"), class = "factor"), value = c(34.72222, 
    52.08333, 104.16667, -62.5, -52.08333, -20.83333, -41.66667, 
    -52.08333, 145.83333, 156.25)), .Names = c("tornado_names", 
"Level", "value"), class = "data.frame", row.names = c(NA, -10L
))

下面是一个可能的

data.table

解决方案

library(data.table)
setDT(df)[, value := if(diff(abs(value)) < 0) value * -1,
                                            by = tornado_names]
df
#           tornado_names Level     value
#  1:    node per cluster   low  34.72222
#  2:         TB per node   low  52.08333
#  3: expense per cluster   low 104.16667
#  4:            Total TB   low -62.50000
#  5: revenue per cluster   low -52.08333
#  6:    node per cluster  high -20.83333
#  7:         TB per node  high -41.66667
#  8: expense per cluster  high -52.08333
#  9:            Total TB  high 145.83333
# 10: revenue per cluster  high 156.25000

库（data.table）
setDT（df）[，值：=如果（差值（绝对值））<0）值*-1，
by=龙卷风名称]
df
#tornado_名称级别值
#1：每个群集的节点数低34.72222
#2:TB/节点低52.08333
#3：每个集群的费用低104.16667
#4：结核病总数低-62.50000
#5：每个集群的收入较低-52.08333
#6：每个群集的节点数高-20.83333
#7:TB/节点高-41.66667
#8：每个集群的费用高-52.08333
#9：总结核病高达145.83333
#10：每个集群的收入高156.25000

这将根据

tornado_names

检查您的条件，并且只更改满足条件的组中的值的符号。

如果我想在if语句中添加第二个条件来检查另一个数据框的@davids列中的条件，那么只需添加

和类似if的条件（cond1和cond2）
有没有一种方法可以强制if语句查看cond2数据帧的所有元素（我得到的是“条件的长度大于1，只有第一个元素会被使用”）并且我不想切换到ifelseYes您可以做类似于if（值[1]>1）的事情
或类似的东西。很难说清楚你的具体情况。