R 获取按两列的唯一组合分组的最小值_R

R 获取按两列的唯一组合分组的最小值

R 获取按两列的唯一组合分组的最小值,r,R,我试图在R中实现以下目标：给定一个表（在我的例子中是数据帧）——我希望得到两列的每个唯一组合的最低价格例如，给定下表： +-----+-----------+-------+----------+----------+ | Key | Feature1 | Price | Feature2 | Feature3 | +-----+-----------+-------+----------+----------+ | AAA | 1 | 100 | whatever |

我试图在R中实现以下目标：给定一个表（在我的例子中是数据帧）——我希望得到两列的每个唯一组合的最低价格

例如，给定下表：

+-----+-----------+-------+----------+----------+
| Key | Feature1  | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA |         1 |   100 | whatever | whatever |
| AAA |         1 |   150 | whatever | whatever |
| AAA |         1 |   200 | whatever | whatever |
| AAA |         2 |   110 | whatever | whatever |
| AAA |         2 |   120 | whatever | whatever |
| BBB |         1 |   100 | whatever | whatever |
+-----+-----------+-------+----------+----------+

我希望得到如下结果：

+-----+-----------+-------+----------+----------+
| Key | Feature1  | Price | Feature2 | Feature3 |
+-----+-----------+-------+----------+----------+
| AAA |         1 |   100 | whatever | whatever |
| AAA |         2 |   110 | whatever | whatever |
| BBB |         1 |   100 | whatever | whatever |
+-----+-----------+-------+----------+----------+

因此，我正在按照以下思路制定解决方案：

s <- lapply(split(data, list(data$Key, data$Feature1)), function(chunk) { 
        chunk[which.min(chunk$Price),]})

s您可以使用dplyr
软件包：
library(dplyr)

data %>% group_by(Key, Feature1) %>%
         slice(which.min(Price))

由于您提到了data.table
包，我在这里提供了使用该包的解决方案：
library(data.table)
setDT(df)[,.(Price=min(Price)),.(Key, Feature1)] #initial question
setDT(df)[,.SD[which.min(Price)],.(Key, Feature1)] #updated question

df是您的示例data.frame
更新：使用mtcars
数据进行测试
df<-mtcars
library(data.table)
setDT(df)[,.SD[which.min(mpg)],by=am]
   am  mpg cyl disp  hp drat   wt  qsec vs gear carb
1:  1 15.0   8  301 335 3.54 3.57 14.60  0    5    8
2:  0 10.4   8  472 205 2.93 5.25 17.98  0    3    4

df基本R解决方案将是aggregate（Price~Key+Feature1，data，FUN=min）
使用R baseaggregate

> aggregate(Price~Key+Feature1, min, data=data)
  Key Feature1 Price
1 AAA        1   100
2 BBB        1   100
3 AAA        2   110

对于其他选项。
非常优雅-但我需要将所有列返回结果中。我把这个例子简化了一点。事实上，数据包含更多的列，这是我在结果中需要的。您的意思是希望在原始数据帧中返回最小值吗？如果是这种情况，请使用ave（data$Price，data$Key，data$Feature，FUN=min）
。否-请参阅更新的问题-我只想要具有最低值的行（对于Key+Feature1的唯一组合）-但要具有所有原始值。我尝试了你的代码，它只返回3列：Key、Feature1和Price——但我还需要所有其他原始列。啊，我明白了。jeremycg的dplyr
解决方案看起来不错。一个data.table
如果你做了data@user227710我现在看到你的更新了，你也可以做同样的事情，那就是setDT（data）[，lapply（.SD，min），by=list（Key，Feature1）]
。你的解决方案很有效。非常有效-但我需要将所有列返回结果中。我把这个例子简化了一点。实际上，数据包含更多的列，这是我在结果中需要的。使用什么逻辑来获取其他列的值？例如，如果同一关键特性1的Feature2
具有不同的值，那么输出中必须包含哪个值？属于最低价格的值。所以这个东西需要用作行过滤器。所以AAA-1，AAA-2，BBB-1的“随便什么”。其余的行可以被丢弃。