R：使用for循环跨两个不同长度的数据帧执行多个if条件_R_For Loop_If Statement

R：使用for循环跨两个不同长度的数据帧执行多个if条件

r for-loop if-statement

R：使用for循环跨两个不同长度的数据帧执行多个if条件,r,for-loop,if-statement,R,For Loop,If Statement,我对R相当陌生，必须解决（对我来说）一个相当复杂的问题->我希望得到你的帮助我有两个不同长度的数据帧： Product <- c("A1", "A2", "C1", "D1") Posting_Date <- c("01-2016", "03-2016", "02-2016", "01-2016") df1 <- data.frame(Product, Posting_Date) df1 Product Posting_Date 1 A1 01-20

我对R相当陌生，必须解决（对我来说）一个相当复杂的问题->我希望得到你的帮助

我有两个不同长度的数据帧：

Product <- c("A1", "A2", "C1", "D1")
Posting_Date <- c("01-2016", "03-2016", "02-2016", "01-2016")

df1 <- data.frame(Product, Posting_Date)

df1
 Product Posting_Date
1      A1      01-2016
2      A2      03-2016
3      C1      02-2016
4      D1      01-2016

Product2 <- rep(c("A1", "A2", "B1", "C1", "C2", "D1"), each = 3)
Sales_Month <- rep(c("01-2016", "02-2016", "03-2016"), times = 6)
Sales <- rep(c(2300,0,2700,250,0,3700), times =3)
df2 <- data.frame(Product2, Sales_Month, Sales)

df2
 Product2 Sales_Month Sales
1        A1     01-2016  2300
2        A1     02-2016     0
3        A1     03-2016  2700
4        A2     01-2016   250
5        A2     02-2016     0
6        A2     03-2016  3700
7        B1     01-2016  2300
8        B1     02-2016     0
9        B1     03-2016  2700
10       C1     01-2016   250
11       C1     02-2016     0
12       C1     03-2016  3700
13       C2     01-2016  2300
14       C2     02-2016     0
15       C2     03-2016  2700
16       D1     01-2016   250
17       D1     02-2016     0
18       D1     03-2016  3700

有人能帮我解决这个问题吗？

这是一种更安全、更高效的方法

> tmp=aggregate(df2$Sales,list(df2$Product2,df2$Sales_Month),max)
> colnames(tmp)=c("Product","Posting_Date","match")
> tmp$match=ifelse(tmp$match>0,1,0)
> merge(df1,tmp,by=c("Product","Posting_Date"))

  Product Posting_Date match
1      A1      01-2016     1
2      A2      03-2016     1
3      C1      02-2016     0
4      D1      01-2016     1

下面是一种使用流行的

dplyr

库的方法

基本上，您希望将两个表连接在一起，然后根据销售是否符合您的标准创建一个新变量

match

library(dplyr)

df1 %>%
  left_join(df2, by = c("Product" = "Product2", "Posting_Date" = "Sales_Month")) %>%
  mutate(match = as.numeric(Sales > 0)) %>%
  select(-Sales)

  Product Posting_Date match
1      A1      01-2016     1
2      A2      03-2016     1
3      C1      02-2016     0
4      D1      01-2016     1

由于R如何处理因子和字符变量，这可能会引发警告。对每个

data.frame（）

执行类似的操作可以纠正它

df1 <- data.frame(Product, Posting_Date, stringsAsFactors = FALSE)
df2 <- data.frame(Product2, Sales_Month, Sales, stringsAsFactors = FALSE)

df1或使用data.table
join
library(data.table)
setDT(df1)[df2,  match := as.integer(Sales > 0), 
        on = .(Product= Product2, Posting_Date = Sales_Month)]
df1
#   Product Posting_Date match
#1:      A1      01-2016     1
#2:      A2      03-2016     1
#3:      C1      02-2016     0
#4:      D1      01-2016     1

您好，为了帮助您，应该将if
中的多个条件写入同一个括号中，例如：if（df1$Product==df2$Product2&df1$Posting\u Date==df2$Sales\u Month&df2$Sales>0）
。此外，您需要告诉您查看的是哪个元素，即df1$Product[i]
而不是df1$Product
。最后，应该有第二个循环。事实上，对于df1
的每个元素，您应该在df2
的所有元素上循环以查找匹配项（找到后，您可以使用break
停止循环）。复制代码会给我以下警告消息：“警告消息：列Product
/Product2
将不同级别的因子连接在一起，强制为字符向量。”“输出在列matchYeah的每一行显示一个值1，这是由于数据帧的方式。它是有效的，你可以忽略它。如果您在data.frame（）
语句中执行了stringsAsFactors=FALSE，则该错误将消失。很抱歉，刚才看到您以任何方式编写了此错误！工作谢谢！！为什么在合并之前要对数据进行聚合？@Morasc查找每个组合的最大值，即查找哪个组合的值大于0。如果合并而不聚合，则会进行交叉连接，生成更多无用的记录。
library(dplyr)

df1 %>%
  left_join(df2, by = c("Product" = "Product2", "Posting_Date" = "Sales_Month")) %>%
  mutate(match = as.numeric(Sales > 0)) %>%
  select(-Sales)

  Product Posting_Date match
1      A1      01-2016     1
2      A2      03-2016     1
3      C1      02-2016     0
4      D1      01-2016     1

df1 <- data.frame(Product, Posting_Date, stringsAsFactors = FALSE)
df2 <- data.frame(Product2, Sales_Month, Sales, stringsAsFactors = FALSE)

library(data.table)
setDT(df1)[df2,  match := as.integer(Sales > 0), 
        on = .(Product= Product2, Posting_Date = Sales_Month)]
df1
#   Product Posting_Date match
#1:      A1      01-2016     1
#2:      A2      03-2016     1
#3:      C1      02-2016     0
#4:      D1      01-2016     1