Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/jsp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 按组ID在日期范围左加入_R_Dplyr - Fatal编程技术网

R 按组ID在日期范围左加入

R 按组ID在日期范围左加入,r,dplyr,R,Dplyr,考虑以下令牌数据集: Data = structure(list(txs = c(-50, -750, -35, -5.96, -61.5, -42.07, -142.4, -500, 132, -154.89, -109.51, -2000, -50, -40, -24.98, -15.6, -50, -147.72, -20, -6.6, -5, -20, -13.48, -7.25, -54.09, -200, -124.11, -30, -50, -30, 400, -10, -0

考虑以下令牌数据集:

Data = structure(list(txs = c(-50, -750, -35, -5.96, -61.5, -42.07, 
-142.4, -500, 132, -154.89, -109.51, -2000, -50, -40, -24.98, 
-15.6, -50, -147.72, -20, -6.6, -5, -20, -13.48, -7.25, -54.09, 
-200, -124.11, -30, -50, -30, 400, -10, -0.95, -4.1, -10000, 
30, -1.99, 74.03, -6.95, -2.96, -29, -403.6, -6, -6, 5250, -513.57, 
-300, -10, -500, -20, -6.45, -7.26, -40, -50, -13.14, 321.29, 
-18, 100, -5.5, -25, -59.2, -10.75, -3.2, 270, 65.8, -11.6, -104.78, 
-99.39, 0.1, -50, -80, -50, -371.44, -78, 270, -6.3, 40, -2.5, 
-29.99, -189.48, -400, -0.29, -20, -6.55, -987.37, -1400, -0.49, 
-20, -29.04, -65, -40, -27.5, -17.37, -10, -1092.84, -5.5, -69.93, 
-15.07, -400, -4.8), week = structure(c(1439157600, 1454281200, 
1471212000, 1445205600, 1448233200, 1451862000, 1449442800, 1453676400, 
1460325600, 1460930400, 1445205600, 1454281200, 1460930400, 1444600800, 
1462140000, 1471816800, 1443996000, 1448838000, 1479682800, 1453071600, 
1447023600, 1473631200, 1465768800, 1433109600, 1445205600, 1433714400, 
1466978400, 1441576800, 1459116000, 1451862000, 1436133600, 1440367200, 
1456095600, 1458514800, 1456700400, 1450047600, 1440972000, 1446418800, 
1465164000, 1441576800, 1442181600, 1453071600, 1461535200, 1460930400, 
1438552800, 1464559200, 1447628400, 1434924000, 1437343200, 1436738400, 
1443391200, 1438552800, 1440972000, 1446418800, 1446418800, 1453071600, 
1453071600, 1457305200, 1444600800, 1462140000, 1435528800, 1457305200, 
1437948000, 1440972000, 1437948000, 1433109600, 1461535200, 1453676400, 
1454886000, 1454281200, 1441576800, 1441576800, 1471212000, 1453071600, 
1451862000, 1442786400, 1443391200, 1439762400, 1436133600, 1461535200, 
1442181600, 1468188000, 1442181600, 1453676400, 1466373600, 1443391200, 
1450652400, 1454886000, 1439157600, 1441576800, 1463954400, 1442181600, 
1446418800, 1454886000, 1476050400, 1461535200, 1456700400, 1456700400, 
1435528800, 1456700400), class = c("POSIXct", "POSIXt"), tzone = ""), 
    num_c = c(1219, 1257, 1195, 33, 1105, 1223, 1257, 1317, 486, 
    1227, 477, 1039, 1238, 1008, 1137, 1294, 1070, 596, 1295, 
    1354, 1010, 1294, 1348, 1254, 19, 1185, 24, 1287, 1198, 955, 
    1324, 1293, 1343, 1162, 1272, 972, 972, 179, 1343, 1105, 
    1085, 1020, 947, 1375, 1005, 477, 596, 1198, 928, 1137, 1263, 
    1237, 1054, 1288, 1185, 1115, 1257, 1301, 1294, 1185, 1039, 
    957, 1131, 33, 477, 1258, 477, 1039, 1362, 1246, 596, 1010, 
    972, 1238, 477, 1296, 972, 1148, 1105, 24, 553, 1297, 1288, 
    1223, 789, 1298, 1082, 1353, 1030, 1287, 1203, 1008, 1294, 
    1227, 1298, 1203, 1346, 1010, 19, 1303)), .Names = c("txs", 
"week", "num_c"), row.names = c(NA, -100L), class = "data.frame")
它有三列:

  • 一个叫做
    num\u c
    :这是一个客户端号码

  • 一个叫做
    :这是一周中星期一的日期
    num_c
    下了订单

  • 一个叫做“txs”:每周账户余额和客户余额(num_c)
现在,对于每个客户端,我都想使用dplyr来扩展(加入)这个数据集 因此,我每周都会在这段时间内得到一行 那个客户已经下了订单

此外,新的空单元格(用于txs)应填充NA。 原始(无扩展)数据集中存在的周/客户对应的txs值应保留其原始值

我试过:

 library(dplyr)
 stretch_Data = Data %>%
                group_by(num_c) %>%
                right_join(seq(min(week), max(week), by = 'week'), by = "week")
但是我得到了

Error in seq(min(week), max(week), by = "week") : object 'week' not found
这是愚蠢的,因为数据确实包含一周列(非常感谢)

我做错了什么

编辑 感谢@mt1022提供他的解决方案(评论如下)。这很聪明。但有一个问题仍然存在:是否有可能从数据中包含所有的柱--为了说明我的观点,我添加了一个:txs——然后缺失的值(对应于没有购买的周数的值)应该由NA来填充(就像在一个关节中)。与我们购买的客户*周日期对应的单元格应仅保留其原始值


本质上,对新的(扩展表)执行na.omit()应该返回原始表;就像在关节上一样

使用@mt1002的溶液,只需简单添加一次即可获得
txs
列。我通过
sum(txs)
获得了单个值,根据您的需要,这也可以是
min(txs)
max(txs)
的单个值

Data %>% group_by(num_c) %>% 
  summarise(week = list(seq(min(week), max(week), by = 'week')),
            txs = sum(txs)) %>% 
  unnest(week)
在澄清之后,这是我提出的解决方案,在没有订单的情况下,除了每周每个用户的订单数量之外,还有几周的NA值。您还可以通过
num_c
使用包含上述df查询的左连接,将周与订单的列表连接起来

library(lubridate)
a <- data.frame(week = rep(seq(1,52,1)))
Data %>% 
  group_by(num_c) %>% 
  mutate(week_num = week(week)) %>% 
  group_by(num_c, week_num) %>% 
  summarise(txs = sum(txs),
            number_orders = n()) %>% 
  full_join(a, by = c("week_num"="week")) %>% 
  ungroup() %>% 
  arrange(week_num)
库(lubridate)
a%
分组依据(数字c)%>%
变异(周数=周数))%>%
分组依据(数量、周数)%>%
总结(txs=总和(txs),
订单数量=n())%>%
完全加入(a,by=c(“周数”=“周”))%>%
解组()%>%
安排(周数)

错误非常明显。
seq(min(week)、max(week)、by='week')
中没有列
week
;如何告诉dplyr从数据中提取week?要联接的两个数据帧都应该包含联接列。对于您的案例,我会尝试类似的方法:
Data%>%group\u by(num\u c)%%>%summary(week=list(seq(min(week)、max(week)、by='week')))%%>%unest(week)
。@mt1022:谢谢!这很聪明。但有一个问题仍然存在:是否有可能从数据中包含所有的柱--为了说明我的观点,我添加了一个——然后缺失的值(对应于没有购买的周数)应该由NA来填充(就像在一个关节中);更近了,但还没到。看看2015-10-19周和2015-06-29周。对于§num_c=19的这几周,txs的值应为-54.09和-400(这些周在该客户的数据中)。但是您的代码在所有周内返回-454.09(行为应该是在所有周内输入NA,但2015-10-19和2015-06-29除外,它们应该具有正确的原始值)。创建一年内所有周的df是否更容易,然后用每周订单数填充df,然后用每周txs的总和或NA填充df?我用过lubridate或类似的方法来做这件事,效果很好。是的,但应该按客户数量做。例如,用每个客户每周的订单数量填充df。与您的答案中的数据集一样,对于给定的q客户机,我们不需要在该客户机的时间范围之外为周创建行(即使所有客户机的时间范围涵盖多年)。