当你必须保持跑步平衡时,有没有比for循环更好的解决方案?
我有一个包含数百万行的大型数据帧。它是时间序列数据。例如:当你必须保持跑步平衡时,有没有比for循环更好的解决方案?,r,R,我有一个包含数百万行的大型数据帧。它是时间序列数据。例如: dates <- c(1,2,3) purchase_price <- c(5,2,1) income <- c(2,2,2) df <- data.frame(dates=dates,price=purchase_price,income=income) 有没有更快的解决方案 谢谢 正如Paul指出的,一些迭代是必要的。一个实例和上一个点之间存在依赖关系 但是,只有在进行购买时才会发生依赖关系(请阅读:您只需
dates <- c(1,2,3)
purchase_price <- c(5,2,1)
income <- c(2,2,2)
df <- data.frame(dates=dates,price=purchase_price,income=income)
有没有更快的解决方案
谢谢 正如Paul指出的,一些迭代是必要的。一个实例和上一个点之间存在依赖关系 但是,只有在进行购买时才会发生依赖关系(请阅读:您只需要在..时重新计算余额)。因此,您可以“批量”进行迭代 尝试下面的方法,通过确定下一行中哪一行有足够的余额进行购买。然后,它在单个调用中处理前面的所有行,然后从该点开始
library(data.table)
DT <- as.data.table(df)
## Initial Balance
b.init <- 2
setattr(DT, "Starting Balance", b.init)
## Raw balance for the day, regardless of purchase
DT[, balance := b.init + cumsum(income)]
DT[, buying := FALSE]
## Set N, to not have to call nrow(DT) several times
N <- nrow(DT)
## Initialize
ind <- seq(1:N)
# Identify where the next purchase is
while(length(buys <- DT[ind, ind[which(price <= balance)]]) && min(ind) < N) {
next.buy <- buys[[1L]] # only grab the first one
if (next.buy > ind[[1L]]) {
not.buys <- ind[1L]:(next.buy-1L)
DT[not.buys, buying := FALSE]
}
DT[next.buy, `:=`(buying = TRUE
, balance = (balance - price)
) ]
# If there are still subsequent rows after 'next.buy', recalculate the balance
ind <- (next.buy+1) : N
# if (N > ind[[1]]) { ## So that
DT[ind, balance := cumsum(income) + DT[["balance"]][[ ind[[1]]-1L]] ]
# }
}
# Final row needs to be outside of while-loop, or else will buy that same item multiple times
if (DT[N, !buying && (balance > price)])
DT[N, `:=`(buying = TRUE, balance = (balance - price)) ]
库(data.table)
DT对于容易用循环表示的问题,我越来越相信Rcpp是正确的解决方案。这是相对的,你可以很自然地表达loop-y算法
以下是使用Rcpp解决问题的方法:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List purchaseWhenPossible(NumericVector date, NumericVector income,
NumericVector price, double init_balance = 0) {
int n = date.length();
NumericVector balance(n);
LogicalVector buy(n);
for (int i = 0; i < n; ++i) {
balance[i] = ((i == 0) ? init_balance : balance[i - 1]) + income;
// Buy it if you can afford it
if (balance[i] >= price[i]) {
buy[i] = true;
balance[i] -= price[i];
} else {
buy[i] = false;
}
}
return List::create(_["buy"] = buy, _["balance"] = balance);
}
/*** R
# Copying input data from Ricardo
df <- data.frame(
dates = 1:6,
income = rep(2, 6),
price = c(5, 2, 3, 5, 2, 1)
)
out <- purchaseWhenPossible(df$dates, df$income, df$price, 3)
df$balance <- out$balance
df$buy <- out$buy
*/
#包括
使用名称空间Rcpp;
//[[Rcpp::导出]]
列出可能的采购(数字矢量日期、数字矢量收入、,
数值向量价格,双初始值=0{
int n=date.length();
数值矢量平衡(n);
LogicalVector购买(n);
对于(int i=0;i=价格[i]){
买[我]=真的;
余额[i]=价格[i];
}否则{
买[我]=假;
}
}
退货清单::创建([“购买”]=购买,[“余额”]=余额);
}
/***R
#从Ricardo复制输入数据
df我删除了我的答案,这只是下一步尝试的提示。我同意有太多的相互依赖,不能简单地使用cumsum()
。也许有人会顺便来看看谁有解决方案。如果价格高于当前的余额,为什么要将余额设置为零?试试xts package函数apply.daily.@Fernando这仍然会在这里的每一行上循环。应该是购买金额=行$price*(余额>=行$price)
## Show output
{
print(DT)
cat("Starting Balance was", attr(DT, "Starting Balance"), "\n")
}
## Starting with 3:
dates price income balance buying
1: 1 5 2 0 TRUE
2: 2 2 2 0 TRUE
3: 3 3 2 2 FALSE
4: 4 5 2 4 FALSE
5: 5 2 2 4 TRUE
6: 6 1 2 5 TRUE
Starting Balance was 3
## Starting with 2:
dates price income balance buying
1: 1 5 2 4 FALSE
2: 2 2 2 4 TRUE
3: 3 3 2 3 TRUE
4: 4 5 2 0 TRUE
5: 5 2 2 0 TRUE
6: 6 1 2 1 TRUE
Starting Balance was 2
# I modified your original data slightly, for testing
df <- rbind(df, df)
df$dates <- seq_along(df$dates)
df[["price"]][[3]] <- 3
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List purchaseWhenPossible(NumericVector date, NumericVector income,
NumericVector price, double init_balance = 0) {
int n = date.length();
NumericVector balance(n);
LogicalVector buy(n);
for (int i = 0; i < n; ++i) {
balance[i] = ((i == 0) ? init_balance : balance[i - 1]) + income;
// Buy it if you can afford it
if (balance[i] >= price[i]) {
buy[i] = true;
balance[i] -= price[i];
} else {
buy[i] = false;
}
}
return List::create(_["buy"] = buy, _["balance"] = balance);
}
/*** R
# Copying input data from Ricardo
df <- data.frame(
dates = 1:6,
income = rep(2, 6),
price = c(5, 2, 3, 5, 2, 1)
)
out <- purchaseWhenPossible(df$dates, df$income, df$price, 3)
df$balance <- out$balance
df$buy <- out$buy
*/