R 对角乘积乘法
我想要每组日期的对角线乘法 主要数据集:R 对角乘积乘法,r,dataframe,data.table,dplyr,R,Dataframe,Data.table,Dplyr,我想要每组日期的对角线乘法 主要数据集: date Bucket D 1/31/2013 bkt 0 NA 1/31/2013 bkt 1(10-20) NA 1/31/2013 bkt 2(20-30) NA 1/31/2013 bkt 3(30-40) NA 1/31/2013 bkt 4(40+)
date Bucket D
1/31/2013 bkt 0 NA
1/31/2013 bkt 1(10-20) NA
1/31/2013 bkt 2(20-30) NA
1/31/2013 bkt 3(30-40) NA
1/31/2013 bkt 4(40+) NA
2/28/2013 bkt 0 NA
2/28/2013 bkt 1(10-20) 3.00
2/28/2013 bkt 2(20-30) 3.63
2/28/2013 bkt 3(30-40) 101
2/28/2013 bkt 4(40+) 102
3/30/2013 bkt 0 NA
3/30/2013 bkt 1(10-20) 0.55
3/30/2013 bkt 2(20-30) 0.40
3/30/2013 bkt 3(30-40) 103
3/30/2013 bkt 4(40+) 104
4/31/2013 bkt 0 NA
4/31/2013 bkt 1(10-20) 4.25
4/31/2013 bkt 2(20-30) 3.65
4/31/2013 bkt 3(30-40) 105
4/31/2013 bkt 4(40+) 106
5/30/2013 bkt 0 NA
5/30/2013 bkt 1(10-20) 2.34
5/30/2013 bkt 2(20-30) 4.10
5/30/2013 bkt 3(30-40) 107
5/30/2013 bkt 4(40+) 108
6/31/2013 bkt 0 NA
6/31/2013 bkt 1(10-20) 4
6/31/2013 bkt 2(20-30) 5
6/31/2013 bkt 3(30-40) 109
6/31/2013 bkt 4(40+) 110
7/30/2013 bkt 0 NA
7/30/2013 bkt 1(10-20) 8
7/30/2013 bkt 2(20-30) 7
7/30/2013 bkt 3(30-40) 111
7/30/2013 bkt 4(40+) 112
Diagonal multiplication is as follows:
1/31/2013 to 5/30/2013
2/28/2013 to 6/31/2013
3/30/2013 to 7/30/2013
each time we incrementing the next group of dates for diagonal product.
as so on ... the dates ranges from 1/31/2013 to 12/31/2016.
预期产出:
date Bucket D DP
1/31/2013 bkt 0 NA
1/31/2013 bkt 1(10-20) NA
1/31/2013 bkt 2(20-30) NA
1/31/2013 bkt 3(30-40) NA
1/31/2013 bkt 4(40+) NA
2/28/2013 bkt 0 NA
2/28/2013 bkt 1(10-20) 3.00
2/28/2013 bkt 2(20-30) 3.63
2/28/2013 bkt 3(30-40) 101
2/28/2013 bkt 4(40+) 102
3/30/2013 bkt 0 NA
3/30/2013 bkt 1(10-20) 0.55
3/30/2013 bkt 2(20-30) 0.40
3/30/2013 bkt 3(30-40) 103
3/30/2013 bkt 4(40+) 104
4/31/2013 bkt 0 NA
4/31/2013 bkt 1(10-20) 4.25
4/31/2013 bkt 2(20-30) 3.65
4/31/2013 bkt 3(30-40) 105
4/31/2013 bkt 4(40+) 106
5/30/2013 bkt 0 NA
5/30/2013 bkt 1(10-20) 2.34 13608 (108 * 105 * 0.40 * 3.00)
5/30/2013 bkt 2(20-30) 4.10 4536 (4.10 108 * 105 * 0.40)
5/30/2013 bkt 3(30-40) 107 11340 (108 * 105)
5/30/2013 bkt 4(40+) 108 108 (108)
6/31/2013 bkt 0 NA
6/31/2013 bkt 1(10-20) 4 23628.275 (110 * 107 * 3.65 * 0.55)
6/31/2013 bkt 2(20-30) 5 42960.5 (110 * 107 * 3.65)
6/31/2013 bkt 3(30-40) 109 1170 (110 * 109 )
6/31/2013 bkt 4(40+) 110 110 (100)
7/30/2013 bkt 0 NA
7/30/2013 bkt 1(10-20) 8 216627.6 (112 * 109 * 4.10 * 4.25)
7/30/2013 bkt 2(20-30) 7 50971.2 (112 * 109 * 4.10)
7/30/2013 bkt 3(30-40) 111 12432 (112 * 109)
7/30/2013 bkt 4(40+) 112 112 (112)
在输出中,我们只需要显示以下列:Date、Bucket、D和DP,因为DP是乘法的结果。
()中的任何内容仅用于解释结果。无需在列中显示
有错误的用户代码:
d <- read.csv("lossrate.csv", header=TRUE)
> d$date = as.Date(d$date, format="%m/%d/%Y")
> r <- reshape2::dcast(data=d, Bucket ~ date, value.var="D")[-1, -2]
Aggregation function missing: defaulting to length
> mat <- as.matrix(r[-1])
> myD <- col(mat) - row(mat)
> rg <- range(myD)
> out <- sapply(seq(rg[1], rg[2]), function(x)
+ `length<-`(rev(cumprod(rev(mat[myD==x]))), nrow(mat)))[,1:ncol(mat)]
> out[, colSums(is.na(out)) > 0] <- NA
> colnames(out) <- colnames(mat) # add dates as headers
> out <- reshape2::melt(cbind(r[1], out))
Using Bucket as id variables
> out <- merge(d, out, by.x=c("date", "Bucket"), by.y=c("variable", "Bucket"), all=TRUE)
> output:
date Bucket D value
1 2013-01-31 bkt 0 NA NA
2 2013-01-31 bkt 1(10-20) NA NA
3 2013-01-31 bkt 2(20-30) NA NA
4 2013-01-31 bkt 3(30-40) NA NA
5 2013-01-31 bkt 4(40+) NA NA
6 2013-02-28 bkt 0 NA NA
7 2013-02-28 bkt 1(10-20) 3.00 NA
8 2013-02-28 bkt 2(20-30) 3.63 NA
9 2013-02-28 bkt 3(30-40) 101.00 NA
10 2013-02-28 bkt 4(40+) 102.00 NA
11 2013-03-30 bkt 0 NA NA
12 2013-03-30 bkt 1(10-20) 0.55 NA
13 2013-03-30 bkt 2(20-30) 0.40 NA
14 2013-03-30 bkt 3(30-40) 103.00 NA
15 2013-03-30 bkt 4(40+) 104.00 NA
16 2013-05-30 bkt 0 NA NA
17 2013-05-30 bkt 1(10-20) 2.34 NA
18 2013-05-30 bkt 2(20-30) 4.10 NA
19 2013-05-30 bkt 3(30-40) 107.00 NA
20 2013-05-30 bkt 4(40+) 108.00 NA
21 2013-07-30 bkt 0 NA NA
22 2013-07-30 bkt 1(10-20) 8.00 1
23 2013-07-30 bkt 2(20-30) 7.00 1
24 2013-07-30 bkt 3(30-40) 111.00 1
25 2013-07-30 bkt 4(40+) 112.00 1
26 <NA> bkt 0 NA NA
27 <NA> bkt 0 NA NA
28 <NA> bkt 1(10-20) 4.25 2
29 <NA> bkt 1(10-20) 4.00 2
30 <NA> bkt 2(20-30) 5.00 2
31 <NA> bkt 2(20-30) 3.65 2
32 <NA> bkt 3(30-40) 109.00 2
33 <NA> bkt 3(30-40) 105.00 2
34 <NA> bkt 4(40+) 106.00 2
35 <NA> bkt 4(40+) 110.00 2
d$date=as.date(d$date,格式=“%m/%d/%Y”)
>r mat myD rg out 0]列名称(out)out输出:
日期桶D值
2013年1月31日bkt 0北美
2013年1月31日bkt 1(10-20)不适用
2013年1月31日bkt 2(20-30)不适用
4 2013-01-31 bkt 3(30-40)不适用
5 2013-01-31 bkt 4(40+)钠钠
6 2013年2月28日bkt 0北美
7 2013-02-28 bkt 1(10-20)3.00北美
8 2013-02-28 bkt 2(20-30)3.63北美
9 2013-02-28北冰洋3号(30-40)101.00北美
10 2013-02-28 bkt 4(40+)102.00北美
2013年3月11日30日bkt 0北美
12 2013-03-30 bkt 1(10-20)0.55北美
13 2013-03-30 bkt 2(20-30)0.40北美
14 2013-03-30 bkt 3(30-40)103.00北美
15 2013-03-30 bkt 4(40+)104.00北美
16 2013-05-30英国国民账户
172013-05-30BKT 1(10-20)2.34NA
18 2013-05-30 bkt 2(20-30)4.10北美
19 2013-05-30 bkt 3(30-40)107.00北美
20 2013-05-30 bkt 4(40+)108.00北美
21 2013-07-30英国国民账户
22 2013-07-30 bkt 1(10-20)8.00 1
23 2013-07-30 bkt 2(20-30)7.00 1
24 2013-07-30 bkt 3(30-40)111.00 1
25 2013-07-30 bkt 4(40+)112.00 1
26 bkt 0 NA
27 bkt 0 NA
28 bkt 1(10-20)4.25 2
29 bkt 1(10-20)4.00 2
30 bkt 2(20-30)5.00 2
31 bkt 2(20-30)3.65 2
32 bkt 3(30-40)109.00 2
33 bkt 3(30-40)105.00 2
34 bkt 4(40+)106.00 2
35 bkt 4(40+)110.00 2
我只是根据我的数据集更改了csv文件的名称
# Change to date format so can be ordered
d$date = as.Date(d$date, format="%m/%d/%Y")
# Form matrix so easier to find diagonals
# [-1, -2] removes rows & columns of all NA
r <- reshape2::dcast(data=d, Bucket ~ date, value.var="D")[-1, -2]
# convert to matrix to allow row and col functions & remove non-numeric grouping
mat <- as.matrix(r[-1])
# Multiply diagonals
myD <- col(mat) - row(mat)
rg <- range(myD)
out <- sapply(seq(rg[1], rg[2]), function(x)
`length<-`(rev(cumprod(rev(mat[myD==x]))), nrow(mat)))[,1:ncol(mat)]
# remove if not needed: ensures four values in product
# not sure if needed: done to match expected outcome
out[, colSums(is.na(out)) > 0] <- NA
# reshape
colnames(out) <- colnames(mat) # add dates as headers
out <- reshape2::melt(cbind(r[1], out))
# merge with original data
out <- merge(d, out, by.x=c("date", "Bucket"), by.y=c("variable", "Bucket"), all=TRUE)
数据(*修正4月和6月日期)
d四月没有31天;P(june也一样)它只是一个示例数据集:P无论如何,好的捕获:D希望您能够捕获它背后的逻辑;)我试过了,它不起作用,请你具体一点。你能给我看看你试过的代码吗?你以前问过这个问题…什么是7x5
矩阵的对角积?正如我之前所评论的,您的矩阵应该是对称的…MxM
,而不是MxN
@ChiPak;它不需要是对称的。Trymat=matrix(1:15,nc=5);col(mat)-row(mat)
第二个矩阵表示对角线我遵循您的代码:但我认为您已经更改了数据集的结构,您是否将bkt和bucktet的范围分开了??请您解释代码,它与主要问题中提到的变量一起工作。我将您问题中的数据复制并粘贴到read.table(text=“…”)
,尽管我不得不删除显示计算的额外注释。如果我的答案中的数据不同,你能以更简单的格式将你的数据添加到你的问题中吗?你可以通过添加dput(yourdataframe)
的结果来做到这一点。你能不能解释一下这句话:r@pradnya l谢谢你的更新,但是你能添加dput(df)的结果吗
回答您的问题-使我更容易将您的问题中的数据移动到我的r会话中。然后我会更新我的答案,尽管这不会有多大区别。我会再加一点解释。谢谢用户,我会试试,让你知道。希望看到你的评论。谢谢你的帮助。祝您今天过得愉快。
out[21:35,]
date Bucket D value
21 2013-05-30 bkt 0 NA NA
22 2013-05-30 bkt 1(10-20) 2.34 13608.00
23 2013-05-30 bkt 2(20-30) 4.10 4536.00
24 2013-05-30 bkt 3(30-40) 107.00 11340.00
25 2013-05-30 bkt 4(40+) 108.00 108.00
26 2013-06-30 bkt 0 NA NA
27 2013-06-30 bkt 1(10-20) 4.00 23628.28
28 2013-06-30 bkt 2(20-30) 5.00 42960.50
29 2013-06-30 bkt 3(30-40) 109.00 11770.00
30 2013-06-30 bkt 4(40+) 110.00 110.00
31 2013-07-30 bkt 0 NA NA
32 2013-07-30 bkt 1(10-20) 8.00 212724.40
33 2013-07-30 bkt 2(20-30) 7.00 50052.80
34 2013-07-30 bkt 3(30-40) 111.00 12208.00
35 2013-07-30 bkt 4(40+) 112.00 112.00
d <-
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L), .Label = c("1/31/2013",
"2/28/2013", "3/30/2013", "4/30/2013", "5/30/2013", "6/30/2013",
"7/30/2013"), class = "factor"), Bucket = structure(c(1L, 2L,
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L), .Label = c("bkt 0", "bkt 1(10-20)", "bkt 2(20-30)", "bkt 3(30-40)",
"bkt 4(40+)"), class = "factor"), D = c(NA, NA, NA, NA, NA, NA,
3, 3.63, 101, 102, NA, 0.55, 0.4, 103, 104, NA, 4.25, 3.65, 105,
106, NA, 2.34, 4.1, 107, 108, NA, 4, 5, 109, 110, NA, 8, 7, 111,
112)), .Names = c("date", "Bucket", "D"), class = "data.frame", row.names = c(NA,
-35L))