Julia中分组列的多个摘要统计信息_Julia

Julia中分组列的多个摘要统计信息

julia

Julia中分组列的多个摘要统计信息,julia,Julia,我试图在下面的代码中与Julia（1.5.3）一起工作，它只是我试图做的事情的一个表示 using DataFrames using DataFramesMeta using RDatasets ## setup iris = dataset("datasets", "iris") gdf = groupby(iris, :Species) ## Applying the split combine ## This code works fine co

我试图在下面的代码中与Julia（1.5.3）一起工作，它只是我试图做的事情的一个表示

using DataFrames
using DataFramesMeta
using RDatasets

## setup
iris = dataset("datasets", "iris")
gdf = groupby(iris, :Species)

## Applying the split combine
## This code works fine
combine(gdf, nrow, (valuecols(gdf) .=> mean))

但是，当我尝试为多个摘要执行此操作时，它失败了

 combine(gdf, nrow, (valuecols(gdf) .=> [mean, sum]))

错误：

错误：维度不匹配（“数组”无法广播到公共尺寸；获得长度为4和2英寸的尺寸

“小错误调试”表明，如果我将代码更改为：

combine(gdf, nrow, ([:SepalLength, :PetalLength] .=> [mean,sum]))
## This code works but its still not correct as it doesn't tell me the mean and sum of both the columns , rather mean for SepalLength and sum for PetalLength, which was expected as per previous error

再进一步研究一下，我意识到，我们可以这样做，这个结果是正确的，但结果是长表格形式，而不是宽表格形式。我原以为这会给我的问题的答案，但它似乎没有如预期的那样工作

 combine(gdf, ([:SepalWidth, :PetalWidth] .=>  x -> ([sum(x), mean(x)])))

 ## The code above works but output is 6x3 DataFrame, I was expecting 3x6 DataFrame

我的问题是：
是否有任何方法可以使用拆分合并，这样我可以得到一个如下所示的宽表（我使用了“do end”和“combine”来生成它）。我同意这个解决方案，但我需要在这里键入所有列，是否有任何方法可以将所有汇总统计数据（总和、中值、平均值等）作为combine中提供的所有列的列。我希望我的问题是清楚的，请指出，如果它是一个重复或沟通不好。谢谢

combine(gdf) do x return(sw_sum = sum(x.SepalWidth), sw_mean = mean(x.SepalWidth), sp_mean = mean(x.PetalWidth), sp_sum = sum(x.PetalWidth) ) end ## My expected answer should be similar to this #3×5 DataFrame # Row │ Species sw_sum sw_mean sp_mean sp_sum # │ Cat… Float64 Float64 Float64 Float64 #─────┼──────────────────────────────────────────────── # 1 │ setosa 171.4 3.428 0.246 12.3 # 2 │ versicolor 138.5 2.77 1.326 66.3 # 3 │ virginica 148.7 2.974 2.026 101.3
此外，这项工作：

combine(gdf, [:1] .=> [mean, sum, minimum, maximum,median])
但这不起作用，并且抛出了如上所述的尺寸误差，仍然让我摸不着头脑：

combine(gdf, [:1, :2] .=> [mean, sum, minimum, maximum,median])
做：
或
或
（注意，
mean
和
sum
之间没有逗号）
原因是您需要向broadcasted
=>
添加一个额外的维度，以便获得所有输入组合
编辑：

…
只需迭代集合并将其元素作为连续的位置参数传递给函数，例如：

julia> f(x...) = x f (generic function with 1 method) julia> f(1, [2,3,4]...) (1, 2, 3, 4)

我是从脑子里写的，忘了把结果弄平。谢谢，所有的解决方案都很有魅力。一个请求如果可能的话，请您添加一些关于第二个解决方案如何工作的细节。我得了第一名和第三名，但无法理解省略号，它与python*args类似吗？同时我正在研究省略号，再次感谢我添加了一个解释。现在清楚了吗？
combine(gdf, nrow, (valuecols(gdf) .=> [mean sum])...)

combine(gdf, nrow, [n => f for n in valuecols(gdf) for f in [mean sum]])

julia> f(x...) = x f (generic function with 1 method) julia> f(1, [2,3,4]...) (1, 2, 3, 4)