Wolfram mathematica 在Mathematica中如何实现数据透视表函数？_Wolfram Mathematica

Wolfram mathematica 在Mathematica中如何实现数据透视表函数？

wolfram-mathematica

Wolfram mathematica 在Mathematica中如何实现数据透视表函数？,wolfram-mathematica,Wolfram Mathematica,（或，）非常有用。有人想过如何在Mathematica中实现类似的函数吗我不熟悉透视表的使用，但以上面链接的页面为例，我建议： Needs["Calendar`"] key = # -> #2[[1]] & ~MapIndexed~ {"Region", "Gender", "Style", "Ship Date", "Units", "

（或，）非常有用。有人想过如何在Mathematica中实现类似的函数吗

我不熟悉透视表的使用，但以上面链接的页面为例，我建议：

Needs["Calendar`"]
key = # -> #2[[1]] & ~MapIndexed~
       {"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
   {"North", "South", "East", "West"},
   {"Boy", "Girl"},
   {"Tee", "Golf", "Fancy"},
   IntegerString[#, 10, 2] <> "/2011" & /@ Range@12,
   Range@15,
   Range[8.00, 15.00, 0.01],
   Range[6.00, 14.00, 0.01]
   };
data = RandomChoice[#, 150] & /@ choices // Transpose;

然后：

这是一个粗略的例子，但它给出了如何做到这一点的想法。如果您有更具体的要求，我将尝试解决它们

以下是Sjoerd回答方式的更新。

操纵

块在很大程度上是复制的，但我相信我的

数据透视表

效率更高，我试图正确地本地化符号，因为这是可用代码，而不是粗略的示例

我从相同的示例数据开始，但我嵌入了字段标题，因为我觉得这更能代表正常使用

data = ImportString[#, "TSV"][[1]] & /@ Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]];

data = Transpose[{
    data[[All, 1]], 
    data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"}, 
    data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
    data[[All, 4]], 
    data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"}, 
    data[[All, 6]],
    data[[All, 7]], 
    data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"}, 
    data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 4 -> "Service", 5 -> "Professional", 6 -> "Other"}, 
    data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"}, 
    data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}];

PrependTo[data,
  {"Education", "South", "Sex", "Experience", "Union", "Wage", "Age", "Race", "Occupation", "Sector", "Marriatal status"}
  ];

我的

数据透视表数据是自包含的
pivotTableData[data_, field1_, field2_, dependent_, op_] :=
  Module[{key, sow, h1, h2, ff},
    (key@# = #2[[1]]) & ~MapIndexed~ data[[1]];
    sow = #[[key /@ {dependent, field2}]] ~Sow~ #[[key@field1]] &;
    {h1, h2} = Union@data[[2 ;;, key@#]] & /@ {field1, field2};
    ff = # /. {{} -> Missing@"NotAvailable", _ :> op @@ #} &;
    {
     {h1, h2},
     Join @@ Reap[sow ~Scan~ Rest@data, h1, ff /@ Reap[Sow @@@ #2, h2][[2]] &][[2]]
    }
  ]

数据透视表
仅依赖于数据透视表数据

：

pivotTable[data_?MatrixQ] :=
 DynamicModule[{raw, t, header = data[[1]], opList =
    {Mean              -> "Mean of \[Rule]",
     Total             -> "Sum of \[Rule]",
     Length            -> "Count of \[Rule]",
     StandardDeviation -> "SD of \[Rule]",
     Min               -> "Min of \[Rule]",
     Max               -> "Max of \[Rule]"}},
  Manipulate[
   raw = pivotTableData[data, f1, f2, f3, op];
   t = ConstantArray["", Length /@ raw[[1]] + 2];
   t[[1, 1]] = Control[{op, opList}];
   t[[1, 3]] = Control[{f2, header}];
   t[[2, 1]] = Control[{f1, header}];
   t[[1, 2]] = Control[{f3, header}];
   {{t[[3 ;; -1, 1]], t[[2, 3 ;; -1]]}, t[[3 ;; -1, 3 ;; -1]]} = raw;
   TableView[N@t, Dividers -> All],
   Initialization :> {op = Mean, f1 = data[[1,1]], f2 = data[[1,2]], f3 = data[[1,3]]}
  ]
 ]

用途很简单：

pivotTable[data]

这是我想到的。它使用中定义的函数SelectEquivalents。函数1和函数2意味着标准1和标准2具有不同的分组可能性。FilterFunction在这里是为了基于头名称定义数据的任意筛选公式

使用Mr.Wizard的数据示例，这里是此函数的一些用法

criteria={"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
criteria1 = "Region";
criteria2 = "Ship Date";
consideredData = "Units";

PivotTable[data,criteria,criteria1,criteria2,consideredData]

一个很好的例子

function2 = If[ToExpression@StringTake[#, 2] <= 6, "First Semester", "Second Semester"] &;
PivotTable[data,criteria,criteria1,criteria2,consideredData,FilterFunction->("Gender"=="Girl"&&"Units"*"Price"<=100&),Function2->function2]

使用，这样你就能两全其美。本产品在Excel和mma之间创建了一个完美的双向链接。

快速而肮脏的透视表可视化：

我将从一个更有趣的真实数据集开始：

data = ImportString[#, "TSV"][[1]] & /@ 
          Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]
       ];

一点后处理：

data =
  {
    data[[All, 1]],
    data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
    data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
    data[[All, 4]],
    data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
    data[[All, 6]],
    data[[All, 7]],
    data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
    data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 
                      4 -> "Service", 5 -> "Professional", 6 -> "Other"},
    data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
    data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
  }\[Transpose];

header = {"Education", "South", "Sex", "Experience", "Union", "Wage", 
          "Age", "Race", "Occupation", "Sector", "Marriatal status"};
MapIndexed[(headerNumber[#1] = #2[[1]]) &, header];
levelNames = Union /@ Transpose[data];
levelLength = Length /@ levelNames;

现在来看看真正的东西。它还使用中定义的功能

SelectEquivalents

还有一点工作要做。

DynamicModule

应该变成一个完全独立的函数，标题的内容更加精简，但这应该足以给人留下第一印象。

我在游戏中有点落后。下面是另一个具有类似对象形式的自包含解决方案

使用@Mr.Wizard创建的随机数据：

    key = # -> #2[[1]] & ~MapIndexed~
       {"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
   {"North", "South", "East", "West"},
   {"Boy", "Girl"},
   {"Tee", "Golf", "Fancy"},
   IntegerString[#, 10, 2] <> "/2011" & /@ Range@12,
   Range@15,
   Range[8.00, 15.00, 0.01],
   Range[6.00, 14.00, 0.01]
   };
data = RandomChoice[#, 5000] & /@ choices // Transpose;

您可以将其用作：

pivot=createPivotTable[data,"RowColValueHeads"-> ({"Ship Date","Region","Units"}/.key)];
pivot["Table"]
pivot["Data"]
pivot["Row"]
pivot["Col"]

要获得：

我相信速度比@Ms.Wizard快，但我必须做一个更好的测试，现在没有时间了。

@Wizard先生的答案确实可靠且持久，因为它基于ReapSow方法，适用于Mathematica中的一些map reduce作业。由于MMA自身的发展，也考虑了一个新的选择。

GroupBy（在Mathematica v.10.0中引入）提供了映射减少操作的一般化
因此，上述
数据
作业可以按如下方式实现（部分是为了提高可读性）：
{{“东”、“北”、“南”、“西”}、{“01/2011”、“02/2011”、“03/2011”， "04/2011", "05/2011", "06/2011", "07/2011", "08/2011", "09/2011", “2011年10月”、“2011年11月”、“2011年12月”}
我们可以使用外部为
表格
设置一个矩形模板：

template = Outer[List, Apply[Sequence][headings]];
以GroupBy和Total作为第三个参数的主要作业：

pattern = Append[Normal @ GroupBy[data, (#[[{1, 4}]] &) -> (#[[-1]] &), Total], _ -> Null];
最后，将模式注入模板（并应用表格标题以实现美观）：
这就产生了一些：

注意：我们在
数据
中总共列出了最后一列。（当然，许多其他聚合也是可能的。）
透视表不是筛选器和投影的组合吗？或者您正在考虑在excel UI中使用数据透视表？@rcollyer，selectIn对此可能非常有用。@belisarius，事实上，我正在考虑这两种方法，但首先是获取正确的数据（正确的可配置！）。同意，可能有用。我不确定我是否做过，尽管我可能做过。另外，我只是在帖子中添加了一个更通用的术语，交叉列表；您的答案提供了一个集所有功能于一体的解决方案，这就是我接受它的原因。+1表示完全实现。（我没有测试它，但我相信你。）但有一点需要注意：你似乎使用了
SelectEquivalents
，在我看来，你需要在你的帖子中包含这个定义，因为它是非标准的，而且有几个版本是关于它的。@Mr.Wizard认为我应该省略它，因为在其他帖子中已经有一些对这个函数的引用。我现在提供了一个指向toolbag版本的链接。通过使用Excel，我可以使自己（和客户的账单）变得简单，但您很好地表明，mma作为一种编程语言没有太多限制。-我们该怎么告诉全世界？酷！自从一年多前MathGroup中的一些人意外发现它以来，我们一直在等待更多关于
TableView的信息（），但它在8.0.4版中仍然没有记录。关于这个急需的功能有什么新闻/更新/新的攻击吗？@kguler没有，除了在最近一次虚拟会议的一次演示中显示它的使用。我在这里使用它是为了看起来更像Excel。我最初使用的是Grid 。该产品似乎没有针对Mathematica的第8版进行更新。我还想知道它将如何处理.xlsx格式。@Mr.Wizard我相信这是一个很好的问题，可以移植到Mathematicsase中。 key = # -> #2[[1]] & ~MapIndexed~ {"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"}; choices = { {"North", "South", "East", "West"}, {"Boy", "Girl"}, {"Tee", "Golf", "Fancy"}, IntegerString[#, 10, 2] <> "/2011" & /@ Range@12, Range@15, Range[8.00, 15.00, 0.01], Range[6.00, 14.00, 0.01] }; data = RandomChoice[#, 5000] & /@ choices // Transpose; Options[createPivotTable]={"RowColValueHeads"-> {1,2,3},"Function"-> Total}; createPivotTable[data_,opts:OptionsPattern[{createPivotTable}]]:=Module[{r,c,v,aggDataIndex,rowRule,colRule,pivot}, {r,c,v}=OptionValue["RowColValueHeads"]; pivot["Row"]= Union@data[[All,r]]; pivot["Col"]= Union@data[[All,c]]; rowRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Row"]]; colRule= Dispatch[#->#2[[1]]&~MapIndexed~pivot["Col"]]; aggDataIndex={#[[1,r]]/.rowRule,#[[1,c]]/.colRule}->OptionValue["Function"]@#[[All,v]]&/@GatherBy[data,#[[{r,c}]]&]; pivot["Data"]=Normal@SparseArray@aggDataIndex; pivot["Properties"]={"Data","Row","Col"}; pivot["Table"]=TableForm[pivot["Data"], TableHeadings -> {pivot["Row"], pivot["Col"]}]; Format[pivot]:="PivotObject"; pivot ] pivot=createPivotTable[data,"RowColValueHeads"-> ({"Ship Date","Region","Units"}/.key)]; pivot["Table"] pivot["Data"] pivot["Row"] pivot["Col"] headings = Union @ data[[All, #]] & /@ {1, 4} template = Outer[List, Apply[Sequence][headings]]; pattern = Append[Normal @ GroupBy[data, (#[[{1, 4}]] &) -> (#[[-1]] &), Total], _ -> Null]; TableForm[Replace[template, pattern, {2}], TableHeadings -> headings]