Python 将groupby转换为嵌套json--不需要计算字段_Python_Json_D3.js_Pandas

Python 将groupby转换为嵌套json--不需要计算字段

python json d3.js pandas

Python 将groupby转换为嵌套json--不需要计算字段,python,json,d3.js,pandas,Python,Json,D3.js,Pandas,我正在制作d3.js图形。我的数据在一个巨大的multitab.xls中。我必须从每个选项卡获取数据，所以我决定将其全部转储到pandas中，并导出一些.json 原始数据，分布在多个选项卡上： demography, area, state, month, rate over 65, region2, GA, May, 23 over 65, region2, AL, May, 25 NaN, random_odd_data, mistake, error 18-65, region2

我正在制作d3.js图形。我的数据在一个巨大的multitab.xls中。我必须从每个选项卡获取数据，所以我决定将其全部转储到pandas中，并导出一些.json

原始数据，分布在多个选项卡上：

demography, area, state, month, rate
over 65,   region2, GA, May, 23
over 65,  region2, AL, May, 25
NaN,  random_odd_data, mistake, error
18-65, region2, GA, 77
18-65, region2, AL, 75

现在，放入熊猫，合并并清理：

     demography area     state  month rate
0    over 65    region2  GA     May   23
1    over 65    region2  AL     May   25
2    18-65      region2  GA     May   50
3    18-65      region2  AL     May   55

现在，把它分组

group = df.groupby(['state', 'demography'])

屈服

<pandas.core.groupby.DataFrameGroupBy object at 0x106939610>

产生几乎正确的结果，除了我不想计算任何东西，我只想要“速率”

果不其然，这只会为每个值导出“1”，lol:

group.reset_index().to_json("myjson2.json", orient="index")

当我快到了，我如何导出它，使每个州都是一个家长

[
    {
        "state": "Alabama",
        "over 65": 25,
        "18-65": 50

    },
    {
        "state": "Georgia",
        "over 65": 23,
        "18-65": 55
    }
]

count方法统计每个列中每个组的非NaN项的数量，因此它们在这里都是1（每个组的大小为1，没有NaN）。
（我找不到特定链接，但在中提到了。）

我认为你真正想要的是：

我想你正在寻找

orient='records'

（不过你需要先

reset\u index

）：

count方法统计每个列中每个组的非NaN项的数量，因此它们在这里都是1（每个组的大小为1，没有NaN）。
（我找不到特定链接，但在中提到了。）

我认为你真正想要的是：

我想你正在寻找

orient='records'

（不过你需要先

reset\u index

）：

呜呜！它就像Excel，只是很酷。一个add-无论出于何种原因，它将“rate”视为类型对象，而不是float。它给了我一个错误“没有要聚合的数值类型”。所以我不得不把它转换成float:df.convert_objects（'rate'，convert_numeric=True）呜呼！它就像Excel，只是很酷。一个add-无论出于何种原因，它将“rate”视为类型对象，而不是float。它给了我一个错误“没有要聚合的数值类型”。所以我不得不将其转换为float:df.convert\u objects（'rate'，convert\u numeric=True）

group.reset_index().to_json("myjson2.json", orient="index")

[
    {
        "state": "Alabama",
        "over 65": 25,
        "18-65": 50

    },
    {
        "state": "Georgia",
        "over 65": 23,
        "18-65": 55
    }
]

In [11]: res = df.pivot_table('rate', 'state', 'demography')

In [12]: res
Out[12]:
demography  18-65  over65
state
AL             55      25
GA             50      23

In [13]: res.reset_index().to_json(orient='records')
Out[13]: '[{"state":"AL","18-65":55,"over65":25},{"state":"GA","18-65":50,"over65":23}]'