Python数据帧代码优化
实际上,我希望用python优化我的代码。我正在进行ESELAST搜索并获得json响应,现在我正在迭代json响应并将它们存储为列表,以将它们作为列附加到dataframe中Python数据帧代码优化,python,json,pandas,dataframe,Python,Json,Pandas,Dataframe,实际上,我希望用python优化我的代码。我正在进行ESELAST搜索并获得json响应,现在我正在迭代json响应并将它们存储为列表,以将它们作为列附加到dataframe中 unmtchd_ESdata={"Response from Elastic seaach"} for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])): list6.append(
unmtchd_ESdata={"Response from Elastic seaach"}
for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
list6.append(unmtchd_ESdata['avg'])
list7.append(unmtchd_ESdata['key'])
....
....
mkt_df=pd.DataFrame()
mkt_df["market_avg_total_sales_count"]=dict6
mkt_df["pos_code"]=dict7
...
....
最后,结果将具有mkt_df dataframe,所有列都按照附加到列表的顺序分配值。如果列表6附加了[01200001290098003003]这样的值,那么它将以数据格式出现在下表中,其余的也同样适用
market_avg_total_sales_count pos_code
0 329.75 01200000129
1 15.00 00980030003
现在我的问题是,我读取了太多的变量,我希望它们作为数据帧值,显然,有N个列表使我的程序高效,因为所有这些操作都在内存中。
有没有关于如何以更少的空间和时间复杂性复制这样的场景的建议
编辑:
在此处添加我的json结构:
{
"took": 28,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 12170,
"max_score": 0,
"hits": []
},
"aggregations": {
"filtered": {
"doc_count": 5,
"POSCode": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "01200000129",
"doc_count": 4,
"POSCodeModifier": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "0",
"doc_count": 4,
"CSP": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "5555",
"doc_count": 4,
"per_stock": {
"buckets": [
{
"key_as_string": "2018-02-26",
"key": 1519603200000,
"doc_count": 0,
"avg_week_qty_sales": {
"value": 0
}
},
{
"key_as_string": "2018-03-05",
"key": 1520208000000,
"doc_count": 1,
"avg_week_qty_sales": {
"value": 10
}
},
{
"key_as_string": "2018-03-12",
"key": 1520812800000,
"doc_count": 1,
"avg_week_qty_sales": {
"value": 300
}
},
{
"key_as_string": "2018-03-19",
"key": 1521417600000,
"doc_count": 1,
"avg_week_qty_sales": {
"value": 1000
}
},
{
"key_as_string": "2018-03-26",
"key": 1522022400000,
"doc_count": 1,
"avg_week_qty_sales": {
"value": 9
}
}
]
},
"market_week_metrics": {
"count": 4,
"min": 9,
"max": 1000,
"avg": 329.75,
"sum": 1319,
"sum_of_squares": 1090181,
"variance": 163810.1875,
"std_deviation": 404.7347124969639,
"std_deviation_bounds": {
"upper": 1139.2194249939278,
"lower": -479.71942499392776
}
}
}
]
}
}
]
}
},
{
"key": "00980030003",
"doc_count": 1,
"POSCodeModifier": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "0",
"doc_count": 1,
"CSP": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "5555",
"doc_count": 1,
"per_stock": {
"buckets": [
{
"key_as_string": "2018-02-26",
"key": 1519603200000,
"doc_count": 0,
"avg_week_qty_sales": {
"value": 0
}
},
{
"key_as_string": "2018-03-05",
"key": 1520208000000,
"doc_count": 1,
"avg_week_qty_sales": {
"value": 15
}
},
{
"key_as_string": "2018-03-12",
"key": 1520812800000,
"doc_count": 0,
"avg_week_qty_sales": {
"value": 0
}
},
{
"key_as_string": "2018-03-19",
"key": 1521417600000,
"doc_count": 0,
"avg_week_qty_sales": {
"value": 0
}
},
{
"key_as_string": "2018-03-26",
"key": 1522022400000,
"doc_count": 0,
"avg_week_qty_sales": {
"value": 0
}
}
]
},
"market_week_metrics": {
"count": 1,
"min": 15,
"max": 15,
"avg": 15,
"sum": 15,
"sum_of_squares": 225,
"variance": 0,
"std_deviation": 0,
"std_deviation_bounds": {
"upper": 15,
"lower": 15
}
}
}
]
}
}
]
}
}
]
}
}
}
}
我试图获取的值
for i in range(len(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'])):
list6.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['avg'])
list7.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['key'])
list8.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max']-unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
list9.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['max'])
list10.append(unmtchd_ESdata['aggregations']['filtered']['POSCode']['buckets'][i]['POSCodeModifier']['buckets'][0]['CSP']['buckets'][0]['market_week_metrics']['min'])
您可以只创建一个列表并附加一个具有n dim的元组,其中n是每次迭代的列数,例如:
for i in range(3):
some_list.append((i, i+3))
结果:
[(0, 3), (1, 4), (2, 5)]
将其传递给数据帧将提供:
pd.DataFrame(some_list, columns=['col1', 'col2'])
col1 col2
0 0 3
1 1 4
2 2 5
试着使其适应您的解决方案。我感觉慢的部分是顺序列表的附件,而不是数据帧系列结构。你确定瓶颈是从列表中分配pandas系列吗?即使我感觉有点慢,顺序列表附录也是因为我在代码中使用了大约20个这样的附录。在这种情况下,你可能需要向我们展示一些json。不是所有20个键和所有行,可能4个键对应4行就足够了。然后,我们可以为您建议一种更好的方法来构建您的数据框架。添加您可以查看@jppkudos!你让我不用再占用额外的空间了