Python:转换字典列表
我有一个带有单列“positions”的数据框,其中包含字典列表。下面是它的外观:Python:转换字典列表,python,pandas,dataframe,dictionary,Python,Pandas,Dataframe,Dictionary,我有一个带有单列“positions”的数据框,其中包含字典列表。下面是它的外观: df1.head() positions 0 [] 1 [{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 280.245, 'cost_basis': 280.38612250000006, 'sid'
df1.head()
positions
0 []
1 [{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 280.245, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 121.666, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
2 [{'last_sale_price': 121.8, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 280.686, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 120.61200000000001, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
3 [{'last_sale_price': 122.11, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 281.43, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 120.953, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
4 [{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 282.793, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 121.11, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
我想提取股票代码及其金额。最终输出数据帧应如下所示:
GLD SPY TLT
0 271 129 248
1 271 129 248
2 271 129 248
这是我到目前为止得到的,但它的格式还不正确。我也认为有更好的方法
import pandas as pd
import numpy as np
from itertools import chain
df1.positions = df1.positions.str.replace('(Equity)(\(\d+\s\[[a-zA-Z]+\]\))', "'" + r"\1\2" + "'", regex = True)
s = df1.positions.apply(eval)
s1 = s.tolist()
consolidate = []
for l in list(chain(*s1)):
temp = {}
for k,(key, value) in enumerate(l.items()) :
temp.update({f"col{k+1}":key,
f"col{k+1}_val":value})
consolidate.append(temp)
df2 = pd.DataFrame.from_dict(consolidate)
df2 = df2[['col3_val', 'col4_val']].rename(columns = {'col3_val': 'ticker', 'col4_val':'amount'})
df2.ticker = df2.ticker.str.replace(r'(Equity\(\d+\s\[)([a-zA-Z]+)(\]\))', r'\2')
df3 = df2.pivot( columns='ticker', values='amount')
df3.head()
ticker GLD SPY TLT
0 271.0 NaN NaN
1 NaN 129.0 NaN
2 NaN NaN 248.0
3 271.0 NaN NaN
4 NaN 129.0 NaN
为了便于操作,我将权益(0[GLD])替换为“GLD”,并更改了几个值。 简单的apply函数和dict理解就可以了
import pandas as pd
df1 = pd.DataFrame(
{ 'positions':[ [],
[{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 271}, {'last_sale_price': 280.245, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 121.666, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}],
[{'last_sale_price': 121.8, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 281}, {'last_sale_price': 280.686, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 120.61200000000001, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}],
[{'last_sale_price': 122.11, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 291}, {'last_sale_price': 281.43, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 120.953, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}],
[{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 261}, {'last_sale_price': 282.793, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 121.11, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}]]} )
df1['positions'].apply(lambda row: pd.Series({x['sid']:x['amount'] for x in row}))
Out[28]:
GLD SPY TLT
0 NaN NaN NaN
1 271.0 129.0 248.0
2 281.0 129.0 248.0
3 291.0 129.0 248.0
4 261.0 129.0 248.0
这似乎对我有用(在@Siva的帮助下)
谢谢你的回复。我尝试了你的解决方案,先去掉股权部分。但我在运行您的解决方案的第二行时出错<代码>df1.positions=df1.positions.str.replace(r'(Equity\(\d++\s\[)([a-zA-Z]+(\]\)',““+r”\2“+”,regex=True)df1[“positions”]。应用(lambda行:pd.Series({x['sid']:x['amount']表示第x行}))。错误是
类型错误:字符串索引必须是整数
了解从列表转换为数据帧的方法要点。在replace命令后检查df1..您应该能够修复。
df1.positions = df1.positions.str.replace(r'(Equity\(\d+\s\[)([a-zA-Z]+)(\]\))', " '" + r"\2" + "'", regex = True)
#convert positions to list
s = df1.positions.apply(eval).tolist()
#creating temp dataframe with the list
temp = pd.DataFrame({'positions': s })
#finally creating final output format
df2 = temp['positions'].apply(lambda row: pd.Series({x['sid']:x['amount'] for x in row}))