Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:转换字典列表_Python_Pandas_Dataframe_Dictionary - Fatal编程技术网

Python:转换字典列表

Python:转换字典列表,python,pandas,dataframe,dictionary,Python,Pandas,Dataframe,Dictionary,我有一个带有单列“positions”的数据框,其中包含字典列表。下面是它的外观: df1.head() positions 0 [] 1 [{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 280.245, 'cost_basis': 280.38612250000006, 'sid'

我有一个带有单列“positions”的数据框,其中包含字典列表。下面是它的外观:

df1.head()

    positions
0   []
1   [{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 280.245, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 121.666, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
2   [{'last_sale_price': 121.8, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 280.686, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 120.61200000000001, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
3   [{'last_sale_price': 122.11, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 281.43, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 120.953, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
4   [{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': Equity(0 [GLD]), 'amount': 271}, {'last_sale_price': 282.793, 'cost_basis': 280.38612250000006, 'sid': Equity(1 [SPY]), 'amount': 129}, {'last_sale_price': 121.11, 'cost_basis': 121.72783299999999, 'sid': Equity(2 [TLT]), 'amount': 248}]
我想提取股票代码及其金额。最终输出数据帧应如下所示:

   GLD  SPY  TLT
0  271  129  248
1  271  129  248
2  271  129  248
这是我到目前为止得到的,但它的格式还不正确。我也认为有更好的方法

import pandas as pd
import numpy as np
from itertools import chain

df1.positions = df1.positions.str.replace('(Equity)(\(\d+\s\[[a-zA-Z]+\]\))', "'" + r"\1\2" + "'", regex = True)
s = df1.positions.apply(eval)
s1 = s.tolist()
consolidate = []
for l in list(chain(*s1)):
    temp = {}
    for k,(key, value) in enumerate(l.items()) :
        temp.update({f"col{k+1}":key,
                     f"col{k+1}_val":value})
    consolidate.append(temp)
df2 = pd.DataFrame.from_dict(consolidate)

df2 = df2[['col3_val', 'col4_val']].rename(columns = {'col3_val': 'ticker', 'col4_val':'amount'})
df2.ticker = df2.ticker.str.replace(r'(Equity\(\d+\s\[)([a-zA-Z]+)(\]\))', r'\2')
df3 = df2.pivot( columns='ticker', values='amount')
df3.head()

ticker  GLD SPY TLT
0   271.0   NaN NaN
1   NaN 129.0   NaN
2   NaN NaN 248.0
3   271.0   NaN NaN
4   NaN 129.0   NaN

为了便于操作,我将权益(0[GLD])替换为“GLD”,并更改了几个值。 简单的apply函数和dict理解就可以了

import pandas as pd


df1 = pd.DataFrame(
{ 'positions':[ [],
[{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 271}, {'last_sale_price': 280.245, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 121.666, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}],
[{'last_sale_price': 121.8, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 281}, {'last_sale_price': 280.686, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 120.61200000000001, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}],
[{'last_sale_price': 122.11, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 291}, {'last_sale_price': 281.43, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 120.953, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}],
[{'last_sale_price': 121.98, 'cost_basis': 122.04199000000001, 'sid': 'GLD', 'amount': 261}, {'last_sale_price': 282.793, 'cost_basis': 280.38612250000006, 'sid': 'SPY', 'amount': 129}, {'last_sale_price': 121.11, 'cost_basis': 121.72783299999999, 'sid': 'TLT', 'amount': 248}]]} )

df1['positions'].apply(lambda row: pd.Series({x['sid']:x['amount'] for x in row}))

Out[28]: 
     GLD    SPY    TLT
0    NaN    NaN    NaN
1  271.0  129.0  248.0
2  281.0  129.0  248.0
3  291.0  129.0  248.0
4  261.0  129.0  248.0
这似乎对我有用(在@Siva的帮助下)


谢谢你的回复。我尝试了你的解决方案,先去掉股权部分。但我在运行您的解决方案的第二行时出错<代码>df1.positions=df1.positions.str.replace(r'(Equity\(\d++\s\[)([a-zA-Z]+(\]\)',““+r”\2“+”,regex=True)df1[“positions”]。应用(lambda行:pd.Series({x['sid']:x['amount']表示第x行}))。错误是
类型错误:字符串索引必须是整数
了解从列表转换为数据帧的方法要点。在replace命令后检查df1..您应该能够修复。
df1.positions = df1.positions.str.replace(r'(Equity\(\d+\s\[)([a-zA-Z]+)(\]\))', " '" + r"\2" + "'", regex = True)
#convert positions to list
s = df1.positions.apply(eval).tolist()
#creating temp dataframe with the list
temp = pd.DataFrame({'positions': s })
#finally creating final output format
df2 = temp['positions'].apply(lambda row: pd.Series({x['sid']:x['amount'] for x in row}))