Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python/pandas:dict系列的数据帧:优化_Python_Pandas_Python 3.4 - Fatal编程技术网

Python/pandas:dict系列的数据帧:优化

Python/pandas:dict系列的数据帧:优化,python,pandas,python-3.4,Python,Pandas,Python 3.4,我有一系列的词汇表,我想把它转换成具有相同索引的数据帧 我找到的唯一方法是通过本系列的to_dict方法,这不是很有效,因为它返回到纯python模式,而不是numpy/pandas/cython 你对更好的方法有什么建议吗 非常感谢 >>> import pandas as pd >>> flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20})) >>> flagIn

我有一系列的词汇表,我想把它转换成具有相同索引的数据帧

我找到的唯一方法是通过本系列的
to_dict
方法,这不是很有效,因为它返回到纯python模式,而不是numpy/pandas/cython

你对更好的方法有什么建议吗

非常感谢

>>> import pandas as pd
>>> flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
>>> flagInfoSeries
0      {'a': 1, 'b': 2}
1    {'a': 10, 'b': 20}
dtype: object
>>> pd.DataFrame(flagInfoSeries.to_dict()).T
    a   b
0   1   2
1  10  20

我认为你可以使用理解:

import pandas as pd

flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
print flagInfoSeries
0      {u'a': 1, u'b': 2}
1    {u'a': 10, u'b': 20}
dtype: object

print pd.DataFrame(flagInfoSeries.to_dict()).T
    a   b
0   1   2
1  10  20

print pd.DataFrame([x for x in flagInfoSeries])
    a   b
0   1   2
1  10  20
定时

In [203]: %timeit pd.DataFrame(flagInfoSeries.to_dict()).T
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 554 µs per loop

In [204]: %timeit pd.DataFrame([x for x in flagInfoSeries])
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 361 µs per loop

In [209]: %timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 751 µs per loop
In [257]: %timeit pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
1000 loops, best of 3: 350 µs per loop
import pandas as pd

flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
flagInfoSeries.index = [2,8]
print flagInfoSeries
2      {u'a': 1, u'b': 2}
8    {u'a': 10, u'b': 20}

print pd.DataFrame(flagInfoSeries.to_dict()).T
    a   b
2   1   2
8  10  20

print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
    a   b
2   1   2
8  10  20
编辑:

如果需要保留索引,请尝试将
index=flagInfoSeries.index
添加到
DataFrame
构造函数:

print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
计时

In [203]: %timeit pd.DataFrame(flagInfoSeries.to_dict()).T
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 554 µs per loop

In [204]: %timeit pd.DataFrame([x for x in flagInfoSeries])
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 361 µs per loop

In [209]: %timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 751 µs per loop
In [257]: %timeit pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
1000 loops, best of 3: 350 µs per loop
import pandas as pd

flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
flagInfoSeries.index = [2,8]
print flagInfoSeries
2      {u'a': 1, u'b': 2}
8    {u'a': 10, u'b': 20}

print pd.DataFrame(flagInfoSeries.to_dict()).T
    a   b
2   1   2
8  10  20

print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
    a   b
2   1   2
8  10  20
样本

In [203]: %timeit pd.DataFrame(flagInfoSeries.to_dict()).T
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 554 µs per loop

In [204]: %timeit pd.DataFrame([x for x in flagInfoSeries])
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 361 µs per loop

In [209]: %timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 751 µs per loop
In [257]: %timeit pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
1000 loops, best of 3: 350 µs per loop
import pandas as pd

flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
flagInfoSeries.index = [2,8]
print flagInfoSeries
2      {u'a': 1, u'b': 2}
8    {u'a': 10, u'b': 20}

print pd.DataFrame(flagInfoSeries.to_dict()).T
    a   b
2   1   2
8  10  20

print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
    a   b
2   1   2
8  10  20

这避免了
命令
,但
应用
也可能很慢:

flagInfoSeries.apply(lambda dict: pd.Series(dict))

编辑:我发现这增加了时间比较。这是我的:

%timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
1000 loops, best of 3: 935 µs per loop

是的,所以你的电脑速度更快,但你的代码还是赢了:)是的,你是对的。我想在我的电脑中添加比较:)谢谢你的建议。事实上,在性能方面有了改进。。。但是索引没有被保留:列表理解给出了一个没有索引的a列表
[{mydict},…]
,而
to_dict
给出了一个
{index:{mydict},…}
。我想我现在就这样保存它。解决方案已修改,请检查它。使用索引会更快!谢谢我已经试过了,但实际上,应用速度很慢。