Python Dataframe：将包含列表的行扩展为包含所有列所需索引的多行_Python_Pandas

Python Dataframe：将包含列表的行扩展为包含所有列所需索引的多行

python pandas

Python Dataframe：将包含列表的行扩展为包含所有列所需索引的多行,python,pandas,Python,Pandas,我在pandas dataframe中有时间序列数据，索引为测量开始时的时间，列中有以固定采样率记录的值列表（连续索引的差异/列表中元素的数量）下面是它的外观： Time A B ....... Z 0 [1, 2, 3, 4] [1, 2, 3, 4] 2 [5, 6, 7, 8] [5, 6, 7, 8] 4 [9, 10, 11, 12] [9, 10, 1

我在pandas dataframe中有时间序列数据，索引为测量开始时的时间，列中有以固定采样率记录的值列表（连续索引的差异/列表中元素的数量）

下面是它的外观：

Time         A                   B                   .......  Z
0    [1, 2, 3, 4]      [1, 2, 3, 4]
2    [5, 6, 7, 8]      [5, 6, 7, 8]
4    [9, 10, 11, 12]   [9, 10, 11, 12]
6    [13, 14, 15, 16]  [13, 14, 15, 16 ] 
...

我想将所有列中的每一行展开为多行，以便：

Time       A           B  .... Z
0          1           1
0.5        2           2
1          3           3
1.5        4           4
2          5           5 
2.5        6           6
.......

到目前为止，我的想法是这样的（代码不起作用）：

我还尝试同时使用split（“，”）和stack（），但无法正确修复索引

可能不太理想，但可以使用

groupby

并应用一个函数来返回每行的扩展数据帧（此处假定时差固定为2.0）：

输出：

       A   B
Time        
0.0    1   1
0.5    2   2
1.0    3   3
1.5    4   4
2.0    5   5
2.5    6   6
3.0    7   7
3.5    8   8
4.0    9   9
4.5   10  10
5.0   11  11
5.5   12  12
6.0   13  13
6.5   14  14
7.0   15  15
7.5   16  16

屈服

In [183]: result
Out[183]: 
       A   B   C
Time            
0.00   1   1   1
0.25   2   2   2
0.50   3   3   3
0.75   4   4   4
2.00   5   5   5
2.25   6   6   6
2.50   7   7   7
2.75   8   8   8
4.00   9   9   9
4.25  10  10  10
4.50  11  11  11
4.75  12  12  12
6.00  13  13  13
6.25  14  14  14
6.50  15  15  15
6.75  16  16  16

In [175]: result
Out[175]: 
       A   B   C
Time            
2      1   1   1
2      2   2   2
...
8     15  15  15
8     16  16  16

说明：

循环浏览列表内容的一种方法是使用列表理解：

In [172]: df = pd.DataFrame({key: zip(*[iter(range(1, 17))]*4) for key in list('ABC')}, index=range(2,10,2))

In [173]: [(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)]
Out[173]: 
[(0, (1, 1, 1)),
 (0, (2, 2, 2)),
 ...
 (6, (15, 15, 15)),
 (6, (16, 16, 16))]

一旦获得了上述表单中的值，就可以使用

pd.DataFrame.from_items

：

result = pd.DataFrame.from_items([(index, zipped) for index, row in df.iterrows() for zipped in zip(*row)], orient='index', columns=df.columns)
result.index.name = 'Time'

屈服

In [183]: result
Out[183]: 
       A   B   C
Time            
0.00   1   1   1
0.25   2   2   2
0.50   3   3   3
0.75   4   4   4
2.00   5   5   5
2.25   6   6   6
2.50   7   7   7
2.75   8   8   8
4.00   9   9   9
4.25  10  10  10
4.50  11  11  11
4.75  12  12  12
6.00  13  13  13
6.25  14  14  14
6.50  15  15  15
6.75  16  16  16

In [175]: result
Out[175]: 
       A   B   C
Time            
2      1   1   1
2      2   2   2
...
8     15  15  15
8     16  16  16

要计算要添加到索引中的增量，您可以按索引分组，并找到每组的

cumcount

与

size

的比率：

In [176]: grouped = result.groupby(level=0)
In [177]: increment = (grouped.cumcount()/grouped.size())
In [179]: result.index = result.index + increment
In [199]: result.index
Out[199]: 
Int64Index([ 0.0, 0.25,  0.5, 0.75,  2.0, 2.25,  2.5, 2.75,  4.0, 4.25,  4.5,
            4.75,  6.0, 6.25,  6.5, 6.75],
           dtype='float64', name=u'Time')

比如说，要扩展的数据帧被命名为

df_to_expand

，您可以使用

eval

执行以下操作

df_expanded_list = []
for coln in df_to_expand.columns:
    _df = df_to_expand[coln].apply(lambda x: pd.Series(eval(x), index=[coln + '_' + str(i) for i in range(len(eval(x)))]))
    df_expanded_list.append(_df)

df_expanded = pd.concat(df_expanded_list, axis=1)

参考资料：

df_expanded_list = []
for coln in df_to_expand.columns:
    _df = df_to_expand[coln].apply(lambda x: pd.Series(eval(x), index=[coln + '_' + str(i) for i in range(len(eval(x)))]))
    df_expanded_list.append(_df)

df_expanded = pd.concat(df_expanded_list, axis=1)