Python 如何将可变长度列表的DataFrame列（或系列）转换为固定宽度的DataFrame_Python_Pandas_Numpy_Dataframe

Python 如何将可变长度列表的DataFrame列（或系列）转换为固定宽度的DataFrame

python pandas numpy dataframe

Python 如何将可变长度列表的DataFrame列（或系列）转换为固定宽度的DataFrame,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我想将具有不同长度列表的数据帧列（或系列）转换为具有固定列数的数据帧 DataFrame的列数与最长列表的列数相同，而其他列表较短的列数可以是NaN或任何值当数据以字符串形式出现时，str模块允许使用str.split中的选项expand。但是我还没有找到一个等价物来表示可变长度的列表在我的示例中，列表中的类型是int，但我们的想法是可以使用任何类型。这可以防止将序列简单地转换为str并应用所提到的expand属性下面我展示了使用str.split函数运行示例和列表的代码，然后是要转换的序

我想将具有不同长度列表的数据帧列（或系列）转换为具有固定列数的数据帧

DataFrame的列数与最长列表的列数相同，而其他列表较短的列数可以是NaN或任何值

当数据以字符串形式出现时，str模块允许使用

str.split

中的选项

expand

。但是我还没有找到一个等价物来表示可变长度的列表

在我的示例中，列表中的类型是int，但我们的想法是可以使用任何类型。这可以防止将序列简单地转换为str并应用所提到的

expand

属性

下面我展示了使用

str.split

函数运行示例和列表的代码，然后是要转换的序列的最小示例

我发现了一个使用apply的解决方案，如示例所示，但速度非常慢，因此没有用处

import numpy as np
import pandas as pd

# Example with a list as a string
A = pd.DataFrame({'lists': [
                    '[]',
                    '[360,460,160]',
                    '[360,1,2,3,4,5,6]',
                    '[10,20,30]',
                    '[100,100,100,100]',
                    ],
                  'other': [1,2,3,4,5]
                 })
print(A['lists'].astype(str).str.strip('[]').str.split(',', expand=True))

# Example with actual lists
B = pd.DataFrame({'lists': [
                    [],
                    [360,460,160],
                    [360,1,2,3,4,5,6],
                    [10,20,30],
                    [100,100,100,100],
                ],
                  'other': [1,2,3,4,5]
                 })

# Create and pre-fill expected columns
max_len = max(B['lists'].str.len())
for idx in range(max_len):
    B[f'lists_{idx}'] = np.nan

# Use .apply to fill the columns
def expand_int_list(row, col, df):
    for idx, item in enumerate(row[col]):
        df.loc[row.name, f'{col}_{idx}'] = item
        
B.apply(lambda row: expand_int_list(row, 'lists', B), axis=1)
print(B)

输出：

     0     1     2     3     4     5     6
0       None  None  None  None  None  None
1  360   460   160  None  None  None  None
2  360     1     2     3     4     5     6
3   10    20    30  None  None  None  None
4  100   100   100   100  None  None  None
                     lists  other  lists_0  lists_1  lists_2  lists_3  \
0                       []      1      NaN      NaN      NaN      NaN   
1          [360, 460, 160]      2    360.0    460.0    160.0      NaN   
2  [360, 1, 2, 3, 4, 5, 6]      3    360.0      1.0      2.0      3.0   
3             [10, 20, 30]      4     10.0     20.0     30.0      NaN   
4     [100, 100, 100, 100]      5    100.0    100.0    100.0    100.0   

   lists_4  lists_5  lists_6  
0      NaN      NaN      NaN  
1      NaN      NaN      NaN  
2      4.0      5.0      6.0  
3      NaN      NaN      NaN  
4      NaN      NaN      NaN

                  lists  other  lists_0  lists_1  lists_2  lists_3
0                    []      1      NaN      NaN      NaN      NaN
1       [360, 460, 160]      2    360.0    460.0    160.0      NaN
2                  None      3      NaN      NaN      NaN      NaN
3          [10, 20, 30]      4     10.0     20.0     30.0      NaN
4  [100, 100, 100, 100]      5    100.0    100.0    100.0    100.0

编辑和最终解决方案：使其他问题中的方法失败的一个重要信息是，在我的数据中，我有的时候没有列表，有的时候却没有

None

在这种情况下，使用

tolist（）

将再次生成一系列列表，Pandas将不允许使用

B.loc[B[col].isna（），col]=[]

将这些单元格设为空列表

我找到的解决方案是仅在非无的行中使用

tolist（）

，并使用原始索引使用

concat

：

# Example with actual lists
B = pd.DataFrame({'lists': [
                    [],
                    [360,460,160],
                    None,
                    [10,20,30],
                    [100,100,100,100],
                ],
                  'other': [1,2,3,4,5]
                 })

col = 'lists'
# I need to keep the index for the concat afterwards.
extended = pd.DataFrame(B.loc[~B[col].isna(), col].tolist(),
                        index=B.loc[~B[col].isna()].index)
extended = extended.add_prefix(f'{col}_')
B = pd.concat([B, extended], axis=1)

print(B)

输出：

     0     1     2     3     4     5     6
0       None  None  None  None  None  None
1  360   460   160  None  None  None  None
2  360     1     2     3     4     5     6
3   10    20    30  None  None  None  None
4  100   100   100   100  None  None  None
                     lists  other  lists_0  lists_1  lists_2  lists_3  \
0                       []      1      NaN      NaN      NaN      NaN   
1          [360, 460, 160]      2    360.0    460.0    160.0      NaN   
2  [360, 1, 2, 3, 4, 5, 6]      3    360.0      1.0      2.0      3.0   
3             [10, 20, 30]      4     10.0     20.0     30.0      NaN   
4     [100, 100, 100, 100]      5    100.0    100.0    100.0    100.0   

   lists_4  lists_5  lists_6  
0      NaN      NaN      NaN  
1      NaN      NaN      NaN  
2      4.0      5.0      6.0  
3      NaN      NaN      NaN  
4      NaN      NaN      NaN

                  lists  other  lists_0  lists_1  lists_2  lists_3
0                    []      1      NaN      NaN      NaN      NaN
1       [360, 460, 160]      2    360.0    460.0    160.0      NaN
2                  None      3      NaN      NaN      NaN      NaN
3          [10, 20, 30]      4     10.0     20.0     30.0      NaN
4  [100, 100, 100, 100]      5    100.0    100.0    100.0    100.0

如果将嵌套列表转换为列表并传递给

DataFrame

构造函数，则会像最长列表一样添加缺少的值，然后通过以下方式将其附加到原始值：

如果将嵌套列表转换为列表并传递给

DataFrame

构造函数，则会像最长列表一样添加缺少的值，然后通过以下方式将其附加到原始值：

这回答了你的问题吗？谢谢@MayankPorwal的建议，我认为由于长度可变，我的案例不适用，但原因是有时没有列表而没有列表。我会更新这个问题。这能回答你的问题吗？谢谢@MayankPorwal的建议，我认为由于长度可变，我的案例不适用，但原因是有时没有列表而没有列表。我会更新问题。谢谢@jezrael的回复。我以前尝试过.tolist（），认为由于长度可变的列表，它不起作用，但有时我没有列表，而没有列表。我将更新问题并添加找到的解决方案，从您的答复开始。谢谢@jezrael的答复。我以前尝试过.tolist（），认为由于长度可变的列表，它不起作用，但有时我没有列表，而没有列表。我将更新问题并添加找到的解决方案，从您的答复开始。