Python 3.x 在DataFrame列中的列表上进行迭代_Python 3.x_Pandas_List Comprehension

Python 3.x 在DataFrame列中的列表上进行迭代

python-3.x pandas

Python 3.x 在DataFrame列中的列表上进行迭代,python-3.x,pandas,list-comprehension,Python 3.x,Pandas,List Comprehension,我有一个数据帧df，如下所示： my_list Index 0 [81310, 81800] 1

我有一个数据帧

df

，如下所示：

                                                  my_list
Index                                                                
0                                               [81310, 81800]
1                                                      [82160]
2            [75001, 75002, 75003, 75004, 75005, 75006, 750...
3                                                      [95190]
4                                               [38170, 38180]
5                                                      [95240]
6                                                      [71150]
7                                                      [62520]

我有一个名为

code

的列表，其中至少有一个元素

code = ['75008', '75015']

我想在我的

DataFrame

中创建另一个名为

my_min

的列，其中包含列表

code

的每个元素与来自

df.my_list

的列表之间的最小绝对差值

以下是我尝试过的命令：

df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in df.loc[:, 'my_list'].str[:]])
>>> TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

#or

df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in df.loc[:, 'my_list']])
>>> TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

#or

df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in df.loc[:, 'my_list'].tolist()])
>>> TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

#or

df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in z for z in df.loc[:, 'my_list'].str[:]])
>>> UnboundLocalError: local variable 'z' referenced before assignment

#or

df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in z for z in df.loc[:, 'my_list']])
>>> UnboundLocalError: local variable 'z' referenced before assignment

#or

df.loc[:, 'my_list'] = min([abs(int(x)-int(y)) for x in code for y in z for z in df.loc[:, 'my_list'].tolist()])
>>> UnboundLocalError: local variable 'z' referenced before assignment

编写一个助手：

def find_min（lst）：

——很明显，您知道如何做。助手将查阅名为

code

的全局文件

然后应用它：

df['my_min'] = df.my_list.apply(find_min)

找到帮手的好处您可以为它编写单独的单元测试

如果你更喜欢避免使用globals，你会发现

partial

非常有用。

您可以通过列表理解来完成此操作：

import pandas as pd
import numpy as np
df = pd.DataFrame({'my_list':[[81310, 81800],[82160]]})

code = ['75008', '75015']

pd.DataFrame({'my_min':[min([abs(int(i) - j) for i in code for j in x]) 
              for x in df.my_list]})

   my_min
0    6295
1    7145

您还可以使用

pd.Series.apply

而不是外部列表，例如：

df.my_list.apply（lambda x:min（[abs（int（i）-j）表示代码中的i表示代码中的j表示代码中的j]）

如果您有pandas

0.25+

，您可以使用

explode

并与

np.min

组合：

# sample data
df = pd.DataFrame({'my_list':
                  [[81310, 81800], [82160], [75001,75002]]})
code = ['75008', '75015']

# concatenate the lists into one series
s = df.my_list.explode()

# convert `code` into np.array
code = np.array(code, dtype=int)

# this is the output series
pd.Series(np.min(np.abs(s.values[:,None] - code),axis=1), 
          index=s.index).min(level=0)

输出：

0    6295
1    7145
2       6
dtype: int64

你的预期产出是多少？