Python 如何从数据帧中的列表中删除值？_Python_Pandas_Dataframe

Python 如何从数据帧中的列表中删除值？

python pandas dataframe

Python 如何从数据帧中的列表中删除值？,python,pandas,dataframe,Python,Pandas,Dataframe,我创建了一个数据帧： [in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id']) # Split the product_id's for the testing data testing_df.set_index(['transaction_id'],inplace=True) testing_df['product_id'] = testing_df['product_id'].appl

我创建了一个数据帧：

[in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id'])

# Split the product_id's for the testing data
testing_df.set_index(['transaction_id'],inplace=True)
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))

[out]                 product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]
005             [P01, P03, P05]
006             [P01, P03, P07]
007             [P01, P03, P08]
008                  [P01, P04]
009             [P01, P04, P05]
010             [P01, P04, P08]

现在如何从结果中删除“P04”和“P08”

我试过：

# Remove P04 and P08 from consideration
testing_df['product_id'] = testing_df['product_id'].map(lambda x: x.strip('P04'))

testing_df['product_id'].replace(regex=True,inplace=True,to_replace=r'P04,',value=r'')

然而，这两种选择似乎都不起作用

数据类型包括：

[in] print(testing_df.dtypes)
[out] product_id    object
dtype: object

[in] print(testing_df['product_id'].dtypes)
[out] object

我会在拆分之前执行：
数据：
解决方案：

In [271]: df['product_id'] = df['product_id'].str.replace(r'\,*?(?:P04|P08)\,*?', '') \ .str.split(',') In [272]: df Out[272]: product_id transaction_id 1 [P01] 2 [P01, P02] 3 [P01, P02, P09] 4 [P01, P03] 5 [P01, P03, P05] 6 [P01, P03, P07] 7 [P01, P03] 8 [P01] 9 [P01, P05] 10 [P01]
或者您可以更改：

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
与：
演示：

将要删除的所有元素存储在列表中

remove_results = ['P04','P08'] for k in range(len(testing_df['product_id'])): for r in remove_results: if r in testing_df['product_id'][k]: testing_df['product_id][k].remove(r)

列表理解可能是最有效的：

exc = {'P04', 'P08'} df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]

请注意，低效的Python级循环是不可避免的
apply
+
lambda
、
map
+
lambda
或就地解决方案都涉及Python级循环。
请帮助我了解
product\u id
是否是列表或字符串列。
product\u id
是字符串列表列，即-
['P01'、'P02'、'P03']
为了清晰起见，您应该打印出列的类型。@cᴏʟᴅsᴘᴇᴇᴅ 另外请看我的另一个问题等等<代码>产品id是一个列表。首先使用
astype（str）
，然后使用
apply（ast.literal\u eval）
after@cᴏʟᴅsᴘᴇᴇᴅ, 它在以下步骤之后变成了一个列表：
testing_df['product_id'].apply（lambda row:row.split（'，'））
@zsad512，你的意思是在执行
testing_df['product_id']之前它是一个列表。apply（lambda row:row.split（'，'））
？@MaxU我认为它是一个前后字符串的列表……首先是它的一个字符串
“P01，P02，P03”
然后它变成了3个字符串
“P01”、“P02”、“P03”
@zsad512，好的，谢谢你的澄清-我当时的假设是正确的…太棒了！非常感谢。
In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08']))) Out[280]: transaction_id 1 [P01] 2 [P01, P02] 3 [P09, P01, P02] 4 [P01, P03] 5 [P01, P03, P05] 6 [P07, P01, P03] 7 [P01, P03] 8 [P01] 9 [P01, P05] 10 [P01] Name: product_id, dtype: object

remove_results = ['P04','P08'] for k in range(len(testing_df['product_id'])): for r in remove_results: if r in testing_df['product_id'][k]: testing_df['product_id][k].remove(r)

exc = {'P04', 'P08'} df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]