Python 如何从数据帧中的列表中删除值?

Python 如何从数据帧中的列表中删除值?,python,pandas,dataframe,Python,Pandas,Dataframe,我创建了一个数据帧: [in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id']) # Split the product_id's for the testing data testing_df.set_index(['transaction_id'],inplace=True) testing_df['product_id'] = testing_df['product_id'].appl

我创建了一个数据帧:

[in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id'])

# Split the product_id's for the testing data
testing_df.set_index(['transaction_id'],inplace=True)
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))

[out]                 product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]
005             [P01, P03, P05]
006             [P01, P03, P07]
007             [P01, P03, P08]
008                  [P01, P04]
009             [P01, P04, P05]
010             [P01, P04, P08]
现在如何从结果中删除“P04”和“P08”

我试过:

# Remove P04 and P08 from consideration
testing_df['product_id'] = testing_df['product_id'].map(lambda x: x.strip('P04'))

testing_df['product_id'].replace(regex=True,inplace=True,to_replace=r'P04,',value=r'')
然而,这两种选择似乎都不起作用

数据类型包括:

[in] print(testing_df.dtypes)
[out] product_id    object
dtype: object

[in] print(testing_df['product_id'].dtypes)
[out] object

我会在拆分之前执行

数据:

解决方案:

In [271]: df['product_id'] = df['product_id'].str.replace(r'\,*?(?:P04|P08)\,*?', '') \
                                             .str.split(',')

In [272]: df
Out[272]:
                     product_id
transaction_id
1                         [P01]
2                    [P01, P02]
3               [P01, P02, P09]
4                    [P01, P03]
5               [P01, P03, P05]
6               [P01, P03, P07]
7                    [P01, P03]
8                         [P01]
9                    [P01, P05]
10                        [P01]
或者您可以更改:

testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
与:

演示:


将要删除的所有元素存储在列表中

remove_results = ['P04','P08']
for k in range(len(testing_df['product_id'])):
    for r in remove_results:
        if r in testing_df['product_id'][k]:
            testing_df['product_id][k].remove(r)

列表理解可能是最有效的:

exc = {'P04', 'P08'}
df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]

请注意,低效的Python级循环是不可避免的
apply
+
lambda
map
+
lambda
或就地解决方案都涉及Python级循环。

请帮助我了解
product\u id
是否是列表或字符串列。
product\u id
是字符串列表列,即-
['P01'、'P02'、'P03']
为了清晰起见,您应该打印出列的类型。@cᴏʟᴅsᴘᴇᴇᴅ 另外请看我的另一个问题等等<代码>产品id是一个列表。首先使用
astype(str)
,然后使用
apply(ast.literal\u eval)
after@cᴏʟᴅsᴘᴇᴇᴅ, 它在以下步骤之后变成了一个列表:
testing_df['product_id'].apply(lambda row:row.split(','))
@zsad512,你的意思是在执行
testing_df['product_id']之前它是一个列表。apply(lambda row:row.split(','))
?@MaxU我认为它是一个前后字符串的列表……首先是它的一个字符串
“P01,P02,P03”
然后它变成了3个字符串
“P01”、“P02”、“P03”
@zsad512,好的,谢谢你的澄清-我当时的假设是正确的…太棒了!非常感谢。
In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08'])))
Out[280]:
transaction_id
1               [P01]
2          [P01, P02]
3     [P09, P01, P02]
4          [P01, P03]
5     [P01, P03, P05]
6     [P07, P01, P03]
7          [P01, P03]
8               [P01]
9          [P01, P05]
10              [P01]
Name: product_id, dtype: object
remove_results = ['P04','P08']
for k in range(len(testing_df['product_id'])):
    for r in remove_results:
        if r in testing_df['product_id'][k]:
            testing_df['product_id][k].remove(r)
exc = {'P04', 'P08'}
df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]