Python 如何从数据帧中的列表中删除值?
我创建了一个数据帧:Python 如何从数据帧中的列表中删除值?,python,pandas,dataframe,Python,Pandas,Dataframe,我创建了一个数据帧: [in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id']) # Split the product_id's for the testing data testing_df.set_index(['transaction_id'],inplace=True) testing_df['product_id'] = testing_df['product_id'].appl
[in] testing_df =pd.DataFrame(test_array,columns=['transaction_id','product_id'])
# Split the product_id's for the testing data
testing_df.set_index(['transaction_id'],inplace=True)
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
[out] product_id
transaction_id
001 [P01]
002 [P01, P02]
003 [P01, P02, P09]
004 [P01, P03]
005 [P01, P03, P05]
006 [P01, P03, P07]
007 [P01, P03, P08]
008 [P01, P04]
009 [P01, P04, P05]
010 [P01, P04, P08]
现在如何从结果中删除“P04”和“P08”
我试过:
# Remove P04 and P08 from consideration
testing_df['product_id'] = testing_df['product_id'].map(lambda x: x.strip('P04'))
testing_df['product_id'].replace(regex=True,inplace=True,to_replace=r'P04,',value=r'')
然而,这两种选择似乎都不起作用
数据类型包括:
[in] print(testing_df.dtypes)
[out] product_id object
dtype: object
[in] print(testing_df['product_id'].dtypes)
[out] object
我会在拆分之前执行: 数据: 解决方案:
In [271]: df['product_id'] = df['product_id'].str.replace(r'\,*?(?:P04|P08)\,*?', '') \
.str.split(',')
In [272]: df
Out[272]:
product_id
transaction_id
1 [P01]
2 [P01, P02]
3 [P01, P02, P09]
4 [P01, P03]
5 [P01, P03, P05]
6 [P01, P03, P07]
7 [P01, P03]
8 [P01]
9 [P01, P05]
10 [P01]
或者您可以更改:
testing_df['product_id'] = testing_df['product_id'].apply(lambda row: row.split(','))
与:
演示:
将要删除的所有元素存储在列表中
remove_results = ['P04','P08']
for k in range(len(testing_df['product_id'])):
for r in remove_results:
if r in testing_df['product_id'][k]:
testing_df['product_id][k].remove(r)
列表理解可能是最有效的:
exc = {'P04', 'P08'}
df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]
请注意,低效的Python级循环是不可避免的
apply
+lambda
、map
+lambda
或就地解决方案都涉及Python级循环。请帮助我了解product\u id
是否是列表或字符串列。product\u id
是字符串列表列,即-['P01'、'P02'、'P03']
为了清晰起见,您应该打印出列的类型。@cᴏʟᴅsᴘᴇᴇᴅ 另外请看我的另一个问题等等<代码>产品id是一个列表。首先使用astype(str)
,然后使用apply(ast.literal\u eval)
after@cᴏʟᴅsᴘᴇᴇᴅ, 它在以下步骤之后变成了一个列表:testing_df['product_id'].apply(lambda row:row.split(','))
@zsad512,你的意思是在执行testing_df['product_id']之前它是一个列表。apply(lambda row:row.split(','))
?@MaxU我认为它是一个前后字符串的列表……首先是它的一个字符串“P01,P02,P03”
然后它变成了3个字符串“P01”、“P02”、“P03”
@zsad512,好的,谢谢你的澄清-我当时的假设是正确的…太棒了!非常感谢。
In [280]: df.product_id.apply(lambda row: list(set(row.split(','))- set(['P04','P08'])))
Out[280]:
transaction_id
1 [P01]
2 [P01, P02]
3 [P09, P01, P02]
4 [P01, P03]
5 [P01, P03, P05]
6 [P07, P01, P03]
7 [P01, P03]
8 [P01]
9 [P01, P05]
10 [P01]
Name: product_id, dtype: object
remove_results = ['P04','P08']
for k in range(len(testing_df['product_id'])):
for r in remove_results:
if r in testing_df['product_id'][k]:
testing_df['product_id][k].remove(r)
exc = {'P04', 'P08'}
df['product_id'] = [[i for i in L if i not in exc] for L in df['product_id']]