Python pandas-使用条件行方式应用替换功能_Python_Pandas

Python pandas-使用条件行方式应用替换功能

python pandas

Python pandas-使用条件行方式应用替换功能,python,pandas,Python,Pandas,从该数据帧开始df： 0 1 2 02 en it None 03 en None None 01 nl en fil 缺少一些值。我正在尝试按行应用替换函数，例如在伪代码中： def replace(x): if 'fil' and 'nl' in row: x = '' 我知道我可以做一些事情，比如： df.apply(f, axis=1) 函数f定义如下： def f(x): if x[0] ==

从该数据帧开始

df

：

     0     1     2
02  en    it  None
03  en  None  None
01  nl    en   fil

缺少一些值。我正在尝试按行应用替换函数，例如在伪代码中：

def replace(x):
    if 'fil' and 'nl' in row:
        x = ''

我知道我可以做一些事情，比如：

df.apply(f, axis=1)

函数

定义如下：

def f(x):
    if x[0] == 'nl' and x[2] == 'fil':
        x[0] = ''
    return x

获得：

     0     1     2
02  en    it  None
03  en  None  None
01        en   fil

但是先验地，我不知道字符串在列中的实际位置，所以我必须使用类似于

isin

的方法进行搜索，但要按行搜索

编辑：每个字符串都可以出现在整个列的任何位置。

您可以执行以下操作：

In [111]:
def func(x):
    return x.isin(['fil']).any() &  x.isin(['nl']).any()
df.loc[df.apply(func, axis=1)] = df.replace('nl','')
df

Out[111]:
    0     1     2
2  en    it  None
3  en  None  None
1        en   fil

因此，如果两个值都存在于行中的任何位置，则函数将返回

True

：

In [107]:
df.apply(func, axis=1)

Out[107]:
2    False
3    False
1     True
dtype: bool

Pandas中的布尔索引和文本比较你可以这样创建一个

df['0'].str.contains('nl') & df['2'].str.contains('fil')

或者，由于您更新了，列可能会更改：

df.isin(['fil']).any(axis=1) & df.isin(['nl']).any(axis=1)

以下是测试用例：

import pandas as pd
from cStringIO import StringIO

text_file = '''
     0     1     2
02  en    it  None
03  en  None  None
01  nl    en   fil
'''

# Read in tabular data
df = pd.read_table(StringIO(text_file), sep='\s+')
print 'Original Data:'
print df
print

# Create boolean index based on text comparison
boolIndx = df.isin(['nl']).any(axis=1) & df.isin(['fil']).any(axis=1)
print 'Example Boolean index:'
print boolIndx
print

# Replace string based on boolean assignment   
df.loc[boolIndx] = df.loc[boolIndx].replace('nl', '')
print 'Filtered Data:'
print df
print

谢谢，实际上我不想覆盖第一列中的所有值，只更改满足条件的行和列的值（例如，前两行中的

en

值应保留在那里…），谢谢！很抱歉，还有一些拼写错误需要修复，赋值函数现在应该写成：

df.loc[df.apply（func，axis=1）]=df.replace（'nl'，''）

。因此“nl”可以出现在任何地方，而不是第一列？是的，任何地方，我必须查看它的位置，然后用“”，但要用

df.loc[df.apply（func，axis=1）]=df.replace（'nl'，''）

应该可以正常工作。@FabioLamanna我建议您不要使用

apply

函数。如果不需要，只需直接使用布尔索引，请参见我的说明。谢谢。如果需要深入了解具体细节，还尝试为您和未来的观众提供链接。

Original Data:
    0     1     2
2  en    it  None
3  en  None  None
1  nl    en   fil

Example Boolean index:
2    False
3    False
1     True
dtype: bool

Filtered Data:
    0     1     2
2  en    it  None
3  en  None  None
1        en   fil