Python 如何检查pandas dataframe中的字符串值序列并输出后续_Python_Pandas

Python 如何检查pandas dataframe中的字符串值序列并输出后续

python pandas

Python 如何检查pandas dataframe中的字符串值序列并输出后续,python,pandas,Python,Pandas,我试图检查数据帧中的B-B-B序列 d = {'A': ['A','B','C','D','B','B','B','A','A','E','F','B','B','B','F','A','A']} testdf = pd.DataFrame(data=d) array = [] seq = pd.Series(['B', 'B', 'B']) for i in testdf.index: if testdf.A[i:len(seq)] == seq:

我试图检查数据帧中的B-B-B序列

d = {'A': ['A','B','C','D','B','B','B','A','A','E','F','B','B','B','F','A','A']}
testdf = pd.DataFrame(data=d)

array = []
seq = pd.Series(['B', 'B', 'B'])

for i in testdf.index:
    
    if testdf.A[i:len(seq)] == seq:
        
        array.append(testdf.A[i:len(seq)+1])

我得到一个错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我怎样才能让它工作？我不明白这个代码有什么“模棱两可”的地方

我希望这里的输出是：

A, F

不明确的

比较源于这样一个事实，即当您测试2个

系列

是否相等（它们的大小应该相同）时，会进行成对比较，并且您只获得一个

系列

的值，然后您应该决定是否要全部为真、全部为假、至少一个为真。。。使用

.any（），.all（），…

s1 = pd.Series(['B', 'B', 'B'])
s2 = pd.Series(['A', 'B', 'B'])

print(s1 == s2)
0    False
1     True
2     True
dtype: bool

print((s1 == s2).all())
False

要访问子序列，最好使用

.iloc

您需要使用

[i:i+len（seq）]

而不是

[i:len（seq）]

，因为这是一种

[from:to]

符号

您需要使用

Series.reset_index（drop=True）

，因为要比较序列，它们必须具有相同的索引，因此

seq

如果总是索引

0,1,2

，您计算的sht子序列需要相同的索引（因为

testdf.A.iloc[1:3]

被索引

1,2,3

]

在检查序列之前验证长度，以避免在子序列较小时在结束时出现异常

您以以下内容结束：

values = {'A': ['A', 'B', 'C', 'D', 'B', 'B', 'B', 'A', 'A', 'E', 'F', 'B', 'B', 'B', 'F', 'A', 'A']}
testdf = pd.DataFrame(values)
array = []
seq = pd.Series(['B', 'B', 'B'])
for i in testdf.index:
    test_seq = testdf.A.iloc[i:i + len(seq)].reset_index(drop=True)
    if len(test_seq) == len(seq) and (test_seq == seq).all():
        array.append(testdf['A'].iloc[i + len(seq)])
print(array)  # ['A', 'F']

我们不需要迭代数据帧中的每一行，而是可以迭代更小的序列（当

len（seq）我很困惑，如何得到A，F
？@Manakin我在寻找三个B发生后的字母。B-B-B-A，B-B-B-FDo你真的需要使用数据帧和序列吗？（如果这是一个示例或更复杂的情况），或者我们可以用其他方法吗？@azro它必须是一个数据帧，是的，它是一个来自更大投影的简化样本，看起来更像子字符串模式匹配…看看KMP算法。对于错误部分，使用（testdf.a[i:i+len（seq）]==seq.all（）
，因为testdf.a[i:i+len（seq）]==seq将给出一个布尔numpy数组。
import numpy as np

def find_next_row(seq, df, col):
    seq = seq[::-1]  # to get last index
    m = np.logical_and.reduce([df[col].shift(i).eq(seq[i]) for i in range(len(seq))])

    m = np.roll(m, 1)
    m[0] = False  # Don't wrap around
    
    return df.loc[m]
    # return df.loc[m, col].tolist()

find_next_row(['B', 'B', 'B'], df, col='A')
#    A
#7   A
#14  F

find_next_row(['B', 'B', 'B'], df, col='A')
#['A', 'F']