Python Pandas：返回包含最小数量的区分大小写的单词的行，这些单词中的每一个都在新行（'；\n'；）_Python_Pandas

Python Pandas：返回包含最小数量的区分大小写的单词的行，这些单词中的每一个都在新行（'；\n'；）

python pandas

Python Pandas：返回包含最小数量的区分大小写的单词的行，这些单词中的每一个都在新行（'；\n'；）,python,pandas,Python,Pandas,这是这个问题的后续问题它提供了一种解决方案，可以返回包含多个区分大小写的单词之一的行，这些单词位于新行“\n”之后现在，我想返回一行，其中包含最少数量的区分大小写的单词，这些单词跟在新行后面在下面的最小示例中，我尝试从特定集合中获取至少包含三个字符串的行 testdf = pd.DataFrame([ [ ' generates the final summary. \nRESULTS \nMethods We evaluate the performance of ', ],

这是这个问题的后续问题

它提供了一种解决方案，可以返回包含多个区分大小写的单词之一的行，这些单词位于新行“\n”之后

现在，我想返回一行，其中包含最少数量的区分大小写的单词，这些单词跟在新行后面

在下面的最小示例中，我尝试从特定集合中获取至少包含三个字符串的行

testdf = pd.DataFrame([
    [ ' generates the final summary. \nRESULTS \nMethods We evaluate the performance of ', ], 
                       [ 'the cat and bat \n\n\nRESULTS\n BACKGROUND teamed up to find some food'], 
                       ['anthropology with RESULTS \n\n\nMETHODS\n pharmacology and biology'],
    [ ' generates the final summary. \nMethods \nBACKGROUND We evaluate the performance of ', ], 
                       [ 'the cat and bat \n\n\nMETHODS\n teamed up to find some food'], 
                       ['anthropology with METHODS pharmacology and biology'],
        [ ' generates the final summary. \nBACKGROUND We evaluate the performance of ', ], 
                       [ 'the cat and bat \n\n\nBackground\n teamed up to find some food'], 
                       ['anthropology with \nBACKGROUND with \nRESULTS pharmacology and biology'],
    [ ' generates the final summary. \nBACKGROUND We \nRESULTS  evaluate \nCONCLUSIONS the performance of ', ]  
])
testdf.columns = ['A']
testdf.head(10)

A
0   generates the final summary. \nRESULTS \nMethods We evaluate the performance of
1   the cat and bat \n\n\nRESULTS\n BACKGROUND teamed up to find some food
2   anthropology with RESULTS \n\n\nMETHODS\n pharmacology and biology
3   generates the final summary. \nMethods \nBACKGROUND We evaluate the performance of
4   the cat and bat \n\n\nMETHODS\n teamed up to find some food
5   anthropology with METHODS pharmacology and biology
6   generates the final summary. \nBACKGROUND We evaluate the performance of
7   the cat and bat \n\n\nBackground\n teamed up to find some food
8   anthropology with \nBACKGROUND with \nRESULTS pharmacology and biology
9   generates the final summary. \nBACKGROUND We \nRESULTS evaluate \nCONCLUSIONS the performance of

然后

listStrings = { '\nRESULTS',  '\nMETHODS' ,  '\nBACKGROUND' , '\nCONCLUSIONS', '\nEXPERIMENT'}
testdf.loc[testdf.A.apply(lambda x: len(listStrings.intersection(x.split())) >= 3)]

我什么也不回

所需的结果将只返回最后一行

9   generates the final summary. \nBACKGROUND We \nRESULTS evaluate \nCONCLUSIONS the performance of

因为这是唯一一行，其中包含至少3个指定的大小写敏感词，这些词位于新行之后

检查

str.findall

testdf[testdf.A.str.findall('|'.join(listStrings)).str.len()>=3]
                                                   A
9   generates the final summary. \nBACKGROUND We ...

检查

str.findall

testdf[testdf.A.str.findall('|'.join(listStrings)).str.len()>=3]
                                                   A
9   generates the final summary. \nBACKGROUND We ...

使用

str.findall

：

>>> testdf[testdf['A'].str.findall('|'.join(listStrings)).map(len)>=3]
                                                   A
9   generates the final summary. \nBACKGROUND We ...
>>>

使用

str.findall

：

>>> testdf[testdf['A'].str.findall('|'.join(listStrings)).map(len)>=3]
                                                   A
9   generates the final summary. \nBACKGROUND We ...
>>>

连续三个问题：P我回答了三分之三的问题谢谢！！！！！今天学到了很多。我想我必须把最后一张支票交给@wenyoben，这个人这次技术上领先了（18秒），还提供了其他问题的答案。这没关系，但我可以再快一分钟，因为我的网络出了点问题，无论如何，你可以接受他的魔兽世界xD stackoverflow大师！！！哈哈，哈哈：-）：连续3个问题：P我回答了三分之三的问题谢谢！！！！！今天学到了很多。我想我必须把最后一张支票交给@wenyoben，这个人这次技术上领先了（18秒），还提供了其他问题的答案。这没关系，但我可以再快一分钟，因为我的网络出了点问题，无论如何，你可以接受他的魔兽世界xD stackoverflow大师！！！哈哈，哈哈：-）：P