Python Pandas:返回包含最小数量的区分大小写的单词的行,这些单词中的每一个都在新行(';\n';)
这是这个问题的后续问题 它提供了一种解决方案,可以返回包含多个区分大小写的单词之一的行,这些单词位于新行“\n”之后 现在,我想返回一行,其中包含最少数量的区分大小写的单词,这些单词跟在新行后面 在下面的最小示例中,我尝试从特定集合中获取至少包含三个字符串的行Python Pandas:返回包含最小数量的区分大小写的单词的行,这些单词中的每一个都在新行(';\n';),python,pandas,Python,Pandas,这是这个问题的后续问题 它提供了一种解决方案,可以返回包含多个区分大小写的单词之一的行,这些单词位于新行“\n”之后 现在,我想返回一行,其中包含最少数量的区分大小写的单词,这些单词跟在新行后面 在下面的最小示例中,我尝试从特定集合中获取至少包含三个字符串的行 testdf = pd.DataFrame([ [ ' generates the final summary. \nRESULTS \nMethods We evaluate the performance of ', ],
testdf = pd.DataFrame([
[ ' generates the final summary. \nRESULTS \nMethods We evaluate the performance of ', ],
[ 'the cat and bat \n\n\nRESULTS\n BACKGROUND teamed up to find some food'],
['anthropology with RESULTS \n\n\nMETHODS\n pharmacology and biology'],
[ ' generates the final summary. \nMethods \nBACKGROUND We evaluate the performance of ', ],
[ 'the cat and bat \n\n\nMETHODS\n teamed up to find some food'],
['anthropology with METHODS pharmacology and biology'],
[ ' generates the final summary. \nBACKGROUND We evaluate the performance of ', ],
[ 'the cat and bat \n\n\nBackground\n teamed up to find some food'],
['anthropology with \nBACKGROUND with \nRESULTS pharmacology and biology'],
[ ' generates the final summary. \nBACKGROUND We \nRESULTS evaluate \nCONCLUSIONS the performance of ', ]
])
testdf.columns = ['A']
testdf.head(10)
返回
A
0 generates the final summary. \nRESULTS \nMethods We evaluate the performance of
1 the cat and bat \n\n\nRESULTS\n BACKGROUND teamed up to find some food
2 anthropology with RESULTS \n\n\nMETHODS\n pharmacology and biology
3 generates the final summary. \nMethods \nBACKGROUND We evaluate the performance of
4 the cat and bat \n\n\nMETHODS\n teamed up to find some food
5 anthropology with METHODS pharmacology and biology
6 generates the final summary. \nBACKGROUND We evaluate the performance of
7 the cat and bat \n\n\nBackground\n teamed up to find some food
8 anthropology with \nBACKGROUND with \nRESULTS pharmacology and biology
9 generates the final summary. \nBACKGROUND We \nRESULTS evaluate \nCONCLUSIONS the performance of
然后
listStrings = { '\nRESULTS', '\nMETHODS' , '\nBACKGROUND' , '\nCONCLUSIONS', '\nEXPERIMENT'}
testdf.loc[testdf.A.apply(lambda x: len(listStrings.intersection(x.split())) >= 3)]
我什么也不回
所需的结果将只返回最后一行
9 generates the final summary. \nBACKGROUND We \nRESULTS evaluate \nCONCLUSIONS the performance of
因为这是唯一一行,其中包含至少3个指定的大小写敏感词,这些词位于新行之后 检查
str.findall
testdf[testdf.A.str.findall('|'.join(listStrings)).str.len()>=3]
A
9 generates the final summary. \nBACKGROUND We ...
检查
str.findall
testdf[testdf.A.str.findall('|'.join(listStrings)).str.len()>=3]
A
9 generates the final summary. \nBACKGROUND We ...
使用
str.findall
:
>>> testdf[testdf['A'].str.findall('|'.join(listStrings)).map(len)>=3]
A
9 generates the final summary. \nBACKGROUND We ...
>>>
使用
str.findall
:
>>> testdf[testdf['A'].str.findall('|'.join(listStrings)).map(len)>=3]
A
9 generates the final summary. \nBACKGROUND We ...
>>>
连续三个问题:P我回答了三分之三的问题谢谢!!!!!今天学到了很多。我想我必须把最后一张支票交给@wenyoben,这个人这次技术上领先了(18秒),还提供了其他问题的答案。这没关系,但我可以再快一分钟,因为我的网络出了点问题,无论如何,你可以接受他的魔兽世界xD stackoverflow大师!!!哈哈,哈哈:-):连续3个问题:P我回答了三分之三的问题谢谢!!!!!今天学到了很多。我想我必须把最后一张支票交给@wenyoben,这个人这次技术上领先了(18秒),还提供了其他问题的答案。这没关系,但我可以再快一分钟,因为我的网络出了点问题,无论如何,你可以接受他的魔兽世界xD stackoverflow大师!!!哈哈,哈哈:-):P