在文本python中出现字符串
有很多关于python中出现子字符串的帖子,但是我找不到任何关于文本中出现字符串的内容在文本python中出现字符串,python,string,Python,String,有很多关于python中出现子字符串的帖子,但是我找不到任何关于文本中出现字符串的内容 testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words" #Suppose my search term is a, then I would expect the output of my program to be: print testSTR.my
testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"
#Suppose my search term is a, then I would expect the output of my program to be:
print testSTR.myfunc("a")
>>1
因为在整个输入中只有一个对字符串a的具体引用。count不起作用,因为它也计算子字符串,所以我得到的输出是:
print testSTR.count()
>>3
可以这样做吗?您可以在拆分字符串后使用集合来完成此操作
from collections import Counter
print Counter(testSTR.split())
输出看起来像
Counter({'you': 2, 'a': 1, 'and': 1, 'words': 1, 'text': 1, 'some': 1, 'the': 1, 'large': 1, 'to': 1, 'Suppose': 1, 'are': 1, 'have': 1, 'of': 1, 'specific': 1, 'trying': 1, 'find': 1, 'occurences': 1})
要获取使用的特定子字符串的计数
如果计数需要不区分大小写,请在计数前使用upper或lower转换子字符串
res= Counter(i.lower() for i in testSTR.split())
如果您关心标点符号,您应该尝试以下方法:
words = testSTR.split().map(lambda s: s.strip(".!?:;,\"'"))
print "a" in words
我认为最直接的方法是使用正则表达式:
import re
testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"
print len(re.findall(r"\ba\b", testSTR))
# 1
\ba\b检查a前后的单词边界,其中单词边界是标点、空格或整个字符串的开头或结尾。这比在空格上拆分更有用,当然,除非这是您想要的
import re
str2 = "a large text a, a. a"
print len(re.findall(r"\ba\b", str2))
# 4
你能展示一下你的myfunc吗?你说的混凝土是什么意思?在您的输入中有很多对字符串a的引用,您可能想搜索单词a吗?我不关心标点符号,我只想在整个代码中找到a的数字。
import re
str2 = "a large text a, a. a"
print len(re.findall(r"\ba\b", str2))
# 4