在Python文本文件中搜索大写单词的数量_Python_String_File_Search_Capitalization

在Python文本文件中搜索大写单词的数量

python string file search

在Python文本文件中搜索大写单词的数量,python,string,file,search,capitalization,Python,String,File,Search,Capitalization,我需要帮助整理一个文本文件我尝试了for循环的多种变体。我还尝试去除所有空格，并在文件中逐个计算字母。我还尝试了strip函数的多种变体和不同的if语句 for character in file: if character.isupper(): capital += 1 file.readline().rstrip() break print(capital) 我希望程序读取文档中的每个单词或字母，并返回其中包含的大写单词总数。两件

我需要帮助整理一个文本文件

我尝试了for循环的多种变体。我还尝试去除所有空格，并在文件中逐个计算字母。我还尝试了strip函数的多种变体和不同的if语句

for character in file:
    if character.isupper():
        capital += 1
        file.readline().rstrip()
        break

print(capital)

我希望程序读取文档中的每个单词或字母，并返回其中包含的大写单词总数。

两件事：

确保你在重复字符，而不是单词或句子。放一些打印语句以供检查

删除if块中的break语句。这将立即退出for循环，并将导致您只计算1

如果目标是计算以大写字母开头的单词，那么我将使用布尔值是整数的子类型这一事实：

with open('my_textfile.txt', 'r') as text:
    print(sum(word.istitle() for row in text for word in row))

假设我们有一个包含以下内容的示例文件

doc.txt

：

这是一个识别大写字母的测试文件。我创建了这个示例，因为问题的要求可能会有所不同。例如，像SQL这样的首字母缩略词应该算作大写字母吗？如果否：这将导致八个大写单词。如果是：这将导致九个

如果您想计算大写（又名title case）单词，但不包括所有大写单词（如首字母缩略词），您可以这样做：

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in line.split():                                            
                if word.istitle():                                               
                    print(word)                                                  
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 8

from itertools import chain                                                      


def get_words(filename):                                                         
    with open(filename, 'r') as fp:                                              
        words = chain.from_iterable(line.split() for line in fp)                 
        yield from words

如果所有大写字母都应计算在内，则可以修改函数以仅检查单词的第一个字母。请注意，

filter（None，…）

函数将确保

word

永远不是空字符串，避免在这些情况下抛出的

索引器

：

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in filter(None, line.split()):                              
                if word[0].isupper():                                            
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 9

如果你有更复杂的需求，你可以得到一大堆这样的词：

def count_capital_words(filename):                                               
    count = 0                                                                    
    with open(filename, 'r') as fp:                                              
        for line in fp:                                                          
            for word in line.split():                                            
                if word.istitle():                                               
                    print(word)                                                  
                    count += 1                                                   
    return count


print(count_capital_words('doc.txt'))  # 8

from itertools import chain                                                      


def get_words(filename):                                                         
    with open(filename, 'r') as fp:                                              
        words = chain.from_iterable(line.split() for line in fp)                 
        yield from words

当您对文件中的字符执行

操作时：

实际上是在迭代行，而不是字符。如何迭代行中的字符？您可以使用另一个循环来迭代字符。迭代不应该是在单词上吗？OP将目标定义为：“找到大写单词的数量”。例如，O'Reilly是一个大写字母，有两个大写字母。看来str.istitle（）更适合实现目标。微妙但非常重要的一点。我同意。迭代单词并使用.istitle（）将是最合适的方法。