使用Python打印出字符、单词和行数

使用Python打印出字符、单词和行数,python,file,character,line,word,Python,File,Character,Line,Word,这就是我到目前为止所做的: def stats(filename): ' prints the number of lines, words, and characters in file filename' infile = open(filename) lines = infile.readlines() words = infile.read() chars = infile.read() infile.close() print("

这就是我到目前为止所做的:

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    infile = open(filename)
    lines = infile.readlines()
    words = infile.read()
    chars = infile.read()
    infile.close()
    print("line count:", len(lines))
    print("word count:", len(words.split()))
    print("character counter:", len(chars))

执行时,正确返回行数,但单词和字符计数返回0。不确定为什么…

您需要使用
infle.seek(0)
返回文件的开头。读取位置在末尾后,
seek(0)
将其重置为开头,以便您可以再次读取

infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)

chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
输出:

line count: 2
word count: 19
character counter: 113
其他方法……:


您可以对文件进行一次迭代,并对行、字和字符进行计数,而无需多次返回到开头,这需要使用您的方法,因为在计算行数时会耗尽迭代器:

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    lines = chars = 0
    words = []
    with open(filename) as infile:
        for line in infile:
            lines += 1
            words.extend(line.split())
            chars += len(line)
    print("line count:", lines)
    print("word count:", len(words))
    print("character counter:", chars)
    return len(words) > len(set(words))  # Returns True if duplicate words
或者,使用文件位于字符末尾的副作用:

def stats(filename):
    ' prints the number of lines, words, and characters in file filename'
    words = []
    with open(filename) as infile:
        for lines, line in enumerate(infile, 1):
            words.extend(line.split())
        chars = infile.tell()
    print("line count:", lines)
    print("word count:", len(words))
    print("character counter:", chars)
    return len(words) > len(set(words))  # Returns True if duplicate words

调用
readlines
后,迭代器已耗尽,您可以返回到开始,但实际上根本不需要将所有文件读入内存:

 def stats(filename):
    chars, words, dupes = 0, 0, False
    seen = set()
    with open(filename) as f:
        for i, line in enumerate(f, 1):
            chars += len(line)
            spl = line.split()
            words += len(spl)
            if dupes or not seen.isdisjoint(spl):
                dupes = True
            elif not dupes:
                seen.update(spl)
    return i, chars, words, dupes
然后通过解包来分配值:

no_lines, no_chars, no_words, has_dupes = stats("your_file")

如果不想包含行结尾,您可能需要使用
chars+=len(line.rstrip())
。代码仅使用readlines、read、dicts等完全存储所需的数据量。。意味着对于大文件,您的代码将不太实用

然后,我必须检查文件是否有任何重复的单词,从而根据具体情况返回True或False。你知道怎么做吗?为什么要无缘无故地创建四个列表?OP不需要数据,他们需要数据count@PadraicCunninghamOP想知道更多,而不是更少。@LetzerWille,更确切地知道什么,如何编写内存效率最低的代码?你听说过生成器或求和函数吗?我最初采用这种方法,但OP在另一个答案中添加了一条注释,说明它们需要返回是否有重复的单词,因此将单词更改为列表。事实上,可以使用集合和标志来避免存储任何超出需要的数据。
File_Name = 'file.txt'

line_count = 0
word_count = 0
char_count = 0

with open(File_Name,'r') as fh:
    # This will produce a list of lines.
    # Each line of the file will be an element of the  list. 
    data = fh.readlines()

    # Count of  total number for list elements == total number of lines. 
    line_count = len(data)

    for line in data:
        word_count = word_count + len(line.split())
        char_count = char_count + len(line)

print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)
File_Name = 'file.txt'

line_count = 0
word_count = 0
char_count = 0

with open(File_Name,'r') as fh:
    # This will produce a list of lines.
    # Each line of the file will be an element of the  list. 
    data = fh.readlines()

    # Count of  total number for list elements == total number of lines. 
    line_count = len(data)

    for line in data:
        word_count = word_count + len(line.split())
        char_count = char_count + len(line)

print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)