Python字符和字数_Python_String_Count_Character_Counter

Python字符和字数

python string

Python字符和字数,python,string,count,character,counter,Python,String,Count,Character,Counter,我是python的初学者，我想知道如何使用两个txt文件来计算字符数，以及如何计数10个最常见的字符。还有如何将文件中的所有字符转换为小写，并消除除a-z以外的所有字符以下是我尝试过但没有成功的方法： from string import ascii_lowercase from collections import Counter with open ('document1.txt' , 'document2.txt') as f: print Counter(letter for

我是python的初学者，我想知道如何使用两个txt文件来计算字符数，以及如何计数10个最常见的字符。还有如何将文件中的所有字符转换为小写，并消除除a-z以外的所有字符

以下是我尝试过但没有成功的方法：

from string import ascii_lowercase
from collections import Counter

with open ('document1.txt' , 'document2.txt') as f:
    print Counter(letter for line in f
                    for letter in line.lower()
                    if letter in ascii_lowercase)

下面是一个简单的例子。您可以调整此代码以满足您的需要

from string import ascii_lowercase
from collections import Counter

with open('file1.txt', 'r') as file1data: #opening an reading file one
    file1 = file1data.read().lower() #convert the entire file contents to lower

with open('file2.txt', 'r') as file2data: #opening an reading file two
    file2 = file2data.read().lower() 

#The contents of both file 1 and 2 are stored in fil1 and file2 variables
#Examples of how to work with one file repeat for two files
file1_list = []
for ch in file1:
    if ch in ascii_lowercase: #makes sure only lowercase alphabet is appended.  All Non alphabet characters are removed
        file1_list.append(ch)
    elif ch in [" ", ".", ",", "'"]: #remove this elif block is you just want the letters
        file1_list.append(ch) #make sure basic punctionation is kept

print "".join(file1_list) #this line is not needed. Just to show what the text looks like now
print Counter(file1_list).most_common(10) #prints the top ten
print Counter(file1_list) #prints the number of characters and how many times they repeat

既然您已经查看了上面的混乱情况，并且对每一行都有了概念，那么这里有一个更干净的版本，它符合您的要求

from string import ascii_lowercase
from collections import Counter

with open('file1.txt', 'r') as file1data: 
    file1 = file1data.read().lower()

with open('file2.txt', 'r') as file2data: 
    file2 = file2data.read().lower() 

file1_list = []
for ch in file1:
    if ch in ascii_lowercase: 
        file1_list.append(ch)

file2_list = []
for ch in file2:
    if ch in ascii_lowercase: 
        file2_list.append(ch)



all_counter = Counter(file1_list + file2_list) 
top_ten_counter = Counter(file1_list + file2_list).most_common(10) 

print sorted(all_counter.items()) 
print sorted(top_ten_counter)

下面是一个简单的例子。您可以调整此代码以满足您的需要

from string import ascii_lowercase
from collections import Counter

with open('file1.txt', 'r') as file1data: #opening an reading file one
    file1 = file1data.read().lower() #convert the entire file contents to lower

with open('file2.txt', 'r') as file2data: #opening an reading file two
    file2 = file2data.read().lower() 

#The contents of both file 1 and 2 are stored in fil1 and file2 variables
#Examples of how to work with one file repeat for two files
file1_list = []
for ch in file1:
    if ch in ascii_lowercase: #makes sure only lowercase alphabet is appended.  All Non alphabet characters are removed
        file1_list.append(ch)
    elif ch in [" ", ".", ",", "'"]: #remove this elif block is you just want the letters
        file1_list.append(ch) #make sure basic punctionation is kept

print "".join(file1_list) #this line is not needed. Just to show what the text looks like now
print Counter(file1_list).most_common(10) #prints the top ten
print Counter(file1_list) #prints the number of characters and how many times they repeat

既然您已经查看了上面的混乱情况，并且对每一行都有了概念，那么这里有一个更干净的版本，它符合您的要求

from string import ascii_lowercase
from collections import Counter

with open('file1.txt', 'r') as file1data: 
    file1 = file1data.read().lower()

with open('file2.txt', 'r') as file2data: 
    file2 = file2data.read().lower() 

file1_list = []
for ch in file1:
    if ch in ascii_lowercase: 
        file1_list.append(ch)

file2_list = []
for ch in file2:
    if ch in ascii_lowercase: 
        file2_list.append(ch)



all_counter = Counter(file1_list + file2_list) 
top_ten_counter = Counter(file1_list + file2_list).most_common(10) 

print sorted(all_counter.items()) 
print sorted(top_ten_counter)

试着这样做：

>>> from collections import Counter
>>> import re
>>> words = re.findall(r'\w+', "{} {}".format(open('your_file1').read().lower(), open('your_file2').read().lower()))
>>> Counter(words).most_common(10)

试着这样做：

>>> from collections import Counter
>>> import re
>>> words = re.findall(r'\w+', "{} {}".format(open('your_file1').read().lower(), open('your_file2').read().lower()))
>>> Counter(words).most_common(10)

不幸的是，如果不重新写入文件，就无法插入到文件中间。正如前面的海报所示，您可以使用seek将内容附加到文件或覆盖其中的一部分，但如果您想在文件的开头或中间添加内容，则必须重写它

这是一个操作系统的东西，不是Python的东西。这在所有语言中都是一样的

我通常做的是从文件中读取，进行修改并将其写入一个名为myfile.txt.tmp或类似的新文件。这比将整个文件读入内存要好，因为该文件可能太大了。完成临时文件后，我将其重命名为与原始文件相同的名称

这是一种很好的、安全的方法，因为如果文件写入由于任何原因崩溃或中止，您仍然拥有未触及的原始文件

要从多个文件中查找最常用的单词

from collections import Counter
import re
with open(''document1.txt'') as f1, open(''document1.txt'') as f2:
    words = re.findall(r'\w+', f1.read().lower()) + re.findall(r'\w+', f2.read().lower())
    >>>Counter(words).most_common(10)
    "wil give you most 10 common words"

如果你想要最多10个常用字符

这是一个操作系统的东西，不是Python的东西。这在所有语言中都是一样的

这是一种很好的、安全的方法，因为如果文件写入由于任何原因崩溃或中止，您仍然拥有未触及的原始文件

要从多个文件中查找最常用的单词

from collections import Counter
import re
with open(''document1.txt'') as f1, open(''document1.txt'') as f2:
    words = re.findall(r'\w+', f1.read().lower()) + re.findall(r'\w+', f2.read().lower())
    >>>Counter(words).most_common(10)
    "wil give you most 10 common words"

如果你想要最多10个常用字符

您收到的错误是什么，而且with语句的格式不正确。使用openfile.txt，r作为数据：不能使用同一With语句打开两个文件。您需要两个with语句。您收到的错误是什么，而且with语句的格式不正确。使用openfile.txt，r作为数据：不能使用同一With语句打开两个文件。你需要两份声明，谢谢。这确实有效，但它正在提取整个文件。我怎样才能让它只显示柜台？还有，我如何让它显示出来，比如：一个20B10C14等，而不是一个20B10C14，对上面的代码进行了编辑，应该可以帮你完成。谢谢。这确实有效，但它正在提取整个文件。我怎样才能让它只显示柜台？还有，我如何让它显示出来，比如：20B10C14等，而不是20B10C14，对上面的代码进行了编辑，应该可以帮到你。