Python字符和字数
我是python的初学者,我想知道如何使用两个txt文件来计算字符数,以及如何计数10个最常见的字符。还有如何将文件中的所有字符转换为小写,并消除除a-z以外的所有字符 以下是我尝试过但没有成功的方法:Python字符和字数,python,string,count,character,counter,Python,String,Count,Character,Counter,我是python的初学者,我想知道如何使用两个txt文件来计算字符数,以及如何计数10个最常见的字符。还有如何将文件中的所有字符转换为小写,并消除除a-z以外的所有字符 以下是我尝试过但没有成功的方法: from string import ascii_lowercase from collections import Counter with open ('document1.txt' , 'document2.txt') as f: print Counter(letter for
from string import ascii_lowercase
from collections import Counter
with open ('document1.txt' , 'document2.txt') as f:
print Counter(letter for line in f
for letter in line.lower()
if letter in ascii_lowercase)
下面是一个简单的例子。您可以调整此代码以满足您的需要
from string import ascii_lowercase
from collections import Counter
with open('file1.txt', 'r') as file1data: #opening an reading file one
file1 = file1data.read().lower() #convert the entire file contents to lower
with open('file2.txt', 'r') as file2data: #opening an reading file two
file2 = file2data.read().lower()
#The contents of both file 1 and 2 are stored in fil1 and file2 variables
#Examples of how to work with one file repeat for two files
file1_list = []
for ch in file1:
if ch in ascii_lowercase: #makes sure only lowercase alphabet is appended. All Non alphabet characters are removed
file1_list.append(ch)
elif ch in [" ", ".", ",", "'"]: #remove this elif block is you just want the letters
file1_list.append(ch) #make sure basic punctionation is kept
print "".join(file1_list) #this line is not needed. Just to show what the text looks like now
print Counter(file1_list).most_common(10) #prints the top ten
print Counter(file1_list) #prints the number of characters and how many times they repeat
既然您已经查看了上面的混乱情况,并且对每一行都有了概念,那么这里有一个更干净的版本,它符合您的要求
from string import ascii_lowercase
from collections import Counter
with open('file1.txt', 'r') as file1data:
file1 = file1data.read().lower()
with open('file2.txt', 'r') as file2data:
file2 = file2data.read().lower()
file1_list = []
for ch in file1:
if ch in ascii_lowercase:
file1_list.append(ch)
file2_list = []
for ch in file2:
if ch in ascii_lowercase:
file2_list.append(ch)
all_counter = Counter(file1_list + file2_list)
top_ten_counter = Counter(file1_list + file2_list).most_common(10)
print sorted(all_counter.items())
print sorted(top_ten_counter)
下面是一个简单的例子。您可以调整此代码以满足您的需要
from string import ascii_lowercase
from collections import Counter
with open('file1.txt', 'r') as file1data: #opening an reading file one
file1 = file1data.read().lower() #convert the entire file contents to lower
with open('file2.txt', 'r') as file2data: #opening an reading file two
file2 = file2data.read().lower()
#The contents of both file 1 and 2 are stored in fil1 and file2 variables
#Examples of how to work with one file repeat for two files
file1_list = []
for ch in file1:
if ch in ascii_lowercase: #makes sure only lowercase alphabet is appended. All Non alphabet characters are removed
file1_list.append(ch)
elif ch in [" ", ".", ",", "'"]: #remove this elif block is you just want the letters
file1_list.append(ch) #make sure basic punctionation is kept
print "".join(file1_list) #this line is not needed. Just to show what the text looks like now
print Counter(file1_list).most_common(10) #prints the top ten
print Counter(file1_list) #prints the number of characters and how many times they repeat
既然您已经查看了上面的混乱情况,并且对每一行都有了概念,那么这里有一个更干净的版本,它符合您的要求
from string import ascii_lowercase
from collections import Counter
with open('file1.txt', 'r') as file1data:
file1 = file1data.read().lower()
with open('file2.txt', 'r') as file2data:
file2 = file2data.read().lower()
file1_list = []
for ch in file1:
if ch in ascii_lowercase:
file1_list.append(ch)
file2_list = []
for ch in file2:
if ch in ascii_lowercase:
file2_list.append(ch)
all_counter = Counter(file1_list + file2_list)
top_ten_counter = Counter(file1_list + file2_list).most_common(10)
print sorted(all_counter.items())
print sorted(top_ten_counter)
试着这样做:
>>> from collections import Counter
>>> import re
>>> words = re.findall(r'\w+', "{} {}".format(open('your_file1').read().lower(), open('your_file2').read().lower()))
>>> Counter(words).most_common(10)
试着这样做:
>>> from collections import Counter
>>> import re
>>> words = re.findall(r'\w+', "{} {}".format(open('your_file1').read().lower(), open('your_file2').read().lower()))
>>> Counter(words).most_common(10)
不幸的是,如果不重新写入文件,就无法插入到文件中间。正如前面的海报所示,您可以使用seek将内容附加到文件或覆盖其中的一部分,但如果您想在文件的开头或中间添加内容,则必须重写它 这是一个操作系统的东西,不是Python的东西。这在所有语言中都是一样的 我通常做的是从文件中读取,进行修改并将其写入一个名为myfile.txt.tmp或类似的新文件。这比将整个文件读入内存要好,因为该文件可能太大了。完成临时文件后,我将其重命名为与原始文件相同的名称 这是一种很好的、安全的方法,因为如果文件写入由于任何原因崩溃或中止,您仍然拥有未触及的原始文件 要从多个文件中查找最常用的单词
from collections import Counter
import re
with open(''document1.txt'') as f1, open(''document1.txt'') as f2:
words = re.findall(r'\w+', f1.read().lower()) + re.findall(r'\w+', f2.read().lower())
>>>Counter(words).most_common(10)
"wil give you most 10 common words"
如果你想要最多10个常用字符
不幸的是,如果不重新写入文件,就无法插入到文件中间。正如前面的海报所示,您可以使用seek将内容附加到文件或覆盖其中的一部分,但如果您想在文件的开头或中间添加内容,则必须重写它 这是一个操作系统的东西,不是Python的东西。这在所有语言中都是一样的 我通常做的是从文件中读取,进行修改并将其写入一个名为myfile.txt.tmp或类似的新文件。这比将整个文件读入内存要好,因为该文件可能太大了。完成临时文件后,我将其重命名为与原始文件相同的名称 这是一种很好的、安全的方法,因为如果文件写入由于任何原因崩溃或中止,您仍然拥有未触及的原始文件 要从多个文件中查找最常用的单词
from collections import Counter
import re
with open(''document1.txt'') as f1, open(''document1.txt'') as f2:
words = re.findall(r'\w+', f1.read().lower()) + re.findall(r'\w+', f2.read().lower())
>>>Counter(words).most_common(10)
"wil give you most 10 common words"
如果你想要最多10个常用字符
您收到的错误是什么,而且with语句的格式不正确。使用openfile.txt,r作为数据:不能使用同一With语句打开两个文件。您需要两个with语句。您收到的错误是什么,而且with语句的格式不正确。使用openfile.txt,r作为数据:不能使用同一With语句打开两个文件。你需要两份声明,谢谢。这确实有效,但它正在提取整个文件。我怎样才能让它只显示柜台?还有,我如何让它显示出来,比如:一个20B10C14等,而不是一个20B10C14,对上面的代码进行了编辑,应该可以帮你完成。谢谢。这确实有效,但它正在提取整个文件。我怎样才能让它只显示柜台?还有,我如何让它显示出来,比如:20B10C14等,而不是20B10C14,对上面的代码进行了编辑,应该可以帮到你。