Python 来自txt文件程序的字数
我正在用以下代码计算txt文件的字数:Python 来自txt文件程序的字数,python,Python,我正在用以下代码计算txt文件的字数: #!/usr/bin/python file=open("D:\\zzzz\\names2.txt","r+") wordcount={} for word in file.read().split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 print (word,wordcount) file.cl
#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
print (word,wordcount)
file.close();
这给了我这样的输出:
>>>
goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, '': 1, 'tiger': 1, 'cat': 2, 'dog': 1}
但我希望以以下方式输出:
word wordcount
goat 2
cow 1
dog 1.....
我还在输出中得到一个额外的符号(
ï»
)。如何删除此项?您遇到的有趣符号是UTF-8。要消除它们,请使用正确的编码打开文件(我假设您使用的是Python 3):
此外,对于计数,您可以使用:
将它们显示为:
>>> for item in wordcount.items(): print("{}\t{}".format(*item))
...
snake 1
lion 2
goat 2
horse 3
如果您使用的是graphLab,则可以使用此函数。它真的很强大
products['word_count'] = graphlab.text_analytics.count_words(your_text)
#
您可以这样做:
file= open(r'D:\\zzzz\\names2.txt')
file_split=set(file.read().split())
print(len(file_split))
下面的代码是为我工作的
import re
frequency = {}
#Open the sample text file in read mode.
document_text = open('sample.txt', 'r')
#convert the string of the document in lowercase and assign it to text_string variable.
text = document_text.read().lower()
pattern = re.findall(r'\b[a-z]{2,15}\b', text)
for word in pattern:
count = frequency.get(word,0)
frequency[word] = count + 1
frequency_list = frequency.keys()
for words in frequency_list:
print(words, frequency[words])
输出:
看看。你的python版本是什么?你可以在@user3068762检查示例有效性:关于AttributeError:“dict”对象没有属性“key”:wordcount.keys()中的key的行应该是错误的:——注意
workcount.keys
中的key后面的“s”字符返回此文件,而不是第一个参数。我看不出这是你想要的。修改您的答案。请添加一些解释。但是如何对它们进行排序?如果您解释了您提供的代码如何回答问题,这将是一个更好的答案。请解释您的解决方案如何以及您的解决方案如何/为什么比现有的解决方案更好/不同。
#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k,v in wordcount.items():
print k, v
products['word_count'] = graphlab.text_analytics.count_words(your_text)
FILE_NAME = 'file.txt'
wordCounter = {}
with open(FILE_NAME,'r') as fh:
for line in fh:
# Replacing punctuation characters. Making the string to lower.
# The split will spit the line into a list.
word_list = line.replace(',','').replace('\'','').replace('.','').lower().split()
for word in word_list:
# Adding the word into the wordCounter dictionary.
if word not in wordCounter:
wordCounter[word] = 1
else:
# if the word is already in the dictionary update its count.
wordCounter[word] = wordCounter[word] + 1
print('{:15}{:3}'.format('Word','Count'))
print('-' * 18)
# printing the words and its occurrence.
for (word,occurance) in wordCounter.items():
print('{:15}{:3}'.format(word,occurance))
Word Count
------------------
of 6
examples 2
used 2
development 2
modified 2
open-source 2
#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k,v in wordcount.items():
print k,v
file.close();
file= open(r'D:\\zzzz\\names2.txt')
file_split=set(file.read().split())
print(len(file_split))
import re
frequency = {}
#Open the sample text file in read mode.
document_text = open('sample.txt', 'r')
#convert the string of the document in lowercase and assign it to text_string variable.
text = document_text.read().lower()
pattern = re.findall(r'\b[a-z]{2,15}\b', text)
for word in pattern:
count = frequency.get(word,0)
frequency[word] = count + 1
frequency_list = frequency.keys()
for words in frequency_list:
print(words, frequency[words])
print("sorted counting values:-")
from collections import Counter
fname = open(filename)
fname = fname.read()
fsplit = fname.split()
user = Counter(fsplit)
for i,v in sorted(user.items()):
print((v,i))