Python 如何计算文本中的单词并在字典中追加?
我正在尝试制作一本文本中单词频率的词典,但由于某些原因,打印出了额外的字符(我不确定这是我的文本还是我的代码),并且它没有成功打印出包含无效符号的行或单词!这是我的代码:Python 如何计算文本中的单词并在字典中追加?,python,python-3.x,dictionary,io,Python,Python 3.x,Dictionary,Io,我正在尝试制作一本文本中单词频率的词典,但由于某些原因,打印出了额外的字符(我不确定这是我的文本还是我的代码),并且它没有成功打印出包含无效符号的行或单词!这是我的代码: def parse_documentation(filename): filename=open(filename, "r") lines = filename.read(); invalidsymbols=["`","~","!", "@","#","$"] for line in li
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.read();
invalidsymbols=["`","~","!", "@","#","$"]
for line in lines:
for x in invalidsymbols:
if x in line:
print(line)
print(x)
print(line.replace(x, ""))
freq={}
for word in line:
count=counter(word)
freq[word]=count
return freq
您可能需要。
line.split(“”)
否则for循环将通过字母循环
....
for word in line.split(' '):
count=counter(word)
...
您可能需要。
line.split(“”)
否则for循环将通过字母循环
....
for word in line.split(' '):
count=counter(word)
...
您可能需要。
line.split(“”)
否则for循环将通过字母循环
....
for word in line.split(' '):
count=counter(word)
...
您可能需要。
line.split(“”)
否则for循环将通过字母循环
....
for word in line.split(' '):
count=counter(word)
...
您的代码有几个缺陷。我不会解决所有问题,但会给你指出正确的方向 首先,
read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines()
将文件中的所有行作为列表获取
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.readlines(); # returns a list of all lines in file
invalidsymbols=["`","~","!", "@","#","$"]
freq = {} # declare this OUTSIDE of your loop.
for line in lines:
for letter in line:
if letter in invalidsymbols:
print(letter)
line = line.replace(letter, ""))
print line #this should print the line without invalid symbols.
words = line.split() # Now get the words.
for word in line:
count=counter(word)
# ... Do your counter stuff here ...
return freq
第二,我对你的计数器的工作方式非常怀疑。如果您打算计算字数,您可以采用以下策略:
检查word
是否在freq
中
如果它不在freq
中,请将其添加并映射到1。否则,增加单词
先前映射到的数字
这将使您走上正确的轨道。您的代码有几个缺陷。我不会解决所有问题,但会给你指出正确的方向
首先,read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines()
将文件中的所有行作为列表获取
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.readlines(); # returns a list of all lines in file
invalidsymbols=["`","~","!", "@","#","$"]
freq = {} # declare this OUTSIDE of your loop.
for line in lines:
for letter in line:
if letter in invalidsymbols:
print(letter)
line = line.replace(letter, ""))
print line #this should print the line without invalid symbols.
words = line.split() # Now get the words.
for word in line:
count=counter(word)
# ... Do your counter stuff here ...
return freq
第二,我对你的计数器的工作方式非常怀疑。如果您打算计算字数,您可以采用以下策略:
检查word
是否在freq
中
如果它不在freq
中,请将其添加并映射到1。否则,增加单词
先前映射到的数字
这将使您走上正确的轨道。您的代码有几个缺陷。我不会解决所有问题,但会给你指出正确的方向
首先,read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines()
将文件中的所有行作为列表获取
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.readlines(); # returns a list of all lines in file
invalidsymbols=["`","~","!", "@","#","$"]
freq = {} # declare this OUTSIDE of your loop.
for line in lines:
for letter in line:
if letter in invalidsymbols:
print(letter)
line = line.replace(letter, ""))
print line #this should print the line without invalid symbols.
words = line.split() # Now get the words.
for word in line:
count=counter(word)
# ... Do your counter stuff here ...
return freq
第二,我对你的计数器的工作方式非常怀疑。如果您打算计算字数,您可以采用以下策略:
检查word
是否在freq
中
如果它不在freq
中,请将其添加并映射到1。否则,增加单词
先前映射到的数字
这将使您走上正确的轨道。您的代码有几个缺陷。我不会解决所有问题,但会给你指出正确的方向
首先,read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines()
将文件中的所有行作为列表获取
def parse_documentation(filename):
filename=open(filename, "r")
lines = filename.readlines(); # returns a list of all lines in file
invalidsymbols=["`","~","!", "@","#","$"]
freq = {} # declare this OUTSIDE of your loop.
for line in lines:
for letter in line:
if letter in invalidsymbols:
print(letter)
line = line.replace(letter, ""))
print line #this should print the line without invalid symbols.
words = line.split() # Now get the words.
for word in line:
count=counter(word)
# ... Do your counter stuff here ...
return freq
第二,我对你的计数器的工作方式非常怀疑。如果您打算计算字数,您可以采用以下策略:
检查word
是否在freq
中
如果它不在freq
中,请将其添加并映射到1。否则,增加单词
先前映射到的数字
这会让你走上正确的轨道。检查一下,这可能是你想要的。顺便说一句,您的代码不正确Python
code。这里有很多问题
from collections import Counter
def parse_documentation(filename):
with open(filename,"r") as fin:
lines = fin.read()
#for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
lines = lines.translate(None,"`~!@#$") #thanks to @gnibbler's comment
freq = Counter(lines.split())
return freq
文本文件:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
结果:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
检查这个,它可能是你想要的。顺便说一句,您的代码不正确Python
code。这里有很多问题
from collections import Counter
def parse_documentation(filename):
with open(filename,"r") as fin:
lines = fin.read()
#for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
lines = lines.translate(None,"`~!@#$") #thanks to @gnibbler's comment
freq = Counter(lines.split())
return freq
文本文件:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
结果:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
检查这个,它可能是你想要的。顺便说一句,您的代码不正确Python
code。这里有很多问题
from collections import Counter
def parse_documentation(filename):
with open(filename,"r") as fin:
lines = fin.read()
#for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
lines = lines.translate(None,"`~!@#$") #thanks to @gnibbler's comment
freq = Counter(lines.split())
return freq
文本文件:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
结果:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
检查这个,它可能是你想要的。顺便说一句,您的代码不正确Python
code。这里有很多问题
from collections import Counter
def parse_documentation(filename):
with open(filename,"r") as fin:
lines = fin.read()
#for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
lines = lines.translate(None,"`~!@#$") #thanks to @gnibbler's comment
freq = Counter(lines.split())
return freq
文本文件:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
结果:
this is a text. text is that. @this #that
$this #!that is those
Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})
您能否再次检查以确保您发布的代码中的缩进是正确的?代码不正确,例如计数器(word)未定义。您能否再次检查以确保您发布的代码中的缩进是正确的?代码不正确,例如计数器(word)未定义。您能否再次检查以确保您发布的代码中的缩进是正确的?代码不正确,例如未定义计数器(word)。您能否再次检查以确保您发布的代码中的缩进是正确的?代码不正确,例如未定义计数器(word)。除了上述代码行。请更换(字母“”
将只返回不带[symbol=字母的当前值”
]的新副本…打印不带无效符号的行行=行。替换(字母“”
应该是现在看起来很好的:)@adil或更好:行=字符串。翻译(行,无,删除=无效符号)
但是无效符号
必须是“~!@…”
“filename”可能会使文件名混淆object@HennyH,通常您会说line=line.translate(无,删除=无效符号)
。我甚至只使用了line=line.translate(None,invalidsymbols)
除了上面的代码line.replace(字母“”)
将只返回一个新的副本,而不返回[symbol=当前的字母值
]。。要打印不带无效符号的行,请执行以下操作:)@adil或更好的操作:line=string.translate(行,无,删除)