Python 如何计算文本中的单词并在字典中追加？_Python_Python 3.x_Dictionary_Io

Python 如何计算文本中的单词并在字典中追加？

python python-3.x dictionary io

Python 如何计算文本中的单词并在字典中追加？,python,python-3.x,dictionary,io,Python,Python 3.x,Dictionary,Io,我正在尝试制作一本文本中单词频率的词典，但由于某些原因，打印出了额外的字符（我不确定这是我的文本还是我的代码），并且它没有成功打印出包含无效符号的行或单词！这是我的代码： def parse_documentation(filename): filename=open(filename, "r") lines = filename.read(); invalidsymbols=["`","~","!", "@","#","$"] for line in li

我正在尝试制作一本文本中单词频率的词典，但由于某些原因，打印出了额外的字符（我不确定这是我的文本还是我的代码），并且它没有成功打印出包含无效符号的行或单词！这是我的代码：

 def parse_documentation(filename):
    filename=open(filename, "r") 
    lines = filename.read(); 
    invalidsymbols=["`","~","!", "@","#","$"]
    for line in lines: 
        for x in invalidsymbols:
            if x in line: 
                print(line) 
                print(x) 
                print(line.replace(x, "")) 
                freq={}
            for word in line:
                count=counter(word)
        freq[word]=count
    return freq

您可能需要。

line.split（“”）

否则for循环将通过字母循环

....
for word in line.split(' '):
    count=counter(word)
...

您可能需要。

line.split（“”）

否则for循环将通过字母循环

....
for word in line.split(' '):
    count=counter(word)
...

您可能需要。

line.split（“”）

否则for循环将通过字母循环

....
for word in line.split(' '):
    count=counter(word)
...

您可能需要。

line.split（“”）

否则for循环将通过字母循环

....
for word in line.split(' '):
    count=counter(word)
...

您的代码有几个缺陷。我不会解决所有问题，但会给你指出正确的方向

首先，

read

将整个文件作为字符串读取。我不认为这是你在这里的意图。使用

readlines（）

将文件中的所有行作为列表获取

def parse_documentation(filename):
    filename=open(filename, "r") 
    lines = filename.readlines(); # returns a list of all lines in file
    invalidsymbols=["`","~","!", "@","#","$"]
    freq = {} # declare this OUTSIDE of your loop.
    for line in lines:
        for letter in line:
            if letter in invalidsymbols:
                print(letter) 
                line = line.replace(letter, ""))
        print line #this should print the line without invalid symbols.

        words = line.split() # Now get the words.

        for word in line:
            count=counter(word)
            # ... Do your counter stuff here ...

    return freq

第二，我对你的

计数器的工作方式非常怀疑。如果您打算计算字数，您可以采用以下策略：
检查word
是否在freq
中
如果它不在freq
中，请将其添加并映射到1。否则，增加单词
先前映射到的数字
这将使您走上正确的轨道。
您的代码有几个缺陷。我不会解决所有问题，但会给你指出正确的方向
首先，read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines（）
将文件中的所有行作为列表获取
def parse_documentation(filename):
    filename=open(filename, "r") 
    lines = filename.readlines(); # returns a list of all lines in file
    invalidsymbols=["`","~","!", "@","#","$"]
    freq = {} # declare this OUTSIDE of your loop.
    for line in lines:
        for letter in line:
            if letter in invalidsymbols:
                print(letter) 
                line = line.replace(letter, ""))
        print line #this should print the line without invalid symbols.

        words = line.split() # Now get the words.

        for word in line:
            count=counter(word)
            # ... Do your counter stuff here ...

    return freq

第二，我对你的计数器的工作方式非常怀疑。如果您打算计算字数，您可以采用以下策略：
检查word
是否在freq
中
如果它不在freq
中，请将其添加并映射到1。否则，增加单词
先前映射到的数字
这将使您走上正确的轨道。
您的代码有几个缺陷。我不会解决所有问题，但会给你指出正确的方向
首先，read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines（）
将文件中的所有行作为列表获取
def parse_documentation(filename):
    filename=open(filename, "r") 
    lines = filename.readlines(); # returns a list of all lines in file
    invalidsymbols=["`","~","!", "@","#","$"]
    freq = {} # declare this OUTSIDE of your loop.
    for line in lines:
        for letter in line:
            if letter in invalidsymbols:
                print(letter) 
                line = line.replace(letter, ""))
        print line #this should print the line without invalid symbols.

        words = line.split() # Now get the words.

        for word in line:
            count=counter(word)
            # ... Do your counter stuff here ...

    return freq

第二，我对你的计数器的工作方式非常怀疑。如果您打算计算字数，您可以采用以下策略：
检查word
是否在freq
中
如果它不在freq
中，请将其添加并映射到1。否则，增加单词
先前映射到的数字
这将使您走上正确的轨道。
您的代码有几个缺陷。我不会解决所有问题，但会给你指出正确的方向
首先，read
将整个文件作为字符串读取。我不认为这是你在这里的意图。使用readlines（）
将文件中的所有行作为列表获取
def parse_documentation(filename):
    filename=open(filename, "r") 
    lines = filename.readlines(); # returns a list of all lines in file
    invalidsymbols=["`","~","!", "@","#","$"]
    freq = {} # declare this OUTSIDE of your loop.
    for line in lines:
        for letter in line:
            if letter in invalidsymbols:
                print(letter) 
                line = line.replace(letter, ""))
        print line #this should print the line without invalid symbols.

        words = line.split() # Now get the words.

        for word in line:
            count=counter(word)
            # ... Do your counter stuff here ...

    return freq

第二，我对你的计数器的工作方式非常怀疑。如果您打算计算字数，您可以采用以下策略：
检查word
是否在freq
中
如果它不在freq
中，请将其添加并映射到1。否则，增加单词
先前映射到的数字
这会让你走上正确的轨道。
检查一下，这可能是你想要的。顺便说一句，您的代码不正确Python
code。这里有很多问题
from collections import Counter

def parse_documentation(filename):
    with open(filename,"r") as fin:
        lines = fin.read()
    #for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
    lines = lines.translate(None,"`~!@#$")    #thanks to @gnibbler's comment
    freq = Counter(lines.split())
    return freq

文本文件：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

结果：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

检查这个，它可能是你想要的。顺便说一句，您的代码不正确Python
code。这里有很多问题
from collections import Counter

def parse_documentation(filename):
    with open(filename,"r") as fin:
        lines = fin.read()
    #for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
    lines = lines.translate(None,"`~!@#$")    #thanks to @gnibbler's comment
    freq = Counter(lines.split())
    return freq

文本文件：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

结果：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

检查这个，它可能是你想要的。顺便说一句，您的代码不正确Python
code。这里有很多问题
from collections import Counter

def parse_documentation(filename):
    with open(filename,"r") as fin:
        lines = fin.read()
    #for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
    lines = lines.translate(None,"`~!@#$")    #thanks to @gnibbler's comment
    freq = Counter(lines.split())
    return freq

文本文件：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

结果：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

检查这个，它可能是你想要的。顺便说一句，您的代码不正确Python
code。这里有很多问题
from collections import Counter

def parse_documentation(filename):
    with open(filename,"r") as fin:
        lines = fin.read()
    #for sym in ["`","~","!","@","#","$"]: lines = lines.replace(sym,'')
    lines = lines.translate(None,"`~!@#$")    #thanks to @gnibbler's comment
    freq = Counter(lines.split())
    return freq

文本文件：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

结果：
this is a text. text is that. @this #that
$this #!that is those

Counter({'this': 3, 'is': 3, 'that': 2, 'a': 1, 'that.': 1, 'text': 1, 'text.': 1, 'those': 1})

您能否再次检查以确保您发布的代码中的缩进是正确的？代码不正确，例如计数器（word）未定义。您能否再次检查以确保您发布的代码中的缩进是正确的？代码不正确，例如计数器（word）未定义。您能否再次检查以确保您发布的代码中的缩进是正确的？代码不正确，例如未定义计数器（word）。您能否再次检查以确保您发布的代码中的缩进是正确的？代码不正确，例如未定义计数器（word）。除了上述代码行。请更换（字母“”
将只返回不带[symbol=字母的当前值”
]的新副本…打印不带无效符号的行行=行。替换（字母“”
应该是现在看起来很好的：）@adil或更好：行=字符串。翻译（行，无，删除=无效符号）
但是无效符号
必须是“~！@…”
“filename”可能会使文件名混淆object@HennyH，通常您会说line=line.translate（无，删除=无效符号）
。我甚至只使用了line=line.translate（None，invalidsymbols）
除了上面的代码line.replace（字母“”）
将只返回一个新的副本，而不返回[symbol=当前的字母值
]。。要打印不带无效符号的行，请执行以下操作：）@adil或更好的操作：line=string.translate（行，无，删除）