Python 3.x 将文件转换为dict
预期产出:Python 3.x 将文件转换为dict,python-3.x,dictionary,Python 3.x,Dictionary,预期产出: my_file = "The Itsy Bitsy Spider went up the water spout. Down came the rain & washed the spider out. Out came the sun & dried up all the rain, And the Itsy Bitsy Spider went up the spout again. " 我的代码: {'the': ['itsy', 'water', 'rain'
my_file = "The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again. "
我的代码:
{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'], 'itsy': ['bitsy', 'bitsy'], 'bitsy': ['spider', 'spider'], 'spider': ['went', 'out', 'went'], 'went': ['up', 'up'], 'up': ['the', 'all', 'the'], 'water': ['spout'], 'spout': ['down', 'again'], 'down': ['came'], 'came': ['the', 'the'], 'rain': ['washed', 'and'], 'washed': ['the'], 'out': ['out', 'came'], 'sun': ['dried'], 'dried': ['up'], 'all': ['the'], 'and': ['the'], 'again': []}
您可以通过以下几个概念重现预期结果: 给定的
import string
words_set = {}
for line in my_file:
lower_text = line.lower()
for word in lower_text.split():
word = word.strip(string.punctuation + string.digits)
if word:
if word in words_set:
words_set[word] = words_set[word] + 1
else:
words_set[word] = 1
代码
import string
import itertools as it
import collections as ct
data = """\
The Itsy Bitsy Spider went up the water spout.
Down came the rain & washed the spider out.
Out came the sun & dried up all the rain,
And the Itsy Bitsy Spider went up the spout again.
"""
演示
def clean_string(s:str) -> str:
"""Return a list of lowered strings without punctuation."""
table = str.maketrans("","", string.punctuation)
return s.lower().translate(table).replace(" ", " ").replace("\n", " ")
def get_neighbors(words:list) -> dict:
"""Return a dict of right-hand, neighboring words."""
dd = ct.defaultdict(list)
for word, nxt in it.zip_longest(words, words[1:], fillvalue=""):
dd[word].append(nxt)
return dict(dd)
{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'],
'itsy': ['bitsy', 'bitsy'],
'bitsy': ['spider', 'spider'],
'spider': ['went', 'out', 'went'],
'went': ['up', 'up'],
'up': ['the', 'all', 'the'],
'water': ['spout'],
'spout': ['down', 'again'],
'down': ['came'],
'came': ['the', 'the'],
'rain': ['washed', 'and'],
'washed': ['the'],
'out': ['out', 'came'],
'sun': ['dried'],
'dried': ['up'],
'all': ['the'],
'and': ['the'],
'again': ['']}
结果
words = clean_string(data).split()
get_neighbors(words)
ct.Counter(words)
详细信息
clean_字符串
- 您可以使用任意数量的方法来实现。这里我们使用翻译表来替换大部分标点符号。其他可通过
直接删除str.replace()
获取邻居
- A把清单记下来。如果缺少键,则生成新的列表值
- 我们通过迭代两个并列的单词列表,一个在另一个之前,来完成口述
- 这些列表是列表,用空字符串填充较短的列表李>
确保返回一个简单的dictdict(dd)
如果您只想计算字数: 演示
def clean_string(s:str) -> str:
"""Return a list of lowered strings without punctuation."""
table = str.maketrans("","", string.punctuation)
return s.lower().translate(table).replace(" ", " ").replace("\n", " ")
def get_neighbors(words:list) -> dict:
"""Return a dict of right-hand, neighboring words."""
dd = ct.defaultdict(list)
for word, nxt in it.zip_longest(words, words[1:], fillvalue=""):
dd[word].append(nxt)
return dict(dd)
{'the': ['itsy', 'water', 'rain', 'spider', 'sun', 'rain', 'itsy', 'spout'],
'itsy': ['bitsy', 'bitsy'],
'bitsy': ['spider', 'spider'],
'spider': ['went', 'out', 'went'],
'went': ['up', 'up'],
'up': ['the', 'all', 'the'],
'water': ['spout'],
'spout': ['down', 'again'],
'down': ['came'],
'came': ['the', 'the'],
'rain': ['washed', 'and'],
'washed': ['the'],
'out': ['out', 'came'],
'sun': ['dried'],
'dried': ['up'],
'all': ['the'],
'and': ['the'],
'again': ['']}
结果
words = clean_string(data).split()
get_neighbors(words)
ct.Counter(words)
我能用上面的代码数一数重复的单词。你的问题是什么?你的代码都没有格式化。你的输出有什么逻辑要求吗