Python 如何将此文本文件转换为字典？_Python_Dictionary

Python 如何将此文本文件转换为字典？

python dictionary

Python 如何将此文本文件转换为字典？,python,dictionary,Python,Dictionary,我有一个文件f，看起来像：标签那里是某物在这里 #标签在这里也 ... 一行上可以有多个标签和任意数量的元素（仅str），每个标签可以有几行。我希望将此数据存储在字典中，如： d = {'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell', ...} 我有几个子问题：我如何利用#字符来知道新条目何时到位如何将其移除，并将后续内容保留到生产线结束如何才能将紧跟在新行上的每个字符串追加，直到再次弹出# 文件完

我有一个文件

，看起来像：

标签那里是某物在这里 #标签在这里也 ... 一行上可以有多个标签和任意数量的元素（仅str），每个标签可以有几行。我希望将此数据存储在字典中，如：

d = {'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell', ...}

我有几个子问题：

我如何利用#字符来知道新条目何时到位

如何将其移除，并将后续内容保留到生产线结束

如何才能将紧跟在新行上的每个字符串追加，直到再次弹出#

文件完成后，我如何停止

首先，

mydict

包含以#开头的键，值是一个列表（list可以保持行的追加顺序），我们将行追加到此列表中，直到找到以#开头的下一行。然后我们只需要将行列表转换为一个字符串

我使用的是python3，如果您使用python2将

mydict.items（）

替换为

mydict.iterms（）

来迭代键值对

mydict = dict()
with open("sample.csv") as inputs:
    for line in inputs:
        if line.startswith("#"):
            key = line.strip()[1:]
            mydict.setdefault(key,list())
        else:
            mydict[key].append(line.strip())

result = dict()
for key, vlist in mydict.items():
    result[key] = "".join(vlist)

print(result)

输出：

{'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell'}

使用集合。默认dict：

from collections import defaultdict

d = defaultdict(list)

with open('f.txt') as file:
    for line in file:
        if line.startswith('#'):
            key = line.lstrip('#').rstrip('\n')
        else:
            d[key].append(line.rstrip('\n'))
for key in d:
    d[key] = ''.join(d[key])

使用函数的最短解：

import re 

with open("lines.txt", 'r') as fh:
    d = {k:v.replace('\n', '') for k,v in re.findall(r'^#(\w+)\s([^#]+)', fh.read(), re.M)}

print(d)

输出：

{'label_Bbb': 'hereaswell', 'labelA': 'thereissomethinghere'}

re.findall

将返回一个元组列表，每个元组包含两个表示两个连续捕获组的项，作为一个单一过程，而不生成临时字典：

f = open('untitled.txt', 'r')

line = f.readline()
d = {}
last_key = None
last_element = ''
while line:
    if line.startswith('#'):
        if last_key:
            d[last_key] = last_element
            last_element = ''
        last_key = line[:-1]
        last_element = ''
    else:
        last_element += line
    line = f.readline()

d[last_key] = last_element

res = {}
with open("sample") as lines:
    try:
        line = lines.next()
        while True:
            entry = ""
            if line.startswith("#"):
                next = lines.next()
                while not next.startswith("#"):
                    entry += next
                    next = lines.next()
            res[line[1:]] = entry
            line = next
    except StopIteration:
        res[line[1:]] = entry  # Catch the last entry

我会这样做（这是伪代码，所以不会编译！）

我的做法如下：

def eachChunk(stream):
  key = None
  for line in stream:
    if line.startswith('#'):
      line = line.rstrip('\n')
      if key:
        yield key, value
      key = line[1:]
      value = ''
    else:
      value += line
  yield key, value

您可以像这样快速创建所需的词典：

with open('f') as data:
  d = dict(eachChunk(data))

循环文件中的每一行，检查它是否以

字符开始。使用str[1://code>获取字符串的其余部分，并将其作为键添加到字典中。然后每隔一行附加到该键，直到找到另一个
。标签是否可以出现多次？我是说，被其他标签打断了？那结果会怎样？哇！这很快，也很切题。首先让我了解您在这里做了什么，我将回答：D我不知道.startswith函数。非常感谢你key=line.strip（）[1://code>获取不带#
@sparkandshine的字符串谢谢提醒，编辑了它！谢谢大家！我发现了一些新的功能，将来可能会对我有所帮助。到目前为止，对我来说最直观的答案，尽管可能不是最快的答案，是那些只使用纯字符串操作的答案，而不是那些使用特定函数的答案。答案很好！真的干净整洁+1我喜欢正则表达式，我真的喜欢。我已经对他们做了一些肮脏的事情，这些事情会让你脸红。但是我从来不会把r'^#（\w+）\s（[^#]+）“
称为整洁和干净。最好使用with再次自动关闭文件。您的解决方案会留下一个打开的文件。
with open('f') as data:
  d = dict(eachChunk(data))