如何从非结构化文本创建python字典？_Python_Python 2.7_Dictionary

如何从非结构化文本创建python字典？

python python-2.7 dictionary

如何从非结构化文本创建python字典？,python,python-2.7,dictionary,Python,Python 2.7,Dictionary,我有一组文本文件中存在的断开链接检查器结果： Getting links from: https://www.foo.com/ ├───OK─── http://www.this.com/ ├───OK─── http://www.is.com/ ├─BROKEN─ http://www.broken.com/ ├───OK─── http://www.set.com/ ├───OK─── http://www.one.com/ 5 links found. 0 excluded. 1 brok

我有一组文本文件中存在的断开链接检查器结果：

Getting links from: https://www.foo.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
├─BROKEN─ http://www.broken.com/
├───OK─── http://www.set.com/
├───OK─── http://www.one.com/
5 links found. 0 excluded. 1 broken.

Getting links from: https://www.bar.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
├─BROKEN─ http://www.broken.com/
3 links found. 0 excluded. 1 broken.

Getting links from: https://www.boo.com/
├───OK─── http://www.this.com/
├───OK─── http://www.is.com/
2 links found. 0 excluded. 0 broken.

我正在尝试编写一个脚本，读取文件并创建一个字典列表，其中每个根链接作为键，其子项作为值（包括摘要行）

我试图实现的输出如下所示：

{"Getting links from: https://www.foo.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "├─BROKEN─ http://www.broken.com/", "├───OK─── http://www.set.com/", "├───OK─── http://www.one.com/", "5 links found. 0 excluded. 1 broken."], 
"Getting links from: https://www.bar.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "├─BROKEN─ http://www.broken.com/", "3 links found. 0 excluded. 1 broken."],
"Getting links from: https://www.boo.com/": ["├───OK─── http://www.this.com/", "├───OK─── http://www.is.com/", "2 links found. 0 excluded. 0 broken."] }

以下是我到目前为止的情况：

result_list = []

with open('link_checker_result.txt', 'r') as f:
    temp_list = f.readlines()
    for line in temp_list:
        result_list.append(line)

这给了我输出：

['Getting links from: https://www.foo.com/', '├───OK─── http://www.this.com/', '├───OK─── http://www.is.com/', '├─BROKEN─ http://www.broken.com/', '├───OK─── http://www.set.com/', '├───OK─── http://www.one.com/', '5 links found. 0 excluded. 1 broken.', 'Getting links from: https://www.bar.com/', '├───OK─── http://www.this.com/', '├───OK─── http://www.is.com/', '...'  ]

我认识到这些集合都有一些共同的特性，例如，它们之间有一个空行，或者它们以“获取…”开头。这是我写字典之前应该试着说的吗

我对Python还不熟悉，所以我承认我甚至不确定我是否朝着正确的方向前进。我真的很感激一些专家的眼光！提前谢谢

这将产生您想要的结果：

result = {}

with open('link_checker_result.txt', 'r') as f:
    temp_list = f.readlines()
    key = ''
    value = []
    for line in temp_list:
        if not line:
            result[key] = value
            key = ''
            value = []
        elif not key:
            key = line
        else:
            value.append(line)

    if key:
      result[key] = value

这实际上可能很短，在4行代码内：

finalDict = {}
with open('link_checker_result.txt', 'r') as f:
    lines = list(map(lambda line: line.split('\n'),f.read().split('\n\n')))
    finalDict = dict((elem[0],elem[1:]) for elem in lines)
print(finalDict)

输出：

{'Getting links from: https://www.foo.com/': ['+---OK--- http://www.this.com/', '+---OK--- http://www.is.com/', '+-BROKEN- http://www.broken.com/', '+---OK--- http://www.set.com/', '+---OK--- http://www.one.com/'], 'Getting links from: https://www.bar.com/': ['+---OK--- http://www.this.com/', '+---OK--- http://www.is.com/', '+-BROKEN- http://www.broken.com/'], 'Getting links from: https://www.boo.com/': ['+---OK--- http://www.this.com/', '+---OK--- http://www.is.com/']}

上面的代码所做的是，读取输入文件并使用两个连续的换行符将其拆分，以获取每个url的链接

最后，它创建第一个元素和每个列表的其余元素的元组，并将它们转换为

finalDict

字典中的键值对

下面是一种更容易理解的方法：

finalDict = {}
with open('link_checker_result.txt', 'r') as f:
    # Getting data and splitting in order to get each url and its links as a unique list element.
    data = f.read().split('\n\n')
    # Splitting each of the above created elements and discarding the last one which is redundant.
    links = [line.split('\n') for line in data]
    # Transforming these elements into key-value pairs and inserting them in the dictionary.
    finalDict = dict((elem[0],elem[1:]) for elem in links)
print(finalDict)

我不明白这怎么是无组织的。有一个标题，有以

├─，还有一个摘要，它不是以├─。如何更结构化？您是否能够修改生成文本文件的代码？将数据加载到该代码中的字典中，而不是从该代码的工件中加载，这将更加简单！谢谢你，同志！