Python 如何更优雅地将词典列表转换为另一种格式？_Python_List_Dictionary_For Loop

Python 如何更优雅地将词典列表转换为另一种格式？

python list dictionary for-loop

Python 如何更优雅地将词典列表转换为另一种格式？,python,list,dictionary,for-loop,Python,List,Dictionary,For Loop,我有一个json文件，其中包含一些关于单词的信息。该结构是一个包含DICT的列表，如下所示： file = [{"index": "1", "text": "uhm", "eos": false}, {"index": "2", "text": "moeten", "eos": false}, {"index": "3", "text": "langs", "eos": false}, {"index": "4", "text": "uhm", "eos": true}, {"index": "

我有一个json文件，其中包含一些关于单词的信息。该结构是一个包含DICT的列表，如下所示：

file = [{"index": "1", "text": "uhm", "eos": false}, {"index": "2", "text": "moeten", "eos": false}, {"index": "3", "text": "langs", "eos": false}, {"index": "4", "text": "uhm", "eos": true}, {"index": "1", "text": "uh", "eos": false}, {"index": "2", "text": "om", "eos": false}, {"index": "3", "text": "die", "eos": false}, {"index": "4", "text": "afsluiters", "eos": true}]

为了做进一步的分析，我需要对数据进行预处理。因此，我编写了以下函数。它在工作，但看起来不太优雅。如何改进它以使其更具可读性、更少冗余、更美观=）

如果编译

prepare（file）

，它将返回如下列表：

[{'sentence' : 'uhm moeten langs uhm', 'chunk' : [{'1' : 'uhm'}, {'2' : 'uhm moeten'}, {'3' : 'uhm moeten langs'}, {'4' : 'uhm moeten langs uhm'}]}]

我假设每一个句子都有4个句子块。如果不是这样，我相信你可以很容易地调整我的代码，但现在它的硬编码为4项排序。但这是可以改变的。我决定以列表的方式输出这些信息。对我来说，精简列表比字典更容易使用，因此我的制作方法如下：这将是一个充满以下项目的列表

sentence,uhm moeten langs uhm : sentence is made up of the following chunks : 1,uhm : 2,uhm moeten : 3,uhm moeten langs : 4,uhm moeten langs uhm

列表中的下一项是

sentence,uh om die afsluiters : sentence is made up of the following chunks : 1,uh : 2,uh om : 3,uh om die : 4,uh om die afsluiters

我这样做的原因是因为它很容易分割，你可以很容易地得到你想要的每一件物品，例如，你可以分割

" : "

然后在你可以循环和分割之后

","

得到非常好的东西

您的代码在最后对我来说如下所示

def prepare(file):

    # set up variables
    text = []
    sent_dict = {}
    sentence = ""
    chunks = []
    ngram = ""
    maxn = 5

    for word in file:

        if word["eos"] == False:
            # concatenate words
            sentence += word["text"] + " "


            chunk = " ".join(sentence.split(" ")[:-1][-maxn:])
            index = word["index"]
            chunks.append({index: {"ngram" : chunk}})

        else:

            sentence += word["text"]

            chunk = " ".join(sentence.split(" ")[-maxn:])
            index = word["index"]
            chunks.append({index: {"ngram" : chunk}})

            sent_dict["sentence"] = sentence
            sent_dict["chunks"] = chunks
            text.append(sent_dict)

            sent_dict = {}
            sentence = ""
            chunks = []

    return(text)



file = [{"index": "1", "text": "uhm", "eos": False}, {"index": "2", "text": "moeten", "eos": False}, {"index": "3", "text": "langs", "eos": False}, {"index": "4", "text": "uhm", "eos": True}, {"index": "1", "text": "uh", "eos": False}, {"index": "2", "text": "om", "eos": False}, {"index": "3", "text": "die", "eos": False}, {"index": "4", "text": "afsluiters", "eos": True}]



final_list = []
x = (prepare(file))
for i in x:
    new_string = "sentence,{} : sentence is made up of the following chunks : 1,{} : 2,{} : 3,{} : 4,{}".format(i["sentence"], i["chunks"][0]["1"]["ngram"], i["chunks"][1]["2"]["ngram"], i["chunks"][2]["3"]["ngram"], i["chunks"][3]["4"]["ngram"])
    final_list.append(new_string)

记住，以我的方式格式化的项目列表称为最终列表。如果您循环并打印每个项目，您将看到我向您展示的内容。希望这更容易使用。

我假设每一个句子都有4个块。如果不是这样，我相信你可以很容易地调整我的代码，但现在它的硬编码为4项排序。但这是可以改变的。我决定以列表的方式输出这些信息。对我来说，精简列表比字典更容易使用，因此我的制作方法如下：这将是一个充满以下项目的列表

sentence,uhm moeten langs uhm : sentence is made up of the following chunks : 1,uhm : 2,uhm moeten : 3,uhm moeten langs : 4,uhm moeten langs uhm

列表中的下一项是

sentence,uh om die afsluiters : sentence is made up of the following chunks : 1,uh : 2,uh om : 3,uh om die : 4,uh om die afsluiters

我这样做的原因是因为它很容易分割，你可以很容易地得到你想要的每一件物品，例如，你可以分割

" : "

然后在你可以循环和分割之后

","

得到非常好的东西

您的代码在最后对我来说如下所示

def prepare(file):

    # set up variables
    text = []
    sent_dict = {}
    sentence = ""
    chunks = []
    ngram = ""
    maxn = 5

    for word in file:

        if word["eos"] == False:
            # concatenate words
            sentence += word["text"] + " "


            chunk = " ".join(sentence.split(" ")[:-1][-maxn:])
            index = word["index"]
            chunks.append({index: {"ngram" : chunk}})

        else:

            sentence += word["text"]

            chunk = " ".join(sentence.split(" ")[-maxn:])
            index = word["index"]
            chunks.append({index: {"ngram" : chunk}})

            sent_dict["sentence"] = sentence
            sent_dict["chunks"] = chunks
            text.append(sent_dict)

            sent_dict = {}
            sentence = ""
            chunks = []

    return(text)



file = [{"index": "1", "text": "uhm", "eos": False}, {"index": "2", "text": "moeten", "eos": False}, {"index": "3", "text": "langs", "eos": False}, {"index": "4", "text": "uhm", "eos": True}, {"index": "1", "text": "uh", "eos": False}, {"index": "2", "text": "om", "eos": False}, {"index": "3", "text": "die", "eos": False}, {"index": "4", "text": "afsluiters", "eos": True}]



final_list = []
x = (prepare(file))
for i in x:
    new_string = "sentence,{} : sentence is made up of the following chunks : 1,{} : 2,{} : 3,{} : 4,{}".format(i["sentence"], i["chunks"][0]["1"]["ngram"], i["chunks"][1]["2"]["ngram"], i["chunks"][2]["3"]["ngram"], i["chunks"][3]["4"]["ngram"])
    final_list.append(new_string)

记住，以我的方式格式化的项目列表称为最终列表。如果您循环并打印每个项目，您将看到我向您展示的内容。希望这更易于使用。

请向我们展示您编写的函数的示例输出。还向我们展示一个您想要的输出示例。如果您将问题顶部的列表插入到函数中，它将准确返回我想要的输出。这是一个有效的例子是的，但无论如何请发布一个例子。很多人不用看代码就可以想出解决方案。请给我们展示您编写的函数的输出示例。还向我们展示一个您想要的输出示例。如果您将问题顶部的列表插入到函数中，它将准确返回我想要的输出。这是一个有效的例子是的，但无论如何请发布一个例子。很多人不用看代码就可以想出解决方案。记住，输出中的“句子由以下部分组成”也可以在拆分原始字符串后轻松删除，这是因为它总是在拆分列表的第一个位置，所以如果需要的话，您可以随时将其弹出。希望代码有帮助。谢谢你的回答。这个想法很好，似乎对一些作业很有用。但是，我需要指定的输出，不想改进输出格式，而是返回该输出的函数。此外，句子可以由4个以上的语块组成，语块的数量因句子而异。请记住，输出中表示“句子由以下语块组成”的部分也可以在拆分原始字符串后轻松删除，这是因为它总是在拆分列表的第一个位置，所以如果需要的话，您可以随时将其弹出。希望代码有帮助。谢谢你的回答。这个想法很好，似乎对一些作业很有用。但是，我需要指定的输出，不想改进输出格式，而是返回该输出的函数。此外，句子可以由4个以上的语块组成，语块的数量因句子而异。