Python KeyError:“;所有[';索引';]都不在“列”中;

Python KeyError:“;所有[';索引';]都不在“列”中;,python,pandas,huggingface-datasets,Python,Pandas,Huggingface Datasets,下面是一个json文件: { "id": "68af48116a252820a1e103727003d1087cb21a32", "article": [ "by mark duell .", "published : .", "05:58 est , 10 september 2012 .", &

下面是一个json文件:

{
    "id": "68af48116a252820a1e103727003d1087cb21a32",
    "article": [
        "by mark duell .",
        "published : .",
        "05:58 est , 10 september 2012 .",
        "| .",
        "updated : .",
        "07:38 est , 10 september 2012 .",
        "a pet owner starved her two dogs so badly that one was forced to eat part of his mother 's dead body in a desperate attempt to survive .",
        "the mother died a ` horrendous ' death and both were in a terrible state when found after two weeks of starvation earlier this year at the home of katrina plumridge , 31 , in grimsby , lincolnshire .",
        "the barely-alive dog was ` shockingly thin ' and the house had a ` nauseating and overpowering ' stench , grimsby magistrates court heard .",
        "warning : graphic content .",
        "horrendous : the male dog , scrappy -lrb- right -rrb- , was so badly emaciated that he ate the body of his mother ronnie -lrb- centre -rrb- to try to survive at the home of katrina plumridge in grimsby , lincolnshire .",
        "the suffering was so serious that the female staffordshire bull terrier , named ronnie , died of starvation , nigel burn , prosecuting , told the court last friday .",
        "suspended jail term : the dogs were in a terrible state when found after two weeks of starvation at the home of katrina plumridge , 31 -lrb- pictured -rrb- .",
        "the male dog , her son scrappy , was so badly emaciated that he ate her body to try to survive .",
    ],
    "abstract": [
        "neglect by katrina plumridge saw staffordshire bull terrier ronnie die .",
        "dog 's son scrappy was forced to eat her to survive at grimsby house .",
        "alarm raised by letting agent shocked by ` thinnest dog he 'd ever seen '",
    ]
}
我已经运行了
df=pd.read_json('100252.json')
,但是我得到了一个错误:
ValueError:数组必须都是相同长度的

然后我试着

with open('100252.json') as json_data: 
    data = json.load(json_data) 

pd.DataFrame.from_dict(data, orient='index').T.set_index('index')
但是我得到了一个错误:“['index']都不在列中”

我怎样才能解决这个问题?我不知道我的错误是从哪里来的。这就是为什么我需要你的帮助

编辑

资料来源:

从这个网站上,我想做一些类似的事情

>>> from datasets import Dataset
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1, 2, 3]})
>>> dataset = Dataset.from_pandas(df)

我必须将json文件传输到数据帧中,然后使用数据集库从pandas获取数据集

dataset
输入必须是一个具有相同大小列表作为值的dict。所以

  • 将句子连接成一个字符串并创建一个元素列表
  • 您的数据集将包含一行

  • 对齐列表。例如,用空字符串填充

  • 你想达到什么目的?@AlexanderVolkovsky让我编辑我的代码,向你解释我想要什么。@AlexanderVolkovsky你有更好的理解吗?我不理解想要的输出。您是否正在尝试使用列
    [“id”、“article”、“abstract”]
    创建数据框?如果是这样,您只需要用连接的字符串替换数组。请发布所需输出数据帧的片段。这篇文章和摘要似乎是一个句子分开的文件。是否要将每个句子加载到一行中,是否应将所有句子合并到一个单元格中?现在还不清楚输出应该是什么样子。谢谢你的回答!然而,我不必加入这个句子。它必须是一个句子列表。你能修改它吗
    from datasets import Dataset
    with open('100252.json') as json_data: 
        data = json.load(json_data)
    
    data['id'] = [data['id']]
    data['article'] = ["\n".join(data['article'])]
    data['abstract'] = ["\n".join(data['abstract'])]
    
    Dataset.from_dict(data)
    
    max_len = max([len(data[col]) for col in ['article', 'abstract'] ])
    
    data['id'] = [data['id']] * max_len
    data['article'] = data['article'] + [""] * (max_len - len(data['article'])) 
    data['abstract'] = data['abstract'] + [""] * (max_len - len(data['abstract'])) 
    Dataset.from_dict(data)