Python 将dataframe中的JSON列转换为简单的值数组_Python_Json_Pandas_Jupyter Notebook

Python 将dataframe中的JSON列转换为简单的值数组

python json pandas jupyter-notebook

Python 将dataframe中的JSON列转换为简单的值数组,python,json,pandas,jupyter-notebook,Python,Json,Pandas,Jupyter Notebook,我试图将bbox（boundingbox）列中的JSON转换为Jupyter笔记本中python中DL项目的简单值数组可能的标签包括以下类别：[玻璃、纸板、垃圾、金属、纸张] [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}] TO ([191 70 183 311], 0) 我正在寻找帮助，将bbox列从JSON对象转换为单个CSV，其中包含所有图像名称和相关bbox 更新当前列是一个系列，因此每当我尝试

我试图将bbox（boundingbox）列中的JSON转换为Jupyter笔记本中python中DL项目的简单值数组

可能的标签包括以下类别：[玻璃、纸板、垃圾、金属、纸张]

[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]

TO

([191 70 183 311], 0)

我正在寻找帮助，将bbox列从JSON对象转换为单个CSV，其中包含所有图像名称和相关bbox

更新

当前列是一个系列，因此每当我尝试对该列应用JSON操作时，都会得到一个“TypeError:JSON对象必须是str、bytes或bytearray，而不是“series”。到目前为止，我已经尝试将列转换为JSON对象，然后从键中取出值

BB_CSV

您需要使用JSON解码器：

编辑：

如果要将从字典中提取值的操作映射到列表的所有元素，可以执行以下操作：

extracted = []
for element in li:
    result = ([element[key] for key in "left top width height".split()], 0)
    extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10

df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())

同样，根据我的评论，如果您不想在列表中提取的数字之间使用逗号，您可以使用：

without_comma = []
for element, zero in extracted:
    result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
    without_comma.append(result_string)

看起来您的

bbox

列的每一行都包含一个

字典

在

列表中。我试着复制你的问题如下编辑：说明以下解决方案假设您所指的“JSON对象”表示为一个列表，其中包含一个字典，根据您的示例和屏幕截图，该字典看起来是这样的
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])

# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]

现在，要简单地解压缩该行，您可以执行以下操作：
extracted = []
for element in li:
    result = ([element[key] for key in "left top width height".split()], 0)
    extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10

df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())

这将为您提供一个包含5个项的元组的新列
如果您想进一步应用标签，可能需要创建一个字典来包含标签逻辑。根据评论中给出的示例，我已经完成了：
labels = {
    'cardboard': 1,
    'trash': 2,
    'glass': 3
}

如果您希望一行解决方案而不编写自己的函数，那么这应该可以得到您想要的布局
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))

更具可读性的解决方案是使用.apply（）
方法定义自己的函数Edit:由于您的JSON对象在DataFrame
行中存储为str
，因此我添加了JSON.loads（row）
以在检索键之前先处理字符串。您需要导入json
才能运行
import json    

def unpack_bbox(row, labels):

    # load the string into a JSON object (in this
    # case a list of length one containing the dictionary;
    # index the list to its first item [0] and use the .values()
    # dictionary method to access the values only 

    keys = list(json.loads(row)[0].values())

    bbox_values = keys[:4]
    bbox_label = keys[-1]

    label_value = labels.get(bbox_label)

    return bbox_values, label_value

df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))

0代表什么？@W-B 0是表示标签的数字。1是硬纸板，2是垃圾等等forth@cleme001，到目前为止，您是否尝试过任何方法来实现它，这也将提供真正需要的线索。您可以在“玻璃纸板废金属纸”中添加assert dictionary[“label”]。split（）
如果您想检查标签值是否正确，如果不希望bbox中的值之间有逗号，请使用print（（[{}]，0）”.format（“.join（str（d[key]）作为“left top width height”中的键。split（）））
因此，我设法让它在我的用例中起作用，稍微调整了一下代码，但从那时起。split只对json对象的第一个实例起作用。我有多个注释的图片没有正确转换。e、 g.[{“左”：191，“顶”：70，“宽”：183，“高”：311，“标签”：“玻璃”}，{“左”：200，“顶”：60，“宽”：132，“高”：318，“标签”：“玻璃”}]出于某种原因，我在做BB_CSV['bbox'][0]时得到了多行，在帖子中添加了一个图像。@cleme001 thedf['bbox'][0]
snippet只是为了展示我是如何将您的示例列表分配给一行示例DataFrame
来复制您的问题的。您是否尝试过BB_CSV['bbox'].map（lambda x:x[0].values（））
解压您的行？问题似乎在于它本身不是一个字符串，而是一个numpyndarray，因此我遇到了此错误。BB_CSV['bbox'][0]。值（）--------------------------------------------------------------------------------------类型错误回溯（最近一次调用最后一次）在（）--->1 BB_CSV['bbox'][0]。值（）TypeError:“numpy.ndarray”对象不可调用。另外，如果您查看我附加到原始帖子上的图像，您会发现，即使我尝试查看一行，我也会返回5个不同的内容，而不仅仅是行的值。@cleme001您的数据帧在其索引中是否有重复的值？您能否尝试df.reset_index（drop=True，inplace=False）
并重试？