Python 在数据帧中加载多文档yaml文件时缺少第一个文档_Python_Pandas_Pyyaml

Python 在数据帧中加载多文档yaml文件时缺少第一个文档

python pandas

Python 在数据帧中加载多文档yaml文件时缺少第一个文档,python,pandas,pyyaml,Python,Pandas,Pyyaml,我尝试将一个多文档yaml文件（即，由多个yaml文档组成的yaml文件，以“---”分隔）加载到一个数据帧中。由于某些原因，第一个文档不会在数据帧中结束。如果yaml.safe\u load\u all的输出首先具体化为一个列表（而不是将迭代器输入到pd.io.json.json\u normalize），则所有文档都会在数据帧中结束。我可以用下面的示例代码（在一个完全不同的yaml文件上）重现这一点导入操作系统进口yaml 作为pd进口熊猫导入urllib.request #多文档ya

我尝试将一个多文档yaml文件（即，由多个yaml文档组成的yaml文件，以“---”分隔）加载到一个数据帧中。由于某些原因，第一个文档不会在数据帧中结束。如果

yaml.safe\u load\u all

的输出首先具体化为一个列表（而不是将迭代器输入到

pd.io.json.json\u normalize

），则所有文档都会在数据帧中结束。我可以用下面的示例代码（在一个完全不同的yaml文件上）重现这一点

导入操作系统
进口yaml
作为pd进口熊猫
导入urllib.request
#多文档yaml的公开示例
inputfilepath=os.path.expanduser（“~/my\u example.yaml”）
url=”https://raw.githubusercontent.com/kubernetes/examples/master/guestbook/all-in-one/guestbook-all-in-one.yaml"
urllib.request.urlretrieve（url，inputfilepath）
以open（inputfilepath，'r'）作为流：
df1=pd.io.json.json\u规范化（yaml.safe\u load\u all（流））
以open（inputfilepath，'r'）作为流：
df2=pd.io.json.json\u规范化（[x代表yaml.safe\u load\u all（stream）]）
打印（f'Output table shape with iterator:{df1.shape}'）
打印（f'输出表形状，迭代器具体化为列表：{df2.shape}'）

我希望两个结果相同，但我得到：

Output table shape with iterator: (5, 18)
Output table shape with iterator materialized as list: (6, 18)

你知道为什么这些结果不同吗？

请查看此网站了解更多信息

df1

缺少第一行数据，因为您传递的是迭代器而不是iterable

嵌套的\u-to\u记录的内部

函数：

new_d = copy.deepcopy(d)
for k, v in d.items():
    # each key gets renamed with prefix
    if not isinstance(k, compat.string_types):
        k = str(k)
    if level == 0:
        newkey = k
    else:
        newkey = prefix + sep + k

    # only dicts gets recurse-flattend
    # only at level>1 do we rename the rest of the keys
    if not isinstance(v, dict):
        if level != 0:  # so we skip copying for top level, common case
            v = new_d.pop(k)
            new_d[newkey] = v
        continue
    else:
        v = new_d.pop(k)
        new_d.update(nested_to_record(v, newkey, sep, level + 1))
new_ds.append(new_d)

行

d.items（）

是评估生成器的地方，在循环中，您可以看到它们跳过了第一个“级别”，这是您的第一条记录。

感谢您的解释和研究！我认为这应该被熊猫捕捉到：

if any([isinstance(x, dict)
    for x in compat.itervalues(y)] for y in data):
        # naive normalization, this is idempotent for flat records
        # and potentially will inflate the data considerably for
        # deeply nested structures:
        #  {VeryLong: { b: 1,c:2}} -> {VeryLong.b:1 ,VeryLong.c:@}
        #
        # TODO: handle record value which are lists, at least error
        #       reasonably
        data = nested_to_record(data, sep=sep)
    return DataFrame(data)

new_d = copy.deepcopy(d)
for k, v in d.items():
    # each key gets renamed with prefix
    if not isinstance(k, compat.string_types):
        k = str(k)
    if level == 0:
        newkey = k
    else:
        newkey = prefix + sep + k

    # only dicts gets recurse-flattend
    # only at level>1 do we rename the rest of the keys
    if not isinstance(v, dict):
        if level != 0:  # so we skip copying for top level, common case
            v = new_d.pop(k)
            new_d[newkey] = v
        continue
    else:
        v = new_d.pop(k)
        new_d.update(nested_to_record(v, newkey, sep, level + 1))
new_ds.append(new_d)