Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/289.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将多个yaml文件读取到数据帧_Python_Pandas_Dataframe_Yaml - Fatal编程技术网

Python 将多个yaml文件读取到数据帧

Python 将多个yaml文件读取到数据帧,python,pandas,dataframe,yaml,Python,Pandas,Dataframe,Yaml,我确实意识到这里已经提到了这一点(例如,,)。不过,我希望这个问题是不同的 我知道将YAML文件加载到pandasdataframe import yaml import pandas as pd with open(r'1000851.yaml') as file: df = pd.io.json.json_normalize(yaml.load(file)) df.head() 我想将目录中的几个yaml文件读入pandasdataframe并将它们连接到一个大数据帧中。但是我

我确实意识到这里已经提到了这一点(例如,,)。不过,我希望这个问题是不同的

我知道将
YAML
文件加载到pandas
dataframe

import yaml
import pandas as pd

with open(r'1000851.yaml') as file:
    df = pd.io.json.json_normalize(yaml.load(file))

df.head()
我想将目录中的几个
yaml
文件读入pandas
dataframe
并将它们连接到一个大数据帧中。但是我还没弄明白

import pandas as pd
import glob

path = r'../input/cricsheet-a-retrosheet-for-cricket/all' # use your path
all_files = glob.glob(path + "/*.yaml")

li = []

for filename in all_files:
    df = pd.json_normalize(yaml.load(filename, Loader=yaml.FullLoader))
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

/opt/conda/lib/python3.7/site-packages/pandas/io/json/_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    268 
    269     if record_path is None:
--> 270         if any([isinstance(x, dict) for x in y.values()] for y in data):
    271             # naive normalization, this is idempotent for flat records
    272             # and potentially will inflate the data considerably for

/opt/conda/lib/python3.7/site-packages/pandas/io/json/_normalize.py in <genexpr>(.0)
    268 
    269     if record_path is None:
--> 270         if any([isinstance(x, dict) for x in y.values()] for y in data):
    271             # naive normalization, this is idempotent for flat records
    272             # and potentially will inflate the data considerably for

AttributeError: 'str' object has no attribute 'values'
---------------------------------------------------------------------------
AttributeError回溯(最近一次呼叫上次)
在里面
/opt/conda/lib/python3.7/site-packages/pandas/io/json//u normalize.py in\u json\u normalize(数据、记录路径、元、元前缀、记录前缀、错误、sep、最大级别)
268
269如果记录路径为“无”:
-->270如果有([isinstance(x,dict)代表y中的x。values()]代表y中的数据):
271#朴素规范化,这对于平面记录是幂等的
272#并有可能使数据大幅膨胀
/opt/conda/lib/python3.7/site-packages/pandas/io/json//u normalize.py in(.0)
268
269如果记录路径为“无”:
-->270如果有([isinstance(x,dict)代表y中的x。values()]代表y中的数据):
271#朴素规范化,这对于平面记录是幂等的
272#并有可能使数据大幅膨胀
AttributeError:“str”对象没有属性“值”


有没有办法做到这一点并有效地读取文件?

似乎您添加的第一部分代码和第二部分代码有所不同

第一部分正确读取yaml文件,但第二部分损坏:

for filename in all_files:
    # `filename` here is just a string containing the name of the file. 
    df = pd.json_normalize(yaml.load(filename, Loader=yaml.FullLoader))
    li.append(df)
问题是您需要读取这些文件。目前,您只是给出文件名,而不是文件内容。改为这样做

li=[]
# Only loading 3 files:
for filename in all_files[:3]:
    with open(filename,'r') as fh:
        df = pd.json_normalize(yaml.safe_load(fh.read()))
    li.append(df)

len(li)
3

pd.concat(li)

output:
  
                                             innings  meta.data_version meta.created  meta.revision info.city info.competition  ... info.player_of_match                         info.teams info.toss.decision info.toss.winner              info.umpires                           info.venue
0  [{'1st innings': {'team': 'Glamorgan', 'delive...                0.9   2020-09-01              1   Bristol   Vitality Blast  ...          [AG Salter]       [Glamorgan, Gloucestershire]              field  Gloucestershire  [JH Evans, ID Blackwell]                        County Ground
0  [{'1st innings': {'team': 'Pune Warriors', 'de...                0.9   2013-05-19              1      Pune              IPL  ...          [LJ Wright]  [Pune Warriors, Delhi Daredevils]                bat    Pune Warriors    [NJ Llong, SJA Taufel]           Subrata Roy Sahara Stadium
0  [{'1st innings': {'team': 'Botswana', 'deliver...                0.9   2020-08-29              1  Gaborone              NaN  ...       [A Rangaswamy]              [Botswana, St Helena]                bat         Botswana   [R D'Mello, C Thorburn]  Botswana Cricket Association Oval 1

[3 rows x 18 columns]

最简单的方法是首先定义数据帧,然后简单地连接新的yaml文件。为此,您需要循环浏览文件,读取它们,将它们转换为df&concat。这是基于所有文件共享相同结构的假设。你能告诉我你犯了什么错误吗?有什么问题problem@DanailPetrov共享并更新了我使用的代码看起来有些yaml文件没有所有的值。你有一些选择,我会发布作为更好的可读性的答案,这将是非常有帮助的,我认为还有另一个问题。只是作为答案发布。检查并让我知道。