将CSV/Excel中的多个表转换为Python中的字典或数据帧_Python_Excel_Csv_Dictionary_Dataframe

将CSV/Excel中的多个表转换为Python中的字典或数据帧

python excel csv dictionary dataframe

将CSV/Excel中的多个表转换为Python中的字典或数据帧,python,excel,csv,dictionary,dataframe,Python,Excel,Csv,Dictionary,Dataframe,我需要帮助我有一个Excel文件，其中包含我试图进入数据框的数据，但数据是以表格形式存在的，不容易处理。例如：我希望最终将其放入这种形式的数据框架中： Meal Food Calories Breakfast English Muffins 120 Breakfast Peanut Butter Spread

我需要帮助

我有一个Excel文件，其中包含我试图进入数据框的数据，但数据是以表格形式存在的，不容易处理。例如：

我希望最终将其放入这种形式的数据框架中：

Meal               Food                              Calories
Breakfast          English Muffins                   120
Breakfast          Peanut Butter Spread              190
Morning Snack      Banana                            90
Morning Snack      Nectarine                         59
...                ...                               ...

以及此表单中每日总计的单独数据框（暂时忽略“日期”列）：

我正在努力将其放入数据帧中。查看数据集的屏幕截图，首先将数据存储到字典中是有意义的，但这会给我带来一组NA值，因为所有的空白单元格。

我想用我想要的方式来获取“膳食”专栏，就是做一个正向填充，但这意味着我将不得不使用一个系列或数据帧，而我还没有做到这一点

这就是我目前拥有的：

df = pd.read_excel('filename.xls', 'Foods')

# create a list to store the dictionaries
food_logs = []

# this is code to reformat the string values in a certain column 
# to get the name of the sheets I need to use in the Excel. This can be ignored
for day in df.values:
    if day[1] != '0':
        foodLogSheetName = 'Food Log ' + day[0].replace('-', '')
        food_logs.append(foodLogSheetName)

# 'foods' is now a list of nested dictionaries (think of everything in the 
# first screenshot as the outer dictionary, and each of the column as the 
# inner dictionary)
foods = [xls.parse(food_log).to_dict() for food_log in food_logs]

这就是“食品”目前的含义，如果我在每个外部字典之间用一行字打印出来：

我可以选择使用CSV文件，但是如果有意义的话，我可以选择垂直堆叠多个“表”，而不是多张图纸

我将非常感谢任何人能提供的任何提示，请

我认为你的数据收集是正确的。听起来您可能只是在处理丢失的数据时遇到了问题。从您发布的示例中，看起来您可以将整个内容读入一个数据框，删除所有空行，在“用餐”列中添加F，然后删除任何部分为空的行（或子集上的行）

有了你的建议，我才知道如何回答我的问题！非常感谢你！！

df = pd.read_excel('filename.xls', 'Foods')

# create a list to store the dictionaries
food_logs = []

# this is code to reformat the string values in a certain column 
# to get the name of the sheets I need to use in the Excel. This can be ignored
for day in df.values:
    if day[1] != '0':
        foodLogSheetName = 'Food Log ' + day[0].replace('-', '')
        food_logs.append(foodLogSheetName)

# 'foods' is now a list of nested dictionaries (think of everything in the 
# first screenshot as the outer dictionary, and each of the column as the 
# inner dictionary)
foods = [xls.parse(food_log).to_dict() for food_log in food_logs]

import pandas as pd

df = pd.read_excel(file_path_or_buffer, sheet_name=my_sheet_name, **other_kwargs)
# You should have a dataframe that looks like
# Meal               Food                              Calories
# Breakfast          
#                    English Muffins                   120
#                    Peanut Butter Spread              190
# ...
# Next drop totally NaN/empty rows
df.dropna(how='all', inplace=True)
df['Meal'] = df['Meal'].fillna(method='ffill')
# Now you should have something that looks like
# Meal               Food                              Calories
# Breakfast          
# Breakfast          English Muffins                   120
# Breakfast          Peanut Butter Spread              190
# ...
# Drop empty rows, if you need to allow for some sparse data, use the subset argument
df.dropna(how='any', inplace=True)