如何使用python处理excel中的组合列';熊猫是什么?
因此,我有一个excel工作表,我需要提取一些数据并用它创建一个json,这里的问题是这是一个非常大且复杂的文件,我遇到的第一个问题是工作表中有一些合并的行,当我使用如何使用python处理excel中的组合列';熊猫是什么?,python,json,pandas,Python,Json,Pandas,因此,我有一个excel工作表,我需要提取一些数据并用它创建一个json,这里的问题是这是一个非常大且复杂的文件,我遇到的第一个问题是工作表中有一些合并的行,当我使用to_json()时,它返回了合并后的数据,就像是一行一样。下面是一个excel文件的示例 --------------------------------------------------------- Name | property 1 | property 2 | property 3 | -----
to_json()
时,它返回了合并后的数据,就像是一行一样。下面是一个excel文件的示例
---------------------------------------------------------
Name | property 1 | property 2 | property 3 |
---------------------------------------------------------
variableName1 | X1 | Y1 | Z1 |
---------------------------------------------------------
variableName2 | X2 | Y2 | Z2 |
---------------------------------------------------------
variableName3 | X3 | Y3 | Z31 |
---------------
| | | Z32 |
---------------------------------------------------------
variableName4 | X4 | Y4 | Z4 |
---------------------------------------------------------
我通过以下方式获得excel:
import pandas as pd
excel = pd.read_excel('testExcel.xlsx', 'Hoja1', na_values=['NA'], skiprows=range(0, 1))
如您所见,第4行有一些组合单元格(4,5)
为了得到行,我这样做
for i in excel.index:
print(excel.loc[i].to_json())
并返回以下结果:
{"Name":"VariableName1","Property1":"X1","Property2":"Y1","Property3":"Z1"}
{"Name":"VariableName2","Property1":"X2","Property2":"Y2","Property3":"Z2"}
{"Name":"VariableName3","Property1":"X3","Property2":"Y3","Property3":"Z31"}
{"Name":null,"Property1":null,"Property2":null,"Property3":"Z32"}
{"Name":"VariableName4","Property1":"X4","Property2":"Y4","Property3":"Z4"}
我遇到的问题不是:
{"Name":"VariableName3","Property1":"X3","Property2":"Y3","Property3":"Z31"}
{"Name":null,"Property1":null,"Property2":null,"Property3":"Z32"}
我想要这样的东西:
{"Name":"VariableName3","Property1":"X3","Property2":"Y3","Property3":["Z31", "Z32"]}
最好的方法是什么?您可以执行以下操作:
# Get the data
df = pd.read_excel('testExcel.xlsx',
sheet_name='Hoja1',
na_values='NA',
skiprows=2)
# Remove empty rows
df = df.dropna(axis='columns', how='all')
# Fill down the 'Name' values
df['Name'] = df['Name'].fillna(method='ffill')
# Define an aggregate function
def join_values(series):
return ', '.join(pd.Series.dropna(series))
# Group and aggregate the data using the defined function
df = df.groupby(by='Name').aggregate(join_values)
# Reset multi index
df = df.reset_index()
# Serialize
json_output = df.to_json(orient='records')
请注意,此解决方案将具有重复“Name”值的行聚合到一行中。您可以执行以下操作:
# Get the data
df = pd.read_excel('testExcel.xlsx',
sheet_name='Hoja1',
na_values='NA',
skiprows=2)
# Remove empty rows
df = df.dropna(axis='columns', how='all')
# Fill down the 'Name' values
df['Name'] = df['Name'].fillna(method='ffill')
# Define an aggregate function
def join_values(series):
return ', '.join(pd.Series.dropna(series))
# Group and aggregate the data using the defined function
df = df.groupby(by='Name').aggregate(join_values)
# Reset multi index
df = df.reset_index()
# Serialize
json_output = df.to_json(orient='records')
请注意,此解决方案将具有重复“Name”值的行聚合到一行中