如何使用python处理excel中的组合列'；熊猫是什么？_Python_Json_Pandas

如何使用python处理excel中的组合列'；熊猫是什么？

python json pandas

如何使用python处理excel中的组合列'；熊猫是什么？,python,json,pandas,Python,Json,Pandas,因此，我有一个excel工作表，我需要提取一些数据并用它创建一个json，这里的问题是这是一个非常大且复杂的文件，我遇到的第一个问题是工作表中有一些合并的行，当我使用to_json（）时，它返回了合并后的数据，就像是一行一样。下面是一个excel文件的示例 --------------------------------------------------------- Name | property 1 | property 2 | property 3 | -----

因此，我有一个excel工作表，我需要提取一些数据并用它创建一个json，这里的问题是这是一个非常大且复杂的文件，我遇到的第一个问题是工作表中有一些合并的行，当我使用

to_json（）

时，它返回了合并后的数据，就像是一行一样。下面是一个excel文件的示例

---------------------------------------------------------
      Name    | property 1  | property 2  | property 3  |
---------------------------------------------------------
variableName1 |      X1     |      Y1     |     Z1      | 
---------------------------------------------------------
variableName2 |      X2     |      Y2     |     Z2      |   
--------------------------------------------------------- 
variableName3 |      X3     |      Y3     |     Z31     |
                                          ---------------
              |             |             |     Z32     |
---------------------------------------------------------
variableName4 |      X4     |      Y4     |     Z4      |
---------------------------------------------------------

我通过以下方式获得excel：

import pandas as pd
excel = pd.read_excel('testExcel.xlsx', 'Hoja1',  na_values=['NA'], skiprows=range(0, 1))

如您所见，第4行有一些组合单元格（4,5）

为了得到行，我这样做

for i in excel.index:
    print(excel.loc[i].to_json())

并返回以下结果：

{"Name":"VariableName1","Property1":"X1","Property2":"Y1","Property3":"Z1"}
{"Name":"VariableName2","Property1":"X2","Property2":"Y2","Property3":"Z2"}
{"Name":"VariableName3","Property1":"X3","Property2":"Y3","Property3":"Z31"}
{"Name":null,"Property1":null,"Property2":null,"Property3":"Z32"}
{"Name":"VariableName4","Property1":"X4","Property2":"Y4","Property3":"Z4"}

我遇到的问题不是：

{"Name":"VariableName3","Property1":"X3","Property2":"Y3","Property3":"Z31"}
{"Name":null,"Property1":null,"Property2":null,"Property3":"Z32"}

我想要这样的东西：

{"Name":"VariableName3","Property1":"X3","Property2":"Y3","Property3":["Z31", "Z32"]}

最好的方法是什么？

您可以执行以下操作：

# Get the data
df = pd.read_excel('testExcel.xlsx',
                   sheet_name='Hoja1',
                   na_values='NA',
                   skiprows=2)

# Remove empty rows
df = df.dropna(axis='columns', how='all')

# Fill down the 'Name' values
df['Name'] = df['Name'].fillna(method='ffill')

# Define an aggregate function
def join_values(series):
    return ', '.join(pd.Series.dropna(series))

# Group and aggregate the data using the defined function
df = df.groupby(by='Name').aggregate(join_values)

# Reset multi index
df = df.reset_index()

# Serialize
json_output = df.to_json(orient='records')

请注意，此解决方案将具有重复“Name”值的行聚合到一行中。

您可以执行以下操作：

# Get the data
df = pd.read_excel('testExcel.xlsx',
                   sheet_name='Hoja1',
                   na_values='NA',
                   skiprows=2)

# Remove empty rows
df = df.dropna(axis='columns', how='all')

# Fill down the 'Name' values
df['Name'] = df['Name'].fillna(method='ffill')

# Define an aggregate function
def join_values(series):
    return ', '.join(pd.Series.dropna(series))

# Group and aggregate the data using the defined function
df = df.groupby(by='Name').aggregate(join_values)

# Reset multi index
df = df.reset_index()

# Serialize
json_output = df.to_json(orient='records')

请注意，此解决方案将具有重复“Name”值的行聚合到一行中