Python 将数据插入分组数据框（熊猫）_Python_Pandas_Dataframe_Insert

Python 将数据插入分组数据框（熊猫）

python pandas dataframe

Python 将数据插入分组数据框（熊猫）,python,pandas,dataframe,insert,Python,Pandas,Dataframe,Insert,我有一个按特定列分组的数据帧。现在我想将四个相邻列的数值的平均值插入到一个新列中。这就是我所做的： df = pd.read_csv(filename) # in this line I extract a unique ID from the filename id = re.search('(\w\w\w)', filename).group(1) 文件如下所示： col1 | col2 | col3 ----------------------- str1a | str1b |

我有一个按特定列分组的数据帧。现在我想将四个相邻列的数值的平均值插入到一个新列中。这就是我所做的：

df = pd.read_csv(filename)
# in this line I extract a unique ID from the filename
id = re.search('(\w\w\w)', filename).group(1)

文件如下所示：

col1   | col2  | col3
-----------------------
str1a  | str1b | float1

我现在的想法是：

# get the numeric values
df2 = pd.DataFrame(df.groupby(['col1', 'col2']).mean()['col3'].T
# insert the id into a new column
df2.insert(0, 'ID', id)

现在循环所有

for j in range(len(df2.values)):
    for k in df['col1'].unique():
        df2.insert(j+5, (k, 'mean'), df2.values[j])

df2.to_excel('text.xlsx')

但是我得到了以下错误，涉及到带有df.insert的行：

TypeError: not all arguments converted during string formatting

及

我不确定这里的字符串格式指的是什么，因为我只传递数值

最终输出应该在一行中包含col3中的所有值（按id索引），每五列应该是前面四个值的插入平均值。

如果我必须处理像您这样的文件，我会编写一个函数来转换为csv。。。诸如此类：

data = []
for lineInFile in file.read().splitlines():
    lineInFile_splited = lineInFile.split('|')
    if len(lineInFile_splited)>1: ## get only data and not '-------'
        data.append(lineInFile_splited)
df = pandas.DataFrame(data, columns = ['A','B'])

希望有帮助

你们能添加数据样本和期望的输出吗？我刚刚添加了。我希望现在更清楚一些。对不起，没有。你能添加5-6行数据和所需的输出吗？最好的方法是如果同时出现错误。您的问题是关于写入.xlsx文件还是进行转换？@FabianMoss-谢谢。也许会有帮助，我想事实上是相反的。我有很多文件，我想从中提取一列并将这些列合并到一个数据帧中。然后我想在特定点插入平均值。

data = []
for lineInFile in file.read().splitlines():
    lineInFile_splited = lineInFile.split('|')
    if len(lineInFile_splited)>1: ## get only data and not '-------'
        data.append(lineInFile_splited)
df = pandas.DataFrame(data, columns = ['A','B'])