Python 基于行值将多列添加到数据集中_Python_Pandas_Dataframe

Python 基于行值将多列添加到数据集中

python pandas dataframe

Python 基于行值将多列添加到数据集中,python,pandas,dataframe,Python,Pandas,Dataframe,我想使用一个生成多个输出的函数在现有数据帧中创建多个新列例如，假设我有一个测试函数，它输出两个东西： def testfunc (TranspoId, LogId): thing1 = TranspoId + LogId thing2 = LogId - TranspoId return thing1, thing2 我可以将这些返回的输出赋给两个不同的变量，如下所示： Thing1,Thing2 = testfunc(4,28) print(Thing1) prin

我想使用一个生成多个输出的函数在现有数据帧中创建多个新列

例如，假设我有一个测试函数，它输出两个东西：

def testfunc (TranspoId, LogId):
    thing1 = TranspoId + LogId
    thing2 = LogId - TranspoId
    return thing1, thing2

我可以将这些返回的输出赋给两个不同的变量，如下所示：

Thing1,Thing2 = testfunc(4,28)
print(Thing1)
print(Thing2)

我尝试通过以下方式使用数据帧执行此操作：

data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23]}

df = pd.DataFrame(data, columns = ['Name','TranspoId','LogId'])
print(df)

df['thing1','thing2'] = df.apply(lambda row: testfunc(row.TranspoId, row.LogId), axis=1)
print(df)

我想要的是这样的东西：

data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23], 'Thing1':[13,16,26], 'Thing2':[11,12,20]}
df = pd.DataFrame(data, columns=['Name','TranspoId','LogId','Thing1','Thing2'])
print(df)

在现实世界中，这个函数需要做很多繁重的工作，我不能让它运行两次，每次为df添加一个新变量都运行一次

几个小时来我一直在用这个打自己的头。如有任何见解，将不胜感激

我认为最好的方法是改变顺序，制作一个可以与级数一起工作的函数

import pandas as pd

# Create function that deals with series
def testfunc (Series1, Series2):
    Thing1 = Series1 + Series2
    Thing2 = Series1 - Series2
    return Thing1, Thing2

# Create df
data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23]}    
df = pd.DataFrame(data, columns = ['Name','TranspoId','LogId'])

# Apply function
Thing1,Thing2 = testfunc(df['TranspoId'],df['LogId'])
print(Thing1)
print(Thing2)

# Assign new columns
df = df.assign(Thing1 = Thing1)
df = df.assign(Thing2 = Thing2)

# print df
print(df)

函数应该返回一个序列，该序列在一次过程中计算新列。然后可以使用pandas.apply（）添加新字段

import pandas as pd
df = pd.DataFrame( {'TranspoId':[1,2,3], 'LogId':[4,5,6]})

def testfunc(row):
    new_cols = pd.Series([
       row['TranspoId'] + row['LogId'],
       row['LogId'] - row['TranspoId']]) 
    return new_cols

df[['thing1','thing2']] = df.apply(testfunc, axis = 1)

print(df)

输出：

   TranspoId  LogId  thing1  thing2
0          1      4       5       3
1          2      5       7       3
2          3      6       9       3

为什么不能简单地定义列而不需要

apply

、

lambda

和自定义函数？