Python 应用函数创建以多列作为参数的字符串

Python 应用函数创建以多列作为参数的字符串,python,pandas,dataframe,pandas-apply,Python,Pandas,Dataframe,Pandas Apply,我有这样一个数据帧: name . size . type . av_size_type 0 John . 23 . Qapra' . 22 1 Dan . 21 . nuk'neH . 12 2 Monica . 12 . kahless . 15 name . size . type . av_size_type . sentence 0

我有这样一个数据帧:

     name .  size . type    .  av_size_type
0    John .   23  . Qapra'  .            22
1     Dan .   21  . nuk'neH .            12
2  Monica .   12  . kahless .            15
    name .  size . type    .  av_size_type  .   sentence
0    John .   23 . Qapra'  .            22  .   "John has size 23, above the average of Qapra' type (22)"
1     Dan .   21 . nuk'neH .            12  .   "Dan has size 21, above the average of nuk'neH type (21)"
2  Monica .   12 . kahless .            15  .   "Monica has size 12l, above the average of kahless type (12)
def func(x):
    string="{0} has size {1}, above the average of {2} type ({3})".format(x[0],x[1],x[2],x[3])
    return string

df['sentence']=df[['name','size','type','av_size_type']].apply(func)
我想用一个句子创建一个新列,如下所示:

     name .  size . type    .  av_size_type
0    John .   23  . Qapra'  .            22
1     Dan .   21  . nuk'neH .            12
2  Monica .   12  . kahless .            15
    name .  size . type    .  av_size_type  .   sentence
0    John .   23 . Qapra'  .            22  .   "John has size 23, above the average of Qapra' type (22)"
1     Dan .   21 . nuk'neH .            12  .   "Dan has size 21, above the average of nuk'neH type (21)"
2  Monica .   12 . kahless .            15  .   "Monica has size 12l, above the average of kahless type (12)
def func(x):
    string="{0} has size {1}, above the average of {2} type ({3})".format(x[0],x[1],x[2],x[3])
    return string

df['sentence']=df[['name','size','type','av_size_type']].apply(func)
应该是这样的:

     name .  size . type    .  av_size_type
0    John .   23  . Qapra'  .            22
1     Dan .   21  . nuk'neH .            12
2  Monica .   12  . kahless .            15
    name .  size . type    .  av_size_type  .   sentence
0    John .   23 . Qapra'  .            22  .   "John has size 23, above the average of Qapra' type (22)"
1     Dan .   21 . nuk'neH .            12  .   "Dan has size 21, above the average of nuk'neH type (21)"
2  Monica .   12 . kahless .            15  .   "Monica has size 12l, above the average of kahless type (12)
def func(x):
    string="{0} has size {1}, above the average of {2} type ({3})".format(x[0],x[1],x[2],x[3])
    return string

df['sentence']=df[['name','size','type','av_size_type']].apply(func)
然而,显然这种synthax不起作用

有人对此有什么建议吗?

使用splat打开包装

string = lambda x: "{} has size {}, above the average of {} type ({})".format(*x)

df.assign(sentence=df.apply(string, 1))

     name  size     type  av_size_type                                           sentence
0    John    23   Qapra'            22  John has size 23, above the average of Qapra' ...
1     Dan    21  nuk'neH            12  Dan has size 21, above the average of nuk'neH ...
2  Monica    12  kahless            15  Monica has size 12, above the average of kahle...

如果你愿意,你可以使用字典解包

string = lambda x: "{name} has size {size}, above the average of {type} type ({av_size_type})".format(**x)

df.assign(sentence=df.apply(string, 1))

     name  size     type  av_size_type                                           sentence
0    John    23   Qapra'            22  John has size 23, above the average of Qapra' ...
1     Dan    21  nuk'neH            12  Dan has size 21, above the average of nuk'neH ...
2  Monica    12  kahless            15  Monica has size 12, above the average of kahle...

使用列表理解作为快速替代,因为您必须迭代:

string = "{0} has size {1}, above the average of {2} type ({3})"
df['sentence'] = [string.format(*r) for r in df.values.tolist()]


你可以使用apply直接构建句子

df['sentence'] = (
    df.apply(lambda x: "{} has size {}, above the average of {} type ({})"
                       .format(*x), axis=1)
)
如果要显式引用列,可以执行以下操作:

df['sentence'] = (
    df.apply(lambda x: "{} has size {}, above the average of {} type ({})"
                       .format(x.name, x.size, x.type, x.av_size_type), axis=1)
)

你忘了在函数中返回字符串…不知道,在fn中尝试
返回字符串
(也许可以睡一会儿:)@umutto-oops,没错。已经修好了。这两种方法都不起作用耶,我刚刚意识到你也需要在列上应用,所以
.apply(func,axis=1)
应该可以。谢谢@Allen。这似乎是个好办法。有没有一种方法可以选择我想在格式中放置哪些列?我的df实际上有几十个列,我试图在这里简化它。谢谢@piRSquared!有没有一种方法可以选择我想在格式中放置哪些列?我的df实际上有几十列,我在这里试图简化它。您可以将行序列用作字典,并使用双splat进行解包。@OP,使用此方法,选择要输出的列是最简单的。只需按如下方式索引它们:
df[col1,col2,…].values.tolist()