Python 使用列作为str.format()命名参数的数据帧
我有一个数据帧,如:Python 使用列作为str.format()命名参数的数据帧,python,pandas,string-formatting,Python,Pandas,String Formatting,我有一个数据帧,如: import pandas as pd df = pd.DataFrame({'author':["Melville","Hemingway","Faulkner"], 'title':["Moby Dick","The Sun Also Rises","The Sound and the Fury"], 'subject':["whaling","bullfighting","a messed-u
import pandas as pd
df = pd.DataFrame({'author':["Melville","Hemingway","Faulkner"],
'title':["Moby Dick","The Sun Also Rises","The Sound and the Fury"],
'subject':["whaling","bullfighting","a messed-up family"]
})
我知道我可以做到这一点:
# produces desired output
("Some guy " + df['author'] + " wrote a book called " +
df['title'] + " that uses " + df['subject'] +
" as a metaphor for the human condition.")
但是,有没有可能使用str.format()
更清楚地写出这一点,大致如下:
# returns KeyError:'author'
["Some guy {author} wrote a book called {title} that uses "
"{subject} as a metaphor for the human condition.".format(x)
for x in df.itertuples(index=False)]
请注意,\u asdict()
并不是公共api的一部分,因此依赖它可能会在将来的熊猫更新中中断
您可以这样做:
>>> ["Some guy {} wrote a book called {} that uses "
"{} as a metaphor for the human condition.".format(*x)
for x in df.values]
请注意,\u asdict()
并不是公共api的一部分,因此依赖它可能会在将来的熊猫更新中中断
您可以这样做:
>>> ["Some guy {} wrote a book called {} that uses "
"{} as a metaphor for the human condition.".format(*x)
for x in df.values]
您还可以像这样使用
DataFrame.iterrows()
:
["The book {title} by {author} uses "
"{subject} as a metaphor for the human condition.".format(**x)
for i, x in df.iterrows()]
如果您想:
- 使用命名参数,因此使用顺序不必与列的顺序相匹配(如上所述)
- 不要使用内部函数,如
\u asdict()
# example
%%timeit
("Some guy " + df['author'] + " wrote a book called " +
df['title'] + " that uses " + df['subject'] +
" as a metaphor for the human condition.")
# 1000 loops, best of 3: 883 µs per loop
%%timeit
["Some guy {author} wrote a book called {title} that uses "
"{subject} as a metaphor for the human condition.".format(**x._asdict())
for x in df.itertuples(index=False)]
#1000 loops, best of 3: 962 µs per loop
%%timeit
["Some guy {} wrote a book called {} that uses "
"{} as a metaphor for the human condition.".format(*x)
for x in df.values]
#The slowest run took 5.90 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 18.9 µs per loop
%%timeit
from collections import OrderedDict
["The book {title} by {author} uses "
"{subject} as a metaphor for the human condition.".format(**x)
for x in [OrderedDict(row) for i, row in df.iterrows()]]
#1000 loops, best of 3: 308 µs per loop
%%timeit
["The book {title} by {author} uses "
"{subject} as a metaphor for the human condition.".format(**x)
for i, x in df.iterrows()]
#1000 loops, best of 3: 413 µs per loop
为什么“倒数第二个”比“最后一个”快,我无法理解。您也可以像这样使用
DataFrame.iterrows()
:
["The book {title} by {author} uses "
"{subject} as a metaphor for the human condition.".format(**x)
for i, x in df.iterrows()]
如果您想:
- 使用命名参数,因此使用顺序不必与列的顺序相匹配(如上所述)
- 不要使用内部函数,如
\u asdict()
# example
%%timeit
("Some guy " + df['author'] + " wrote a book called " +
df['title'] + " that uses " + df['subject'] +
" as a metaphor for the human condition.")
# 1000 loops, best of 3: 883 µs per loop
%%timeit
["Some guy {author} wrote a book called {title} that uses "
"{subject} as a metaphor for the human condition.".format(**x._asdict())
for x in df.itertuples(index=False)]
#1000 loops, best of 3: 962 µs per loop
%%timeit
["Some guy {} wrote a book called {} that uses "
"{} as a metaphor for the human condition.".format(*x)
for x in df.values]
#The slowest run took 5.90 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 18.9 µs per loop
%%timeit
from collections import OrderedDict
["The book {title} by {author} uses "
"{subject} as a metaphor for the human condition.".format(**x)
for x in [OrderedDict(row) for i, row in df.iterrows()]]
#1000 loops, best of 3: 308 µs per loop
%%timeit
["The book {title} by {author} uses "
"{subject} as a metaphor for the human condition.".format(**x)
for i, x in df.iterrows()]
#1000 loops, best of 3: 413 µs per loop
为什么“倒数第二个”比“倒数第二个”快,这是我无法理解的。明白了,所以
*
为我做元组部分。太棒了,谢谢——不知道为什么有人否决了我们,所以*
为我做了元组部分。太棒了,谢谢,我不知道为什么有人否决了我们