Python 使用列作为str.format（）命名参数的数据帧_Python_Pandas_String Formatting

Python 使用列作为str.format（）命名参数的数据帧

python pandas

Python 使用列作为str.format（）命名参数的数据帧,python,pandas,string-formatting,Python,Pandas,String Formatting,我有一个数据帧，如： import pandas as pd df = pd.DataFrame({'author':["Melville","Hemingway","Faulkner"], 'title':["Moby Dick","The Sun Also Rises","The Sound and the Fury"], 'subject':["whaling","bullfighting","a messed-u

我有一个数据帧，如：

import pandas as pd
df = pd.DataFrame({'author':["Melville","Hemingway","Faulkner"],
                   'title':["Moby Dick","The Sun Also Rises","The Sound and the Fury"],
                   'subject':["whaling","bullfighting","a messed-up family"]
                   })

我知道我可以做到这一点：

# produces desired output                   
("Some guy " + df['author'] + " wrote a book called " + 
   df['title'] + " that uses " + df['subject'] + 
   " as a metaphor for the human condition.")

但是，有没有可能使用

str.format（）

更清楚地写出这一点，大致如下：

# returns KeyError:'author'
["Some guy {author} wrote a book called {title} that uses "
   "{subject} as a metaphor for the human condition.".format(x) 
      for x in df.itertuples(index=False)]

请注意，

\u asdict（）

并不是公共api的一部分，因此依赖它可能会在将来的熊猫更新中中断

您可以这样做：

>>> ["Some guy {} wrote a book called {} that uses "
   "{} as a metaphor for the human condition.".format(*x)
      for x in df.values]

请注意，

\u asdict（）

并不是公共api的一部分，因此依赖它可能会在将来的熊猫更新中中断

您可以这样做：

>>> ["Some guy {} wrote a book called {} that uses "
   "{} as a metaphor for the human condition.".format(*x)
      for x in df.values]

您还可以像这样使用

DataFrame.iterrows（）

：

["The book {title} by {author} uses "
   "{subject} as a metaphor for the human condition.".format(**x) 
     for i, x in df.iterrows()]

如果您想：

使用命名参数，因此使用顺序不必与列的顺序相匹配（如上所述）
不要使用内部函数，如
```
\u asdict（）
```

计时：最快的似乎是M.Klugerford的第二个解决方案，即使我们注意到关于缓存的警告并采取最慢的运行

# example
%%timeit
 ("Some guy " + df['author'] + " wrote a book called " + 
   df['title'] + " that uses " + df['subject'] + 
   " as a metaphor for the human condition.")
# 1000 loops, best of 3: 883 µs per loop

%%timeit
    ["Some guy {author} wrote a book called {title} that uses "
       "{subject} as a metaphor for the human condition.".format(**x._asdict())
          for x in df.itertuples(index=False)]
#1000 loops, best of 3: 962 µs per loop

%%timeit
    ["Some guy {} wrote a book called {} that uses "
     "{} as a metaphor for the human condition.".format(*x)
          for x in df.values]   
#The slowest run took 5.90 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 18.9 µs per loop

%%timeit
    from collections import OrderedDict
    ["The book {title} by {author} uses "
       "{subject} as a metaphor for the human condition.".format(**x) 
         for x in [OrderedDict(row) for i, row in df.iterrows()]]
#1000 loops, best of 3: 308 µs per loop            

%%timeit 
    ["The book {title} by {author} uses "
       "{subject} as a metaphor for the human condition.".format(**x) 
         for i, x in df.iterrows()]
#1000 loops, best of 3: 413 µs per loop

为什么“倒数第二个”比“最后一个”快，我无法理解。

您也可以像这样使用

DataFrame.iterrows（）

：

["The book {title} by {author} uses "
   "{subject} as a metaphor for the human condition.".format(**x) 
     for i, x in df.iterrows()]

如果您想：

使用命名参数，因此使用顺序不必与列的顺序相匹配（如上所述）
不要使用内部函数，如
```
\u asdict（）
```

计时：最快的似乎是M.Klugerford的第二个解决方案，即使我们注意到关于缓存的警告并采取最慢的运行

# example
%%timeit
 ("Some guy " + df['author'] + " wrote a book called " + 
   df['title'] + " that uses " + df['subject'] + 
   " as a metaphor for the human condition.")
# 1000 loops, best of 3: 883 µs per loop

%%timeit
    ["Some guy {author} wrote a book called {title} that uses "
       "{subject} as a metaphor for the human condition.".format(**x._asdict())
          for x in df.itertuples(index=False)]
#1000 loops, best of 3: 962 µs per loop

%%timeit
    ["Some guy {} wrote a book called {} that uses "
     "{} as a metaphor for the human condition.".format(*x)
          for x in df.values]   
#The slowest run took 5.90 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 18.9 µs per loop

%%timeit
    from collections import OrderedDict
    ["The book {title} by {author} uses "
       "{subject} as a metaphor for the human condition.".format(**x) 
         for x in [OrderedDict(row) for i, row in df.iterrows()]]
#1000 loops, best of 3: 308 µs per loop            

%%timeit 
    ["The book {title} by {author} uses "
       "{subject} as a metaphor for the human condition.".format(**x) 
         for i, x in df.iterrows()]
#1000 loops, best of 3: 413 µs per loop

为什么“倒数第二个”比“倒数第二个”快，这是我无法理解的。

明白了，所以

为我做元组部分。太棒了，谢谢——不知道为什么有人否决了我们，所以

为我做了元组部分。太棒了，谢谢，我不知道为什么有人否决了我们