Python 将数据框中的一列四舍五入_Python_Python 3.x_Pandas_Python 3.5

Python 将数据框中的一列四舍五入

python python-3.x pandas

Python 将数据框中的一列四舍五入,python,python-3.x,pandas,python-3.5,Python,Python 3.x,Pandas,Python 3.5,我有一个熊猫数据帧df，看起来像这样： no_obs price_cleaning house_size 0 1 585 30 1 1 585 40 2 1 585 43 3 1 650 43 4 1 6

我有一个熊猫数据帧

df

，看起来像这样：

          no_obs  price_cleaning  house_size
0         1             585          30
1         1             585          40
2         1             585          43
3         1             650          43
4         1             633          44
5         1             650          45
6         2             585          50
7         1             633          50
8         1             650          50
9         2             750          50

我想使用此函数对

price\u cleaning

列中的值进行四舍五入：

def综述（x）：
返回int（math.ceil（x/10.0））*10
我已经尝试了这个答案（）的解决方案：
cols=[col for col for col in df.columns if col！='price\u cleaning']
df[cols]=df[cols]。应用（汇总）

我得到以下错误：
TypeError:（“无法将序列转换为“，”发生在索引no_obs'）
有人能帮我理解为什么这不起作用吗？如何将汇总函数应用于列？非常感谢您的帮助。
这可能有用：
>>> df['price_cleaning_ceiling']= df.price_cleaning.apply(lambda x: int(math.ceil(x / 10.0)) * 10)

我认为您可以使用apply
和lambda
作为：
In [6]: df['p'] = df['price_cleaning'].apply(lambda x: int(math.ceil(x / 10.0)) * 10)

In [7]: df
Out[7]: 
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

您将对列进行倒置过滤，请改为执行以下操作：
cols = [col for col in  df.columns if col == 'price_cleaning']

现在，如果只需要清理一列，则无需创建cols
。只要做：
df['price_cleaning'] = df['price_cleaning'].apply(roundup)

我想要矢量化
In [298]: df['p'] = (np.ceil(df.price_cleaning / 10) * 10).astype(int)

In [299]: df
Out[299]:
   no_obs  price_cleaning  house_size    p
0       1             585          30  590
1       1             585          40  590
2       1             585          43  590
3       1             650          43  650
4       1             633          44  640
5       1             650          45  650
6       2             585          50  590
7       1             633          50  640
8       1             650          50  650
9       2             750          50  750

对于10K行，计时-矢量化方法比应用快约15倍
In [331]: %timeit (np.ceil(dff.price_cleaning / 10) * 10).astype(int)
1000 loops, best of 3: 436 µs per loop

In [332]: %timeit dff['price_cleaning'].apply(roundup)
100 loops, best of 3: 7.86 ms per loop

In [333]: dff.shape
Out[333]: (10000, 4)

至少在这种情况下，行数越多，性能差距就会越大。
apply
方法很好。然而，根据您的用例、数据大小，您可能需要使用矢量化方法进行基准测试。@JohnGalt我认为OP在这一点上与您的准确评论相去甚远真的，这只是为了子孙后代：）使用np.ceil的好答案，非常有用，谢谢