Python 编辑值时在熊猫中添加列
在homework2数据框中添加一个名为ADJ_HDI的新列,如果HDI值大于0.5,则该列为HDI值,否则等于零Python 编辑值时在熊猫中添加列,python,pandas,Python,Pandas,在homework2数据框中添加一个名为ADJ_HDI的新列,如果HDI值大于0.5,则该列为HDI值,否则等于零 我们已经尝试了几个小时来创建此语法,但运气不佳,有人能帮忙吗?如果您的HDI位于名为“HDI”的列中,并且您正在尝试创建一个与HDI相等的新列,或者如果HDI小于0.5,请尝试此操作 def adj_hdi(row): hdi = row['HDI'] if hdi>.5: return hdi else: return
我们已经尝试了几个小时来创建此语法,但运气不佳,有人能帮忙吗?如果您的HDI位于名为“HDI”的列中,并且您正在尝试创建一个与HDI相等的新列,或者如果HDI小于0.5,请尝试此操作
def adj_hdi(row):
hdi = row['HDI']
if hdi>.5:
return hdi
else:
return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)
尝试此操作,假设您的HDI位于名为“HDI”的列中,并且您正在尝试创建一个与HDI相等的新列,或者如果HDI小于0.5,则为0
def adj_hdi(row):
hdi = row['HDI']
if hdi>.5:
return hdi
else:
return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)
替代解决方案:
homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']
替代解决方案:
homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']
我认为您可以使用非常快速的解决方案: 计时:
import pandas as pd
import numpy as np
homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2],
"HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]})
#for test 7k uncomment row bellow
#homework2 = pd.concat([homework2]*1000).reset_index(drop=True)
print homework2
h = homework2.copy()
h1 = homework2.copy()
len(家庭作业2)=7
:
In [2]: %timeit a(homework2)
1000 loops, best of 3: 376 µs per loop
In [3]: %timeit b(h)
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 1.49 ms per loop
In [4]: %timeit c(h1)
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 283 µs per loop
len(家庭作业2)=7k
:
In [7]: %timeit a(homework2)
10 loops, best of 3: 106 ms per loop
In [8]: %timeit b(h)
100 loops, best of 3: 2.63 ms per loop
In [9]: %timeit c(h1)
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 324 µs per loop
我认为您可以使用非常快速的解决方案: 计时:
import pandas as pd
import numpy as np
homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2],
"HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]})
#for test 7k uncomment row bellow
#homework2 = pd.concat([homework2]*1000).reset_index(drop=True)
print homework2
h = homework2.copy()
h1 = homework2.copy()
len(家庭作业2)=7
:
In [2]: %timeit a(homework2)
1000 loops, best of 3: 376 µs per loop
In [3]: %timeit b(h)
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 1.49 ms per loop
In [4]: %timeit c(h1)
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 283 µs per loop
len(家庭作业2)=7k
:
In [7]: %timeit a(homework2)
10 loops, best of 3: 106 ms per loop
In [8]: %timeit b(h)
100 loops, best of 3: 2.63 ms per loop
In [9]: %timeit c(h1)
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 324 µs per loop
生成警告,但当我显示数据帧时,它正在工作,谢谢!生成警告,但当我显示数据帧时,它正在工作,谢谢!