Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/310.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 编辑值时在熊猫中添加列_Python_Pandas - Fatal编程技术网

Python 编辑值时在熊猫中添加列

Python 编辑值时在熊猫中添加列,python,pandas,Python,Pandas,在homework2数据框中添加一个名为ADJ_HDI的新列,如果HDI值大于0.5,则该列为HDI值,否则等于零 我们已经尝试了几个小时来创建此语法,但运气不佳,有人能帮忙吗?如果您的HDI位于名为“HDI”的列中,并且您正在尝试创建一个与HDI相等的新列,或者如果HDI小于0.5,请尝试此操作 def adj_hdi(row): hdi = row['HDI'] if hdi>.5: return hdi else: return

在homework2数据框中添加一个名为ADJ_HDI的新列,如果HDI值大于0.5,则该列为HDI值,否则等于零


我们已经尝试了几个小时来创建此语法,但运气不佳,有人能帮忙吗?

如果您的HDI位于名为“HDI”的列中,并且您正在尝试创建一个与HDI相等的新列,或者如果HDI小于0.5,请尝试此操作

def adj_hdi(row):
    hdi = row['HDI']
    if hdi>.5:
        return hdi
    else:
        return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)

尝试此操作,假设您的HDI位于名为“HDI”的列中,并且您正在尝试创建一个与HDI相等的新列,或者如果HDI小于0.5,则为0

def adj_hdi(row):
    hdi = row['HDI']
    if hdi>.5:
        return hdi
    else:
        return 0
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1)
替代解决方案:

homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']
替代解决方案:

homework2['ADJ_HDI'] = 0
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI']

我认为您可以使用非常快速的解决方案:

计时

import pandas as pd
import numpy as np

homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2],
                           "HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]})

#for test 7k uncomment row bellow  
#homework2 =  pd.concat([homework2]*1000).reset_index(drop=True)
print homework2
h = homework2.copy()
h1 = homework2.copy()
len(家庭作业2)=7

In [2]: %timeit a(homework2)
1000 loops, best of 3: 376 µs per loop

In [3]: %timeit b(h)
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.49 ms per loop

In [4]: %timeit c(h1)
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 283 µs per loop
len(家庭作业2)=7k

In [7]: %timeit a(homework2)
10 loops, best of 3: 106 ms per loop

In [8]: %timeit b(h)
100 loops, best of 3: 2.63 ms per loop

In [9]: %timeit c(h1)
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 324 µs per loop

我认为您可以使用非常快速的解决方案:

计时

import pandas as pd
import numpy as np

homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2],
                           "HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]})

#for test 7k uncomment row bellow  
#homework2 =  pd.concat([homework2]*1000).reset_index(drop=True)
print homework2
h = homework2.copy()
h1 = homework2.copy()
len(家庭作业2)=7

In [2]: %timeit a(homework2)
1000 loops, best of 3: 376 µs per loop

In [3]: %timeit b(h)
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.49 ms per loop

In [4]: %timeit c(h1)
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 283 µs per loop
len(家庭作业2)=7k

In [7]: %timeit a(homework2)
10 loops, best of 3: 106 ms per loop

In [8]: %timeit b(h)
100 loops, best of 3: 2.63 ms per loop

In [9]: %timeit c(h1)
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 324 µs per loop

生成警告,但当我显示数据帧时,它正在工作,谢谢!生成警告,但当我显示数据帧时,它正在工作,谢谢!