Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas—访问对象列的快速方式';属性_Python_Performance_Pandas - Fatal编程技术网

Python Pandas—访问对象列的快速方式';属性

Python Pandas—访问对象列的快速方式';属性,python,performance,pandas,Python,Performance,Pandas,假设我有一个python中的自定义类,它具有属性val。如果我有一个包含这些对象的列的dataframe,如何访问此属性并使用此值创建一个新列 示例数据: df Out[46]: row custom_object 1 foo1 2 foo2 3 foo3 4 foo4 Name: book, dtype: object 其中自定义对象属于Foo类: class Foo: def __init__(self, val): self.

假设我有一个python中的自定义类,它具有属性
val
。如果我有一个包含这些对象的列的dataframe,如何访问此属性并使用此值创建一个新列

示例数据:

df
Out[46]: 
row   custom_object
1     foo1
2     foo2
3     foo3
4     foo4
Name: book, dtype: object
其中自定义对象属于Foo类:

class Foo:
    def __init__(self, val):
        self.val = val
据我所知,使用实例属性创建新列的唯一方法是使用
apply
lambda
组合,这在大型数据集上速度较慢:

df['custom\u val']=df['custom\u object'].应用(lambda x:x.val)


有没有更有效的方法?

您可以使用列表理解:

df['custom_val'] = [foo.val for foo in df['custom_object']]
计时

# Set-up 100k Foo objects.
vals = [np.random.randn() for _ in range(100000)]
foos = [Foo(val) for val in vals]
df = pd.DataFrame(foos, columns=['custom_object'])

# 1) OP's apply method.
%timeit df['custom_object'].apply(lambda x: x.val)
# 10 loops, best of 3: 26.7 ms per loop

# 2) Using a list comprehension instead.
%timeit [foo.val for foo in df['custom_object']]
# 100 loops, best of 3: 11.7 ms per loop

# 3) For reference with the original list of objects (slightly faster than 2) above).
%timeit [foo.val for foo in foos]
# 100 loops, best of 3: 9.79 ms per loop

# 4) And just on the original list of raw values themselves.
%timeit [val for val in vals]
# 100 loops, best of 3: 4.91 ms per loop
如果您有原始的值列表,您可以直接分配它们:

# 5) Direct assignment to list of values.
%timeit df['v'] = vals
# 100 loops, best of 3: 5.88 ms per loop
设置代码:

导入操作符
随机输入
从数据类导入数据类
将numpy作为np导入
作为pd进口熊猫
@数据类
类SomeObj:
val:int
df=pd.DataFrame(data={f“col_1”:[SomeObj(random.randint(0,10000)),用于范围(10000000)])

解决方案1
df['col_1'].map(lambda elem:elem.val)
时间:~3.2秒

# Set-up 100k Foo objects.
vals = [np.random.randn() for _ in range(100000)]
foos = [Foo(val) for val in vals]
df = pd.DataFrame(foos, columns=['custom_object'])

# 1) OP's apply method.
%timeit df['custom_object'].apply(lambda x: x.val)
# 10 loops, best of 3: 26.7 ms per loop

# 2) Using a list comprehension instead.
%timeit [foo.val for foo in df['custom_object']]
# 100 loops, best of 3: 11.7 ms per loop

# 3) For reference with the original list of objects (slightly faster than 2) above).
%timeit [foo.val for foo in foos]
# 100 loops, best of 3: 9.79 ms per loop

# 4) And just on the original list of raw values themselves.
%timeit [val for val in vals]
# 100 loops, best of 3: 4.91 ms per loop
解决方案2
df['col_1'].map(operator.attrgetter('val'))
时间:~2.7秒

# Set-up 100k Foo objects.
vals = [np.random.randn() for _ in range(100000)]
foos = [Foo(val) for val in vals]
df = pd.DataFrame(foos, columns=['custom_object'])

# 1) OP's apply method.
%timeit df['custom_object'].apply(lambda x: x.val)
# 10 loops, best of 3: 26.7 ms per loop

# 2) Using a list comprehension instead.
%timeit [foo.val for foo in df['custom_object']]
# 100 loops, best of 3: 11.7 ms per loop

# 3) For reference with the original list of objects (slightly faster than 2) above).
%timeit [foo.val for foo in foos]
# 100 loops, best of 3: 9.79 ms per loop

# 4) And just on the original list of raw values themselves.
%timeit [val for val in vals]
# 100 loops, best of 3: 4.91 ms per loop
解决方案3
[elem.val用于df中的元素['col_1']]
时间:~1.4秒

# Set-up 100k Foo objects.
vals = [np.random.randn() for _ in range(100000)]
foos = [Foo(val) for val in vals]
df = pd.DataFrame(foos, columns=['custom_object'])

# 1) OP's apply method.
%timeit df['custom_object'].apply(lambda x: x.val)
# 10 loops, best of 3: 26.7 ms per loop

# 2) Using a list comprehension instead.
%timeit [foo.val for foo in df['custom_object']]
# 100 loops, best of 3: 11.7 ms per loop

# 3) For reference with the original list of objects (slightly faster than 2) above).
%timeit [foo.val for foo in foos]
# 100 loops, best of 3: 9.79 ms per loop

# 4) And just on the original list of raw values themselves.
%timeit [val for val in vals]
# 100 loops, best of 3: 4.91 ms per loop
注意:请记住,此解决方案会产生不同的结果类型,这在某些情况下可能是一个问题