Python:如何将2列相乘?

Python:如何将2列相乘?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个简单的数据框架,我想添加“Pow_calkowita”列。如果'liczba_-kon'为0,'Pow_-calkowita'为'povierzchn',但如果'liczba_-kon'不为0,'Pow_-calkowita'为'liczba_-kon'*'povierzchn。为什么我不能那样做 for index, row in df.iterrows(): if row['liczba_kon'] == 0: row['Pow_calkowita'] = r

我有一个简单的数据框架,我想添加“Pow_calkowita”列。如果'liczba_-kon'为0,'Pow_-calkowita'为'povierzchn',但如果'liczba_-kon'不为0,'Pow_-calkowita'为'liczba_-kon'*'povierzchn。为什么我不能那样做

for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        row['Pow_calkowita'] = row['Powierzchn']
    elif row['liczba_kon'] != 0:
        row['Pow_calkowita'] = row['Powierzchn'] * row['liczba_kon']
我的代码没有返回任何值

    liczba_kon  Powierzchn
0            3    69.60495
1            1    39.27270
2            1   130.41225
3            1   129.29570
4            1   294.94400
5            1    64.79345
6            1   108.75560
7            1    35.12290
8            1   178.23905
9            1   263.00930
10           1    32.02235
11           1   125.41480
12           1    47.05420
13           1    45.97135
14           1   154.87120
15           1    37.17370
16           1    37.80705
17           1    38.78760
18           1    35.50065
19           1    74.68940
我找到了一些解决办法:

result = []
for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        result.append(row['Powierzchn'])
    elif row['liczba_kon'] != 0:
        result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result

这是一种好方法吗?

数据帧的设计目的是使用。可以将其视为数据库表。因此,您应该尽可能长时间地使用它的功能

tdf = df                                                     # temp df 
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1)          # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita']                   # copy column
这简化了代码并增强了性能。我们可以测试它们的性能:

sampleSize = 100000
df=pd.DataFrame({
    'liczba_kon': np.random.randint(3, size=(sampleSize)),
    'Powierzchn': np.random.randint(1000, size=(sampleSize)),
    })

# vectorication
s = time.time()
tdf = df                                                     # temp df 
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1)          # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita']                   # copy column
print(time.time() - s)

# iteration
s = time.time()
result = []
for index, row in df.iterrows():
    if row['liczba_kon'] == 0:
        result.append(row['Powierzchn'])
    elif row['liczba_kon'] != 0:
        result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result
print(time.time() - s)
我们可以看到矢量化执行得更快

0.0034716129302978516
6.193516492843628

要为Pandas编写惯用代码并利用Pandas的高效数组处理,您应该避免自己编写循环数组的代码。Pandas允许您在高效的numpy ndarray数据结构上使用矢量化来编写简洁的代码,同时高效地进行处理。在底层,它使用优化的C语言二进制代码进行快速数组处理。Pandas已经在幕后处理了必要的循环,这也是使用Pandas by single语句而无需显式编写循环来迭代所有元素的优势。通过使用Pandas,您将更好地享受其快速高效但简洁的矢量化处理

由于公式基于条件,因此不能使用直接乘法。相反,您可以按如下方式使用:

import numpy as np

df['Pow_calkowita'] = np.where(df['liczba_kon'] == 0,  df['Powierzchn'], df['Powierzchn'] * df['liczba_kon'])
当第一个参数中的测试条件为真时,取第二个参数的值,否则取第三个参数的值

测试运行输出:(在末尾再添加两个测试用例;一个测试用例的值为0
liczba_kon


回答第一个问题:“为什么我不能这么做?”

各国(在说明中):

因为ItErrors为每一行返回一个序列

您不应该修改正在迭代的内容。[…]迭代器返回一个副本而不是一个视图,对其进行写入将没有任何效果

这基本上意味着它返回一个包含该行值的新序列

因此,您得到的不是实际的行,也肯定不是数据帧

但你所做的是工作,尽管不是以你想要的方式:

df = DF(dict(a= [1,2,3], b= list("abc")))
df                 # To demonstrate what you are doing
   a  b
0  1  a
1  2  b
2  3  c

for index, row in df.iterrows():
...     print("\n------------------\n>>> Next Row:\n")
...     print(row)

...     row["c"] = "ADDED"           ####### HERE I am adding to 'the row'

...     print("\n -- >> added:")
...     print(row)
...     print("----------------------")
...     
------------------
 Next Row:     # as you can see, this Series has the same values
a    1         # as the row that it represents
b    a
Name: 0, dtype: object  

 -- >> added:
a        1
b        a
c    ADDED     # and adding to it works... but you aren't doing anything 
Name: 0, dtype: object   # with it, unless you append it to a list
----------------------


------------------
 Next Row:
a    2
b    b
Name: 1, dtype: object
                       ### same here
 -- >> added:
a        2
b        b
c    ADDED
Name: 1, dtype: object
----------------------


------------------
 Next Row:
a    3
b    c
Name: 2, dtype: object
                          ### and here
 -- >> added:
a        3
b        c
c    ADDED
Name: 2, dtype: object
----------------------
回答第二个问题:“这是好办法吗?”

没有

因为使用SeaBean所展示的乘法实际上使用了 numpy和pandas是矢量化操作。
,它们基本上是pandas数据帧和系列的构建块。

您似乎试图在迭代数据时就地修改数据。查看哪些引号“您永远不应该修改您正在迭代的内容”
iterrows()
似乎返回每行的副本,而不是实际行本身
df = DF(dict(a= [1,2,3], b= list("abc")))
df                 # To demonstrate what you are doing
   a  b
0  1  a
1  2  b
2  3  c

for index, row in df.iterrows():
...     print("\n------------------\n>>> Next Row:\n")
...     print(row)

...     row["c"] = "ADDED"           ####### HERE I am adding to 'the row'

...     print("\n -- >> added:")
...     print(row)
...     print("----------------------")
...     
------------------
 Next Row:     # as you can see, this Series has the same values
a    1         # as the row that it represents
b    a
Name: 0, dtype: object  

 -- >> added:
a        1
b        a
c    ADDED     # and adding to it works... but you aren't doing anything 
Name: 0, dtype: object   # with it, unless you append it to a list
----------------------


------------------
 Next Row:
a    2
b    b
Name: 1, dtype: object
                       ### same here
 -- >> added:
a        2
b        b
c    ADDED
Name: 1, dtype: object
----------------------


------------------
 Next Row:
a    3
b    c
Name: 2, dtype: object
                          ### and here
 -- >> added:
a        3
b        c
c    ADDED
Name: 2, dtype: object
----------------------