Python:如何将2列相乘?
我有一个简单的数据框架,我想添加“Pow_calkowita”列。如果'liczba_-kon'为0,'Pow_-calkowita'为'povierzchn',但如果'liczba_-kon'不为0,'Pow_-calkowita'为'liczba_-kon'*'povierzchn。为什么我不能那样做Python:如何将2列相乘?,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个简单的数据框架,我想添加“Pow_calkowita”列。如果'liczba_-kon'为0,'Pow_-calkowita'为'povierzchn',但如果'liczba_-kon'不为0,'Pow_-calkowita'为'liczba_-kon'*'povierzchn。为什么我不能那样做 for index, row in df.iterrows(): if row['liczba_kon'] == 0: row['Pow_calkowita'] = r
for index, row in df.iterrows():
if row['liczba_kon'] == 0:
row['Pow_calkowita'] = row['Powierzchn']
elif row['liczba_kon'] != 0:
row['Pow_calkowita'] = row['Powierzchn'] * row['liczba_kon']
我的代码没有返回任何值
liczba_kon Powierzchn
0 3 69.60495
1 1 39.27270
2 1 130.41225
3 1 129.29570
4 1 294.94400
5 1 64.79345
6 1 108.75560
7 1 35.12290
8 1 178.23905
9 1 263.00930
10 1 32.02235
11 1 125.41480
12 1 47.05420
13 1 45.97135
14 1 154.87120
15 1 37.17370
16 1 37.80705
17 1 38.78760
18 1 35.50065
19 1 74.68940
我找到了一些解决办法:
result = []
for index, row in df.iterrows():
if row['liczba_kon'] == 0:
result.append(row['Powierzchn'])
elif row['liczba_kon'] != 0:
result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result
这是一种好方法吗?数据帧的设计目的是使用。可以将其视为数据库表。因此,您应该尽可能长时间地使用它的功能
tdf = df # temp df
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1) # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita'] # copy column
这简化了代码并增强了性能。我们可以测试它们的性能:
sampleSize = 100000
df=pd.DataFrame({
'liczba_kon': np.random.randint(3, size=(sampleSize)),
'Powierzchn': np.random.randint(1000, size=(sampleSize)),
})
# vectorication
s = time.time()
tdf = df # temp df
tdf['liczba_kon'] = tdf['liczba_kon'].replace(0, 1) # replace 0 to 1
tdf['Pow_calkowita'] = tdf['liczba_kon'] * tdf['Powierzchn'] # multiply
df['Pow_calkowita'] = tdf['Pow_calkowita'] # copy column
print(time.time() - s)
# iteration
s = time.time()
result = []
for index, row in df.iterrows():
if row['liczba_kon'] == 0:
result.append(row['Powierzchn'])
elif row['liczba_kon'] != 0:
result.append(row['Powierzchn'] * row['liczba_kon'])
df['Pow_calkowita'] = result
print(time.time() - s)
我们可以看到矢量化执行得更快
0.0034716129302978516
6.193516492843628
要为Pandas编写惯用代码并利用Pandas的高效数组处理,您应该避免自己编写循环数组的代码。Pandas允许您在高效的numpy ndarray数据结构上使用矢量化来编写简洁的代码,同时高效地进行处理。在底层,它使用优化的C语言二进制代码进行快速数组处理。Pandas已经在幕后处理了必要的循环,这也是使用Pandas by single语句而无需显式编写循环来迭代所有元素的优势。通过使用Pandas,您将更好地享受其快速高效但简洁的矢量化处理 由于公式基于条件,因此不能使用直接乘法。相反,您可以按如下方式使用:
import numpy as np
df['Pow_calkowita'] = np.where(df['liczba_kon'] == 0, df['Powierzchn'], df['Powierzchn'] * df['liczba_kon'])
当第一个参数中的测试条件为真时,取第二个参数的值,否则取第三个参数的值
测试运行输出:(在末尾再添加两个测试用例;一个测试用例的值为0liczba_kon
)
回答第一个问题:“为什么我不能这么做?” 各国(在说明中): 因为ItErrors为每一行返回一个序列 及 您不应该修改正在迭代的内容。[…]迭代器返回一个副本而不是一个视图,对其进行写入将没有任何效果 这基本上意味着它返回一个包含该行值的新序列 因此,您得到的不是实际的行,也肯定不是数据帧 但你所做的是工作,尽管不是以你想要的方式:
df = DF(dict(a= [1,2,3], b= list("abc")))
df # To demonstrate what you are doing
a b
0 1 a
1 2 b
2 3 c
for index, row in df.iterrows():
... print("\n------------------\n>>> Next Row:\n")
... print(row)
... row["c"] = "ADDED" ####### HERE I am adding to 'the row'
... print("\n -- >> added:")
... print(row)
... print("----------------------")
...
------------------
Next Row: # as you can see, this Series has the same values
a 1 # as the row that it represents
b a
Name: 0, dtype: object
-- >> added:
a 1
b a
c ADDED # and adding to it works... but you aren't doing anything
Name: 0, dtype: object # with it, unless you append it to a list
----------------------
------------------
Next Row:
a 2
b b
Name: 1, dtype: object
### same here
-- >> added:
a 2
b b
c ADDED
Name: 1, dtype: object
----------------------
------------------
Next Row:
a 3
b c
Name: 2, dtype: object
### and here
-- >> added:
a 3
b c
c ADDED
Name: 2, dtype: object
----------------------
回答第二个问题:“这是好办法吗?”
没有
因为使用SeaBean所展示的乘法实际上使用了
numpy和pandas是矢量化操作。
,它们基本上是pandas数据帧和系列的构建块。您似乎试图在迭代数据时就地修改数据。查看哪些引号“您永远不应该修改您正在迭代的内容”
iterrows()
似乎返回每行的副本,而不是实际行本身
df = DF(dict(a= [1,2,3], b= list("abc")))
df # To demonstrate what you are doing
a b
0 1 a
1 2 b
2 3 c
for index, row in df.iterrows():
... print("\n------------------\n>>> Next Row:\n")
... print(row)
... row["c"] = "ADDED" ####### HERE I am adding to 'the row'
... print("\n -- >> added:")
... print(row)
... print("----------------------")
...
------------------
Next Row: # as you can see, this Series has the same values
a 1 # as the row that it represents
b a
Name: 0, dtype: object
-- >> added:
a 1
b a
c ADDED # and adding to it works... but you aren't doing anything
Name: 0, dtype: object # with it, unless you append it to a list
----------------------
------------------
Next Row:
a 2
b b
Name: 1, dtype: object
### same here
-- >> added:
a 2
b b
c ADDED
Name: 1, dtype: object
----------------------
------------------
Next Row:
a 3
b c
Name: 2, dtype: object
### and here
-- >> added:
a 3
b c
c ADDED
Name: 2, dtype: object
----------------------