Python 在for循环中使用append在新列中获取不需要的值_Python_Pandas

Python 在for循环中使用append在新列中获取不需要的值

python pandas

Python 在for循环中使用append在新列中获取不需要的值,python,pandas,Python,Pandas,我正在尝试创建一个脚本，该脚本在dataframe中的行之间循环，并根据列C中的条件从列a或B中追加值，从而生成一个新列。但是，在列中追加行似乎有问题，因为我的新列包含多个值 import pandas as pd import numpy as np #Loading in the csv file filename = '35180_TRA_data.csv' df1 = pd.read_csv(filename, sep=',', nrows=1300, skiprows=25, ind

我正在尝试创建一个脚本，该脚本在dataframe中的行之间循环，并根据列C中的条件从列a或B中追加值，从而生成一个新列。但是，在列中追加行似乎有问题，因为我的新列包含多个值

import pandas as pd
import numpy as np

#Loading in the csv file
filename = '35180_TRA_data.csv'
df1 = pd.read_csv(filename, sep=',', nrows=1300, skiprows=25, index_col=False, header=0)

#Calculating the B concentration using column A and a factor
B_calc = df1['A']*137.818

#The measured B concentration
B_measured = df1['B']

#Looping through the dataset, and append the B_calc values where the C column is 2, while appending the B_measured values where the C column is 1.
calculations = []

for row in df1['C']:
    if row == 2:
        calculations.append(B_calc)
    if row ==1:
        calculations.append(B_measured)

df1['B_new'] = calculations

我的新列（B_new）的值都是错误的。例如，在第一行中，它应该仅为0.00，但它包含许多值。因此，在附加中出现了一些问题。谁能发现这个问题？

B_计算和B_测量都是数组。因此，必须指定要指定的值，否则将指定整个数组。以下是您如何做到这一点：

df1 = pd.DataFrame({"A":[1,3,5,7,9], "B" : [9,7,5,3,1], "C":[1,2,1,2,1]})
#Calculating the B concentration using column A and a factor
B_calc = df1['A']*137.818

#The measured B concentration
B_measured = df1['B']
#Looping through the dataset, and append the B_calc values where the C column is 2, while appending the B_measured values where the C column is 1.
calculations = []

for index, row in df1.iterrows():
    if row['C'] == 2:
        calculations.append(B_calc[index])
    if row['C'] ==1:
        calculations.append(B_measured[index])

df1['B_new'] = calculations

但对行进行迭代是一种不好的做法，因为它需要很长时间。更好的方法是使用熊猫面具，下面是它的工作原理：

mask_1 = df1['C'] == 1
mask_2 = df1['C'] == 2

df1.loc[mask_1, 'C'] = df1[mask_1]['A']*137.818
df1.loc[mask_2, 'C'] = df1[mask_2]['B']

B_calc和B_measured是阵列。因此，必须指定要指定的值，否则将指定整个数组。以下是您如何做到这一点：

df1 = pd.DataFrame({"A":[1,3,5,7,9], "B" : [9,7,5,3,1], "C":[1,2,1,2,1]})
#Calculating the B concentration using column A and a factor
B_calc = df1['A']*137.818

#The measured B concentration
B_measured = df1['B']
#Looping through the dataset, and append the B_calc values where the C column is 2, while appending the B_measured values where the C column is 1.
calculations = []

for index, row in df1.iterrows():
    if row['C'] == 2:
        calculations.append(B_calc[index])
    if row['C'] ==1:
        calculations.append(B_measured[index])

df1['B_new'] = calculations

但对行进行迭代是一种不好的做法，因为它需要很长时间。更好的方法是使用熊猫面具，下面是它的工作原理：

mask_1 = df1['C'] == 1
mask_2 = df1['C'] == 2

df1.loc[mask_1, 'C'] = df1[mask_1]['A']*137.818
df1.loc[mask_2, 'C'] = df1[mask_2]['B']

看起来您的

计算

是一个序列数组，因为每次向其追加一个序列。如果可能的话，应该避免循环行。相反，使用布尔掩蔽或

np。where

@QuangHoang:谢谢！我使用了np.where，这就成功了。看起来你的

计算是一个序列数组，因为每次你都要向它附加一个序列。如果可能的话，应该避免循环行。相反，使用布尔掩蔽或np。where
@QuangHoang:谢谢！我使用了np.where，这就成功了。谢谢你的建议！我使用了np.where，正如上面的评论所建议的那样，解决了我的问题。谢谢你的建议！我使用了np.where，正如上面的评论中所建议的那样，它解决了我的问题。