Pandas 在数据框中循环并创建新列值

Pandas 在数据框中循环并创建新列值,pandas,Pandas,我正在尝试循环通过一个csv文件,我把它转换成一个熊猫数据帧 我需要遍历每一行,检查我拥有的纬度和经度数据(2个单独的列),并根据纬度和经度数据是否在某个范围内,在同一行中附加一个代码(0、1或2) 我对python有些陌生,希望您能提供任何帮助 我犯了不少错误 book = 'yellow_tripdata_2014-04.csv' write_book = 'yellow_04.csv' yank_max_long = -73.921630300 yank_min_long = -73.93

我正在尝试循环通过一个csv文件,我把它转换成一个熊猫数据帧

我需要遍历每一行,检查我拥有的纬度和经度数据(2个单独的列),并根据纬度和经度数据是否在某个范围内,在同一行中附加一个代码(0、1或2)

我对python有些陌生,希望您能提供任何帮助

我犯了不少错误

book = 'yellow_tripdata_2014-04.csv'
write_book = 'yellow_04.csv'
yank_max_long = -73.921630300
yank_min_long = -73.931169700 
yank_max_lat = 40.832823000
yank_min_lat = 40.825582000
mets_max_long = 40.760523000
mets_min_long = 40.753277000
mets_max_lat = -73.841035400   
mets_min_lat = -73.850564600   

df = pd.read_csv(book)


##To check for Yankee Stadium Lat's and Long's, if within gps units then Stadium_Code = 1 , if mets then Stadium_Code=2

df['Stadium_Code'] = 0

for i, row in df.iterrows(): 
    if yank_min_lat <= float(row['dropoff_latitude']) <= yank_max_lat and yank_min_long <=float(row('dropoff_longitude')) <=yank_max_long:
        row['Stadium_Code'] == 1
    elif mets_min_lat <= float(row['dropoff_latitude']) <= mets_max_lat and mets_min_long <=float(row('dropoff_longitude')) <=mets_max_long:
        row['Stadium_Code'] == 2
book='yellow\u tripdata\u 2014-04.csv'
写一本书='yellow\u 04.csv'
yank_max_long=-73.9216303000
杨孝敏龙=-73.931169700
yank_max_lat=40.83282300
yank_min_lat=40.825582000
mets_max_long=40.760523000
mets_min_long=40.753277000
mets_max_lat=-73.841035400
mets_min_lat=-73.850564600
df=pd.read\u csv(书籍)
##检查洋基球场的Lat和Long,如果在gps单位内,则球场代码=1,如果大都会队,则球场代码=2
df['Stadium_Code']=0
对于i,df.iterrows()中的行:

如果yank_min_lat首先,当有矢量化解决方案可用于同时操作整个df时,按行迭代是非常浪费的

我将为您的2个条件创建一个布尔掩码,并将它们传递给
.loc
,以屏蔽满足条件的行,并将它们设置为值

在这里,掩码使用位运算符
&
由于运算符优先级,在每个条件周围使用条件和括号

因此,以下措施应该有效:

yank_mask = (df['dropoff_latitude'] > yank_min_lat) & (df['dropoff_latitude'] <= yank_max_lat) & (df['dropoff_longitude'] > yank_min_long) & (df['dropoff_longitude'] <= yank_max_long)

mets_mask = (df['dropoff_latitude'] > mets_min_lat) & (df['dropoff_latitude'] <= mets_max_lat) & (df['dropoff_longitude'] > mets_min_long) & (df['dropoff_longitude'] <= mets_max_long)

df.loc[yank_mask, 'Stadium_Code'] = 1
df.loc[mets_mask, 'Stadium_Code'] = 2

yank_mask=(df['dropoff_latitude']>yank_min_lat)和(df['dropoff_latitude']yank_min_long)和(df['dropoff_latitude']mets_min_lat)&(df['dropoff_latitude']mets_min_long)&(df['dropoff_latitude']通常,当您报告收到错误时,发布错误跟踪及其发生的行是很有用的。您的错误意味着您的列命名错误,您能否发布
df.columns.tolist()的输出[‘供应商id’、‘收货日期时间’、‘收货日期时间’、‘乘客计数’、‘行程距离’、‘收货经度’、‘收货经度’、‘费率代码’、‘商店标志’、‘收货经度’、‘收货经度’、‘付款类型’、‘车费金额’、‘附加费’、‘mta税’、‘小费金额’、‘通行费金额’、‘总金额’、‘体育场’_代码']啊……我认为列中有一个空白,因此您需要修复名称或传递列的实际名称,当您看到
keyrorm
时,通常意味着传递的列名不匹配。我以前尝试过此方法,但遇到错误后尝试了一种我更熟悉的方法上面的ors(最近编辑了这篇文章)。
yank_mask = (df['dropoff_latitude'] > yank_min_lat) & (df['dropoff_latitude'] <= yank_max_lat) & (df['dropoff_longitude'] > yank_min_long) & (df['dropoff_longitude'] <= yank_max_long)

mets_mask = (df['dropoff_latitude'] > mets_min_lat) & (df['dropoff_latitude'] <= mets_max_lat) & (df['dropoff_longitude'] > mets_min_long) & (df['dropoff_longitude'] <= mets_max_long)

df.loc[yank_mask, 'Stadium_Code'] = 1
df.loc[mets_mask, 'Stadium_Code'] = 2