Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何根据条件将.fillna()与字典一起使用_Python_Pandas_Numpy_Dataframe - Fatal编程技术网

Python 如何根据条件将.fillna()与字典一起使用

Python 如何根据条件将.fillna()与字典一起使用,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我正在做一些房地产数据清理,遇到了这个新手问题,令人惊讶的是,我自己似乎无法解决这个问题 我有一个数据帧,在lat和lon列中有nan值。我可以算出输入给定邻域的lat和lon平均值的几乎正确的值 这是一个示例,实际DF有超过20k行 lat lon neighborhood -34.62 -58.50 Monte Castro -34.63 -58.36 Boca nan nan San Telmo 我为每个社区制作了两本带有lat和lon意思

我正在做一些房地产数据清理,遇到了这个新手问题,令人惊讶的是,我自己似乎无法解决这个问题

我有一个数据帧,在lat和lon列中有nan值。我可以算出输入给定邻域的lat和lon平均值的几乎正确的值

这是一个示例,实际DF有超过20k行

    lat   lon    neighborhood
   -34.62 -58.50 Monte Castro
   -34.63 -58.36 Boca
    nan   nan    San Telmo
我为每个社区制作了两本带有lat和lon意思的字典,代码如下:

neighborhood_lat = []
neighborhood_lon = []
for neighborhood in df['l3'].unique():
    lat = df[((df['l3']==neighborhood) & (df['lat'].notnull()))].mean().lat
    lon = df[((df['l3']==neighborhood) & (df['lon'].notnull()))].mean().lon
    neighborhood_lat.append({neighborhood: lat})
    neighborhood_lon.append({neighborhood: lon})
这是其中一条格言的一部分:

 neighborhood_lat 
 [{'Mataderos': -34.65278757721805},
 {'Saavedra': -34.551813882357166},
 {nan: nan},
 {'Boca': -34.63204552441155},
 {'Boedo': -34.62695442446412},
 {'Abasto': -34.603728937455315},
 {'Flores': -34.62757516061659},
 {'Nuñez': -34.54843158034983},
 {'Retiro': -34.595564030955934},
 {'Almagro': -34.60692879236826},
 {'Palermo': -34.58274909271148},
 {'Belgrano': -34.56304387233704},
 {'Recoleta': -34.592081482406854},
 {'Balvanera': -34.608665174550694},
 {'Caballito': -34.61749059613885}
然后我试着用那些字典来填充lat和lon,但我不明白如何为fillna赋值,所以它根据邻域lat和lon的平均值来填充lat和lon

预期结果

    lat                         lon                       neighborhood
   -34.62                      -58.50                     Monte Castro
   -34.63                      -58.36                     Boca
    (mean lat of neighborhood) (mean lon of neighborhood) San Telmo

谢谢你的帮助。

再次回答我自己的问题

借助以下答案,我找到了解决问题的正确代码:

代码:

创建字典:

neighborhood_lat = {}
neighborhood_lon = {}

for neighborhood in df['l3'].unique():
    neighborhood_lat[neighborhood] = df[((df['l3']==neighborhood) & (df['lat'].notnull()))].mean().lat
    neighborhood_lon[neighborhood] = df[((df['l3']==neighborhood) & (df['lon'].notnull()))].mean().lon
df['lat'] = df['lat'].fillna(df['l3'].map(neighborhood_lat))
df['lon'] = df['lon'].fillna(df['l3'].map(neighborhood_lon))
使用字典填充nan值:

neighborhood_lat = {}
neighborhood_lon = {}

for neighborhood in df['l3'].unique():
    neighborhood_lat[neighborhood] = df[((df['l3']==neighborhood) & (df['lat'].notnull()))].mean().lat
    neighborhood_lon[neighborhood] = df[((df['l3']==neighborhood) & (df['lon'].notnull()))].mean().lon
df['lat'] = df['lat'].fillna(df['l3'].map(neighborhood_lat))
df['lon'] = df['lon'].fillna(df['l3'].map(neighborhood_lon))

你想用每个街区的平均值来填充nan,对吗?如果是这种情况,请增加数据量,使每个邻域在数据中不止一次。实际数据集包含超过20k行。这是一个示例,可能是的重复项,但在这种情况下,它们映射的是整个列,而不仅仅是nan值