Python 如何选择不仅包含NaN值和0的行
这是我的数据帧:Python 如何选择不仅包含NaN值和0的行,python,pandas,Python,Pandas,这是我的数据帧: cols = ['Country', 'Year', 'Orange', 'Apple', 'Plump'] data = [['US', 2008, 17, 29, 19], ['US', 2009, 11, 12, 16], ['US', 2010, 14, 16, 38], ['Spain', 2008, 11, None, 33], ['Spain', 2009, 12, 19, 17],
cols = ['Country', 'Year', 'Orange', 'Apple', 'Plump']
data = [['US', 2008, 17, 29, 19],
['US', 2009, 11, 12, 16],
['US', 2010, 14, 16, 38],
['Spain', 2008, 11, None, 33],
['Spain', 2009, 12, 19, 17],
['France', 2008, 17, 19, 21],
['France', 2009, 19, 22, 13],
['France', 2010, 12, 11, 0],
['France', 2010, 0, 0, 0],
['Italy', 2009, None, None, None],
['Italy', 2010, 15, 16, 17],
['Italy', 2010, 0, None, None],
['Italy', 2011, 42, None, None]]
我想选择的行中,橙苹果和李子不是只包含“无”,只有0或它们的混合。因此,结果输出应为:
Country Year Orange Apple Plump
0 US 2008 17.0 29.0 19.0
1 US 2009 11.0 12.0 16.0
2 US 2010 14.0 16.0 38.0
3 Spain 2008 11.0 NaN 33.0
4 Spain 2009 12.0 19.0 17.0
5 France 2008 17.0 19.0 21.0
6 France 2009 19.0 22.0 13.0
7 France 2010 12.0 11.0 0.0
10 Italy 2010 15.0 16.0 17.0
12 Italy 2011 42.0 NaN NaN
第二,我想放弃那些我三年都没有观察到的国家。因此,由此产生的产出应该只包括美国和法国。我怎样才能得到它们?
我试过这样的方法:
df = df[(df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull()) | (df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0)]
In: df = df.replace(0,np.nan)
df = df[df[['Orange','Apple','Plump']].notnull().any(1)]
Out:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
3 Spain 2008 11 NaN 33
4 Spain 2009 12 19 17
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
10 Italy 2010 15 16 17
12 Italy 2011 42 NaN NaN
countries = []
for group,values in enumerate(df.groupby('Country')):
lista = values[1].Year.unique() == [2008,2009,2010]
if (np.all(lista)):
countries.append(values[0])
df = df[df.Country.isin(countries)]
我也试过:
df = df[((df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull())) & ((df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0))]
无值将被读取为NaN,因此您可以替换0并将其转换为NaN。在那之后,你可以按照马苏的建议去做。这大概是:
df = df[(df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull()) | (df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0)]
In: df = df.replace(0,np.nan)
df = df[df[['Orange','Apple','Plump']].notnull().any(1)]
Out:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
3 Spain 2008 11 NaN 33
4 Spain 2009 12 19 17
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
10 Italy 2010 15 16 17
12 Italy 2011 42 NaN NaN
countries = []
for group,values in enumerate(df.groupby('Country')):
lista = values[1].Year.unique() == [2008,2009,2010]
if (np.all(lista)):
countries.append(values[0])
df = df[df.Country.isin(countries)]
关于你的第二个问题,我理解在这种情况下,你想摆脱那些你没有200820092010年观察数据的国家。
为此,您可以执行以下操作:
df = df[(df['Orange'].notnull())| \
(df['Apple'].notnull()) | (df['Plump'].notnull()) | (df['Orange'] != 0 )| (df['Apple']!= 0) | (df['Plump']!= 0)]
In: df = df.replace(0,np.nan)
df = df[df[['Orange','Apple','Plump']].notnull().any(1)]
Out:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
3 Spain 2008 11 NaN 33
4 Spain 2009 12 19 17
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
10 Italy 2010 15 16 17
12 Italy 2011 42 NaN NaN
countries = []
for group,values in enumerate(df.groupby('Country')):
lista = values[1].Year.unique() == [2008,2009,2010]
if (np.all(lista)):
countries.append(values[0])
df = df[df.Country.isin(countries)]
这将产生如下结果:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
8 France 2010 NaN NaN NaN
最后,您可以同时应用这两种解决方案:
df[df[['Orange','Apple','Plump']].notnull().any(1) & df.Country.isin(countries)])
获取:
Country Year Orange Apple Plump
0 US 2008 17 29 19
1 US 2009 11 12 16
2 US 2010 14 16 38
5 France 2008 17 19 21
6 France 2009 19 22 13
7 France 2010 12 11 NaN
我会更新我尝试过的,非常感谢你的回答。但我也想去掉第八排,它包含了所有0@MaxU完美答案,如usualOkay,这很聪明。我本来想做null或者-=-0,但是fillna(0)。eq(0)非常灵活。