Python 删除数据框中某列中与特定级别对应的行
My从2个大型数据集中合并,然后删除NA。它现在已经成型(2707,18) 我已经做了Python 删除数据框中某列中与特定级别对应的行,python,pandas,merge,Python,Pandas,Merge,My从2个大型数据集中合并,然后删除NA。它现在已经成型(2707,18) 我已经做了 parks= pd.read_csv('parks.csv') species= pd.read_csv('species.csv') data= pd.merge(parks, species, on='Park Name') variables= list(data.columns)[:-1] print(parks.columns) print('') print(species.columns) pr
parks= pd.read_csv('parks.csv')
species= pd.read_csv('species.csv')
data= pd.merge(parks, species, on='Park Name')
variables= list(data.columns)[:-1]
print(parks.columns)
print('')
print(species.columns)
print('')
print(variables)
data= data.loc[:, variables]
data= data.dropna()
print(data.shape)
# The output:
Index(['Park Code', 'Park Name', 'State', 'Acres', 'Latitude', 'Longitude'], dtype='object')
Index(['Species ID', 'Park Name', 'Category', 'Order', 'Family',
'Scientific Name', 'Common Names', 'Record Status', 'Occurrence',
'Nativeness', 'Abundance', 'Seasonality', 'Conservation Status',
'Unnamed: 13'],
dtype='object')
['Park Code', 'Park Name', 'State', 'Acres', 'Latitude', 'Longitude', 'Species ID', 'Category', 'Order', 'Family', 'Scientific Name', 'Common Names', 'Record Status', 'Occurrence', 'Nativeness', 'Abundance', 'Seasonality', 'Conservation Status']
(2707, 18)
我在数据中查找了一个分类变量的级别,发现有一些不相关的级别。这些水平(似乎属于其他变量)来自“物种”数据
print(data.groupby('Record Status').size())
# Output:
Record Status
American Crow 1
Bushtit 1
Cabezon 1
Catbird 1
Cocodrilo De Tumbes 1
Common Poorwill 1
Northern Goshawk 1
Northern Pintail 1
Pigeon Hawk 1
Robin 1
Short-Tailed Weasel 1
Speckled Trout 1
Wapiti 1
White-Footed Mouse 1
Approved 2668
In Review 25
我已尝试将列“Record Status”仅在['Approved'、'in Review']中取值的行保留为
data= data[(data.loc[:,'Record Status']=='Approved') & (data.loc[:,'Record Status']=='In Review')]
但这会删除数据中的所有行,即data.shape现在为(0,18)
否则,我已经试过pd.isin了
data= data[data.loc[:,'Record Status'].isin(['Approved', 'In Review'])]
print(data.groupby('Record Status').size())
# Output:
Record Status
Approved 2668
dtype: int64
这一次只保留一个级别,而删除“审查中”级别的观察结果。如何“错误地”从数据框中删除所有这些级别?提前感谢。尝试使用删除列记录状态中的标题和尾部空白,并使用筛选列
data['Record Status']=data['Record Status'].str.strip()
数据=数据[数据['Record Status'].isin(['Approved','In Review'])]