Python 如何清理数据框
我当前的数据帧如下所示: 我想通过删除一些NAN值和一些未列出的NAN值来清除此数据帧 预期产量 有人能帮我做一下dropna+切片吗Python 如何清理数据框,python,pandas,dataframe,nan,Python,Pandas,Dataframe,Nan,我当前的数据帧如下所示: 我想通过删除一些NAN值和一些未列出的NAN值来清除此数据帧 预期产量 有人能帮我做一下dropna+切片吗 您可以按df删除不需要的列。drop[column_names],axis=1,并将NAN替换为not listed,然后设置列标题并删除为标题所取的额外行 df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed') df.columns = df.iloc[0] df.drop(0,inplace=Tru
您可以按df删除不需要的列。drop[column_names],axis=1,并将NAN替换为not listed,然后设置列标题并删除为标题所取的额外行
df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed')
df.columns = df.iloc[0]
df.drop(0,inplace=True)
鉴于您的具体数据结构:
df.columns = df.iloc[0, :] # Rename the columns based on the first row of data.
df.columns.name = None # Set the columns name to None.
df = df.iloc[1:, :].reset_index(drop=True) # Drop the column names from the data in the dataframe.
>>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed')
Name Amount Percentage
0 A 28223 8.70%
1 B Not Listed Not Listed
2 C Not Listed Not Listed
3 D 21871 6.80%
4 E Not Listed Not Listed
5 F Not Listed Not Listed
6 G 21380 6.64%
7 H Not Listed Not Listed
8 I Not Listed Not Listed
9 J 20784 6.46%
10 K Not Listed Not Listed
11 L Not Listed Not Listed
如果愿意,可以通过更改最后一行代码将索引设置为名称:
>>> >>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed').set_index('Name')
Amount Percentage
Name
A 28223 8.70%
B Not Listed Not Listed
C Not Listed Not Listed
D 21871 6.80%
E Not Listed Not Listed
F Not Listed Not Listed
G 21380 6.64%
H Not Listed Not Listed
I Not Listed Not Listed
J 20784 6.46%
K Not Listed Not Listed
L Not Listed Not Listed
我尝试了你的方法,但即使是这段代码也不能代替NAN,使其无法列出。我不明白我在这里遗漏了什么。出于好奇,我在问,如果我想删除这一行,我该怎么办。我试过了:df=df.drop[0,1,2],axis=0,但它不工作。你有什么错误吗,如果您试图在“我的代码”之后删除行0,您将面临一个错误,因为行0已被删除:那部分我想出来了。我使用了df1=df1.dropdf1.index[[0,1,2]],但这一行有一个错误df1.drop0,inplace=True error:KeyError:“[0]在axis中找不到”,所以如果列和索引的inplace=True,我该怎么办?
Name Amount Percentage
0 A 28223 8.70%
1 B Not listed Not listed
2 C Not listed Not listed
3 D 21871 6.80%
4 E Not listed Not listed
5 F Not listed Not listed
6 G 21380 6.64%
7 H Not listed Not listed
8 I Not listed Not listed
9 J 20784 6.46%
10 K Not listed Not listed
11 L Not listed Not listed
df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed')
df.columns = df.iloc[0]
df.drop(0,inplace=True)
df.columns = df.iloc[0, :] # Rename the columns based on the first row of data.
df.columns.name = None # Set the columns name to None.
df = df.iloc[1:, :].reset_index(drop=True) # Drop the column names from the data in the dataframe.
>>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed')
Name Amount Percentage
0 A 28223 8.70%
1 B Not Listed Not Listed
2 C Not Listed Not Listed
3 D 21871 6.80%
4 E Not Listed Not Listed
5 F Not Listed Not Listed
6 G 21380 6.64%
7 H Not Listed Not Listed
8 I Not Listed Not Listed
9 J 20784 6.46%
10 K Not Listed Not Listed
11 L Not Listed Not Listed
>>> >>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed').set_index('Name')
Amount Percentage
Name
A 28223 8.70%
B Not Listed Not Listed
C Not Listed Not Listed
D 21871 6.80%
E Not Listed Not Listed
F Not Listed Not Listed
G 21380 6.64%
H Not Listed Not Listed
I Not Listed Not Listed
J 20784 6.46%
K Not Listed Not Listed
L Not Listed Not Listed