Python 如何清理数据框_Python_Pandas_Dataframe_Nan

Python 如何清理数据框

python pandas dataframe

Python 如何清理数据框,python,pandas,dataframe,nan,Python,Pandas,Dataframe,Nan,我当前的数据帧如下所示：我想通过删除一些NAN值和一些未列出的NAN值来清除此数据帧预期产量有人能帮我做一下dropna+切片吗您可以按df删除不需要的列。drop[column_names]，axis=1，并将NAN替换为not listed，然后设置列标题并删除为标题所取的额外行 df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed') df.columns = df.iloc[0] df.drop(0,inplace=Tru

我当前的数据帧如下所示：

我想通过删除一些NAN值和一些未列出的NAN值来清除此数据帧

预期产量

有人能帮我做一下dropna+切片吗

您可以按df删除不需要的列。drop[column_names]，axis=1，并将NAN替换为not listed，然后设置列标题并删除为标题所取的额外行

df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed')
df.columns = df.iloc[0]
df.drop(0,inplace=True)

鉴于您的具体数据结构：

df.columns = df.iloc[0, :]  # Rename the columns based on the first row of data.
df.columns.name = None  # Set the columns name to None.
df = df.iloc[1:, :].reset_index(drop=True)  # Drop the column names from the data in the dataframe.
>>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed')
   Name      Amount  Percentage
0     A       28223       8.70%
1     B  Not Listed  Not Listed
2     C  Not Listed  Not Listed
3     D       21871       6.80%
4     E  Not Listed  Not Listed
5     F  Not Listed  Not Listed
6     G       21380       6.64%
7     H  Not Listed  Not Listed
8     I  Not Listed  Not Listed
9     J       20784       6.46%
10    K  Not Listed  Not Listed
11    L  Not Listed  Not Listed

如果愿意，可以通过更改最后一行代码将索引设置为名称：

>>> >>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed').set_index('Name')
          Amount  Percentage
Name                        
A          28223       8.70%
B     Not Listed  Not Listed
C     Not Listed  Not Listed
D          21871       6.80%
E     Not Listed  Not Listed
F     Not Listed  Not Listed
G          21380       6.64%
H     Not Listed  Not Listed
I     Not Listed  Not Listed
J          20784       6.46%
K     Not Listed  Not Listed
L     Not Listed  Not Listed

我尝试了你的方法，但即使是这段代码也不能代替NAN，使其无法列出。我不明白我在这里遗漏了什么。出于好奇，我在问，如果我想删除这一行，我该怎么办。我试过了：df=df.drop[0,1,2]，axis=0，但它不工作。你有什么错误吗，如果您试图在“我的代码”之后删除行0，您将面临一个错误，因为行0已被删除：那部分我想出来了。我使用了df1=df1.dropdf1.index[[0，1，2]]，但这一行有一个错误df1.drop0，inplace=True error:KeyError:“[0]在axis中找不到”，所以如果列和索引的inplace=True，我该怎么办？

   Name      Amount  Percentage
0     A       28223       8.70%
1     B  Not listed  Not listed
2     C  Not listed  Not listed
3     D       21871       6.80%
4     E  Not listed  Not listed
5     F  Not listed  Not listed
6     G       21380       6.64%
7     H  Not listed  Not listed
8     I  Not listed  Not listed
9     J       20784       6.46%
10    K  Not listed  Not listed
11    L  Not listed  Not listed

df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed')
df.columns = df.iloc[0]
df.drop(0,inplace=True)

df.columns = df.iloc[0, :]  # Rename the columns based on the first row of data.
df.columns.name = None  # Set the columns name to None.
df = df.iloc[1:, :].reset_index(drop=True)  # Drop the column names from the data in the dataframe.
>>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed')
   Name      Amount  Percentage
0     A       28223       8.70%
1     B  Not Listed  Not Listed
2     C  Not Listed  Not Listed
3     D       21871       6.80%
4     E  Not Listed  Not Listed
5     F  Not Listed  Not Listed
6     G       21380       6.64%
7     H  Not Listed  Not Listed
8     I  Not Listed  Not Listed
9     J       20784       6.46%
10    K  Not Listed  Not Listed
11    L  Not Listed  Not Listed

>>> >>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed').set_index('Name')
          Amount  Percentage
Name                        
A          28223       8.70%
B     Not Listed  Not Listed
C     Not Listed  Not Listed
D          21871       6.80%
E     Not Listed  Not Listed
F     Not Listed  Not Listed
G          21380       6.64%
H     Not Listed  Not Listed
I     Not Listed  Not Listed
J          20784       6.46%
K     Not Listed  Not Listed
L     Not Listed  Not Listed