Python 如何根据'；家长'；？问题1_Python_Pandas

Python 如何根据'；家长'；？问题1

python pandas

Python 如何根据'；家长'；？问题1,python,pandas,Python,Pandas,我有这样一个数据帧： import pandas as pd import numpy as np df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]}) 如何根据父列进行订购？我首先要说的是，第一个是没有父级的代码，它的“索引”是1。其他将基于其父级和外观顺序（df.index）索引=父项的索引+'。+出现顺序我相信df.loc[df['Parent'].isna（），'Index']=

我有这样一个数据帧：

import pandas as pd
import numpy as np
df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})

如何根据父列进行订购？我首先要说的是，第一个是没有父级的代码，它的“索引”是1。其他将基于其父级和外观顺序（

df.index

）

索引=父项的索引+'。+出现顺序

我相信

df.loc[df['Parent'].isna（），'Index']=1

是一个好的开始。之后，来自一的直接“子对象”将只有一个点，并且将根据数据帧顺序进行排序，因此，我们将有

由于代码2是起始父级，因此我们将移动到它的子级（按此顺序为代码1、4和45）

因此，在本次迭代之后，我们将有：

在下一步中，我们将查找代码1、4和45的子级，依此类推，直到索引列中没有NaN

预期最终结果（根据第2期订购后）为：

我们可以将其视为章节、小节和小节，而不是“父级”。代码在父级内部，或者来自父级。父母是直接上级

问题2 此外，还有另一个问题。假设

我希望1.2.16.1在1.2.2.1之后但是，如果我尝试：

df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})
df['Index'] = pd.Series(['1.1','1','1.2.1','1.2.16.1','1.2.2.1','1.3.1','1.3'])
df = df.sort_values(by=['Index'])

我得到的是：

预期结果是：

单向使用

natsorted

from natsort import natsorted

df = df.set_index('Index').reindex(natsorted(df.Index)).reset_index()
Out[42]: 
      Index  Code  Parent
0         1     2     NaN
1       1.1     1     2.0
2     1.2.1     3     4.0
3   1.2.2.1     5     3.0
4  1.2.16.1     4     2.0
5       1.3    45     2.0
6     1.3.1    10    45.0

好吧，我做到了，不是效率高，而是效率高

import pandas as pd
import numpy as np
from natsort import natsorted
df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})

#First I gor the Indexes for the Codes that don't have Parents, those will be 1,2,3,...
df.loc[df['Parent'].isna(),"Index"] = range(1,len(df.loc[df['Parent'].isna(), df.columns[1]])+1)

#Then I saved the columns from the dataframe at this point
initial_columns = df.columns

#Then I converted Index column to string and split on dots, since the values were like 1.0, 2.0, etc.
df['Index']  = df['Index'].astype(str).str.split('.',expand = True)

#Then I started a loop that will go on until there is no more "nan" in the index column

while (len(df.loc[df['Index'] == "nan",'Index']) > 0):
    #Since the Codes that have 'Index' will now be the Parents, I put them in another dataframe so I could merge
    df2 = df[['Index','Code']]
    df2 = df2.rename(columns = {'Index':'Parent Index','Code': 'Parent Code'})
    df = df.merge(df2, left_on = 'Parent' , right_on = 'Parent Code',how = 'left')

    #Then I created a Auxiliar Column for each Parent in which I make a range depend on the number of ocurrances
    for i in range(len(df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique())):
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Aux Col' + str(i)] = \
                                range(1,len(df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]),df2.columns[1]])+1)
    #Then I, again, split on dots, this time right split (so, if I have, 1.2.2.1.0 (this .0 comes from the range), I stay with 1.2.2.1)
        df['Aux Col' + str(i)] = df['Aux Col' + str(i)].astype(str).str.rsplit('.',n = 1,expand = True)

    #Now I just define the Index for the new children (currently na, but with Parents not na)     
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Index'] = \
                                df['Parent Index'].astype(str) + "." + df['Aux Col'+str(i)].astype(str)
    #Then I clean the dataset, since I created a big mess and a lot of columns
    df = df[initial_columns]
# Finally, just reordering as WeNYoBen suggested.
df = df.set_index('Index').reindex(natsorted(df.Index)).reset_index()

输出：

nice:）抓得好。。。另外，我猜

index=df['code']+'.+df['Parent'].replacena（'0'）

您解决了第二个问题！但起点是

df=pd.DataFrame（{'code'：[1,2,3,4,5,10,45]，'Parent'：[2，np.nan，4,2,3,45,2]}）

<代码>将创建df['Index']。对不起，我不太清楚。你能详细解释一下第一期的逻辑吗？我不清楚您是如何构建索引列值的。现在是否更好？是的，现在很清楚

import pandas as pd
import numpy as np
from natsort import natsorted
df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})

#First I gor the Indexes for the Codes that don't have Parents, those will be 1,2,3,...
df.loc[df['Parent'].isna(),"Index"] = range(1,len(df.loc[df['Parent'].isna(), df.columns[1]])+1)

#Then I saved the columns from the dataframe at this point
initial_columns = df.columns

#Then I converted Index column to string and split on dots, since the values were like 1.0, 2.0, etc.
df['Index']  = df['Index'].astype(str).str.split('.',expand = True)

#Then I started a loop that will go on until there is no more "nan" in the index column

while (len(df.loc[df['Index'] == "nan",'Index']) > 0):
    #Since the Codes that have 'Index' will now be the Parents, I put them in another dataframe so I could merge
    df2 = df[['Index','Code']]
    df2 = df2.rename(columns = {'Index':'Parent Index','Code': 'Parent Code'})
    df = df.merge(df2, left_on = 'Parent' , right_on = 'Parent Code',how = 'left')

    #Then I created a Auxiliar Column for each Parent in which I make a range depend on the number of ocurrances
    for i in range(len(df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique())):
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Aux Col' + str(i)] = \
                                range(1,len(df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]),df2.columns[1]])+1)
    #Then I, again, split on dots, this time right split (so, if I have, 1.2.2.1.0 (this .0 comes from the range), I stay with 1.2.2.1)
        df['Aux Col' + str(i)] = df['Aux Col' + str(i)].astype(str).str.rsplit('.',n = 1,expand = True)

    #Now I just define the Index for the new children (currently na, but with Parents not na)     
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Index'] = \
                                df['Parent Index'].astype(str) + "." + df['Aux Col'+str(i)].astype(str)
    #Then I clean the dataset, since I created a big mess and a lot of columns
    df = df[initial_columns]
# Finally, just reordering as WeNYoBen suggested.
df = df.set_index('Index').reindex(natsorted(df.Index)).reset_index()