Python 如何根据';家长';? 问题1

Python 如何根据';家长';? 问题1,python,pandas,Python,Pandas,我有这样一个数据帧: import pandas as pd import numpy as np df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]}) 如何根据父列进行订购?我首先要说的是,第一个是没有父级的代码,它的“索引”是1。其他将基于其父级和外观顺序(df.index) 索引=父项的索引+'。+出现顺序 我相信df.loc[df['Parent'].isna(),'Index']=

我有这样一个数据帧:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})

如何根据父列进行订购?我首先要说的是,第一个是没有父级的代码,它的“索引”是1。其他将基于其父级和外观顺序(
df.index

索引=父项的索引+'。+出现顺序

我相信
df.loc[df['Parent'].isna(),'Index']=1
是一个好的开始。之后,来自一的直接“子对象”将只有一个点,并且将根据数据帧顺序进行排序,因此,我们将有

由于代码2是起始父级,因此我们将移动到它的子级(按此顺序为代码1、4和45)

因此,在本次迭代之后,我们将有:

在下一步中,我们将查找代码1、4和45的子级,依此类推,直到索引列中没有NaN

预期最终结果(根据第2期订购后)为:

我们可以将其视为章节、小节和小节,而不是“父级”。代码在父级内部,或者来自父级。父母是直接上级


问题2 此外,还有另一个问题。 假设

我希望1.2.16.1在1.2.2.1之后 但是,如果我尝试:

df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})
df['Index'] = pd.Series(['1.1','1','1.2.1','1.2.16.1','1.2.2.1','1.3.1','1.3'])
df = df.sort_values(by=['Index'])
我得到的是:

预期结果是:


单向使用
natsorted

from natsort import natsorted

df = df.set_index('Index').reindex(natsorted(df.Index)).reset_index()
Out[42]: 
      Index  Code  Parent
0         1     2     NaN
1       1.1     1     2.0
2     1.2.1     3     4.0
3   1.2.2.1     5     3.0
4  1.2.16.1     4     2.0
5       1.3    45     2.0
6     1.3.1    10    45.0

好吧,我做到了,不是效率高,而是效率高

import pandas as pd
import numpy as np
from natsort import natsorted
df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})

#First I gor the Indexes for the Codes that don't have Parents, those will be 1,2,3,...
df.loc[df['Parent'].isna(),"Index"] = range(1,len(df.loc[df['Parent'].isna(), df.columns[1]])+1)

#Then I saved the columns from the dataframe at this point
initial_columns = df.columns

#Then I converted Index column to string and split on dots, since the values were like 1.0, 2.0, etc.
df['Index']  = df['Index'].astype(str).str.split('.',expand = True)

#Then I started a loop that will go on until there is no more "nan" in the index column

while (len(df.loc[df['Index'] == "nan",'Index']) > 0):
    #Since the Codes that have 'Index' will now be the Parents, I put them in another dataframe so I could merge
    df2 = df[['Index','Code']]
    df2 = df2.rename(columns = {'Index':'Parent Index','Code': 'Parent Code'})
    df = df.merge(df2, left_on = 'Parent' , right_on = 'Parent Code',how = 'left')

    #Then I created a Auxiliar Column for each Parent in which I make a range depend on the number of ocurrances
    for i in range(len(df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique())):
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Aux Col' + str(i)] = \
                                range(1,len(df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]),df2.columns[1]])+1)
    #Then I, again, split on dots, this time right split (so, if I have, 1.2.2.1.0 (this .0 comes from the range), I stay with 1.2.2.1)
        df['Aux Col' + str(i)] = df['Aux Col' + str(i)].astype(str).str.rsplit('.',n = 1,expand = True)

    #Now I just define the Index for the new children (currently na, but with Parents not na)     
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Index'] = \
                                df['Parent Index'].astype(str) + "." + df['Aux Col'+str(i)].astype(str)
    #Then I clean the dataset, since I created a big mess and a lot of columns
    df = df[initial_columns]
# Finally, just reordering as WeNYoBen suggested.
df = df.set_index('Index').reindex(natsorted(df.Index)).reset_index()
输出:


nice:)抓得好。。。另外,我猜
index=df['code']+'.+df['Parent'].replacena('0')
您解决了第二个问题!但起点是
df=pd.DataFrame({'code':[1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})
<代码>将创建df['Index']。对不起,我不太清楚。你能详细解释一下第一期的逻辑吗?我不清楚您是如何构建索引列值的。现在是否更好?是的,现在很清楚
import pandas as pd
import numpy as np
from natsort import natsorted
df = pd.DataFrame({'Code': [1,2,3,4,5,10,45],'Parent':[2,np.nan,4,2,3,45,2]})

#First I gor the Indexes for the Codes that don't have Parents, those will be 1,2,3,...
df.loc[df['Parent'].isna(),"Index"] = range(1,len(df.loc[df['Parent'].isna(), df.columns[1]])+1)

#Then I saved the columns from the dataframe at this point
initial_columns = df.columns

#Then I converted Index column to string and split on dots, since the values were like 1.0, 2.0, etc.
df['Index']  = df['Index'].astype(str).str.split('.',expand = True)

#Then I started a loop that will go on until there is no more "nan" in the index column

while (len(df.loc[df['Index'] == "nan",'Index']) > 0):
    #Since the Codes that have 'Index' will now be the Parents, I put them in another dataframe so I could merge
    df2 = df[['Index','Code']]
    df2 = df2.rename(columns = {'Index':'Parent Index','Code': 'Parent Code'})
    df = df.merge(df2, left_on = 'Parent' , right_on = 'Parent Code',how = 'left')

    #Then I created a Auxiliar Column for each Parent in which I make a range depend on the number of ocurrances
    for i in range(len(df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique())):
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Aux Col' + str(i)] = \
                                range(1,len(df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]),df2.columns[1]])+1)
    #Then I, again, split on dots, this time right split (so, if I have, 1.2.2.1.0 (this .0 comes from the range), I stay with 1.2.2.1)
        df['Aux Col' + str(i)] = df['Aux Col' + str(i)].astype(str).str.rsplit('.',n = 1,expand = True)

    #Now I just define the Index for the new children (currently na, but with Parents not na)     
        df.loc[(df['Index'] == "nan") & (df['Parent Index'] == df2.loc[df2['Parent Index'] != "nan", "Parent Index"].unique()[i]), 'Index'] = \
                                df['Parent Index'].astype(str) + "." + df['Aux Col'+str(i)].astype(str)
    #Then I clean the dataset, since I created a big mess and a lot of columns
    df = df[initial_columns]
# Finally, just reordering as WeNYoBen suggested.
df = df.set_index('Index').reindex(natsorted(df.Index)).reset_index()