Python 基于特定行名称对数据帧中的行进行分组和转置

Python 基于特定行名称对数据帧中的行进行分组和转置,python,pandas,Python,Pandas,这个问题与本帖有关[ 我有一个数据框,每行都有在线抓取的文本,其中包含运动选择信息(都在同一列中)。链接帖子中的解决方案工作得很好,但我发现了更多的问题,因为文本中没有一致的模式。以下是我的DF: print(df): Col A Race 1 - Handicap 14 - NAME 3 - NAME 5 - NAME 6 - NAME 4 - NAME Race Overview: lorem ipsum etc etc Race 2 - Sprint 12 - NAME

这个问题与本帖有关[

我有一个数据框,每行都有在线抓取的文本,其中包含运动选择信息(都在同一列中)。链接帖子中的解决方案工作得很好,但我发现了更多的问题,因为文本中没有一致的模式。以下是我的DF:

print(df): 
Col A    

Race 1 - Handicap
14 - NAME
3  - NAME
5  - NAME
6  - NAME
4  - NAME
Race Overview: lorem ipsum etc etc
Race 2 - Sprint
12 - NAME
10 - NAME
8 - NAME
11 - NAME
Race Overview: Second lorem ipsum etc etc
Race 3 - Sprint
1  - NAME
14 - NAME
8  - NAME
6  - NAME
Race 4 - Handicap
1  - NAME
14 - NAME
8  - NAME
#Race numbers may run up to 15-20
这就是我试图将其转换为的内容:

print(df):
Race Name             | Selection No    | Selection    | Race Overview

Race 1 - Handicap     |  1              |  14 - Name   | Race Overview: lorem ipsum etc etc
Race 1 - Handicap     |  2              |  3  - Name   | Race Overview: lorem ipsum etc etc
Race 1 - Handicap     |  3              |  5  - Name   | Race Overview: lorem ipsum etc etc
Race 1 - Handicap     |  4              |  6  - Name   | Race Overview: lorem ipsum etc etc
Race 1 - Handicap     |  5              |  4  - Name   | Race Overview: lorem ipsum etc etc
Race 2 - Sprint       |  1              |  12 - Name   | Race Overview: Second lorem ipsum etc etc
Race 2 - Sprint       |  2              |  10 - Name   | Race Overview: Second lorem ipsum etc etc
Race 2 - Sprint       |  3              |  8  - Name   | Race Overview: Second lorem ipsum etc etc
Race 2 - Sprint       |  4              |  11 - Name   | Race Overview: Second lorem ipsum etc etc
Race 3 - Sprint       |  1              |  1  - Name   | 
Race 3 - Sprint       |  2              |  14 - Name   | 
Race 3 - Sprint       |  3              |  8  - Name   | 
Race 3 - Sprint       |  4              |  6  - Name   | 
Race 4 - Sprint       |  1              |  1  - Name   | 
Race 4 - Sprint       |  2              |  14 - Name   | 
Race 4 - Sprint       |  3              |  8  - Name   | 
如果模式基于6行的重复圆柱体,则此函数用于转置:

df2 = (
    pd.DataFrame(data = df['Col A'].values.reshape(-1, 6))
    .set_index([0, 5])
    .stack()
    .rename_axis(index=['Race Name','Race Overview','Selection No'])
    .to_frame('Selection')
    .reset_index()
)
是否会在每个
“Race[0-9]-”
行之间找到行,然后为每个模式运行上述
df2

非常感谢您的帮助。谢谢!

使用:

#get Race values by pattern
df['Race Name'] = df['Col A'].where(df['Col A'].str.contains('Race [0-9]+ -'))
#get Selection values by pattern - starting numeric of original column
df['Selection'] = df['Col A'].where(df['Col A'].str.contains('^[0-9]+'))
#get info column
df['Race Overview'] = df['Col A'].where(df['Race Name'].isna() & df['Selection'].isna())

#forward and back filling per helper groups
s1 = df['Selection'].isna().cumsum()
s2 = df['Race Overview'].notna().iloc[::-1].cumsum()
df['Race Name'] = df.groupby(s1)['Race Name'].ffill()
df['Race Overview'] = df.groupby(s2)['Race Overview'].bfill()

#remove rows by missing values and also original column
df = df.dropna(subset=['Race Name', 'Selection']).drop('Col A', axis=1)
#added counter
df.insert(1, 'Selection No', df.groupby('Race Name').cumcount().add(1))


您好@jezrael。这个代码运行得很好,尽管唯一的问题是它似乎没有从“第10场比赛”开始。我是否需要调整
df['Race Name']=df['Col A']=df['Col A'].str.contains('Race[0-9]'))
code来捕获进入双位数的比赛?@SOK-是否可以检查
df['Race Name']=df['Col A'].where(df['Col A'].str.contains('Race[0-9]+-'))
?它表示一个或多个数字[0-9]完美!+是一个方便的提示,因此我感谢@jezrael的帮助。谢谢!
print (df)
            Race Name  Selection No  Selection  \
4   Race 1 - Handicap             1  14 - NAME   
5   Race 1 - Handicap             2  3  - NAME   
6   Race 1 - Handicap             3  5  - NAME   
7   Race 1 - Handicap             4  6  - NAME   
8   Race 1 - Handicap             5  4  - NAME   
11    Race 2 - Sprint             1  12 - NAME   
12    Race 2 - Sprint             2  10 - NAME   
13    Race 2 - Sprint             3   8 - NAME   
14    Race 2 - Sprint             4  11 - NAME   
17    Race 3 - Sprint             1  1  - NAME   
18    Race 3 - Sprint             2  14 - NAME   
19    Race 3 - Sprint             3  8  - NAME   
20    Race 3 - Sprint             4  6  - NAME   
22  Race 4 - Handicap             1  1  - NAME   
23  Race 4 - Handicap             2  14 - NAME   
24  Race 4 - Handicap             3  8  - NAME   

                                Race Overview  
4          Race Overview: lorem ipsum etc etc  
5          Race Overview: lorem ipsum etc etc  
6          Race Overview: lorem ipsum etc etc  
7          Race Overview: lorem ipsum etc etc  
8          Race Overview: lorem ipsum etc etc  
11  Race Overview: Second lorem ipsum etc etc  
12  Race Overview: Second lorem ipsum etc etc  
13  Race Overview: Second lorem ipsum etc etc  
14  Race Overview: Second lorem ipsum etc etc  
17                                        NaN  
18                                        NaN  
19                                        NaN  
20                                        NaN  
22                                        NaN  
23                                        NaN  
24                                        NaN