Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Regex分隔列_Python_Regex_Pandas - Fatal编程技术网

Python 使用Regex分隔列

Python 使用Regex分隔列,python,regex,pandas,Python,Regex,Pandas,我有一个数据框,其中有一列标题为“文件名”,其中包含很长的字符串,我需要将这些字符串拆分为不同的列 df = pd.DataFrame({'File Name':['92.00.88 / Z.89 / JK89Y3333 Test File Name', '94.00.21 / W.22 / JK89Y3333 Sample Document Title Here',

我有一个数据框,其中有一列标题为“文件名”,其中包含很长的字符串,我需要将这些字符串拆分为不同的列

df = pd.DataFrame({'File Name':['92.00.88 / Z.89 / JK89Y3333 Test File Name', 
                                 '94.00.21 / W.22 / JK89Y3333 Sample Document Title Here', 
                                 '94.10.31 / Y.88 / JK89Y3333 File Document Name',
                                 'Phase 1',
                                 'Phase 2']}) 

| File Name                                              |
|--------------------------------------------------------|
| 92.00.88 / Z.89 / JK89Y3333 Test File Name             |
| 94.00.21 / W.22 / JK89Y3333 Sample Document Title Here |
| 94.10.31 / Y.88 / JK89Y3333 File Document Name         |
| Phase 1                                                |
| Phase 2                                                |
这就是我需要的数据帧的外观:

| File Number | Site | Barcode   | Title                      | Phase   |
|-------------|------|-----------|----------------------------|---------|
| 92.00.88    | Z.89 | JK89Y3333 | Test File Name             |         |
| 94.00.21    | W.22 | JK89Y3333 | Sample Document Title Here |         |
| 94.10.31    | Y.88 | JK89Y3333 | File Document Name         |         |
|             |      |           |                            | Phase 1 |
|             |      |           |                            | Phase 2 |

我似乎不知道如何使用正则表达式来实现这一点

对于某些高级拆分,我们可以使用负前瞻和正前瞻:

data = df['File Name'].str.split('/|(?<=\d{3})\s(?=[A-Z])', expand=True)
df2 = pd.DataFrame(data.to_numpy(), columns=['File Number', 'Site', 'Barcode', 'Title'])

# clean up File Number column and create Phase columns
phase = df2['File Number'].str.contains('Phase')
df2.loc[phase, 'Phase'] = df2.loc[phase, 'File Number']
df2.loc[phase, 'File Number'] = ''
df2 = df2.replace(np.NaN, '')

  File Number    Site     Barcode                       Title    Phase
0   92.00.88    Z.89    JK89Y3333              Test File Name         
1   94.00.21    W.22    JK89Y3333  Sample Document Title Here         
2   94.10.31    Y.88    JK89Y3333          File Document Name         
3                                                              Phase 1
4                                                              Phase 2

data=df['File Name'].str.split('/|(?哇,多么神奇的正则表达式,你是怎么想到的。我认为你不需要在
extract
展开
在正则表达式中是什么意思?@Kenan它是命名组,熊猫似乎会使用这个名称作为列名-我刚在文档中读到-它也在模块文档中
df = df['File Name'].str.extract(r'(?P<File_Number>.*)\s/\s(?P<Site>.*)\s/\s(?P<Barcode>.*?)\s(?P<Tile>.*)|(?P<Phase>Phase.*)'). \
    fillna(''). \
    rename(columns={'File_Number':'File Number'})

print(df)
  File Number  Site    Barcode                        Tile    Phase
0    92.00.88  Z.89  JK89Y3333              Test File Name         
1    94.00.21  W.22  JK89Y3333  Sample Document Title Here         
2    94.10.31  Y.88  JK89Y3333          File Document Name         
3                                                           Phase 1
4                                                           Phase 2