如何在Python中使用所有特定模式拆分数据帧列中的字符串

如何在Python中使用所有特定模式拆分数据帧列中的字符串,python,regex,split,Python,Regex,Split,我有以下数据框,其中包含许多作者及其所属机构。 在从属关系列中,有一个模式“Department of…,”我需要为每个作者拆分此模式。请注意,每行(作者)的这种模式可能会出现多次。 我需要为每个作者拆分所有“department of…,”模式,并存储在分配给该作者的单独列或行中。 (我需要用Python来做。) 下图显示了预期结果。 我将非常感谢您的帮助。这可以通过使用“re”模块并查找模式-“(Department of.*?”)来完成 建议剪下: import re re.finda

我有以下数据框,其中包含许多作者及其所属机构。

在从属关系列中,有一个模式“Department of…,”我需要为每个作者拆分此模式。请注意,每行(作者)的这种模式可能会出现多次。 我需要为每个作者拆分所有“department of…,”模式,并存储在分配给该作者的单独列或行中。 (我需要用Python来做。) 下图显示了预期结果。


我将非常感谢您的帮助。

这可以通过使用“re”模块并查找模式-“(Department of.*?”)来完成

建议剪下:

import re
re.findall("(Department of .*?),","Department of Oncology, aadsf, afasdf, Department of Computer science, asf asfa, asfas, ")
输出:
['Department of Oncology','Department of Computer science']

为了便于分离和后续分配到新列,您可以使用,它返回带有
多索引的行,这些行可以轻松地在带有

作为数据.csv使用的输入

Author_ID,Affiliation
6504356384,"Department of Cell and Developmental Biology, University of Michigan, Ann Arbor, Ml 48109, United States, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Ml 48109, United States"
57194644787,"Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX, United States, Texas Children's Microbiome Center, Texas Children's Hospital, Houston, TX, United States, Department of Pathology, Texas Children's Ho:"
57194687826,"Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 2C1, Canada, Department of Computer Science, Faculty of Science, Western University, London, ON N6A 2C1, Canada, Depart"
123456789,"Department of RegexTest, Address and Numbers, Department of RegexTest, Faculty of Patterns, Department of RegexTest, Department of RegexTest, City and Place"
来自AFDF的输出

     Author_ID              Affiliation0              Affiliation1             Affiliation2             Affiliation3
0   6504356384  Department of Cell an...  Department of Computa...                      NaN                      NaN
1  57194644787  Department of Patholo...   Department of Pathology                      NaN                      NaN
2  57194687826  Department of Biochem...  Department of Compute...                      NaN                      NaN
3    123456789   Department of RegexTest   Department of RegexTest  Department of RegexTest  Department of RegexTest

谢谢你的评论!但我需要基于此模式进行拆分,并在单独的列中将其分配给相关作者。根据,每行包含、authord和Affiliation。您需要迭代每一行以获得authord和Affiliation。上述“re”代码片段可用于每个附属机构,以获取部门列表。每行的迭代将为您提供作者到部门的映射。
     Author_ID              Affiliation0              Affiliation1             Affiliation2             Affiliation3
0   6504356384  Department of Cell an...  Department of Computa...                      NaN                      NaN
1  57194644787  Department of Patholo...   Department of Pathology                      NaN                      NaN
2  57194687826  Department of Biochem...  Department of Compute...                      NaN                      NaN
3    123456789   Department of RegexTest   Department of RegexTest  Department of RegexTest  Department of RegexTest