Python 带异常的字符串拆分_Python_Pandas_Split

Python 带异常的字符串拆分

python pandas

Python 带异常的字符串拆分,python,pandas,split,Python,Pandas,Split,我使用逗号作为分隔符将字符串拆分为行 for col in [col for col in df.loc[:,df.columns.str.contains(">")]]: #only on colnames containing ">" df[col] = df[col].str.split(", ") df = df.explode(col).reset_index(drop=True) 但是，有三个子

我使用逗号作为分隔符将字符串拆分为行

for col in [col for col in df.loc[:,df.columns.str.contains(">")]]: #only on colnames containing ">"
    df[col] = df[col].str.split(", ")
    df = df.explode(col).reset_index(drop=True)

但是，有三个子字符串，其中逗号“自然”出现，不应导致拆分：

与性偏好、性生活和/或性取向有关的数据

合同、薪金和福利

采购、分包和供应商管理

我在想，如果有一种方法可以用这样的东西破例的话，那么只有这三种情况：“偏好”，“性生活”，“合同”和“采购”。还是更优雅的解决方法

以下是一个示例：

df = pd.DataFrame({"col > 1": ["Personals, Financials, Data related to sexual preferences, sex life, and/or sexual orientation", "Personals, Financials", "Vendors, Procurement, subcontracting and vendor management"]})

下面是它应该输出的内容：

+-------------------------------------------------------------------------+
|                                 col > 1                                 |
+-------------------------------------------------------------------------+
| Personals                                                               |
| Financials                                                              |
| Data related to sexual preferences, sex life, and/or sexual orientation |
| Personals                                                               |
| Financials                                                              |
| Vendors                                                                 |
| Procurement, subcontracting and vendor management                       |
+-------------------------------------------------------------------------+

您可以临时用其他内容替换这些异常的逗号（让我们使用

；

）

创建以逗号分隔的列表

爆炸数据帧

将分号替换为逗号

您可以在

df.str.split（）

中使用带有多个负lookback语句的正则表达式模式，本质上说是“在

上拆分行，

，除非

，

前面有…”

为了在Python中实现这一点，最好使用多个负lookback断言——Python正则表达式强制使用固定宽度的lookarounds，因此它不像单个负lookback那样简单，其中包含由

分隔的子句

使用示例中的短语在

，

上拆分，除非前面有任何列出的短语，您可以使用：

r"(?<!preferences)(?<!sex life)(?<!Contract)(?<!Procurement),"

r"(?<!preferences)(?<!sex life)(?<!Contract)(?<!Procurement),"

                                             col > 1
0                                          Personals
1                                         Financials
2   Data related to sexual preferences, sex life,...
3                                          Personals
4                                         Financials
5                                            Vendors
6   Procurement, subcontracting and vendor manage...