Python 如何从数据帧中可变长度列中提取子字符串？_Python_Python 3.x_Pandas_Dataframe

Python 如何从数据帧中可变长度列中提取子字符串？

python python-3.x pandas dataframe

Python 如何从数据帧中可变长度列中提取子字符串？,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,大家好，我正在尝试用python的pandas数据框中的一列实现类似于excel中的mid函数的功能。我有一个列，列中有药物名称+强项等，长度可变。我只想拉出名称的第一个“部分”，并将结果放入dataframe中的另一列例如：数据帧列 MEDICATION_NAME acetaminophen 325 mg a-hydrocort 100 mg/2 ml 基本上，我想应用 df['GENERIC_NAME'] = df['MEDICATION_NAME'].apply(lambda x: x

大家好，我正在尝试用python的pandas数据框中的一列实现类似于excel中的mid函数的功能。我有一个列，列中有药物名称+强项等，长度可变。我只想拉出名称的第一个“部分”，并将结果放入dataframe中的另一列

例如：

数据帧列

MEDICATION_NAME acetaminophen 325 mg a-hydrocort 100 mg/2 ml 基本上，我想应用

df['GENERIC_NAME'] = df['MEDICATION_NAME'].apply(lambda x: x.find(' '))

到str[：]函数

谢谢

处理

str.split

df['MEDICATION_NAME'].str.split(n=1).str[0]
Out[345]: 
0    acetaminophen
1      a-hydrocort
Name: MEDICATION_NAME, dtype: object
#df['GENERIC_NAME']=df['MEDICATION_NAME'].str.split(n=1).str[0]

使用

str.extract

使用完整的正则表达式功能：

df["GENERIC_NAME"] = df["MEDICATION_NAME"].str.extract(r'([^\s]+)')

这将捕获由空格限定的第一个单词。因此，它可以防止出现先有空间的情况。

尝试以下方法：

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]

您可以在此处使用：

对于给定的列，它给出：

>>> g.str.partition(' ')[0]
0    acetaminophen
1      a-hydrocort
Name: 0, dtype: object

分区

本身从一系列数据框中创建三列数据框：before、match和after：

>>> df['MEDICATION_NAME'].str.partition(' ')
               0  1            2
0  acetaminophen          325 mg
1    a-hydrocort     100 mg/2 ml

你能提供更多的例子吗？名称后面总是跟一个空格和数字吗？是否有一些带有空格的通用名称？

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.partition(' ')[0]

>>> g.str.partition(' ')[0]
0    acetaminophen
1      a-hydrocort
Name: 0, dtype: object

>>> df['MEDICATION_NAME'].str.partition(' ')
               0  1            2
0  acetaminophen          325 mg
1    a-hydrocort     100 mg/2 ml