Python 3.x 如何在字符串列中找到特定的数字模式，并用该序号的文本版本替换该值？_Python 3.x_Regex_Pandas_Spyder

Python 3.x 如何在字符串列中找到特定的数字模式，并用该序号的文本版本替换该值？

python-3.x regex pandas

Python 3.x 如何在字符串列中找到特定的数字模式，并用该序号的文本版本替换该值？,python-3.x,regex,pandas,spyder,Python 3.x,Regex,Pandas,Spyder,请原谅，我是python新手。但我正在构建一个功能，我可以用来清理各种调查的文本。我觉得我接近于将序数的数字版本转换为文本版本，但我还不太清楚。下面是我试图构建的函数（注意，我尝试了两种方法在函数中的*nbr=*行上查找正则表达式模式，但我在下面解释了这两种方法的错误）：错误：当我在函数中的“nbr=”行上运行words.str.findall时，我得到错误：AttributeError:'str'对象没有属性“str”，当我运行re.findall时，我能够得到一个数据帧，但“字符串清理”

请原谅，我是python新手。但我正在构建一个功能，我可以用来清理各种调查的文本。我觉得我接近于将序数的数字版本转换为文本版本，但我还不太清楚。下面是我试图构建的函数（注意，我尝试了两种方法在函数中的*nbr=*行上查找正则表达式模式，但我在下面解释了这两种方法的错误）：

错误：当我在函数中的“nbr=”行上运行

words.str.findall

时，我得到错误：

AttributeError:'str'对象没有属性“str”

，当我运行

re.findall

时，我能够得到一个数据帧，但“字符串清理”列不能反映每行上的字符串。相反，我得到的是：

    record  the_string                  the_string_clean
0   47      This is the first string    "0This is the first string 1This is the 2nd string 2nothing to 
                                        see here 3 4th string has the date: today is the 8th 4This has 
                                        a typo10th"
Name: the_string, dtype: object
1   56      This is the 2nd string      "0This is the first string 1This is the 2nd string 2 nothing to 
                                        see here3 4th string has the date: today is the 8th 4This has a 
                                        typo10th"
Name: the_string, dtype: object
2   59       nothing to see here        "0This is the first string 1This is the 2nd string 2 nothing to 
                                        see here3 4th string has the date: today is the 8th 4This has a 
                                        typo10th"
Name: the_string, dtype: object
3   134      4th string has the         "0This is the first string 1This is the 2nd string 2 nothing to
             date: today is the 8th     see here3 4th string has the date: today is the 8th 4This has a 
                                        typo10th"
Name: the_string, dtype: object
4   454      this has a typo10th        "0This is the first string 1This is the 2nd string 2 nothing to 
                                        see here3 4th string has the date: today is the 8th 4This has a 
                                        typo10th"
Name: the_string, dtype: object

预期输出：这是我预期的输出：

record    the_string                                 the_string_clean
47        this is the first string                   this is the first string
56        this is the 2nd string                     this is the second string
59        nothing to see here                        nothing to see here
134       4th string has the date: today is the 8th  fourth string has the date: today is the eighth
454       this has a typo10th                        this has a typotenth

我希望我足够清楚。我是Python新手，非常感谢您的帮助。

您可以通过在lambda函数中使用和调用

num2words

作为替换，简化您的

替换序数

函数。然后仅使用在列上运行函数：

将熊猫作为pd导入
从num2words导入num2words
进口稀土
my_df=pd.DataFrame（{“记录”：[47,56,59134454]，
“the_string”：[“这是第一个字符串”，
“这是第二个字符串”，
“这里没什么可看的”，
“第四个字符串有日期：今天是第八个”，
“这有一个输入错误”]}）
def替换序号（文字）：
返回re.sub（r'（\d+）（：st | nd | rd | th'），lambda m:num2words（m.group（1），序号=True），words）
my_df['the_string']=my_df['the_string'].应用（替换序号）
我的

输出

记录\u字符串
0 47这是第一个字符串
156这是第二个字符串
2 59这里没什么可看的
第四个字符串有日期：今天是第八个
454这有一个输入错误

请注意，您需要在正则表达式中使用一个替代项

（？：st | nd | rd | th）

，以匹配

st

、

nd

、

rd

或

th

中的一个；您正在使用的字符类：

[st | nd | rd | th]

将匹配包含

dnrst |

中任何字符的任何字符串。您可以通过在lambda函数中使用并调用

num2words

来简化

替换序数

函数。然后仅使用在列上运行函数：

将熊猫作为pd导入
从num2words导入num2words
进口稀土
my_df=pd.DataFrame（{“记录”：[47,56,59134454]，
“the_string”：[“这是第一个字符串”，
“这是第二个字符串”，
“这里没什么可看的”，
“第四个字符串有日期：今天是第八个”，
“这有一个输入错误”]}）
def替换序号（文字）：
返回re.sub（r'（\d+）（：st | nd | rd | th'），lambda m:num2words（m.group（1），序号=True），words）
my_df['the_string']=my_df['the_string'].应用（替换序号）
我的

输出

记录\u字符串
0 47这是第一个字符串
156这是第二个字符串
2 59这里没什么可看的
第四个字符串有日期：今天是第八个
454这有一个输入错误

请注意，您需要在正则表达式中使用一个替代项

（？：st | nd | rd | th）

，以匹配

st

、

nd

、

rd

或

th

中的一个；您正在使用的字符类：

[st | nd | rd | th]

将匹配包含

dnrst

中任何字符的任何字符串

record    the_string                                 the_string_clean
47        this is the first string                   this is the first string
56        this is the 2nd string                     this is the second string
59        nothing to see here                        nothing to see here
134       4th string has the date: today is the 8th  fourth string has the date: today is the eighth
454       this has a typo10th                        this has a typotenth