在Python中使用函数作为re.sub的参数？_Python_Regex_String_Replace_Hashtag

在Python中使用函数作为re.sub的参数？

python regex string replace

在Python中使用函数作为re.sub的参数？,python,regex,string,replace,hashtag,Python,Regex,String,Replace,Hashtag,我正在写一个程序来分割散列标签中包含的单词例如，我想拆分哈希标记： #你好#戈巴克进入：你的问候是怎么回的我在使用函数参数时遇到问题我写的代码是： import re,pdb def func_replace(each_func): i=0 wordsineach_func=[] while len(each_func) >0: i=i+1 word_found=longest_word(each_func)

我正在写一个程序来分割散列标签中包含的单词

例如，我想拆分哈希标记：

#你好#戈巴克

进入：

你的问候是怎么回的

我在使用函数参数时遇到问题

我写的代码是：

import re,pdb

def func_replace(each_func):
    i=0
    wordsineach_func=[] 
    while len(each_func) >0:
        i=i+1
        word_found=longest_word(each_func)
        if len(word_found)>0:
            wordsineach_func.append(word_found)
            each_func=each_func.replace(word_found,"")
    return ' '.join(wordsineach_func)

def longest_word(phrase):
    phrase_length=len(phrase)
    words_found=[];index=0
    outerstring=""
    while index < phrase_length:
        outerstring=outerstring+phrase[index]
        index=index+1
        if outerstring in words or outerstring.lower() in words:
            words_found.append(outerstring)
    if len(words_found) ==0:
        words_found.append(phrase)
    return max(words_found, key=len)        

words=[]
# The file corncob_lowercase.txt contains a list of dictionary words
with open('corncob_lowercase.txt') as f:
    read_words=f.readlines()

for read_word in read_words:
    words.append(read_word.replace("\n","").replace("\r",""))

我得到的结果是：

怎么回事 #你好，戈巴克这不是我所期望的结果：

怎么回事你到底在说什么为什么会这样？特别是我使用了来自的建议，但我不明白这段代码中出现了什么错误。

请注意，

m.group（）

返回匹配的整个字符串，无论它是否是捕获组的一部分：

In [19]: m = re.search(r"#(\w+)", s)

In [20]: m.group()
Out[20]: '#Whatthehello'

In [22]: m.group(1)
Out[22]: 'Whatthehello'

m.group（0）

还返回整个匹配项：

In [23]: m.group(0)
Out[23]: '#Whatthehello'

相反，

m.groups（）

返回所有捕获组：

In [21]: m.groups()
Out[21]: ('Whatthehello',)

和

m.group（1）

返回第一个捕获组：

In [19]: m = re.search(r"#(\w+)", s)

In [20]: m.group()
Out[20]: '#Whatthehello'

In [22]: m.group(1)
Out[22]: 'Whatthehello'

因此，代码中的问题源于在中使用

m.group

re.sub(r"#(\w+)", lambda m: func_replace(m.group()), s)

自

然而，如果您使用了

.group（1）

，您将得到

In [24]: re.search(r"#(\w+)", s).group(1)
Out[24]: 'Whatthehello'

前面的

起到了很大的作用：

In [25]: func_replace('#Whatthehello')
Out[25]: '#Whatthehello'

In [26]: func_replace('Whatthehello')
Out[26]: 'What the hello'

因此，将

m.group（）

更改为

m.group（1）

，并将

/usr/share/dict/words

替换为

corncob\u lowercase.txt

import re

def func_replace(each_func):
    i = 0
    wordsineach_func = []
    while len(each_func) > 0:
        i = i + 1
        word_found = longest_word(each_func)
        if len(word_found) > 0:
            wordsineach_func.append(word_found)
            each_func = each_func.replace(word_found, "")
    return ' '.join(wordsineach_func)


def longest_word(phrase):
    phrase_length = len(phrase)
    words_found = []
    index = 0
    outerstring = ""
    while index < phrase_length:
        outerstring = outerstring + phrase[index]
        index = index + 1
        if outerstring in words or outerstring.lower() in words:
            words_found.append(outerstring)
    if len(words_found) == 0:
        words_found.append(phrase)
    return max(words_found, key=len)

words = []
# corncob_lowercase.txt contains a list of dictionary words
with open('/usr/share/dict/words', 'rb') as f:
    for read_word in f:
        words.append(read_word.strip())
s = "#Whatthehello #goback"
hashtags = re.findall(r"#(\w+)", s)
print func_replace(hashtags[0])
print re.sub(r"#(\w+)", lambda m: func_replace(m.group(1)), s)

因为，唉，

'gob'

比

'go'

长

一种调试方法是将

lambda

函数替换为常规函数，然后添加打印语句：

def foo(m):
    result = func_replace(m.group())
    print(m.group(), result)
    return result

In [35]: re.sub(r"#(\w+)", foo, s)
('#Whatthehello', '#Whatthehello')   <-- This shows you what `m.group()` and `func_replace(m.group())` returns
('#goback', '#goback')
Out[35]: '#Whatthehello #goback'

你可以拿它来比较

In [26]: func_replace(hashtags[0])
Out[26]: 'What the hello'

In [27]: func_replace('Whatthehello')
Out[27]: 'What the hello'

这会让你问一个问题，如果

m.group（）

“#Whatthehello”

，我需要用什么方法返回

“Whatthehello”

。深入研究然后解决问题。

Hmmm。。问题是什么。为什么要投否决票？这是关于编程的！！通俗易懂固然好，但你的问题至少应该是可读的。使用英语句子，而不是像“目标：这样做。代码：…；输出…；为什么？请看这里”@Bakuriu感谢您的编辑！我想再问一次，我只是想举一个例子，说明如何写出一个好的问题。您在提供完整的代码和输出方面做得很好，您期望得到什么，但是您应该至少放置一个文本的pragraph，描述您想要做什么（可能是为什么，一点背景），以及代码如何适合此操作。这样你的问题会吸引更多的人，更有用。谢谢！这是到目前为止我读过的最好的解释性答案。使用口译员一步一步地解释这个问题非常棒。谢谢一旦你理解了这个问题，解决方案就会扑向你。此外，您可以将所了解的内容带到未来的编码工作中。

In [25]: func_replace('#Whatthehello')
Out[25]: '#Whatthehello'

In [26]: func_replace(hashtags[0])
Out[26]: 'What the hello'

In [27]: func_replace('Whatthehello')
Out[27]: 'What the hello'