使用Python正则表达式识别推文中的汉字转发者_Python_Regex_Weibo

使用Python正则表达式识别推文中的汉字转发者

python regex

使用Python正则表达式识别推文中的汉字转发者,python,regex,weibo,Python,Regex,Weibo,给出新浪微博的一条推文： tweet = "//@lilei: dd //@Bob: cc//@Girl: dd//@魏武: 利益所致自然念念不忘// @诺什: 吸引优质客户，摆脱屌丝男！！！//@MarkGreene: 转发微博" 请注意，//和之间有一个空格@诺什. 我想得到一个转发者列表，如下所示： result = ['lilei', 'Bob', 'Girl', '魏武', 'MarkGreene'] 我一直在考虑使用以下脚本： RTpattern = r'''//?

给出新浪微博的一条推文：

  tweet = "//@lilei: dd //@Bob: cc//@Girl: dd//@魏武: 利益所致 自然念念不忘// @诺什: 吸引优质  客户，摆脱屌丝男！！！//@MarkGreene: 转发微博"

请注意，//和之间有一个空格@诺什.

我想得到一个转发者列表，如下所示：

  result = ['lilei', 'Bob', 'Girl', '魏武', 'MarkGreene']

我一直在考虑使用以下脚本：

RTpattern = r'''//?@(\w+)'''
rt = re.findall(RTpattern, tweet)

然而，我没能得到中文单词的答案魏武'.

使用标志：

非常感谢。我得到的是['lilei'、'Boy'、'Girl'、'\xe9'、'markgreen']，而不是['lilei'、'Bob'、'Girl'、'魏武', 'MarkGreene']您必须将tweet设置为

unicode

字符串（注意

）。要做到这一点，只需添加

tweet=tweet.decode（'utf-8'）

re.UNICODE
Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character 
properties database.

tweet = u"//@lilei: dd //@Bob: cc//@Girl: dd//@魏武: 利益所致 自然念念不忘// @诺什: 吸引优质  客户，摆脱屌丝男！！！//@MarkGreene: 转发微博"
RTpattern = r'''//?@(\w+)'''
for word in re.findall(RTpattern, tweet, re.UNICODE):
    print word

# lilei
# Bob
# Girl
# 魏武
# MarkGreene