使用python分隔字符串中的连接词
我想做的是用python读取这个字符串,并将连接的单词分开。我想要的是一个正则表达式来分隔字符串中的连接词 我想从文件中读取上述字符串,输出如下:使用python分隔字符串中的连接词,python,Python,我想做的是用python读取这个字符串,并将连接的单词分开。我想要的是一个正则表达式来分隔字符串中的连接词 我想从文件中读取上述字符串,输出如下: "10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attac
"10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"
(将连接词分开)
我需要编写一个正则表达式来分隔:
'2015年1月10日航空邮件','Hyderabaddress','details:John','DriveAdelaide'
需要一个正则表达式来识别上面这样的连接词,并在相同的字符串中用空格分隔它们,如
'2015年1月10日航空邮件,“海得拉巴地址”,“详细信息:约翰”
"10 JAN 2015 AirMail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide. Also calculated: Nil Action Taken: Goods referred to USG for further action. Attachments : Nil 60 FEB 2004."
上面的代码不起作用我知道这个解决方案可以非常简单地对字符集进行分类(上限、下限、数字),但我更喜欢使用更详细的解决方案:
text = open('C:\sample.txt', 'r').read().replace("\n","").replace("\t","").replace("-","").replace("/"," ")
newtext = re.sub('[a-zA-Z0-9_:]','',text) #This regex does not work.Please assist
print text
print newtext
有时候,我们只需要指出正确的方向。我们需要的是你自己尝试解决问题的证据。你有尝试过吗?你不能只是发布一个问题并要求解决方案,而不首先展示你自己在解决方案上的尝试。text=open('C:\sample.txt','r')。read().replace(“\n”,“更换”)。replace(“-”,“”)。replace(“/”,“”)newtext=re.sub('a-zA-Z0-9:','',text)打印文本打印新文本谢谢Rafael。正如你提到的,逻辑是清楚的,我正在检查使用正则表达式的更有效的方法。你想要一个神奇的正则表达式来完成这一切吗?或者可以通过混合使用正则表达式来完成?。。我可以进一步尝试,但这可能既不美观也不高效。
test_text = "10JAN2015AirMail standard envelope from HyderabadAddress details:John Cena Palm DriveAdelaide.Also Contained:NilAction Taken:Goods referred to HGI QLD for further action.Attachments:Nil34FEB2004"
splitted_text = test_text.split(' ')
num = False
low = False
upp = False
result = []
for word in ss:
new_word = ''
if not word.isupper() and not word.islower():
if word[0].isnumeric():
num = True
low = False
upp = False
elif word[0].islower():
num = False
low = True
upp = False
elif word[0].isupper():
num = False
low = False
upp = True
for letter in word:
if letter.isnumeric():
if num:
new_word += letter
else:
new_word += ' ' + letter
low = False
upp = False
num = True
elif letter.islower():
if low or upp:
new_word += letter
else:
new_word += ' ' + letter
low = True
upp = False
num = False
elif letter.isupper():
if low or num:
new_word += ' ' + letter
else:
new_word += letter
low = False
upp = True
num = False
else:
new_word += ' ' + letter
result.append(''.join(new_word))
else:
result.append(word)
' '.join(result)
#'10 JAN 2015 Air Mail standard envelope from Hyderabad Address details : John Cena Palm Drive Adelaide . Also Contained : Nil Action Taken : Goods referred to HGI QLD for further action . Attachments : Nil 34 FEB 2004'