用python拆分Twitter流API接收的推文_Python_String_Mongodb_Twitter_Split

用python拆分Twitter流API接收的推文

python string mongodb twitter

用python拆分Twitter流API接收的推文,python,string,mongodb,twitter,split,Python,String,Mongodb,Twitter,Split,我正在使用Twitter的流式API，我收到了json类型的结果，我通过Python将其导入mongoDb数据库以执行查询。从查询结果中，我得到了一个包含用户id和tweet消息文本的文本文件。表格如下： u'"#Fishing on the #Euphrates": http://t.co/sA1uGz8c2g. The shocking power of normality in #IS #propaganda, from @charliewinter @QuilliamF': 651322

我正在使用Twitter的流式API，我收到了json类型的结果，我通过Python将其导入mongoDb数据库以执行查询。从查询结果中，我得到了一个包含用户id和tweet消息文本的文本文件。表格如下：

u'"#Fishing on the #Euphrates": http://t.co/sA1uGz8c2g. The shocking power of normality in #IS #propaganda, from @charliewinter @QuilliamF': 651322435355181056L,

我想隔离文本部分和用户id。理想的情况是一个python实现，它将生成一个包含两个条目的列表

list[0] = #Fishing on the #Euphrates": http://t.co/sA1uGz8c2g. The shocking power of normality in #IS #propaganda, from @charliewinter @QuilliamF

list[1] = 651322435355181056L

我是python初学者，非常感谢您的帮助！我已经尝试过split（）方法，但我不明白如何将整个句子放在一起并删除任何标点符号。谢谢大家!

将文本文件的行读入字符串，然后可以使用拆分方法。仅当字符串一致且用户id始终由相同字符分隔（在本例中为冒号（：）时，此操作才有效

在哪里

inP = str(u'"#Fishing on the #Euphrates": http://t.co/sA1uGz8c2g. The shocking power of normality in #IS #propaganda, from @charliewinter @QuilliamF': 651322435355181056L)

list = inP.split(:)

这将为您提供两个值

list[0] = u'#Fishing on the #Euphrates": http://t.co/sA1uGz8c2g. The shocking power of normality in #IS #propaganda, from @charliewinter @QuilliamF
list[1] = 651322435355181056L

然后可以使用替换方法：

rep = ['.',','] etc.
for i in rep:
 list[0] = list[0].replace(i, '')

也许有更快的办法

希望有帮助：）

要按冒号分割字符串，必须将冒号作为分割函数的参数：

inputStr = str(u'"#Fishing on the #Euphrates": http://t.co/sA1uGz8c2g. The shocking power of normality in #IS #propaganda, from @charliewinter @QuilliamF': 651322435355181056L)

inputStrSplit = inputStr.split(":")

要从列表的第一个元素中删除标点符号，请使用：

import string
outputStr = inputStrSplit[0].translate(string.maketrans("",""), string.punctuation)