Python 解析存储在数据帧列中的tweet

Python 解析存储在数据帧列中的tweet,python,pandas,csv,parsing,twitter,Python,Pandas,Csv,Parsing,Twitter,我试图通过tweets进行解析,tweets存储在.csv文件中一个名为“text”的列中。我想使用正则表达式、TweetTokenizer等,但这一切都要求文本采用字符串形式(据我所知) 我看到这个帖子: 但对我来说,代码太具体,无法找到hashtag。我确实想这样做,但有人知道如何更一般地将“text”列中的文本转换为字符串,以便我可以解析吗 谢谢, 阅读csv文件时,punpun文本列应作为字符串导入: df = pd.read_csv('tweet.csv') print(df) 输

我试图通过tweets进行解析,tweets存储在.csv文件中一个名为“text”的列中。我想使用正则表达式、TweetTokenizer等,但这一切都要求文本采用字符串形式(据我所知)

我看到这个帖子:

但对我来说,代码太具体,无法找到hashtag。我确实想这样做,但有人知道如何更一般地将“text”列中的文本转换为字符串,以便我可以解析吗

谢谢,
阅读csv文件时,punpun

文本列应作为字符串导入:

df = pd.read_csv('tweet.csv')
print(df)
输出:

            user                                               text
0  scotthamilton  is upset that he can't update his Facebook by ...
1       mattycus  @Kenichan I dived many times for the ball. Man...
2        ElleCTF     my whole body feels itchy and like its on fire
3         Karoli  @nationwideclass no, it's not behaving at all....
4       joy_wolf                       @Kwesidei not the whole crew
5        mybirch                                         Need a hug
user    object
text    object
dtype: object
输出:

            user                                               text
0  scotthamilton  is upset that he can't update his Facebook by ...
1       mattycus  @Kenichan I dived many times for the ball. Man...
2        ElleCTF     my whole body feels itchy and like its on fire
3         Karoli  @nationwideclass no, it's not behaving at all....
4       joy_wolf                       @Kwesidei not the whole crew
5        mybirch                                         Need a hug
user    object
text    object
dtype: object
Pandas
object
dtype与Python
str
type相同,用于文本

如果确实需要将列类型转换为str,可以使用以下方法:

df.text = df.text.astype(str)

读取csv文件时,应将文本列作为字符串导入:

df = pd.read_csv('tweet.csv')
print(df)
输出:

            user                                               text
0  scotthamilton  is upset that he can't update his Facebook by ...
1       mattycus  @Kenichan I dived many times for the ball. Man...
2        ElleCTF     my whole body feels itchy and like its on fire
3         Karoli  @nationwideclass no, it's not behaving at all....
4       joy_wolf                       @Kwesidei not the whole crew
5        mybirch                                         Need a hug
user    object
text    object
dtype: object
输出:

            user                                               text
0  scotthamilton  is upset that he can't update his Facebook by ...
1       mattycus  @Kenichan I dived many times for the ball. Man...
2        ElleCTF     my whole body feels itchy and like its on fire
3         Karoli  @nationwideclass no, it's not behaving at all....
4       joy_wolf                       @Kwesidei not the whole crew
5        mybirch                                         Need a hug
user    object
text    object
dtype: object
Pandas
object
dtype与Python
str
type相同,用于文本

如果确实需要将列类型转换为str,可以使用以下方法:

df.text = df.text.astype(str)

您应该能够从数据框中提取“text”列,将其保存为列表并解析列表的元素。除非我没有抓住要点。@fulaphex你知道如何一次解析列表中的所有元素吗?例如,运行-re.findall(r“#”(\w+),tweetlist)-返回TypeError:预期的字符串或类似字节的对象。基本上,我想把所有tweet都做成一个大字符串,并且能够解析,这对我来说是有效的。你应该能够从数据框中提取“text”列,将其保存为列表,并解析列表中的元素。除非我没有抓住要点。@fulaphex你知道如何一次解析列表中的所有元素吗?例如,运行-re.findall(r“#”(\w+),tweetlist)-返回TypeError:预期的字符串或类似字节的对象。基本上,我想把所有的推文做成一个大串,并能够解析出这对我来说是有效的