Python TextBlob-循环文章以计算极性&；主观性得分_Python_Textblob

Python TextBlob-循环文章以计算极性&；主观性得分

python

Python TextBlob-循环文章以计算极性&；主观性得分,python,textblob,Python,Textblob,我在查看TextBlob，以计算我编辑的excel表格上文章列表的情绪分数（极性、主观性）下面是该表的一个示例： 11/03/2004 04:03至少60人在三次炸弹袭击中丧生他说，在拥挤的马德里火车上发生了西班牙有史以来最严重的恐怖袭击 Efe新闻专线和其他媒体。红十字会称至少有200人受伤受伤``社会党领袖何塞·路易斯说：“这是一场大屠杀。” 罗德里格斯·萨帕特罗指责巴斯克恐怖组织埃塔 07/07/2005 04:41伦敦关闭了地铁系统，疏散了所有乘客事故发生后，各车站紧急服务部门接

我在查看TextBlob，以计算我编辑的excel表格上文章列表的情绪分数（极性、主观性）

下面是该表的一个示例：

11/03/2004 04:03至少60人在三次炸弹袭击中丧生他说，在拥挤的马德里火车上发生了西班牙有史以来最严重的恐怖袭击 Efe新闻专线和其他媒体。红十字会称至少有200人受伤受伤``社会党领袖何塞·路易斯说：“这是一场大屠杀。” 罗德里格斯·萨帕特罗指责巴斯克恐怖组织埃塔

07/07/2005 04:41伦敦关闭了地铁系统，疏散了所有乘客事故发生后，各车站紧急服务部门接到电话，以制止和制止爆炸在金融区周围

2009年12月1日04:00美国国际集团（AIG）今天宣布已完成此前宣布的两项交易随着纽约联邦储备银行（FRBNY）的减少 AIG欠纽约联邦储备银行250亿美元的债务，以换取纽约联邦储备银行的债务收购某些新成立公司的优先股权附属公司

2013年8月22日11:38纳斯达克因电脑故障关闭3小时问题

我已经能够以最简单的方式使用textblob，每行都是这样：

analysis = TextBlob("NASDAQ shuts down for 3 hours due to a computer problem")
print(analysis.sentiment)

我想导入包含日期和时间以及两列文章的excel文件，然后继续循环每一行，计算极性和主观性得分，并将其保存在文件中

我曾尝试通过以下方式修改汤森路透新闻分析的代码：

import pandas as pd
import numpy as np
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

df['Polarity'] = np.nan
df['Subjectivity'] = np.nan

pd.options.mode.chained_assignment = None

for idx, articles in enumerate(df['articles'].values):  # for each row in our df dataframe
    sentA = TextBlob("articles")  # pass the text only article to TextBlob to analyze
    df['Polarity'].iloc[idx] = sentA.sentiment.polarity  # write sentiment polarity back to df
    df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity  # write sentiment subjectivity score back to df
df.head()

df.to_csv("out.csv", index=False)

但是代码不起作用…我没有得到任何分数

有什么建议我可以这样做吗

我是Python的新手（我正在使用Pycharm）。我主要在Stata和Matlab上编写代码

请帮忙

应该将逻辑移动到函数中，然后使用

pd.Series.map（）

将该函数应用到数据帧的每一行。使用

.map（）

或

.apply（）

比手动循环更快、更干净

import pandas as pd
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

# function to extract polarity and subjectivity from text
def process_text(text):
    blob = TextBlob(text)
    return blob.sentiemnt.polarity, blob.sentiment.subjectivity

# apply to each row of the 'articles' Series using the pd.Series.map method
df["polarity"], df["sentiment"] = zip(*df.articles.map(process_text))

df.head()

df.to_csv("out.csv", index=False)

免责声明：我还没有测试过这个

应该将逻辑移动到函数中，然后使用

pd.Series.map（）

将该函数应用到数据帧的每一行。使用

.map（）

或

.apply（）

比手动循环更快、更干净

import pandas as pd
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

# function to extract polarity and subjectivity from text
def process_text(text):
    blob = TextBlob(text)
    return blob.sentiemnt.polarity, blob.sentiment.subjectivity

# apply to each row of the 'articles' Series using the pd.Series.map method
df["polarity"], df["sentiment"] = zip(*df.articles.map(process_text))

df.head()

df.to_csv("out.csv", index=False)

免责声明：我还没有测试过这个

谢谢你伸出援手

我前一段时间确实让代码正常工作了

这就是它的样子：

import pandas as pd
import numpy as np
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

df['Polarity'] = np.nan
df['Subjectivity'] = np.nan

pd.options.mode.chained_assignment = None

for idx, articles in enumerate(df['articles'].values):  # for each row in our df dataFrame
        ***if articles:***
            sentA = TextBlob(articles) # pass the text only article to TextBlob to analyse
            df['Polarity'].iloc[idx] = sentA.sentiment.polarity # write sentiment polarity back to df
            df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity # write sentiment subjectivity score back to df

df.head()

df.to_csv("Sentiment_Scores.csv", index=False)

因此，我基本上缺少了if articles位，它最终会循环遍历每篇文章以检索分数

不过，我非常感谢你提出我的问题

非常感谢

问候

帕维什

谢谢你的帮助

我前一段时间确实让代码正常工作了

这就是它的样子：

import pandas as pd
import numpy as np
from textblob import TextBlob

path_to_file = "C:/Users/Parvesh/Desktop/New Project/Sentiment Analysis/events.csv"
df = pd.read_csv(path_to_file, encoding='latin-1')
df.head()

df['Polarity'] = np.nan
df['Subjectivity'] = np.nan

pd.options.mode.chained_assignment = None

for idx, articles in enumerate(df['articles'].values):  # for each row in our df dataFrame
        ***if articles:***
            sentA = TextBlob(articles) # pass the text only article to TextBlob to analyse
            df['Polarity'].iloc[idx] = sentA.sentiment.polarity # write sentiment polarity back to df
            df['Subjectivity'].iloc[idx] = sentA.sentiment.subjectivity # write sentiment subjectivity score back to df

df.head()

df.to_csv("Sentiment_Scores.csv", index=False)

因此，我基本上缺少了if articles位，它最终会循环遍历每篇文章以检索分数

不过，我非常感谢你提出我的问题

非常感谢

问候帕维什