Python 给定另一列的条件,如何迭代特定DataFrame列的行?
因此,我基本上想做的是以下内容,基于一个数据框,其中包含列'date'和'polarity',在'date'(days)中有七个不同的值,在'polarity'中的值介于-1和1之间:Python 给定另一列的条件,如何迭代特定DataFrame列的行?,python,pandas,loops,dataframe,tweepy,Python,Pandas,Loops,Dataframe,Tweepy,因此,我基本上想做的是以下内容,基于一个数据框,其中包含列'date'和'polarity',在'date'(days)中有七个不同的值,在'polarity'中的值介于-1和1之间: For each of the seven days: i) count all values in the 'polarity' column that are positive ii) count all values in the 'polarity' column that are negative ii
For each of the seven days:
i) count all values in the 'polarity' column that are positive
ii) count all values in the 'polarity' column that are negative
iii) count all values in the 'polarity' column for a given day (neg, neutral, pos)
编辑:每天i)-iii)的输出应为整数,存储在列表中
Edit2:我尝试使用以下代码实现它(仅适用于值>0):
但是,这返回了0,这在签入Excel时是错误的
非常感谢您的帮助
干杯,
IG如果我理解正确,您需要为每一天的极性值计数。 可能是这样的:
positive = df_tweets[df_tweets['polarity'] > 0].groupby('date').count().reset_index()
negative = df_tweets[df_tweets['polarity'] < 0].groupby('date').count().reset_index()
neutral = df_tweets[df_tweets['polarity'] == 0].groupby('date').count().reset_index()
positive=df_tweets[df_tweets['polarity']>0].groupby('date').count().reset_index()
负值=df_tweets[df_tweets['polarity']<0]。分组依据('date')。计数()。重置_索引()
neutral=df_tweets[df_tweets['polarity']==0].groupby('date').count().reset_index()
此代码的输出是三个数据帧,有两列:一列具有唯一的日期值,另一列具有更高、更小或等于0的极性计数。考虑一个具有边距的数据帧。下面用随机种子数据演示:
数据
import numpy as np
import pandas as pd
np.random.seed(2112020)
random_df = pd.DataFrame({'date': np.random.choice(pd.date_range('2020-02-01', '2020-02-11'), 500),
'polarity': np.random.randint(-1, 2, 500)})
print(random_df.head(10))
# date polarity
# 0 2020-02-08 -1
# 1 2020-02-08 1
# 2 2020-02-06 0
# 3 2020-02-10 -1
# 4 2020-02-04 -1
# 5 2020-02-02 1
# 6 2020-02-05 -1
# 7 2020-02-04 0
# 8 2020-02-10 1
# 9 2020-02-09 0
聚合
pvt_df = (random_df.assign(day_date = lambda x: x['date'].dt.normalize(),
polarity_indicator = lambda x: np.select([x['polarity'] > 0, x['polarity'] < 0, x['polarity'] == 0],
['positive', 'negative', 'neutral']))
.pivot_table(index = 'day_date',
columns = 'polarity_indicator',
values = 'polarity',
aggfunc = 'count',
margins = True)
)
print(pvt_df)
# polarity_indicator negative neutral positive All
# day_date
# 2020-02-01 00:00:00 17 14 16 47
# 2020-02-02 00:00:00 19 14 12 45
# 2020-02-03 00:00:00 11 16 12 39
# 2020-02-04 00:00:00 17 18 13 48
# 2020-02-05 00:00:00 11 15 22 48
# 2020-02-06 00:00:00 12 12 16 40
# 2020-02-07 00:00:00 16 15 21 52
# 2020-02-08 00:00:00 15 10 13 38
# 2020-02-09 00:00:00 17 15 19 51
# 2020-02-10 00:00:00 13 16 19 48
# 2020-02-11 00:00:00 13 12 19 44
# All 161 157 182 500
pvt_df=(随机分配(day_date=lambda x:x['date'].dt.normalize(),
极性指示器=λx:np。选择([x['polarity']>0,x['polarity']<0,x['polarity']==0],
[‘正’、‘负’、‘中性’]))
.pivot_表(索引='day_date',
列='polarity_indicator',
值='极性',
aggfunc='count',
边距=真)
)
打印(pvt_df)
#极性指示灯负极中性正极全部
#日期
# 2020-02-01 00:00:00 17 14 16 47
# 2020-02-02 00:00:00 19 14 12 45
# 2020-02-03 00:00:00 11 16 12 39
# 2020-02-04 00:00:00 17 18 13 48
# 2020-02-05 00:00:00 11 15 22 48
# 2020-02-06 00:00:00 12 12 16 40
# 2020-02-07 00:00:00 16 15 21 52
# 2020-02-08 00:00:00 15 10 13 38
# 2020-02-09 00:00:00 17 15 19 51
# 2020-02-10 00:00:00 13 16 19 48
# 2020-02-11 00:00:00 13 12 19 44
#全部161157182500
能否提供示例数据集以及预期输出?添加了预期输出。数据集是一个Excel工作表,列为“日期”(YYYY-MM-DD)格式和“极性”(每行的值介于-1和1之间)。
pvt_df = (random_df.assign(day_date = lambda x: x['date'].dt.normalize(),
polarity_indicator = lambda x: np.select([x['polarity'] > 0, x['polarity'] < 0, x['polarity'] == 0],
['positive', 'negative', 'neutral']))
.pivot_table(index = 'day_date',
columns = 'polarity_indicator',
values = 'polarity',
aggfunc = 'count',
margins = True)
)
print(pvt_df)
# polarity_indicator negative neutral positive All
# day_date
# 2020-02-01 00:00:00 17 14 16 47
# 2020-02-02 00:00:00 19 14 12 45
# 2020-02-03 00:00:00 11 16 12 39
# 2020-02-04 00:00:00 17 18 13 48
# 2020-02-05 00:00:00 11 15 22 48
# 2020-02-06 00:00:00 12 12 16 40
# 2020-02-07 00:00:00 16 15 21 52
# 2020-02-08 00:00:00 15 10 13 38
# 2020-02-09 00:00:00 17 15 19 51
# 2020-02-10 00:00:00 13 16 19 48
# 2020-02-11 00:00:00 13 12 19 44
# All 161 157 182 500