如何在python字典中迭代公式并将结果保存在dataFrame中?
我有一本叫做“评论”的字典: 对于字典的每次检查(本例中为1和2),我需要在单词的值上迭代两个公式。这些公式将计算每次评审的“负后概率”和“正后概率” 公式如下:如何在python字典中迭代公式并将结果保存在dataFrame中?,python,dictionary,iteration,Python,Dictionary,Iteration,我有一本叫做“评论”的字典: 对于字典的每次检查(本例中为1和2),我需要在单词的值上迭代两个公式。这些公式将计算每次评审的“负后概率”和“正后概率” 公式如下: “负后验概率”=(负前验*pos)/(负前验*neg+pos前验*pos) ‘pos_post_prob’=(pos_previor*pos)/(neg_previor*neg+pos_previor*pos) 其中: “neg_prior”是在neg的上一个单词迭代中计算的“neg_post_prob”,并且 “pos_prior
- “neg_prior”是在neg的上一个单词迭代中计算的“neg_post_prob”,并且
- “pos_prior”是在pos的上一个单词迭代中计算的“pos_post_prob”
#Review 1:
# the prior before starting the iteration is 0.5
prior = 0.5
# priors after the first word "like"
neg_prior_like = (prior*0.0005) / (prior * 0.0005 + prior * 0.0025)
pos_prior_like = (prior*0.0025) / (prior * 0.0005 + prior * 0.0025)
# priors after the second word "the"
neg_prior_like_the = (neg_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)
pos_prior_like_the = (pos_prior_like * 0.5) / (neg_prior_like * 0.5 + pos_prior_like * 0.5)
# post_prob after last word "acting"
neg_post_prob = (neg_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)
pos_post_prob = (pos_prior_like_the * 0.5) / (neg_prior_like_the * 0.5 + pos_prior_like_the * 0.5)
validation = neg_post_prob + pos_post_prob
但我期望的结果是:
sentiment = {'review': [1, 2],
'neg_post_prob': [0.17, 0.94],
'pos_post_prob': [0.83, 0.06],
'validation': [1, 1]
}
sentiment = pd.DataFrame(sentiment, columns = ['review', 'neg_post_prob', 'pos_post_prob', 'validation'])
print (sentiment)
使用来自functools模块
代码
from functools import reduce
import pandas as pd
def update(priors, values):
'''
Provides updated probabilities based upon previous pair of neg, pos
'''
# Previous neg, pos pair
neg, pos = priors
# New negative and positive (using OP update equation)
scale = (pos *values[0] + neg * values[1]) # denominator
new_neg = (neg*values[0]) / scale
new_pos = (pos*values[1]) / scale
return new_neg, new_pos # new update pair
def calc(reviews):
''' Main function to perform calculations and
produce pandas data frame
'''
sentiment = {'review':[],
'neg_post_prob': [],
'pos_post_prob': [],
'validation': []}
for review_id, word_values in reviews.items():
# word_values is dictionary of negative/positive for words in review
values = word_values.values() # array of neg/pos values
# Use reduce to iterative apply update function to sequence of value
result = reduce(update, values, [0.5, 0.5])
neg, pos = result
validation = neg + pos
# Update results
sentiment['review'].append(review_id)
sentiment['neg_post_prob'].append(neg)
sentiment['pos_post_prob'].append(pos)
sentiment['validation'].append(validation)
return pd.DataFrame(sentiment)
测试
reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}
df = calc(reviews)
df
从functools模块使用
代码
from functools import reduce
import pandas as pd
def update(priors, values):
'''
Provides updated probabilities based upon previous pair of neg, pos
'''
# Previous neg, pos pair
neg, pos = priors
# New negative and positive (using OP update equation)
scale = (pos *values[0] + neg * values[1]) # denominator
new_neg = (neg*values[0]) / scale
new_pos = (pos*values[1]) / scale
return new_neg, new_pos # new update pair
def calc(reviews):
''' Main function to perform calculations and
produce pandas data frame
'''
sentiment = {'review':[],
'neg_post_prob': [],
'pos_post_prob': [],
'validation': []}
for review_id, word_values in reviews.items():
# word_values is dictionary of negative/positive for words in review
values = word_values.values() # array of neg/pos values
# Use reduce to iterative apply update function to sequence of value
result = reduce(update, values, [0.5, 0.5])
neg, pos = result
validation = neg + pos
# Update results
sentiment['review'].append(review_id)
sentiment['neg_post_prob'].append(neg)
sentiment['pos_post_prob'].append(pos)
sentiment['validation'].append(validation)
return pd.DataFrame(sentiment)
测试
reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}
df = calc(reviews)
df
review neg_post_prob pos_post_prob validation
0 1 0.166667 0.833333 1.0
1 2 0.935484 0.064516 1.0