Python 创建现有csv的新列,使用从数据透视表向量值分割中获得的百分比值?
我想为现有csv创建新列。此列是通过除法获得的百分比,乘以100个单位,如下所示(查看完整代码上的注释箭头): dfb['cm_target\u perc']=cm_inc/[dfb['cm_target']*100*len(cm_inc) 我想要的是生成一个新的列,其中每个值都应该通过将数据透视表的向量cm_inc除以dfb['cm_target',],它的值是每行40乘以100得到 以下是我的Jupyter笔记本的完整代码:Python 创建现有csv的新列,使用从数据透视表向量值分割中获得的百分比值?,python,pandas,numpy,pivot-table,Python,Pandas,Numpy,Pivot Table,我想为现有csv创建新列。此列是通过除法获得的百分比,乘以100个单位,如下所示(查看完整代码上的注释箭头): dfb['cm_target\u perc']=cm_inc/[dfb['cm_target']*100*len(cm_inc) 我想要的是生成一个新的列,其中每个值都应该通过将数据透视表的向量cm_inc除以dfb['cm_target',],它的值是每行40乘以100得到 以下是我的Jupyter笔记本的完整代码: from plotly.offline import init_n
from plotly.offline import init_notebook_mode, iplot
from plotly import graph_objs as go
init_notebook_mode(connected = True)
import pandas as pd
import numpy as np
from datetime import timedelta, datetime, tzinfo
import time
from datetime import datetime as dt
dfb=pd.read_csv('https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1', encoding="latin-1", infer_datetime_format=True, parse_dates=['date'], skipinitialspace=True)
dfb["date"]=pd.to_datetime(dfb['date'])
dfb["site"]=dfb["site"].astype("category")
cm_inc=dfb[dfb.site == 5].pivot_table(index='date', values = 'site', aggfunc = { 'site' : 'count' } )
dfb['cm_target'] = [40]*len(dfb)
#===>>>#dfb['cm_target_perc']=cm_inc/[dfb['cm_target']*100*len(cm_inc)
dfb.to_csv('test_data.csv', index=False)
indexes =pd.to_datetime(cm_inc.index)
dates_indexes = pd.to_datetime(cm_inc.index)
data = [
go.Bar(x=indexes,
y=dfb['cm_target'],
text=dfb['cm_target'],
textposition = 'auto',
name='Target Site A',
base=0
),
go.Bar(x=indexes,
y=cm_inc['site'],
text=cm_inc['site'],
textposition = 'auto',
name='Enroll Site A',
base=0,
#width=2 # Width value varies depending on number of samples in data
)
]
layout = go.Layout(
barmode='stack',
xaxis=dict(
showticklabels=True,
ticktext=dates_indexes,
tickvals=[i for i in indexes],
)
)
fig = dict(data = data, layout = layout)
iplot(fig, show_link=False)
问题:如何更改并修复此错误:
ValueError:传递的项目数错误1239,放置意味着1
提前感谢。虽然它不是新的专栏,但它可以提供以下所需的结果:
cm_achived_perc=cm_inc.loc[:]/40*100
%matplotlib inline
cm_achived_perc.plot(kind = 'bar')
这就是你想要的吗
替换你的线路
dfb['cm_target'] = [40]*len(dfb)
dfb['cm_target_perc']=cm_inc/[dfb['cm_target']*100*len(cm_inc)
与
给我这个dfb
site received sent cm_target cm_inc cm_target_perc
date
2018-07-10 2 NaN NaN 58 20.0 34.482759
2018-07-10 2 NaN NaN 63 20.0 31.746032
2018-07-11 2 NaN NaN 67 20.0 29.850746
2018-07-11 2 NaN NaN 100 20.0 20.000000
如果您的
dfb['cm\u target']
都是40,为什么不干脆dfb['cm\u target\u perc']=cm\u inc/40*100*len(cm\u inc)
instead@iamanigeeit,cm_inc.是pivot_表向量。你说它的数据帧,那么如何调用它的列呢?谢谢你可以做cm_inc.site
或cm_inc['site']
(cm_inc
是一个有一列的数据框)@iamanigeet,它接受但只创建一个有空数据的新列。但我需要绘制两个条形图,不仅仅是这个。
site received sent cm_target cm_inc cm_target_perc
date
2018-07-10 2 NaN NaN 58 20.0 34.482759
2018-07-10 2 NaN NaN 63 20.0 31.746032
2018-07-11 2 NaN NaN 67 20.0 29.850746
2018-07-11 2 NaN NaN 100 20.0 20.000000