Python 通过按分组计算数据帧中值的差异
我有一个数据帧,格式如下:Python 通过按分组计算数据帧中值的差异,python,pandas,dataframe,diff,Python,Pandas,Dataframe,Diff,我有一个数据帧,格式如下: station num_bikes Rush hour? num_racks hour Botanic 3 yes-am 9 9 Botanic 2 no 10 14 Botanic 10 no 2 20 Queens 6 no 10
station num_bikes Rush hour? num_racks hour
Botanic 3 yes-am 9 9
Botanic 2 no 10 14
Botanic 10 no 2 20
Queens 6 no 10 5
Queens 10 yes-pm 6 18
Queens 12 yes-pm 4 19
Queens 1 no 15 7
num_bikes是该站点可用的自行车数量,num_racks是可用的机架数量。我试图计算每个站点的自行车到达和离开总数,以确定交易总数。我使用的代码会产生以下错误:
ValueError: Wrong number of items passed 0, placement implies 1
代码是:
df_filtered['diff'] = df_filtered.groupby(['Rush hour?', 'station']) [['num_bikes']].diff()
预期产出:
station Rush hour? arrivals departures
Botanic yes-am 0 0
Botanic no 8 0
Queens no 0 5
Queens yes-pm 0 2
我的代码出了什么问题?以下是我尝试过的,如果不正确,请告诉我:
import pandas as pd
import numpy as np
df_filtered = pd.DataFrame([
('Botanic' , 3 , 'yes-am' , 9 , 9),
('Botanic' , 2 , 'no' , 10 , 14),
('Botanic' , 10 , 'no' , 2 , 20),
('Queens' , 6 , 'no' , 10 , 5),
('Queens' , 10 , 'yes-pm' , 6 , 18),
('Queens' , 12 , 'yes-pm' , 4 , 19),
('Queens' , 1 , 'no' , 15 , 7)
])
df_filtered.columns = ['station', 'num_bikes', 'Rush hour?', 'num_racks', 'hour']
df_filtered['diff'] = df_filtered['num_bikes'].diff().fillna(0)
df_filtered['arrivals'] = df_filtered['diff'][df_filtered['diff'] > 0]
df_filtered['departures'] = df_filtered['diff'][df_filtered['diff'] < 0]
df_filtered.drop(columns='diff', inplace=True)
df_filtered[['departures','arrivals']] = df_filtered[['departures','arrivals']].astype(float).fillna(0)
df_filtered.groupby(['Rush hour?', 'station'])[['arrivals','departures','num_bikes']].sum()
将熊猫作为pd导入
将numpy作为np导入
df_filtered=pd.DataFrame([
(‘植物学’,3,‘是的,上午’,9,9),
(“植物学”,2,“no”,10,14),
(《植物学》第10章第2章第20节),
(“皇后”,6,“不”,10,5),
(“皇后区”,10,“是的,下午6点,18点),
(“皇后区”,12,“是的,下午4点,19点),
(“皇后”,1,“不”,15,7)
])
df_filtered.columns=['station','num_bikes','lush hour','num_racks','hour']
df_filtered['diff']=df_filtered['num_'].diff().fillna(0)
df_筛选['arrivals']=df_筛选['diff'][df_筛选['diff']>0]
df_filtered['deparations']=df_filtered['diff'][df_filtered['diff']<0]
df_filtered.drop(columns='diff',inplace=True)
df_筛选的[['出发','arrivals']]=df_筛选的[['出发','arrivals']]。aType(浮点)。fillna(0)
df_filtered.groupby(['lush hour?','station'])[[['arrival','defairs','num_bikes']].sum()
这些groupby结果可能不会保留输入数据帧的原始顺序,因此可能会令人困惑,但这些是到达/离开的净结果,作为行组的快照。以下是我尝试过的结果,如果不正确,请告诉我:
import pandas as pd
import numpy as np
df_filtered = pd.DataFrame([
('Botanic' , 3 , 'yes-am' , 9 , 9),
('Botanic' , 2 , 'no' , 10 , 14),
('Botanic' , 10 , 'no' , 2 , 20),
('Queens' , 6 , 'no' , 10 , 5),
('Queens' , 10 , 'yes-pm' , 6 , 18),
('Queens' , 12 , 'yes-pm' , 4 , 19),
('Queens' , 1 , 'no' , 15 , 7)
])
df_filtered.columns = ['station', 'num_bikes', 'Rush hour?', 'num_racks', 'hour']
df_filtered['diff'] = df_filtered['num_bikes'].diff().fillna(0)
df_filtered['arrivals'] = df_filtered['diff'][df_filtered['diff'] > 0]
df_filtered['departures'] = df_filtered['diff'][df_filtered['diff'] < 0]
df_filtered.drop(columns='diff', inplace=True)
df_filtered[['departures','arrivals']] = df_filtered[['departures','arrivals']].astype(float).fillna(0)
df_filtered.groupby(['Rush hour?', 'station'])[['arrivals','departures','num_bikes']].sum()
将熊猫作为pd导入
将numpy作为np导入
df_filtered=pd.DataFrame([
(‘植物学’,3,‘是的,上午’,9,9),
(“植物学”,2,“no”,10,14),
(《植物学》第10章第2章第20节),
(“皇后”,6,“不”,10,5),
(“皇后区”,10,“是的,下午6点,18点),
(“皇后区”,12,“是的,下午4点,19点),
(“皇后”,1,“不”,15,7)
])
df_filtered.columns=['station','num_bikes','lush hour','num_racks','hour']
df_filtered['diff']=df_filtered['num_'].diff().fillna(0)
df_筛选['arrivals']=df_筛选['diff'][df_筛选['diff']>0]
df_filtered['deparations']=df_filtered['diff'][df_filtered['diff']<0]
df_filtered.drop(columns='diff',inplace=True)
df_筛选的[['出发','arrivals']]=df_筛选的[['出发','arrivals']]。aType(浮点)。fillna(0)
df_filtered.groupby(['lush hour?','station'])[[['arrival','defairs','num_bikes']].sum()
这些groupby结果可能不会保留输入数据帧的原始顺序,因此看起来可能很混乱,但这些是到达/离开的净结果,作为行组的快照。尝试在
num_bikes
周围使用单括号,即groupby(…)['num_bikes'].diff()
当我使用单个括号i get ValueError:没有为对象类型命名的轴您的预期输出与预期输入相同时,是否可以提供预期输出?然后问题解决了,QED。不确定您是否因此而出错,但站!=station
至少在*nix环境中。尝试在num_bikes
周围使用单括号,即,groupby(…)['num_bikes'].diff()
使用单括号时,是否可以提供预期输出i get ValueError:没有为对象类型命名的轴预期输出与预期输入相同?然后问题解决了,QED。不确定您是否因此而出错,但站!=<代码>站
至少在*nix环境中。当我尝试此操作时,得到错误:不支持的操作数类型-:“str”和“str”是否使用了astype(float)
?如果打印df_filtered.dtypes
您的数字列是否列为float/ints或object/str?是的,现在错误状态无法从重复索引中重新索引?@caston1414您使用的是Jupyter笔记本吗?如果是这样,您将需要重新启动内核并重新运行所有代码,或者确保所有代码都可以在没有副作用的情况下多次运行。是的,我正在使用Jupiter笔记本,我会尝试这样做。谢谢您当我尝试这样做时,我得到了错误:不支持的操作数类型为-:'str'和'str'您是否使用了astype(float)
?如果打印df_filtered.dtypes
您的数字列是否列为float/ints或object/str?是的,现在错误状态无法从重复索引中重新索引?@caston1414您使用的是Jupyter笔记本吗?如果是这样,您需要重新启动内核并重新运行所有代码,或者确保所有代码都可以多次运行而不会产生副作用。是的,我正在使用Jupiter笔记本,我会尝试一下,谢谢