Python 通过按分组计算数据帧中值的差异

Python 通过按分组计算数据帧中值的差异,python,pandas,dataframe,diff,Python,Pandas,Dataframe,Diff,我有一个数据帧,格式如下: station num_bikes Rush hour? num_racks hour Botanic 3 yes-am 9 9 Botanic 2 no 10 14 Botanic 10 no 2 20 Queens 6 no 10

我有一个数据帧,格式如下:

  station   num_bikes   Rush hour? num_racks hour
  Botanic   3           yes-am     9         9
  Botanic   2           no         10        14
  Botanic   10          no         2         20
  Queens    6           no         10        5
  Queens    10          yes-pm     6         18
  Queens    12          yes-pm     4         19
  Queens    1           no         15        7
num_bikes是该站点可用的自行车数量,num_racks是可用的机架数量。我试图计算每个站点的自行车到达和离开总数,以确定交易总数。我使用的代码会产生以下错误:

ValueError: Wrong number of items passed 0, placement implies 1
代码是:

df_filtered['diff'] = df_filtered.groupby(['Rush hour?', 'station'])      [['num_bikes']].diff()
预期产出:

  station     Rush hour?  arrivals  departures
  Botanic     yes-am      0         0
  Botanic     no          8         0
  Queens      no          0         5
  Queens      yes-pm      0         2

我的代码出了什么问题?

以下是我尝试过的,如果不正确,请告诉我:

import pandas as pd
import numpy as np

df_filtered = pd.DataFrame([
    ('Botanic' ,  3      ,     'yes-am' ,    9    ,     9),
  ('Botanic'  , 2       ,    'no'        , 10     ,   14),
  ('Botanic'  , 10     ,     'no'        , 2      ,   20),
  ('Queens'   , 6     ,      'no'       ,  10     ,   5),
  ('Queens'   , 10   ,       'yes-pm'   ,  6      ,   18),
  ('Queens'   , 12  ,        'yes-pm'   ,  4      ,   19),
  ('Queens'   , 1  ,         'no'       ,  15     ,   7)
])

df_filtered.columns = ['station',   'num_bikes',   'Rush hour?', 'num_racks', 'hour']

df_filtered['diff'] = df_filtered['num_bikes'].diff().fillna(0)
df_filtered['arrivals'] = df_filtered['diff'][df_filtered['diff'] > 0]
df_filtered['departures'] = df_filtered['diff'][df_filtered['diff'] < 0]
df_filtered.drop(columns='diff', inplace=True)
df_filtered[['departures','arrivals']] = df_filtered[['departures','arrivals']].astype(float).fillna(0)
df_filtered.groupby(['Rush hour?', 'station'])[['arrivals','departures','num_bikes']].sum()
将熊猫作为pd导入
将numpy作为np导入
df_filtered=pd.DataFrame([
(‘植物学’,3,‘是的,上午’,9,9),
(“植物学”,2,“no”,10,14),
(《植物学》第10章第2章第20节),
(“皇后”,6,“不”,10,5),
(“皇后区”,10,“是的,下午6点,18点),
(“皇后区”,12,“是的,下午4点,19点),
(“皇后”,1,“不”,15,7)
])
df_filtered.columns=['station','num_bikes','lush hour','num_racks','hour']
df_filtered['diff']=df_filtered['num_'].diff().fillna(0)
df_筛选['arrivals']=df_筛选['diff'][df_筛选['diff']>0]
df_filtered['deparations']=df_filtered['diff'][df_filtered['diff']<0]
df_filtered.drop(columns='diff',inplace=True)
df_筛选的[['出发','arrivals']]=df_筛选的[['出发','arrivals']]。aType(浮点)。fillna(0)
df_filtered.groupby(['lush hour?','station'])[[['arrival','defairs','num_bikes']].sum()


这些groupby结果可能不会保留输入数据帧的原始顺序,因此可能会令人困惑,但这些是到达/离开的净结果,作为行组的快照。

以下是我尝试过的结果,如果不正确,请告诉我:

import pandas as pd
import numpy as np

df_filtered = pd.DataFrame([
    ('Botanic' ,  3      ,     'yes-am' ,    9    ,     9),
  ('Botanic'  , 2       ,    'no'        , 10     ,   14),
  ('Botanic'  , 10     ,     'no'        , 2      ,   20),
  ('Queens'   , 6     ,      'no'       ,  10     ,   5),
  ('Queens'   , 10   ,       'yes-pm'   ,  6      ,   18),
  ('Queens'   , 12  ,        'yes-pm'   ,  4      ,   19),
  ('Queens'   , 1  ,         'no'       ,  15     ,   7)
])

df_filtered.columns = ['station',   'num_bikes',   'Rush hour?', 'num_racks', 'hour']

df_filtered['diff'] = df_filtered['num_bikes'].diff().fillna(0)
df_filtered['arrivals'] = df_filtered['diff'][df_filtered['diff'] > 0]
df_filtered['departures'] = df_filtered['diff'][df_filtered['diff'] < 0]
df_filtered.drop(columns='diff', inplace=True)
df_filtered[['departures','arrivals']] = df_filtered[['departures','arrivals']].astype(float).fillna(0)
df_filtered.groupby(['Rush hour?', 'station'])[['arrivals','departures','num_bikes']].sum()
将熊猫作为pd导入
将numpy作为np导入
df_filtered=pd.DataFrame([
(‘植物学’,3,‘是的,上午’,9,9),
(“植物学”,2,“no”,10,14),
(《植物学》第10章第2章第20节),
(“皇后”,6,“不”,10,5),
(“皇后区”,10,“是的,下午6点,18点),
(“皇后区”,12,“是的,下午4点,19点),
(“皇后”,1,“不”,15,7)
])
df_filtered.columns=['station','num_bikes','lush hour','num_racks','hour']
df_filtered['diff']=df_filtered['num_'].diff().fillna(0)
df_筛选['arrivals']=df_筛选['diff'][df_筛选['diff']>0]
df_filtered['deparations']=df_filtered['diff'][df_filtered['diff']<0]
df_filtered.drop(columns='diff',inplace=True)
df_筛选的[['出发','arrivals']]=df_筛选的[['出发','arrivals']]。aType(浮点)。fillna(0)
df_filtered.groupby(['lush hour?','station'])[[['arrival','defairs','num_bikes']].sum()



这些groupby结果可能不会保留输入数据帧的原始顺序,因此看起来可能很混乱,但这些是到达/离开的净结果,作为行组的快照。

尝试在
num_bikes
周围使用单括号,即
groupby(…)['num_bikes'].diff()
当我使用单个括号i get ValueError:没有为对象类型命名的轴您的预期输出与预期输入相同时,是否可以提供预期输出?然后问题解决了,QED。不确定您是否因此而出错,但
!=
station
至少在*nix环境中。尝试在
num_bikes
周围使用单括号,即,
groupby(…)['num_bikes'].diff()
使用单括号时,是否可以提供预期输出i get ValueError:没有为对象类型命名的轴预期输出与预期输入相同?然后问题解决了,QED。不确定您是否因此而出错,但
!=<代码>站
至少在*nix环境中。当我尝试此操作时,得到错误:不支持的操作数类型-:“str”和“str”是否使用了
astype(float)
?如果打印
df_filtered.dtypes
您的数字列是否列为float/ints或object/str?是的,现在错误状态无法从重复索引中重新索引?@caston1414您使用的是Jupyter笔记本吗?如果是这样,您将需要重新启动内核并重新运行所有代码,或者确保所有代码都可以在没有副作用的情况下多次运行。是的,我正在使用Jupiter笔记本,我会尝试这样做。谢谢您当我尝试这样做时,我得到了错误:不支持的操作数类型为-:'str'和'str'您是否使用了
astype(float)
?如果打印
df_filtered.dtypes
您的数字列是否列为float/ints或object/str?是的,现在错误状态无法从重复索引中重新索引?@caston1414您使用的是Jupyter笔记本吗?如果是这样,您需要重新启动内核并重新运行所有代码,或者确保所有代码都可以多次运行而不会产生副作用。是的,我正在使用Jupiter笔记本,我会尝试一下,谢谢