Python 如何对数据框中的行求和，使其仅基于月份、日期或年份对值求和。然后形成一份包含所有结果的报告_Python_Pandas_Dataframe_Report_Filtering

Python 如何对数据框中的行求和，使其仅基于月份、日期或年份对值求和。然后形成一份包含所有结果的报告

python pandas dataframe report

Python 如何对数据框中的行求和，使其仅基于月份、日期或年份对值求和。然后形成一份包含所有结果的报告,python,pandas,dataframe,report,filtering,Python,Pandas,Dataframe,Report,Filtering,我正在分析雅虎财经的股票数据，目前我的DataFrame=Df 过滤后仅显示1991年以来3月份的天数。我想知道每年三月的月报表是多少。或任何其他月份组合，即自1991年以来1月至3月的回报率。我也希望按年份细分我也希望能够做好这一天的工作，也就是说，自1991年以来，苹果的库存在所有的周五都发生了多少变化。这将是另一个示例问题我正试图让这一点，我可以打印一个实际的纸质副本，打破了所有的一年；就像一份报告我试过在pandas.pydata.org/上阅读多索引教程和分组，但它非常混乱，我不

我正在分析雅虎财经的股票数据，目前我的DataFrame=Df 过滤后仅显示1991年以来3月份的天数。我想知道每年三月的月报表是多少。或任何其他月份组合，即自1991年以来1月至3月的回报率。我也希望按年份细分

我也希望能够做好这一天的工作，也就是说，自1991年以来，苹果的库存在所有的周五都发生了多少变化。这将是另一个示例问题

我正试图让这一点，我可以打印一个实际的纸质副本，打破了所有的一年；就像一份报告

我试过在pandas.pydata.org/上阅读多索引教程和分组，但它非常混乱，我不确定这是否正确这就是我需要的

这是我当前的代码

from pandas_datareader import data as dreader
import pandas as pd
from datetime import datetime
import dateutil.parser
from tkinter import *


# Sets the max rows that can be displayed
# when the program is executed
pd.options.display.max_rows = 120



# df is the name of the dataframe, it is 
# reading the csv file containing data loaded
# from yahoo finance(Date,Open,High,Low,Close
# volume,adj close,)the name of the ticker
# is placed before _data.csv i.e. the ticker aapl
# would have a csv file named aapl_data.csv.
df = pd.read_csv("cde_data.csv")


# resets the index back to the pandas default
# i.e. index starts at 0 for the first row and
# 1 for the second and continues by one till the
# end of the data in the above csv file. 
df.reset_index()



# the following code will allow for filtering of the datafram
# based on the year, day of week (dow), and month. It then gets
# applied to the dataframe and then can be used to sort data i.e
# print(df[(df.year == 2015) & (df.month == 5) & (df.dow == 4)])
# which will give you all the days in the month of May(df.month == 5), 
# that fall on a Thursday(df.dow == 4), in the year 2015 
# (df.year == 2015)
#
#      Month          Dow                      Year
# January    = 1  Monday    = 1  The year will be dispaly in a four
# February   = 2  Tuesday   = 2  digit format i.e. 2015
# March      = 3  Wednesday = 3
# April      = 4  Thursday  = 4
# May        = 5  Friday    = 5
# June       = 6
# July       = 7
# August     = 8
# September  = 9
# October    = 10
# November   = 11
# December   = 12
def year(x):
    return(x.year)
def dow(x):
    return(x.isoweekday())
def month(x):
    return(x.month)
df.Date            = df.Date.apply(dateutil.parser.parse)
df['year']         = df.Date.apply(year)
df['dow']          = df.Date.apply(dow)
df['month']        = df.Date.apply(month)


# The code below has a total of five sections all labeled by number.
# They are #1, #2, #3, #4, #5. Number one adds new columns to the df
# and populates them with data, number two filters out all the days
# that the market went down or flat for the day, number three filters
# out all of the days that the market went up or flat, number four 
# filters all of the days that  the market went up or down, and
# number five drops the excess columns and concats steps #2, #3, & #4. 


# 1
# there are five columns that are being added, up_down, up, down, 
# flat, and %chg. up, down, and flat are temporary and will be 
# deleted later on the other two up_down, and %chg will be permeant.
# The up_down column is derived from taking the 'close' column minus the
# 'open'column, this tells you how much the stock has moved for the day.
# The 'up' column is temporary and has a value of 'up' for all the rows
# of the DataFrame df. The 'down' column is temporary and has a value of  
# 'down' for all the rows of the DataFrame df. The 'down' column is   
# temporary and has a value of 'flat' for all the rows of the DataFrame 
# df. The '%chg' column is calculated by taking the results of the 
# 'up_down' divided by the 'close' column, and then times 100, which
# turns it into a percentage show what percent the stock moved up or 
# down for the day. All of the columns added below are added to the 
# DataFrame called df, which contains a a csv file(see code lines 14-20
# for information on the csv file contained in the DataFrame df). 

df['up']           = 'up'   
df['down']         = 'down'
df['flat']         = 'flat'
df['up_down']      = df['Close'] - df['Open'] 
df['%chg']         = ((df['up_down']/df['Close'])*100)      


# 2
# df column[up_down] is first filtered on the greater than zero
# criteria from the year 1984 on up and then is turned into df2.
# If the up_down column is greater than zero than this means that 
# the stock went up. Next df3 is set = to df2['up'], df3 now holds 
# just the days where the asset went up  
df2= (df[(df.year > 1984) & (df.up_down > 0)])
df3 = df2['up']



# 3
# df column[up_down] is first filtered on the less than zero
# criteria from the year 1984 on up and then is turned into df4.
# If the up_down column is less than zero than this means that 
# the stock went Down. Next df5 is set = to df4['down'], df5 now holds 
# just the days where the asset went down 
df4= (df[(df.year > 1984) & (df.up_down < 0)])
df5 = df4['down']



# 4
# df column[up_down] is first filtered on the equal to zero
# criteria from the year 1984 on up and then is turned into df6.
# If the up_down column is equal to zero than this means that 
# the stock did not move. Next df7 is set = to df6['flat'],df5 
# now holds just the days where the asset did not move at all 
df6= (df[(df.year > 1984) & (df.up_down == 0)])
df7 = df6['flat']



# 5
# The code below starts by droping the columns 'up', 'down', and 'flat'.
# These were temporary and were used to help filter data in the above
# code in sections two, three, and four. Finally we concat the 
# DataFrames df, df3, df5, and df7. We now have new 'up', 'down' and
# 'flat' columns that only display up, down, or flat when the criteria 
# is true.
df = df.drop(['up'], axis = 1)
df = df.drop(['down'], axis = 1)
df = df.drop(['flat'], axis = 1)
df = pd.concat([df,df3,df5,df7],axis =1, join_axes=[df.index])


# The difference between the close of current day and the previous day
# non percentage
df['Up_Down']          = df.Close.diff()


# The percentage of change on the Up_Down column
df['%Chg']             = ((df['up_down']/df['Close'])*100)

# How much the current opening price has moved up from the previous
# opening price in terms of percentage,
df['Open%pd']          = df.Open.pct_change()*100


# How much the current high price has moved up from the previous high
# price in terms of percentage.
df['High%pd']          = df.High.pct_change()*100


# How much the current low price has moved up from the previous low
# price in terms of percentage
df['Low%pd']           = df.Low.pct_change()*100


# How much the current close price has moved up from the previous close
# price in terms of percentage
df['Close%pd']         = df.Close.pct_change()*100


# How much the current volume price has moved up from the previous days
# volume in terms of percetage
df['Volume%pd']        = df.Volume.pct_change()*100


# Both columns take the percentage of change from open to high and open
# to low
df['High%fo']          = ((df.High - df.Open)/(df.Open))*100
df['Low%fo']           = ((df.Open - df.Low) / (df.Open))*100


# Takes the difference from the high price and the low price non 
# percentage
df['HighLowRange']     = df.High - df.Low


# Measures how much the range the high minus low has changed verses the
# previous day
df['HighLowRange%pd']  = df.HighLowRange.pct_change()*100


# df now is equal to only the months of March and only has the date and
# Close%pd column
df=df[['Date','Close%pd']][(df.month == 3)]


print(df)

如评论中所示，您可以使用groupby查找每月总计：

#change the previous last line of code to this
df=df[['Date','year','month','Close%pd']][(df.month == 3)]

#make a new dataframe
new_df = df.groupby(['year','month']).sum()

另一种方法是使用

resample

命令（）。这可能是计算每周总数的最佳方法，尤其是因为您没有一个变量指示“一年中的一周”，这是您将传递给groupby的内容

df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals

你听说过groupby吗？是的，我听说过，但是我可以仅仅根据年份对值求和吗？至少我读过和看过的教程也很混乱。groupby（['year'，'month']）。sum（）会按月份和年份给你每月的总数。我得到一个错误键error:'year'

df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals