Python 如何对数据框中的行求和,使其仅基于月份、日期或年份对值求和。然后形成一份包含所有结果的报告
我正在分析雅虎财经的股票数据,目前我的DataFrame=Df 过滤后仅显示1991年以来3月份的天数。我想知道每年三月的月报表是多少。或任何其他月份组合,即自1991年以来1月至3月的回报率。我也希望按年份细分 我也希望能够做好这一天的工作,也就是说,自1991年以来,苹果的库存在所有的周五都发生了多少变化。这将是另一个示例问题 我正试图让这一点,我可以打印一个实际的纸质副本,打破了所有的一年;就像一份报告 我试过在pandas.pydata.org/上阅读多索引教程和分组,但它非常混乱,我不确定这是否正确 这就是我需要的 这是我当前的代码Python 如何对数据框中的行求和,使其仅基于月份、日期或年份对值求和。然后形成一份包含所有结果的报告,python,pandas,dataframe,report,filtering,Python,Pandas,Dataframe,Report,Filtering,我正在分析雅虎财经的股票数据,目前我的DataFrame=Df 过滤后仅显示1991年以来3月份的天数。我想知道每年三月的月报表是多少。或任何其他月份组合,即自1991年以来1月至3月的回报率。我也希望按年份细分 我也希望能够做好这一天的工作,也就是说,自1991年以来,苹果的库存在所有的周五都发生了多少变化。这将是另一个示例问题 我正试图让这一点,我可以打印一个实际的纸质副本,打破了所有的一年;就像一份报告 我试过在pandas.pydata.org/上阅读多索引教程和分组,但它非常混乱,我不
from pandas_datareader import data as dreader
import pandas as pd
from datetime import datetime
import dateutil.parser
from tkinter import *
# Sets the max rows that can be displayed
# when the program is executed
pd.options.display.max_rows = 120
# df is the name of the dataframe, it is
# reading the csv file containing data loaded
# from yahoo finance(Date,Open,High,Low,Close
# volume,adj close,)the name of the ticker
# is placed before _data.csv i.e. the ticker aapl
# would have a csv file named aapl_data.csv.
df = pd.read_csv("cde_data.csv")
# resets the index back to the pandas default
# i.e. index starts at 0 for the first row and
# 1 for the second and continues by one till the
# end of the data in the above csv file.
df.reset_index()
# the following code will allow for filtering of the datafram
# based on the year, day of week (dow), and month. It then gets
# applied to the dataframe and then can be used to sort data i.e
# print(df[(df.year == 2015) & (df.month == 5) & (df.dow == 4)])
# which will give you all the days in the month of May(df.month == 5),
# that fall on a Thursday(df.dow == 4), in the year 2015
# (df.year == 2015)
#
# Month Dow Year
# January = 1 Monday = 1 The year will be dispaly in a four
# February = 2 Tuesday = 2 digit format i.e. 2015
# March = 3 Wednesday = 3
# April = 4 Thursday = 4
# May = 5 Friday = 5
# June = 6
# July = 7
# August = 8
# September = 9
# October = 10
# November = 11
# December = 12
def year(x):
return(x.year)
def dow(x):
return(x.isoweekday())
def month(x):
return(x.month)
df.Date = df.Date.apply(dateutil.parser.parse)
df['year'] = df.Date.apply(year)
df['dow'] = df.Date.apply(dow)
df['month'] = df.Date.apply(month)
# The code below has a total of five sections all labeled by number.
# They are #1, #2, #3, #4, #5. Number one adds new columns to the df
# and populates them with data, number two filters out all the days
# that the market went down or flat for the day, number three filters
# out all of the days that the market went up or flat, number four
# filters all of the days that the market went up or down, and
# number five drops the excess columns and concats steps #2, #3, & #4.
# 1
# there are five columns that are being added, up_down, up, down,
# flat, and %chg. up, down, and flat are temporary and will be
# deleted later on the other two up_down, and %chg will be permeant.
# The up_down column is derived from taking the 'close' column minus the
# 'open'column, this tells you how much the stock has moved for the day.
# The 'up' column is temporary and has a value of 'up' for all the rows
# of the DataFrame df. The 'down' column is temporary and has a value of
# 'down' for all the rows of the DataFrame df. The 'down' column is
# temporary and has a value of 'flat' for all the rows of the DataFrame
# df. The '%chg' column is calculated by taking the results of the
# 'up_down' divided by the 'close' column, and then times 100, which
# turns it into a percentage show what percent the stock moved up or
# down for the day. All of the columns added below are added to the
# DataFrame called df, which contains a a csv file(see code lines 14-20
# for information on the csv file contained in the DataFrame df).
df['up'] = 'up'
df['down'] = 'down'
df['flat'] = 'flat'
df['up_down'] = df['Close'] - df['Open']
df['%chg'] = ((df['up_down']/df['Close'])*100)
# 2
# df column[up_down] is first filtered on the greater than zero
# criteria from the year 1984 on up and then is turned into df2.
# If the up_down column is greater than zero than this means that
# the stock went up. Next df3 is set = to df2['up'], df3 now holds
# just the days where the asset went up
df2= (df[(df.year > 1984) & (df.up_down > 0)])
df3 = df2['up']
# 3
# df column[up_down] is first filtered on the less than zero
# criteria from the year 1984 on up and then is turned into df4.
# If the up_down column is less than zero than this means that
# the stock went Down. Next df5 is set = to df4['down'], df5 now holds
# just the days where the asset went down
df4= (df[(df.year > 1984) & (df.up_down < 0)])
df5 = df4['down']
# 4
# df column[up_down] is first filtered on the equal to zero
# criteria from the year 1984 on up and then is turned into df6.
# If the up_down column is equal to zero than this means that
# the stock did not move. Next df7 is set = to df6['flat'],df5
# now holds just the days where the asset did not move at all
df6= (df[(df.year > 1984) & (df.up_down == 0)])
df7 = df6['flat']
# 5
# The code below starts by droping the columns 'up', 'down', and 'flat'.
# These were temporary and were used to help filter data in the above
# code in sections two, three, and four. Finally we concat the
# DataFrames df, df3, df5, and df7. We now have new 'up', 'down' and
# 'flat' columns that only display up, down, or flat when the criteria
# is true.
df = df.drop(['up'], axis = 1)
df = df.drop(['down'], axis = 1)
df = df.drop(['flat'], axis = 1)
df = pd.concat([df,df3,df5,df7],axis =1, join_axes=[df.index])
# The difference between the close of current day and the previous day
# non percentage
df['Up_Down'] = df.Close.diff()
# The percentage of change on the Up_Down column
df['%Chg'] = ((df['up_down']/df['Close'])*100)
# How much the current opening price has moved up from the previous
# opening price in terms of percentage,
df['Open%pd'] = df.Open.pct_change()*100
# How much the current high price has moved up from the previous high
# price in terms of percentage.
df['High%pd'] = df.High.pct_change()*100
# How much the current low price has moved up from the previous low
# price in terms of percentage
df['Low%pd'] = df.Low.pct_change()*100
# How much the current close price has moved up from the previous close
# price in terms of percentage
df['Close%pd'] = df.Close.pct_change()*100
# How much the current volume price has moved up from the previous days
# volume in terms of percetage
df['Volume%pd'] = df.Volume.pct_change()*100
# Both columns take the percentage of change from open to high and open
# to low
df['High%fo'] = ((df.High - df.Open)/(df.Open))*100
df['Low%fo'] = ((df.Open - df.Low) / (df.Open))*100
# Takes the difference from the high price and the low price non
# percentage
df['HighLowRange'] = df.High - df.Low
# Measures how much the range the high minus low has changed verses the
# previous day
df['HighLowRange%pd'] = df.HighLowRange.pct_change()*100
# df now is equal to only the months of March and only has the date and
# Close%pd column
df=df[['Date','Close%pd']][(df.month == 3)]
print(df)
如评论中所示,您可以使用groupby查找每月总计:
#change the previous last line of code to this
df=df[['Date','year','month','Close%pd']][(df.month == 3)]
#make a new dataframe
new_df = df.groupby(['year','month']).sum()
另一种方法是使用resample
命令()。这可能是计算每周总数的最佳方法,尤其是因为您没有一个变量指示“一年中的一周”,这是您将传递给groupby的内容
df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals
你听说过groupby吗?是的,我听说过,但是我可以仅仅根据年份对值求和吗?至少我读过和看过的教程也很混乱。groupby(['year','month'])。sum()会按月份和年份给你每月的总数。我得到一个错误键error:'year'
df = df.resample('W', how='sum') #weekly totals
df = df.resample('M', how='sum') #monthly totals