Python 数据帧最小/最大范围

Python 数据帧最小/最大范围,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,提前感谢您的帮助!(以下代码)/此处的数据: 我正在尝试向数据框中添加另外两列,它们表示表土列的数据范围,就像20 cm列的mean['maxx20']=maxx['20 cm']和mean['minn20']=minn['20 cm']do一样 我试图通过添加以下内容来实现这一点: mean['topsoilMax']=maxx['Topsoil'] mean['topsoilMin']=minn['Topsoil'] 这并没有像我所希望的那样添加额外的列,而是导致了关键错误:“表层土”,即

提前感谢您的帮助!(以下代码)/此处的数据:

我正在尝试向数据框中添加另外两列,它们表示表土列的数据范围,就像20 cm列的mean['maxx20']=maxx['20 cm']和mean['minn20']=minn['20 cm']do一样

我试图通过添加以下内容来实现这一点:

mean['topsoilMax']=maxx['Topsoil']
mean['topsoilMin']=minn['Topsoil']
这并没有像我所希望的那样添加额外的列,而是导致了关键错误:“表层土”,即使表层土已经是数据框中的一列,就像我添加范围时的20 cm一样

为什么我会出现这个错误?添加这些列的正确方法是什么

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

#Importing data, creating a copy, and assigning it to a variable
raw_data = pd.read_csv('all-deep-soil-temperatures.csv', index_col=1, parse_dates=True)
df_all_stations = raw_data.copy()

#Setting the program to iterate based off of the station of the users choice
selected_soil_station = 'Minot'
df_selected_station = df_all_stations[df_all_stations['Station'] == selected_soil_station]
df_selected_station.fillna(method = 'ffill', inplace=True);

# Indexes the data by day and creates a column that keeps track of the day
df_selected_station_D=df_selected_station.resample(rule='D').mean()
df_selected_station_D['Day'] = df_selected_station_D.index.dayofyear


#Assigning variable so that mean represents df_selected_station_D but indexed by day
mean=df_selected_station_D.groupby(by='Day').mean()
mean['Day']=mean.index

#This inserts a new column named 'Topsoil' at the end that represents the average between 5 cm, 10 cm, and 20 cm
mean['Topsoil']=mean[['5 cm', '10 cm','20 cm']].mean(axis=1)


#Creating the range in which the line graph will fill in 
maxx=df_selected_station_D.groupby(by='Day').max()
minn=df_selected_station_D.groupby(by='Day').min()

mean['maxx20']=maxx['20 cm']
mean['minn20']=minn['20 cm']

如果我理解你的问题,那么我解决问题的方式是

表土=[-2.971686,-2.599278,-2.264897,-2.083117,-1.946969]

最大数量=最大(表土) 最小值=最小值(表土) 打印(最大数量)#这里是表土列表的最大数量 打印(最小值)#这里是表土列表的最小值 打印(最大值-最小值)#这里是表土列表的最大值-最小值


这里的解决方案可能需要将“表土”列添加到maxx和minn数据帧:

maxx['Topsoil']=maxx[['5 cm', '10 cm','20 cm']].max(axis=1)
minn['Topsoil']=minn[['5 cm', '10 cm','20 cm']].min(axis=1)
任务完成后:

mean['topsoilMax']=maxx['Topsoil']
mean['topsoilMin']=minn['Topsoil']

我认为这是从这三个值中选择最小值和最大值,但表层土柱应该是这三个值的平均值。因此,我认为逻辑应该是,新列是这三个列在该日期的平均值的范围。有点像这样(实际上你不是这样编码的哈哈):maxx['Topsoil']=maxx[[Average('5cm','10cm','20cm')]].max(axis=1)也许我不太明白,但为什么不使用
maxx['Topsoil']=意思是['5cm','10cm','20cm'].max(axis=1)
?mean['5 cm'、'10 cm'、'20 cm']应该已经包含了平均值。我算出了,我最后添加了更多表示5 cm和10 cm范围的列,如下所示:mean['maxx05']=maxx['5 cm']mean['minn05']=minn['5 cm']mean['maxx10']=maxx['10 cm']mean['minn10']=minn['10 cm']然后我平均了这三个,我得到了我想要的结果。如果没有你的指导,我是不可能做到的。谢谢!