Python 表中的函数参数_Python_Pandas_Scipy_Pivot Table

Python 表中的函数参数

python pandas

Python 表中的函数参数,python,pandas,scipy,pivot-table,Python,Pandas,Scipy,Pivot Table,我有一个熊猫数据框中的风向和风速数据，在10分钟的切片中。看起来是这样的： year month day hour minutes direction speed filename 0 1999.0 1 1 0 0 84.0 7.1 mlrf1c1999.txt 1 1999.0 1 1 0 10 75.0 7.5 mlrf1c1999.

我有一个熊猫数据框中的风向和风速数据，在10分钟的切片中。看起来是这样的：

      year  month  day  hour  minutes  direction  speed        filename
0   1999.0      1    1     0        0       84.0    7.1  mlrf1c1999.txt
1   1999.0      1    1     0       10       75.0    7.5  mlrf1c1999.txt
2   1999.0      1    1     0       20       79.0    7.2  mlrf1c1999.txt
3   1999.0      1    1     0       30       77.0    7.2  mlrf1c1999.txt
4   1999.0      1    1     0       40       76.0    6.7  mlrf1c1999.txt
5   1999.0      1    1     0       50       76.0    7.5  mlrf1c1999.txt
6   1999.0      1    1     1        0       81.0    6.9  mlrf1c1999.txt
7   1999.0      1    1     1       10       75.0    7.3  mlrf1c1999.txt
8   1999.0      1    1     1       20       77.0    7.4  mlrf1c1999.txt
9   1999.0      1    1     1       30       73.0    6.9  mlrf1c1999.txt
10  1999.0      1    1     1       40       78.0    6.5  mlrf1c1999.txt
11  1999.0      1    1     1       50       75.0    7.3  mlrf1c1999.txt
...
1147812  1997.0     12   31    21        0      261.0    6.0  mlrf1c1997.txt
1147813  1997.0     12   31    21       10      260.0    5.9  mlrf1c1997.txt
1147814  1997.0     12   31    21       20      262.0    5.5  mlrf1c1997.txt
1147815  1997.0     12   31    21       30      279.0    6.5  mlrf1c1997.txt
1147816  1997.0     12   31    21       40      283.0    7.3  mlrf1c1997.txt
1147817  1997.0     12   31    21       50      282.0    7.2  mlrf1c1997.txt
1147818  1997.0     12   31    22        0      277.0    6.9  mlrf1c1997.txt
1147819  1997.0     12   31    22       10      283.0    7.6  mlrf1c1997.txt
1147820  1997.0     12   31    22       20      283.0    7.2  mlrf1c1997.txt
1147821  1997.0     12   31    22       30      290.0    7.5  mlrf1c1997.txt
1147822  1997.0     12   31    22       40      289.0    7.2  mlrf1c1997.txt
1147823  1997.0     12   31    22       50      292.0    7.6  mlrf1c1997.txt
1147824  1997.0     12   31    23        0      296.0    7.7  mlrf1c1997.txt

我试图使用数据透视表检查数据，这样我就可以得到每小时切片的平均方向和速度。我需要将Scipy的circmean函数应用于方向数据。这需要为数据集指定高参数和低参数。当我尝试这样做时，我得到一个TypeError:“numpy.float64”对象不可调用

df.pivot_table(values = ['direction'], index = ['day', 'hour'], aggfunc = circmean(df.direction, high=df.direction.max(), low=df.direction.min()))

df.pivot_table(values = ['direction'], index = ['day', 'hour'], aggfunc = circmean(df.direction, high=360, low=0))

据我所知，circmean需要高低参数才能获得准确的输出。当我试图获得风速读数的平均值时，使用np.mean，我没有任何困难：

df.pivot_table(values = ['speed'], index = ['day', 'hour'], aggfunc = np.mean)

这将产生：

             speed
day hour          
1   0     6.085055
    1     6.144919
    2     6.253006
    3     6.315291
    4     6.305656
    5     6.241176
    6     6.205701

我也可以在没有参数的情况下应用circmean函数，如下所示：

df.pivot_table(values = ['direction'], index = ['day', 'hour'], aggfunc = circmean)

当我这样做时，我会得到无法解释的结果（即，它们不是360度的）：

有没有办法在pivot_表的aggfunc参数中应用函数和参数？如果没有，是否有人建议我如何从数据框中获取所需的循环方法？

以下是一些复制您的问题的代码：

import io
import pandas as pd
from scipy.stats import circmean

doc = """      year  month  day  hour  minutes  direction  speed        filename
0   1999.0      1    1     0        0       84.0    7.1  mlrf1c1999.txt
1   1999.0      1    1     0       10       75.0    7.5  mlrf1c1999.txt
2   1999.0      1    1     0       20       79.0    7.2  mlrf1c1999.txt
3   1999.0      1    1     0       30       77.0    7.2  mlrf1c1999.txt
4   1999.0      1    1     0       40       76.0    6.7  mlrf1c1999.txt
5   1999.0      1    1     0       50       76.0    7.5  mlrf1c1999.txt
6   1999.0      1    1     1        0       81.0    6.9  mlrf1c1999.txt
7   1999.0      1    1     1       10       75.0    7.3  mlrf1c1999.txt
8   1999.0      1    1     1       20       77.0    7.4  mlrf1c1999.txt
9   1999.0      1    1     1       30       73.0    6.9  mlrf1c1999.txt
10  1999.0      1    1     1       40       78.0    6.5  mlrf1c1999.txt
11  1999.0      1    1     1       50       75.0    7.3  mlrf1c1999.txt
1147812  1997.0     12   31    21        0      261.0    6.0  mlrf1c1997.txt
1147813  1997.0     12   31    21       10      260.0    5.9  mlrf1c1997.txt
1147814  1997.0     12   31    21       20      262.0    5.5  mlrf1c1997.txt
1147815  1997.0     12   31    21       30      279.0    6.5  mlrf1c1997.txt
1147816  1997.0     12   31    21       40      283.0    7.3  mlrf1c1997.txt
1147817  1997.0     12   31    21       50      282.0    7.2  mlrf1c1997.txt
1147818  1997.0     12   31    22        0      277.0    6.9  mlrf1c1997.txt
1147819  1997.0     12   31    22       10      283.0    7.6  mlrf1c1997.txt
1147820  1997.0     12   31    22       20      283.0    7.2  mlrf1c1997.txt
1147821  1997.0     12   31    22       30      290.0    7.5  mlrf1c1997.txt
1147822  1997.0     12   31    22       40      289.0    7.2  mlrf1c1997.txt
1147823  1997.0     12   31    22       50      292.0    7.6  mlrf1c1997.txt
1147824  1997.0     12   31    23        0      296.0    7.7  mlrf1c1997.txt"""    

df = pd.read_csv(io.StringIO(doc), sep='\s+')

脾气暴躁的注：在一个更好的问题中，上面的代码应该是有问题的，这需要一些不必要的练习和时间来复制答案。有关详细信息，请参阅

有一个警告，但我希望你知道如何处理它（也许你想转换）内弧度的度数

avg

）：

RuntimeWarning:在true\u divide中遇到无效值 ang=（样本-低）2pi/（高-低）

希望有帮助。

您的原始数据是什么，

df

？请提供可运行的代码来复制它。这有帮助吗？原始代码是100多行，它基本上只是加载数据帧并将其缩减为上面的数据。Np.mean和circmean是我第一次尝试在上面运行的计算。它成功了！谢谢你，并为我糟糕的文档表示歉意。我还在学习这个，所以我很感激这个链接。顺便说一句，我没有收到运行时警告。再次感谢你，非常感谢！也许我有一个更老的图书馆，但一定要确保foo做到了它的目的。写一个好的、有重点的问题是很难的——一个好的检查方法是从局外人的角度看问题，并且允许其他人通过运行代码来复制问题。如果合适，请接受答案。

import io
import pandas as pd
from scipy.stats import circmean

doc = """      year  month  day  hour  minutes  direction  speed        filename
0   1999.0      1    1     0        0       84.0    7.1  mlrf1c1999.txt
1   1999.0      1    1     0       10       75.0    7.5  mlrf1c1999.txt
2   1999.0      1    1     0       20       79.0    7.2  mlrf1c1999.txt
3   1999.0      1    1     0       30       77.0    7.2  mlrf1c1999.txt
4   1999.0      1    1     0       40       76.0    6.7  mlrf1c1999.txt
5   1999.0      1    1     0       50       76.0    7.5  mlrf1c1999.txt
6   1999.0      1    1     1        0       81.0    6.9  mlrf1c1999.txt
7   1999.0      1    1     1       10       75.0    7.3  mlrf1c1999.txt
8   1999.0      1    1     1       20       77.0    7.4  mlrf1c1999.txt
9   1999.0      1    1     1       30       73.0    6.9  mlrf1c1999.txt
10  1999.0      1    1     1       40       78.0    6.5  mlrf1c1999.txt
11  1999.0      1    1     1       50       75.0    7.3  mlrf1c1999.txt
1147812  1997.0     12   31    21        0      261.0    6.0  mlrf1c1997.txt
1147813  1997.0     12   31    21       10      260.0    5.9  mlrf1c1997.txt
1147814  1997.0     12   31    21       20      262.0    5.5  mlrf1c1997.txt
1147815  1997.0     12   31    21       30      279.0    6.5  mlrf1c1997.txt
1147816  1997.0     12   31    21       40      283.0    7.3  mlrf1c1997.txt
1147817  1997.0     12   31    21       50      282.0    7.2  mlrf1c1997.txt
1147818  1997.0     12   31    22        0      277.0    6.9  mlrf1c1997.txt
1147819  1997.0     12   31    22       10      283.0    7.6  mlrf1c1997.txt
1147820  1997.0     12   31    22       20      283.0    7.2  mlrf1c1997.txt
1147821  1997.0     12   31    22       30      290.0    7.5  mlrf1c1997.txt
1147822  1997.0     12   31    22       40      289.0    7.2  mlrf1c1997.txt
1147823  1997.0     12   31    22       50      292.0    7.6  mlrf1c1997.txt
1147824  1997.0     12   31    23        0      296.0    7.7  mlrf1c1997.txt"""    

df = pd.read_csv(io.StringIO(doc), sep='\s+')

# Now you need a function accepting an argument for `aggfunc`

def avg(x):
    # x will be a pd.Series, equalling df.direction
    return circmean(x, high=x.max(), low=x.min())
    
# just to learn how it works with 'mean'
df2 = df.pivot_table(values='direction', index=['day', 'hour'], aggfunc = 'mean')

# now putting the desired function
df3 = df.pivot_table(values='direction', index=['day', 'hour'], aggfunc = avg)