Python 是否可以为具有2个变量的df创建for循环？_Python_Pandas_For Loop_Variables

Python 是否可以为具有2个变量的df创建for循环？

python pandas for-loop variables

Python 是否可以为具有2个变量的df创建for循环？,python,pandas,for-loop,variables,Python,Pandas,For Loop,Variables,我有一个pandas数据框，其中包含基于不同用户（用户Id列）和日期（日期列/pandas数据对象）的权重（权重列）信息我想计算所有用户最早和最新测量值之间的重量差为了计算最早和最新的测量值，我使用以下函数： earliest_date = [] latest_date = [] for x in Id_list: a = weight_info[weight_info['Id']==x] earliest_date.append(a['date'].min()) l

我有一个pandas数据框，其中包含基于不同用户（用户Id列）和日期（日期列/pandas数据对象）的权重（权重列）信息

我想计算所有用户最早和最新测量值之间的重量差

为了计算最早和最新的测量值，我使用以下函数：

earliest_date = []
latest_date = []
for x in Id_list:
    a = weight_info[weight_info['Id']==x]
    earliest_date.append(a['date'].min())
    latest_date.append(a['date'].max())

然后我想创建一个for循环，以便传入日期和最早日期以获取重量信息，例如：

df = weight_info[(weight_info['date']==x) & (weight_info['Id']==y)]
df['weight']

但我不知道如何使用基于两个变量的for循环来实现这一点。或者有没有更简单的方法来运行整个计算？

您可以尝试使用“pandasql”。此库允许您使用SQL代码操作数据帧中的数据。我发现它对于处理随机csv文件中的数据帧非常有用

import pandasql as psql

df = 'Your_pandas_df'

# Shows the record counts in your dataset
record_count = psql.sqldf('''
SELECT
COUNT(*) as record_count
FROM df''')

使用groupby获取每个用户的最小/最大日期

min_dates = weight_info.groupby('Id').agg({'min':'date'})
max_dates = weight_info.groupby('Id').agg({'max':'date'})

然后加入权重以获得每个用户的最小/最大日期的权重

min_weights = weight_info.merge( min_dates[['Id', 'date']], 
                                 on = ['Id', 'date'], how='inner' )

max_weights = weight_info.merge( max_dates[['Id', 'date']], 
                                 on = ['Id', 'date'], how='inner' )

最后，为同一客户减去这两个

非常感谢，稍后再试！到目前为止还不知道.agg（）函数。几个月前刚学会Python。非常感谢，我的SQL没有那么先进，我会试试的！