Python 嵌套for循环，使用值创建列_Python_Pandas_For Loop

Python 嵌套for循环，使用值创建列

python pandas for-loop

Python 嵌套for循环，使用值创建列,python,pandas,for-loop,Python,Pandas,For Loop,我对python编程相当陌生。我将一个csv文件读取到一个数据框中，每个月的房价中值作为列。现在我想创建列来获得每个季度的平均值。e、 g.创建柱状外壳['2000q1']作为2000-01、2000-02和2000-03的平均值，创建柱状外壳['2000q2']作为2000-042000-05、2000-06的平均值。。。我尝试使用嵌套for循环，如下所示，但总是出现错误 for i in range (2000,2017): for j in range (1,5):

我对python编程相当陌生。我将一个csv文件读取到一个数据框中，每个月的房价中值作为列。现在我想创建列来获得每个季度的平均值。e、 g.创建柱状外壳['2000q1']作为2000-01、2000-02和2000-03的平均值，创建柱状外壳['2000q2']作为2000-042000-05、2000-06的平均值。。。

我尝试使用嵌套for循环，如下所示，但总是出现错误

for i in range (2000,2017):
      for j in range (1,5):
            Housing[i 'q' j] = Housing[[i'-'j*3-2, i'-'j*3-1, i'_'j*3]].mean(axis=1)

谢谢大家!

通常，我们处理的数据行是时间，因此最好也这样做，并通过从

df=Housing.set_index（'CountyName'）.T开始转换数据（此外，变量名通常应以小写字母开头，但这在这里并不重要）
由于您的数据格式已经很好，因此有一个实用的解决方案（即您不需要太多地了解datetime
对象和方法），首先是df=Housing.set_index（'CountyName'）。T
：
df.reset_index(inplace = True) # This moves the dates to a column named 'index'
df.rename(columns = {'index':'quarter'}, inplace = True) # Rename this column into something more meaningful
# Rename the months into the appropriate quarters
df.quarter.str.replace('-01|-02|-03', 'q1', inplace = True)
df.quarter.str.replace('-04|-05|-06', 'q2', inplace = True)
df.quarter.str.replace('-07|-08|-09', 'q3', inplace = True)
df.quarter.str.replace('-10|-11|-12', 'q4', inplace = True)
df.drop('SizeRank', inplace = True) # To avoid including this in the calculation of means
c = df.notnull().sum(axis = 1) # Count the number of non-empty entries
df['total'] = df.sum(axis = 1) # The totals on each month
df['c'] = c # only ssign c after computing the total, so it doesn't intefere with the total column
g = df.groupby('quarter')[['total','c']].sum()
g['q_mean'] = g['total']/g['c']
g

g['q_-mean']
或g['q_-mean']]
应该会给你所需的答案
请注意，我们需要手动计算平均值，因为您缺少数据；否则，df.groupby（'quarter'）.mean（）.mean（）
会立即为您提供所需的答案
备注：技术上“正确”的方法是将日期转换为类似datetime
的对象（可以使用pd.to\u datetime（）
方法），然后使用pd.TimeGrouper（）
参数运行groupby；如果您打算大量使用时间索引数据，这当然值得进一步了解。
您可以使用pandas重采样函数以非常简单的方式计算季度平均值
重新采样：
偏移量名称摘要：
为了使用这个函数，您只需要将时间作为列，所以您应该临时将CountryName和SizeRank设置为索引
代码：
感谢@jezrael在重采样中建议axis=1
感谢格式@Woody PrideI建议进行换位（因此城市是列，行是日期），然后执行如下操作：df.groupby（pd.TimeGrouper（freq='3M'）））.sum（）。注意：请确保您的日期类型为datetime。此外，如果您可以将示例数据作为纯文本（可以复制和粘贴）而不是图像包含，则效果会更好。它使其他人更容易阅读您的数据并测试答案。转置不是必需的，只需要axis=1
-quartlyaverage=Housing.resample（'Q'，axis=1.mean（））谢谢您的评论，我已经编辑了我的答案。在不了解更多数据/应用程序的情况下很难说，但我认为基本情况是有时间和索引，你不同意吗？如果有疑问，似乎第一列和第二列必须先设置为索引，然后重新采样和最后重置索引。
QuarterlyAverage = Housing.set_index(['CountryName', 'SizeRank'], append = True)\
                          .resample('Q', axis = 1).mean()\
                          .reset_index(['CountryName', 'SizeRank'], drop = False)