Python 按3列的笛卡尔乘积组织行_Python_Pandas_Permutation_Itertools_Cartesian Product

Python 按3列的笛卡尔乘积组织行

python pandas

Python 按3列的笛卡尔乘积组织行,python,pandas,permutation,itertools,cartesian-product,Python,Pandas,Permutation,Itertools,Cartesian Product,我有3个列表，如以下可复制示例所示： year = [2015, 2016, 2017] month = [1, 2] ids = ['x', 'y', 'z', 'w'] 我想创建的是一个非常简单的任务，即创建一个最终的数据帧，其中3列将其行排序为列值的置换或笛卡尔积比如：最后，我想添加一个“Epoque”列，其中引用为：2014年12月等于“1”，2015年1月等于“2”，2015年2月等于“3”，依此类推（序列将继续初始引用Dec-2014=“1”（对于“Epoque”值））

我有3个列表，如以下可复制示例所示：

year = [2015, 2016, 2017] 
month = [1, 2] 
ids = ['x', 'y', 'z', 'w']

我想创建的是一个非常简单的任务，即创建一个最终的数据帧，其中3列将其行排序为列值的置换或笛卡尔积

比如：

最后，我想添加一个“Epoque”列，其中引用为：2014年12月等于“1”，2015年1月等于“2”，2015年2月等于“3”，依此类推（序列将继续初始引用Dec-2014=“1”（对于“Epoque”值））

最终所需的输出将具有以下外观：

编辑：

问题编辑感谢@jezrael的大量反馈。他为我提供了缺少的行以实现所需的df，但只缺少了“Epoque”列

我的代码建议如下（缺少所需的“Epoque”列）：

关于如何有效地实现“Epoque”专栏的任何帮助，我将不胜感激。谢谢。

使用由创建的字典，使用开始和结束日期定义

date

s：

import itertools
s = [ [ 2015, 2016, 2017], [1, 2], ['x', 'y', 'z', 'w'] ]
z = list(itertools.product(*s))

a = 'Dec-2014'
b = 'Dec-2018'
r = pd.date_range(a, b, freq='MS')
d = dict(zip(r, range(1, len(r) + 1)))

df = pd.DataFrame(z, columns=['year','month','id'])
df['epoch'] = pd.to_datetime(df[['year','month']].assign(day=1)).map(d)

您可以使用Pandas

datetime

：

df = pd.DataFrame(z, columns=['year', 'month', 'id'])

base = pd.Timestamp('2014-12-01')
dates = pd.to_datetime(df[['year', 'month']].assign(day=1))

df['epoch'] = dates.dt.to_period('M') - base.to_period('M') + 1

# alternative
df['epoch'] = (dates.dt.year - base.year)*12 + (dates.dt.month - base.month) + 1

print(df)

    year  month id  epoch
0   2015      1  x      2
1   2015      1  y      2
2   2015      1  z      2
3   2015      1  w      2
4   2015      2  x      3
5   2015      2  y      3
...
18  2017      1  z     26
19  2017      1  w     26
20  2017      2  x     27
21  2017      2  y     27
22  2017      2  z     27
23  2017      2  w     27

一种解决方案是使用多个for循环遍历所有变量

#Set the start date of your epoch (Here november 2014 is epoch 0)
month_0 = 11
year_0 = 2014
year_col = []
month_col = []
id_col = []
epoch_col = []
for j1 in ids:
    for j2 in month:
        for j3 in year:
            year_col.append(j3)
            month_col.append(j2)
            id_col.append(j1)
            epoch = (j3-year_0)*12 +(j2-month_0)
            epoch_col.append(epoch)
df = pd.DataFrame({'year':year_col,'month':month_col,'id':id_col,'epoch':epoch_col})

您想从这3个列表中创建数据帧，还是从您在文章末尾提到的元组列表中创建数据帧？最终所需输出的图像与您将从

列表（itertools.product（*s））

中获得的排序不匹配。请检查编辑的答案，是否可以定义月-年的最后一个最大值？@StatisticDean我想说的是我在文章开头定义的第一个列表。我注意到我以前犯了一个错误：我将字符列表x、y、z、w定义为对象（列表中没有引号）。我现在已经改变了这一点，所以它是一个字符值列表：ids=['x'，'y'，'z'，'w']。在我的Jupyter笔记本中使用了以下有用的代码行：df['timestamp']=pd.to_datetime（df['timestamp']，format=“%Y-%m-%d”）+MonthEnd（1）。谢谢你的建议。在尝试了几个不同时间戳的例子之后，我发现你的答案是最简单、最容易应用的。因此，我认为你的答案是正确的。谢谢

df = pd.DataFrame(z, columns=['year', 'month', 'id'])

base = pd.Timestamp('2014-12-01')
dates = pd.to_datetime(df[['year', 'month']].assign(day=1))

df['epoch'] = dates.dt.to_period('M') - base.to_period('M') + 1

# alternative
df['epoch'] = (dates.dt.year - base.year)*12 + (dates.dt.month - base.month) + 1

print(df)

    year  month id  epoch
0   2015      1  x      2
1   2015      1  y      2
2   2015      1  z      2
3   2015      1  w      2
4   2015      2  x      3
5   2015      2  y      3
...
18  2017      1  z     26
19  2017      1  w     26
20  2017      2  x     27
21  2017      2  y     27
22  2017      2  z     27
23  2017      2  w     27

#Set the start date of your epoch (Here november 2014 is epoch 0)
month_0 = 11
year_0 = 2014
year_col = []
month_col = []
id_col = []
epoch_col = []
for j1 in ids:
    for j2 in month:
        for j3 in year:
            year_col.append(j3)
            month_col.append(j2)
            id_col.append(j1)
            epoch = (j3-year_0)*12 +(j2-month_0)
            epoch_col.append(epoch)
df = pd.DataFrame({'year':year_col,'month':month_col,'id':id_col,'epoch':epoch_col})