Python 在数据帧中生成迭代日期_Python_Pandas_Dataframe_Datetime_Data Science

Python 在数据帧中生成迭代日期

python pandas dataframe datetime

Python 在数据帧中生成迭代日期,python,pandas,dataframe,datetime,data-science,Python,Pandas,Dataframe,Datetime,Data Science,我有一个问题陈述如下：在每个考试中心，考试将分两班进行，第一批和第二批（报考时间为上午9:00和下午2:00）。考试可在2020年12月1日至30日期间的任何一天在一个地区进行，具体取决于该地区的考生人数。注：每个地区只能有一个考试中心，一个班次最多可以有20名学生参加。根据上述信息，通过分配以下内容完成考试数据库：卷号：候选人的卷号将从NL2000001开始（例如：NL2000001、NL2000002、NL2000003……）分拨：通过输入考试城市代码来分配中心 cent\u add

我有一个问题陈述如下：

在每个考试中心，考试将分两班进行，第一批和第二批（报考时间为上午9:00和下午2:00）。考试可在2020年12月1日至30日期间的任何一天在一个地区进行，具体取决于该地区的考生人数。注：每个地区只能有一个考试中心，一个班次最多可以有20名学生参加。根据上述信息，通过分配以下内容完成考试数据库：

卷号：候选人的卷号将从NL2000001开始（例如：NL2000001、NL2000002、NL2000003……）
分拨：通过输入考试城市代码来分配中心
cent\u add：在每个位置将NL“地区名称”作为中心地址（例如，如果地区名称为ADI，则中心地址为NL ADI）
考试日期：在2020年12月12日至12月30日之间分配任何考试日期，保持最少的考试天数，并且不违反上述任何条件
批次：分配批次I或II，确保满足上述所有条件
报告时间：第一批报告时间为上午9点，第二批报告时间为下午2点

根据以上描述，我需要制作一个满足上述条件的表格。我已经制作了Rollno、cent\u alloct和cent\u add列，但我正在努力制作examDate列，因为它应该每40个地区值有相同的日期

以下是各地区及其发生频率的列表：

Dist    Count
WGL     299
MAHB    289
KUN     249
GUN     198
KARN    196
KRS     171
CTT     169
VIZ     150
PRA     145
NALG    130
MED     128
ADI     123
KPM     119
TRI     107
ANA     107
KHAM    85
NEL     85
VIZI    84
EGOD    84
SOA     84
SIR     80
NIZA    73
PUD     70
KRK     69
WGOD    56

以下是数据帧的前25行：

Rollno     cent_allot   cent_add    examDate    batch   rep_time
NL2000001   WGL          NL WGL       NaN        NaN    NaN
NL2000002   WGL          NL WGL       NaN        NaN    NaN
NL2000003   WGL          NL WGL       NaN        NaN    NaN
NL2000004   KUN          NL KUN       NaN        NaN    NaN
NL2000005   KUN          NL KUN       NaN        NaN    NaN
NL2000006   KUN          NL KUN       NaN        NaN    NaN
NL2000007   GUN          NL GUN       NaN        NaN    NaN
NL2000008   GUN          NL GUN       NaN        NaN    NaN
NL2000009   GUN          NL GUN       NaN        NaN    NaN
NL2000010   GUN          NL GUN       NaN        NaN    NaN
NL2000011   VIZ          NL VIZ       NaN        NaN    NaN
NL2000012   VIZ          NL VIZ       NaN        NaN    NaN
NL2000013   VIZ          NL VIZ       NaN        NaN    NaN
NL2000014   VIZ          NL VIZ       NaN        NaN    NaN
NL2000015   MAHB         NL MAHB      NaN        NaN    NaN
NL2000016   MAHB         NL MAHB      NaN        NaN    NaN
NL2000017   MAHB         NL MAHB      NaN        NaN    NaN
NL2000018   WGOD         NL WGOD      NaN        NaN    NaN
NL2000019   WGOD         NL WGOD      NaN        NaN    NaN
NL2000020   WGOD         NL WGOD      NaN        NaN    NaN
NL2000021   WGOD         NL WGOD      NaN        NaN    NaN
NL2000022   EGOD         NL EGOD      NaN        NaN    NaN
NL2000023   EGOD         NL EGOD      NaN        NaN    NaN
NL2000024   EGOD         NL EGOD      NaN        NaN    NaN
NL2000025   EGOD         NL EGOD      NaN        NaN    NaN

最后三列均为NaN，因为这三列尚未制作

让我们以

WGL

为例。如上所述，每个地区每班最多允许20名候选人，这意味着每个地区需要分配40次相同的日期，每个地区需要分配20次相同的批次和相同的报告时间

有人知道怎么做吗？

关键是使用

.groupby（）.cumcount（）

首先获取运行号码。随后，可通过运行编号对40和20的模数分别确定

examDate

和

batch

资料使用每个

Dist

的给定总计数生成随机行

import numpy as np
import pandas as pd
import io
import datetime

df_count = pd.read_csv(io.StringIO("""
Dist    Count
WGL     299
MAHB    289
KUN     249
GUN     198
KARN    196
KRS     171
CTT     169
VIZ     150
PRA     145
NALG    130
MED     128
ADI     123
KPM     119
TRI     107
ANA     107
KHAM    85
NEL     85
VIZI    84
EGOD    84
SOA     84
SIR     80
NIZA    73
PUD     70
KRK     69
WGOD    56
"""), sep=r"\s{2,}", engine="python")

# generate random cent_allot
df = df_count.loc[np.repeat(df_count.index.values, df_count["Count"]), "Dist"]\
    .sample(frac=1)\
    .reset_index(drop=True)\
    .to_frame()\
    .rename(columns={"Dist": "cent_allot"})

df["Rollno"] = df.index.map(lambda s: f"NL2{s+1:06}")
df["cent_add"] = df["cent_allot"].map(lambda s: f"NL {s}")

df

到目前为止，应该与您的经历相似

代码输出

我为找到解决方案苦苦挣扎，但最终在那天结束时，当我问这个问题时，我找到了一个解决方案：

# examDate column

n_stud = 20   # mention the number of students per batch here
n_batch = 2   # mention the number of batches per day here

temp = data['TH_CENT_CH'].value_counts().sort_index().reset_index()  # storing centers and their counts in a temp variable
cent = temp['index'].to_list()      # storing centers in a list
cnt = temp['TH_CENT_CH'].to_list()  # storing counts in a list
cent1 = []
cnt1 = []
j = 0

# for loops to repeat each center by count times
for c in cent:
    for i in range(1, cnt[j] + 1):
        cent1.append(c)
        cnt1.append(i)
    j += 1

df1 = pd.DataFrame(list(zip(cent1, cnt1)), columns = ['cent','cnt'])  # dataframe to store the centers and new count list

counts = df1['cnt'].to_list() # storing the new counts in a list
helper = {}  # helper dictionary
max_no = max(cnt)

# for-while loops to map helper number to each counts number
for i in counts:
    j = 0
    while(j < (round(max_no / (n_stud * n_batch)) + 1)):
        if((i > (n_stud * n_batch * j)) & (i < (n_stud * n_batch * (i + 1)))):
            helper[i] = j
        j += 1

# mapping the helper with counts
counts = pd.Series(counts)
helper = pd.Series(helper)
hel = counts.map(helper).to_list()
df1['helper'] = hel

examDate = {}  # dictionary to store exam dates

# for loop to map dates to each helper number
for i in hel:
    examDate[i] = pd.to_datetime(date(2020, 12, 1) + timedelta(days = (2 * i)))

# mapping the dates with helpers
hel = pd.Series(hel)
examDate = pd.Series(examDate)
exam = hel.map(examDate).to_list()
df1['examDate'] = exam
        
# adding the dates to the original dataframe
examDate = df1['examDate'].to_list()
data['examDate'] = examDate
data['examDate']

请查找其余两列的代码：

# batch column

counts = df1['cnt'].to_list()  # storing the new counts in a list
helper2 = {}  # helper dictionary

# for-while loops to map helper number to each counts number
for i in counts:
    j = 0
    while(j < (round(max_no / (n_stud)) + 1)):
        if((i > (n_stud * j)) & (i < (n_stud * (i + 1)))):
            helper2[i] = j
        j += 1

# mapping the helper with counts
counts = pd.Series(counts)
helper2 = pd.Series(helper2)
hel2 = counts.map(helper2).to_list()
df1['helper2'] = hel2

batch = {}   # dictionary to store batch numbers

# for loop to map batch numbers to each helper number
for i in hel2:
    if(i % 2 == 0):
        batch[i] = 1
    else:
        batch[i] = 2
        
# mapping the batches with helpers
hel2 = pd.Series(hel2)
batch = pd.Series(batch)
bat = hel2.map(batch).to_list()
df1['batch'] = bat

# adding the batches to the original dataframe
batch = df1['batch'].to_list()
data['batch'] = batch
data['batch'].unique()

# rep_time column
data.loc[data['batch'] == 1, 'rep_time'] = '9:00 AM'
data.loc[data['batch'] == 2, 'rep_time'] = '2:00 PM'
data['rep_time'].unique()

#批处理列
counts=df1['cnt']。to_list（）#将新计数存储在列表中
helper2={}#helper字典
#for while循环将帮助器编号映射到每个计数编号
就我而言：
j=0
而（j<（圆形（最大螺栓数量/（n螺栓））+1））：
如果（（i>（n_stud*j））和（i<（n_stud*（i+1）））：
helper2[i]=j
j+=1
#使用计数映射辅助对象
计数=pd系列（计数）
helper2=pd.系列（helper2）
hel2=counts.map（helper2）.to_list（）
df1['helper2']=hel2
batch={}#用于存储批号的字典
#for循环将批次号映射到每个辅助编号
对于hel2中的i：
如果（i%2==0）：
批次[i]=1
其他：
批次[i]=2
#使用帮助程序映射批
hel2=pd系列（hel2）
批次=pd系列（批次）
bat=hel2.map（批处理）.to_list（）
df1['batch']=bat
#将批添加到原始数据帧
批次=df1[“批次”]。至_列表（）
数据['batch']=batch
数据['batch'].unique（）
#重复时间列
data.loc[data['batch']==1，'rep_time']='9:00am'
data.loc[数据['batch']==2，'rep_time']='2:00pm'
数据['rep_time'].unique（）

谢谢您的代码。您的代码看起来简洁而恰当。我没有运行您的代码，因为我自己想出了一个乏味的解决方案，但我仍然非常感谢您的帮助。谢谢你的帮助。。。

# examDate column

n_stud = 20   # mention the number of students per batch here
n_batch = 2   # mention the number of batches per day here

temp = data['TH_CENT_CH'].value_counts().sort_index().reset_index()  # storing centers and their counts in a temp variable
cent = temp['index'].to_list()      # storing centers in a list
cnt = temp['TH_CENT_CH'].to_list()  # storing counts in a list
cent1 = []
cnt1 = []
j = 0

# for loops to repeat each center by count times
for c in cent:
    for i in range(1, cnt[j] + 1):
        cent1.append(c)
        cnt1.append(i)
    j += 1

df1 = pd.DataFrame(list(zip(cent1, cnt1)), columns = ['cent','cnt'])  # dataframe to store the centers and new count list

counts = df1['cnt'].to_list() # storing the new counts in a list
helper = {}  # helper dictionary
max_no = max(cnt)

# for-while loops to map helper number to each counts number
for i in counts:
    j = 0
    while(j < (round(max_no / (n_stud * n_batch)) + 1)):
        if((i > (n_stud * n_batch * j)) & (i < (n_stud * n_batch * (i + 1)))):
            helper[i] = j
        j += 1

# mapping the helper with counts
counts = pd.Series(counts)
helper = pd.Series(helper)
hel = counts.map(helper).to_list()
df1['helper'] = hel

examDate = {}  # dictionary to store exam dates

# for loop to map dates to each helper number
for i in hel:
    examDate[i] = pd.to_datetime(date(2020, 12, 1) + timedelta(days = (2 * i)))

# mapping the dates with helpers
hel = pd.Series(hel)
examDate = pd.Series(examDate)
exam = hel.map(examDate).to_list()
df1['examDate'] = exam
        
# adding the dates to the original dataframe
examDate = df1['examDate'].to_list()
data['examDate'] = examDate
data['examDate']

        Rollno  cent_allot  cent_add  examDate   batch  rep_time
0     NL2000001        ADI   NL ADI  2020-12-01      1  09:00:00
1     NL2000002        ADI   NL ADI  2020-12-01      1  09:00:00
2     NL2000003        ADI   NL ADI  2020-12-01      1  09:00:00
3     NL2000004        ADI   NL ADI  2020-12-01      1  09:00:00
4     NL2000005        ADI   NL ADI  2020-12-01      1  09:00:00
         ...        ...      ...         ...    ...       ...
3345  NL2003346        WGOD  NL WGOD 2020-12-03      1  09:00:00
3346  NL2003347        WGOD  NL WGOD 2020-12-04      1  09:00:00
3347  NL2003348         KRS  NL KRS  2020-12-05      1  09:00:00
3348  NL2003349        WGOD  NL WGOD 2020-12-02      1  09:00:00
3349  NL2003350        WGOD  NL WGOD 2020-12-04      1  09:00:00

# batch column

counts = df1['cnt'].to_list()  # storing the new counts in a list
helper2 = {}  # helper dictionary

# for-while loops to map helper number to each counts number
for i in counts:
    j = 0
    while(j < (round(max_no / (n_stud)) + 1)):
        if((i > (n_stud * j)) & (i < (n_stud * (i + 1)))):
            helper2[i] = j
        j += 1

# mapping the helper with counts
counts = pd.Series(counts)
helper2 = pd.Series(helper2)
hel2 = counts.map(helper2).to_list()
df1['helper2'] = hel2

batch = {}   # dictionary to store batch numbers

# for loop to map batch numbers to each helper number
for i in hel2:
    if(i % 2 == 0):
        batch[i] = 1
    else:
        batch[i] = 2
        
# mapping the batches with helpers
hel2 = pd.Series(hel2)
batch = pd.Series(batch)
bat = hel2.map(batch).to_list()
df1['batch'] = bat

# adding the batches to the original dataframe
batch = df1['batch'].to_list()
data['batch'] = batch
data['batch'].unique()

# rep_time column
data.loc[data['batch'] == 1, 'rep_time'] = '9:00 AM'
data.loc[data['batch'] == 2, 'rep_time'] = '2:00 PM'
data['rep_time'].unique()