Python 3.x Python3基于其他列的变量创建列
我有一个数据集,包含一周中的年、月和日。但是,它缺少当月的实际日期(即从第1天到第30天)。数据集如下所示:Python 3.x Python3基于其他列的变量创建列,python-3.x,pandas,numpy,dataframe,calculated-columns,Python 3.x,Pandas,Numpy,Dataframe,Calculated Columns,我有一个数据集,包含一周中的年、月和日。但是,它缺少当月的实际日期(即从第1天到第30天)。数据集如下所示: # Year Month Day_Of_Week 22024 2002 January Tuesday 22101 2002 January Wednesday 22146 2002 January Thursday 22201 2002 January Friday 22247 2002 January Saturday
# Year Month Day_Of_Week
22024 2002 January Tuesday
22101 2002 January Wednesday
22146 2002 January Thursday
22201 2002 January Friday
22247 2002 January Saturday
22280 2002 January Sunday
22335 2002 January Monday
22383 2002 January Tuesday
22384 2002 January Wednesday
22424 2002 January Thursday
22459 2002 January Friday
22511 2002 January Saturday
22598 2002 January Sunday
22599 2002 January Monday
22686 2002 January Tuesday
22687 2002 January Wednesday
22688 2002 January Wednesday
22689 2002 January Wednesday
22761 2002 January Wednesday
22762 2002 January Wednesday
22763 2002 January Wednesday
22764 2002 January Wednesday
22765 2002 January Thursday
22766 2002 January Thursday
22767 2002 January Thursday
22768 2002 January Thursday
22814 2002 January Friday
22815 2002 January Friday
22816 2002 January Friday
22817 2002 January Friday
22818 2002 January Friday
找到这一天的逻辑很简单。表中的第一条记录是第1天的。第二个记录是第2天,每当“周中的天”从上一个记录更改时,我们都会增加天数。
当月份是“一月”时,我们计算31天,“二月”则计算28天,以此类推
使用熊猫,我想创建一个名为“Crash_Day”的新专栏。如何迭代记录并按照上面的逻辑在新列中填充记录
如何构造for循环来读取每列的记录并相应地填充新列
这是到目前为止我的代码
import pandas as pd
crash_data = pd.read_csv('data.csv')
print('Length: {} rows.'.format(len(crash_data)))
print(crash_data.head())
如果有人有兴趣查看数据,请访问以下链接:
如果所有日期都是连续的,并且它们之间没有缺失,则可以使用lambda函数对每个连续值的开始使用(
!=
)比较ed值,然后用于计数器
:
df['day'] = (df.groupby(['Year','Month'])['Day_Of_Week']
.transform(lambda x: x.ne(x.shift()).cumsum()))
替代解决方案:
s = df['Day_Of_Week'].ne(df['Day_Of_Week'].shift())
df['day'] = s.groupby([df['Year'],df['Month']]).cumsum().astype(int)
我要做的是把你的“每周一天”专栏拿出来,和它自己比较一下,但换了一个。然后,无论什么地方不同,都是它在几天内发生变化。如果你在不同的地方写一个新的列,如1和0,那么你可以得到该列的累计和,这将计算所有的天数(不断增加)。然后,您可以找出如何将该列(如[0,0,0,1,1,…,31,31,32,32,33,…)转换为在月份正确包装(实际上,可以对月份执行完全相同的操作来重置计数器…)而不是说上述方法特别有效。我只是对《熊猫》中的约会内容做得还不够好,无法给出更好的答案:)非常感谢你来拜访亚历山大。下面的代码成功了;)你真了不起,耶斯雷尔,非常感谢你的帮助
print (df)
Year Month Day_Of_Week day
22024 2002 January Tuesday 1
22101 2002 January Wednesday 2
22146 2002 January Thursday 3
22201 2002 January Friday 4
22247 2002 January Saturday 5
22280 2002 January Sunday 6
22335 2002 January Monday 7
22383 2002 January Tuesday 8
22384 2002 January Wednesday 9
22424 2002 January Thursday 10
22459 2002 January Friday 11
22511 2002 January Saturday 12
22598 2002 January Sunday 13
22599 2002 January Monday 14
22686 2002 January Tuesday 15
22687 2002 January Wednesday 16
22688 2002 January Wednesday 16
22689 2002 January Wednesday 16
22761 2002 January Wednesday 16
22762 2002 January Wednesday 16
22763 2002 January Wednesday 16
22764 2002 January Wednesday 16
22765 2002 January Thursday 17
22766 2002 January Thursday 17
22767 2002 January Thursday 17
22768 2002 January Thursday 17
22814 2002 January Friday 18
22815 2002 January Friday 18
22816 2002 January Friday 18
22817 2002 January Friday 18
22818 2002 January Friday 18
22817 2002 February Wednesday 1
22818 2002 February Wednesday 1