Python:如何选择包含一年中特定月份的列?
我有一个如下所示的数据帧:Python:如何选择包含一年中特定月份的列?,python,pandas,dataframe,timestamp,Python,Pandas,Dataframe,Timestamp,我有一个如下所示的数据帧: +------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+ | State_Name | City_Name | ID1 | ID2 | 1/1/2020 | 1/2/2020 | ... | 12/31/2020 | 1/1/2021 | 1/2/2021 | ... | 12/31/2
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
| State_Name | City_Name | ID1 | ID2 | 1/1/2020 | 1/2/2020 | ... | 12/31/2020 | 1/1/2021 | 1/2/2021 | ... | 12/31/2021 |
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
从2020年1月1日到2021年12月31日,我有很多专栏。如何选择包含一年中特定月份的列?例如,如果我希望选择包含2021年7月数据的列,我可以使用名称“7/1/2021”、“7/2/2021”、“7/31/2021”来子集列
非常感谢你的帮助 我建议将所有非datetime列存储到
多索引
,将所有其他列转换为日期时间
:
print (df)
State_Name City_Name ID1 ID2 7/1/2021 1/7/2020 7/05/2021 1/1/2021 \
0 a b s d 7 8 5 6
1/2/2021 12/31/2021
0 3 8
df = df.set_index(['State_Name','City_Name','ID1','ID2'])
df.columns = pd.to_datetime(df.columns)
对于选定的2021年7月,请使用:
或通过以下方式按月份进行比较:
或:
您可以使用
filter
方法:
df.filter(regex='7/\d{1,2}/2021', axis=1)
一个很好的解决方案是使用
melt
将各种日期列转换为单个日期列中的值
例如:
# make dataframe from sample data
data = {
"State_Name": ['state1', 'state2'],
"City_Name": ['city1', 'city2'],
"ID1": ['ID1_A', 'ID1_B'],
"ID2": ['ID2_A', 'ID2_B'],
"1/1/2020": ['dog', 'cat'],
"1/2/2020": ['house', 'mouse']
}
df = pd.DataFrame(data)
# melt date columns into a row
melted_df = df.melt(
id_vars=["State_Name", "City_Name", "ID1", "ID2"],
var_name="date")
df
如下所示:
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
| State_Name | City_Name | ID1 | ID2 | 1/1/2020 | 1/2/2020 | ... | 12/31/2020 | 1/1/2021 | 1/2/2021 | ... | 12/31/2021 |
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
+------------+-----------+-----+-----+----------+----------+-----+------------+----------+----------+-----+------------+
州名
城市名称
ID1
ID2
1/1/2020
1/2/2020
0
状态1
城市1
ID1_A
ID2_A
狗
房子
1.
状态2
城市2
ID1_B
ID2_B
猫
老鼠
1/1/2020
是否为字符串?df.filter(regex='^7')
,但仅适用于字符串。顺便说一句,不是我的反对票。不如:df.columns=df.columns.astype(str)
然后使用filter
?是的,但您将删除名称和Id列…我只是将它们更改为str
类型。我不是吗?
df.filter(regex='7/\d{1,2}/2021', axis=1)
# make dataframe from sample data
data = {
"State_Name": ['state1', 'state2'],
"City_Name": ['city1', 'city2'],
"ID1": ['ID1_A', 'ID1_B'],
"ID2": ['ID2_A', 'ID2_B'],
"1/1/2020": ['dog', 'cat'],
"1/2/2020": ['house', 'mouse']
}
df = pd.DataFrame(data)
# melt date columns into a row
melted_df = df.melt(
id_vars=["State_Name", "City_Name", "ID1", "ID2"],
var_name="date")