Python 如何按值对多列执行有序选择_Python_Python 3.x_Pandas_Selection

Python 如何按值对多列执行有序选择

python python-3.x pandas

Python 如何按值对多列执行有序选择,python,python-3.x,pandas,selection,Python,Python 3.x,Pandas,Selection,我有一个包含月份和年份列的数据框架。两者都包含字符串，即“九月”和“2013”。如何在一行中选择2013年9月至2008年5月之间的所有行 df1 = stats_month_census_2[(stats_month_census_2['year'] <= '2013') & (stats_month_census_2['year'] >= '2008')] df2 = df1[...] df1=stats\u month\u cens

我有一个包含月份和年份列的数据框架。两者都包含字符串，即“九月”和“2013”。如何在一行中选择2013年9月至2008年5月之间的所有行

df1 = stats_month_census_2[(stats_month_census_2['year'] <= '2013')
                 & (stats_month_census_2['year'] >= '2008')]

df2 = df1[...]

df1=stats\u month\u census\u 2[（stats\u month\u census\u 2['year']='2008'）]
df2=df1[…]

在上面的代码之后，我打算再次做同样的事情，但我很难想出聪明的代码来简单地去除时间高于2013年9月（“10月至12月”）和低于2008年5月的行。我可以很容易地硬编码，但必须有一种更具python风格的方法来实现这一点…

您可以创建

DatetimeIndex

，然后通过以下方式进行选择：

或创建列并用于：

不幸的是，对于所选的中间年份，datetime列的这种方式是不可能的，然后需要

pygo

解决方案和

year

列：

#wrong output
df = stats_month_census_2[stats_month_census_2['date'].between('2008', '2013')]
print (df)

   year  month  data       date
0  2008  April     1 2008-04-01
1  2008    May     3 2008-05-01
2  2008   June     4 2008-06-01

您可以使用以下命令轻松地将列转换为DateTime列

>df
月年
2000年1月0日
2001年4月1日
2002年7月2日
2010年2月3日
2018年2月4日
2014年3月5日
2012年6月6日
2011年6月7日
2009年5月8日
2016年11月9日
>>df['date']=pd.to_datetime（df['month'].astype（str）+'-'+df['year'].astype（str），格式=“%B-%Y”）
>>df
年月日
2000年1月0日2000-01-01
2001年4月1日2001-04-01
2002年7月2日2002-07-01
2010年2月3日2010-02-01
2018年2月4日2018-02-01
2014年3月5日2014-03-01
2012年6月6日2012-06-01
2011年6月7日2011-06-01
2009年5月8日2009-05-01
2016年11月9日2016-11-01
>>df[（df.date=“2008-05”）]
年月日
2010年2月3日2010-02-01
2012年6月6日2012-06-01
2011年6月7日2011-06-01
2009年5月8日2009-05-01

或者，如果您在“选择2013年9月至2008年5月期间的所有行”一文中要求查找2008年至2013年期间的行，您可以在下面进行尝试然后使用：

数据集借用自@jezrael

用于演示的数据帧：

>>> stats_month_census_2
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5
5  2014   November     6
6  2014   December     7

>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"')
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

使用
pandas.Series.between（）

>>> stats_month_census_2[stats_month_census_2['year'].between(2008, 2013, inclusive=True)]
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5

如果只是

datetime

格式的问题，您可以简单地尝试以下内容：

>>> stats_month_census_2[stats_month_census_2['year'].between('2008-05', '2013-09', inclusive=True)]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

使用DataFame.query:

>>> stats_month_census_2
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5
5  2014   November     6
6  2014   December     7

>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"')
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

或者，你也可以像下面这样通过

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05', '2013-09'))]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

使用基于索引开始和结束日期的切片法

Start = stats_month_census_2[stats_month_census_2['year'] =='2008-05'].index[0] End = stats_month_census_2[stats_month_census_2['year']=='2013-09'].index[0] >>> stats_month_census_2.loc[Start:End] year month data 1 2008-05-01 May 3 2 2008-06-01 June 4 3 2013-09-01 September 6
注意：为了满足@jezrael在评论中提出的好奇心，我添加了如何将
year
列转换为日期时间格式：
我们有下面的示例数据框，其中有两个不同的列
year
和
month
，其中year列只有years，month列是文本字符串格式，首先，我们需要使用pandas
pd.to\u datetime
方法将字符串转换为整数形式join或将年和月加在一起，将日期指定为1

df year month data 0 2008 April 1 1 2008 May 3 2 2008 June 4 3 2013 September 6 4 2013 October 5 5 2014 November 6 6 2014 December 7
上面是datetime转换之前的原始数据帧，因此，我采用了下面的方法，这是我在vi-So期间学习的方法
1-首先将
month
名称转换为int形式，并将其分配到名为
month
的新列中，这样我们以后就可以使用它进行转换

df['Month'] = pd.to_datetime(df.month, format='%B').dt.month
2-第二，或者最后，通过直接分配给
year
列本身，将year列直接转换为适当的
datetime
格式，这是一种我们可以说的内置格式

df['Date'] = pd.to_datetime(df[['year', 'Month']].assign(Day=1))
现在，所需的数据帧和
year
列为日期时间形式：
另一个解决方案：
假设df如下所示：

series name Month Year 0 fertility rate May 2008 1 CO2 emissions June 2009 2 fertility rate September 2013 3 fertility rate October 2013 4 CO2 emissions December 2014
创建日历字典映射并保存在新列中

import calendar d = dict((v,k) for k,v in enumerate(calendar.month_abbr)) stats_month_census_2['month_int'] = stats_month_census_2.Month.apply(lambda x: x[:3]).map(d) >>stats_month_census_2 series name Month Year month_int 0 fertility rate May 2008 5 1 CO2 emissions June 2009 6 2 fertility rate September 2013 9 3 fertility rate October 2013 10 4 CO2 emissions December 2014 12
使用
series.between（）进行筛选
输出：

series name Month Year month_int 0 fertility rate May 2008 5 1 CO2 emissions June 2009 6 2 fertility rate September 2013 9

“一排”是什么意思？请在输入和预期输出中加入一个小例子。能够发布几行数据帧，它看起来如何？这
df[df['year'].between（2008，2013，inclusive=True）]
有意义吗？年份列如何填充到日期时间？@jezrael，我不在我的甲板上，我用日期时间格式的方式考虑它，为了您的好奇，我将添加我知道的将年份列转换为日期时间的方法@耶斯雷尔，添加。。请查收。
print(df) year month data Month 0 2008-04-01 April 1 4 1 2008-05-01 May 3 5 2 2008-06-01 June 4 6 3 2013-09-01 September 6 9 4 2013-10-01 October 5 10 5 2014-11-01 November 6 11 6 2014-12-01 December 7 12

series name Month Year 0 fertility rate May 2008 1 CO2 emissions June 2009 2 fertility rate September 2013 3 fertility rate October 2013 4 CO2 emissions December 2014

import calendar d = dict((v,k) for k,v in enumerate(calendar.month_abbr)) stats_month_census_2['month_int'] = stats_month_census_2.Month.apply(lambda x: x[:3]).map(d) >>stats_month_census_2 series name Month Year month_int 0 fertility rate May 2008 5 1 CO2 emissions June 2009 6 2 fertility rate September 2013 9 3 fertility rate October 2013 10 4 CO2 emissions December 2014 12

stats_month_census_2[stats_month_census_2.month_int.between(5,9,inclusive=True) & stats_month_census_2.Year.between(2008,2013,inclusive=True)]

series name Month Year month_int 0 fertility rate May 2008 5 1 CO2 emissions June 2009 6 2 fertility rate September 2013 9