Python 在日期上应用行逻辑,同时仅提取数据帧的多列

Python 在日期上应用行逻辑,同时仅提取数据帧的多列,python,python-3.x,pandas,Python,Python 3.x,Pandas,我正在提取pandas中的数据帧,只想提取日期在变量后面的行 我可以分多个步骤完成这项工作,但我想知道是否有可能在一次调用中应用所有逻辑,以获得最佳实践 这是我的密码 import pandas as pd self.min_date = "2020-05-01" #Extract DF from URL self.df = pd.read_html("https://webgate.ec.europa.eu/rasff-wi

我正在提取pandas中的数据帧,只想提取日期在变量后面的行

我可以分多个步骤完成这项工作,但我想知道是否有可能在一次调用中应用所有逻辑,以获得最佳实践

这是我的密码

        import pandas as pd


        self.min_date = "2020-05-01"

        #Extract DF from URL
        self.df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

        #Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
        self.df = self.df.loc[["Date of case" < self.min_date], ["Subject","Reference","Date of case"]]

        return(self.df)

将熊猫作为pd导入
self.min_date=“2020-05-01”
#从URL提取DF
self.df=pd.read_html(“https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]
#这里是错误所在,我想提取列[“主题”、“参考”、“案例日期”],但日期在min_Date之后。
self.df=self.df.loc[[“案件日期”
我不断得到错误:“IndexError:Boolean索引的长度错误:1而不是100”

我无法在网上找到解决方案,因为每个答案都过于具体地针对提问者的情景

e、 g.此解决方案仅适用于调用一列的情况:

非常感谢您的帮助。

替换此:

["Date of case" < self.min_date]
[“案件日期”
为此:

self.df["Date of case"] < self.min_date
self.df[“案件日期”]
即:

self.df = self.df.loc[self.df["Date of case"] < self.min_date, 
                      ["Subject","Reference","Date of case"]]
self.df=self.df.loc[self.df[“案件日期”]
替换此:

["Date of case" < self.min_date]
[“案件日期”
为此:

self.df["Date of case"] < self.min_date
self.df[“案件日期”]
即:

self.df = self.df.loc[self.df["Date of case"] < self.min_date, 
                      ["Subject","Reference","Date of case"]]
self.df=self.df.loc[self.df[“案件日期”]
您有一个轻微的语法问题。 请记住,最好使用pd.to\u datetime将字符串日期转换为pandas datetime对象

min_date = pd.to_datetime("2020-05-01")

#Extract DF from URL
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

#Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
df['Date of case'] = pd.to_datetime(df['Date of case'])
df = df.loc[df["Date of case"] > min_date, ["Subject","Reference","Date of case"]]
输出:

                                             Subject  Reference Date of case
0  Salmonella enterica ser. Enteritidis (presence...  2020.2145   2020-05-22
1  migration of primary aromatic amines (0.4737 m...  2020.2131   2020-05-22
2  celery undeclared on green juice drink from Ge...  2020.2118   2020-05-22
3  aflatoxins (B1 = 29.4 µg/kg - ppb) in shelled ...  2020.2146   2020-05-22
4  too high content of E 200 - sorbic acid (1772 ...  2020.2125   2020-05-22

您有一个轻微的语法问题。 请记住,最好使用pd.to\u datetime将字符串日期转换为pandas datetime对象

min_date = pd.to_datetime("2020-05-01")

#Extract DF from URL
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

#Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
df['Date of case'] = pd.to_datetime(df['Date of case'])
df = df.loc[df["Date of case"] > min_date, ["Subject","Reference","Date of case"]]
输出:

                                             Subject  Reference Date of case
0  Salmonella enterica ser. Enteritidis (presence...  2020.2145   2020-05-22
1  migration of primary aromatic amines (0.4737 m...  2020.2131   2020-05-22
2  celery undeclared on green juice drink from Ge...  2020.2118   2020-05-22
3  aflatoxins (B1 = 29.4 µg/kg - ppb) in shelled ...  2020.2146   2020-05-22
4  too high content of E 200 - sorbic acid (1772 ...  2020.2125   2020-05-22

@TobiasFunke如果你想要在
self.min\u date
之后的日期,你可能应该使用
而不是
是的,我总是把日期弄混。干杯@TobiasFunke如果你想要在
self.min\u date
之后的日期,你可能应该使用
而不是
是的,我总是把日期弄混。干杯非常感谢,这解决了我的应用程序中的另一个错误,我被困了一段时间。干杯非常感谢,这解决了我的应用程序中的另一个错误,我被困了一段时间。干杯