Python 在日期上应用行逻辑，同时仅提取数据帧的多列_Python_Python 3.x_Pandas

Python 在日期上应用行逻辑，同时仅提取数据帧的多列

python python-3.x pandas

Python 在日期上应用行逻辑，同时仅提取数据帧的多列,python,python-3.x,pandas,Python,Python 3.x,Pandas,我正在提取pandas中的数据帧，只想提取日期在变量后面的行我可以分多个步骤完成这项工作，但我想知道是否有可能在一次调用中应用所有逻辑，以获得最佳实践这是我的密码 import pandas as pd self.min_date = "2020-05-01" #Extract DF from URL self.df = pd.read_html("https://webgate.ec.europa.eu/rasff-wi

我正在提取pandas中的数据帧，只想提取日期在变量后面的行

我可以分多个步骤完成这项工作，但我想知道是否有可能在一次调用中应用所有逻辑，以获得最佳实践

这是我的密码

        import pandas as pd


        self.min_date = "2020-05-01"

        #Extract DF from URL
        self.df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

        #Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
        self.df = self.df.loc[["Date of case" < self.min_date], ["Subject","Reference","Date of case"]]

        return(self.df)

将熊猫作为pd导入
self.min_date=“2020-05-01”
#从URL提取DF
self.df=pd.read_html（“https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]
#这里是错误所在，我想提取列[“主题”、“参考”、“案例日期”]，但日期在min_Date之后。
self.df=self.df.loc[[“案件日期”


我不断得到错误：“IndexError:Boolean索引的长度错误：1而不是100”
我无法在网上找到解决方案，因为每个答案都过于具体地针对提问者的情景
e、 g.此解决方案仅适用于调用一列的情况：
非常感谢您的帮助。
替换此：
["Date of case" < self.min_date]

[“案件日期”

为此：
self.df["Date of case"] < self.min_date

self.df[“案件日期”]

即:
self.df = self.df.loc[self.df["Date of case"] < self.min_date, 
                      ["Subject","Reference","Date of case"]]

self.df=self.df.loc[self.df[“案件日期”]
替换此：
["Date of case" < self.min_date]

[“案件日期”

为此：
self.df["Date of case"] < self.min_date

self.df[“案件日期”]

即:
self.df = self.df.loc[self.df["Date of case"] < self.min_date, 
                      ["Subject","Reference","Date of case"]]

self.df=self.df.loc[self.df[“案件日期”]您有一个轻微的语法问题。
请记住，最好使用pd.to\u datetime将字符串日期转换为pandas datetime对象
min_date = pd.to_datetime("2020-05-01")

#Extract DF from URL
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

#Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
df['Date of case'] = pd.to_datetime(df['Date of case'])
df = df.loc[df["Date of case"] > min_date, ["Subject","Reference","Date of case"]]

输出：
                                             Subject  Reference Date of case
0  Salmonella enterica ser. Enteritidis (presence...  2020.2145   2020-05-22
1  migration of primary aromatic amines (0.4737 m...  2020.2131   2020-05-22
2  celery undeclared on green juice drink from Ge...  2020.2118   2020-05-22
3  aflatoxins (B1 = 29.4 µg/kg - ppb) in shelled ...  2020.2146   2020-05-22
4  too high content of E 200 - sorbic acid (1772 ...  2020.2125   2020-05-22

您有一个轻微的语法问题。
请记住，最好使用pd.to\u datetime将字符串日期转换为pandas datetime对象
min_date = pd.to_datetime("2020-05-01")

#Extract DF from URL
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

#Here is where the error lies, I want to extract the columns ["Subject","Reference","Date of case"] but where the date is after min_date.
df['Date of case'] = pd.to_datetime(df['Date of case'])
df = df.loc[df["Date of case"] > min_date, ["Subject","Reference","Date of case"]]

输出：
                                             Subject  Reference Date of case
0  Salmonella enterica ser. Enteritidis (presence...  2020.2145   2020-05-22
1  migration of primary aromatic amines (0.4737 m...  2020.2131   2020-05-22
2  celery undeclared on green juice drink from Ge...  2020.2118   2020-05-22
3  aflatoxins (B1 = 29.4 µg/kg - ppb) in shelled ...  2020.2146   2020-05-22
4  too high content of E 200 - sorbic acid (1772 ...  2020.2125   2020-05-22

@TobiasFunke如果你想要在self.min\u date
之后的日期，你可能应该使用
而不是是的，我总是把日期弄混。干杯@TobiasFunke如果你想要在self.min\u date
之后的日期，你可能应该使用
而不是是的，我总是把日期弄混。干杯非常感谢，这解决了我的应用程序中的另一个错误，我被困了一段时间。干杯非常感谢，这解决了我的应用程序中的另一个错误，我被困了一段时间。干杯