Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/338.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从大数据帧中删除特定的日期时间戳_Python_Pandas_Dataframe - Fatal编程技术网

Python 如何从大数据帧中删除特定的日期时间戳

Python 如何从大数据帧中删除特定的日期时间戳,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个由600天的数据组成的大数据框架。每天有100个时间戳。我有一个30天的单独列表,我想从中获取数据。如何从数据框中删除这30天的数据? 我尝试了一个for循环,但没有成功。我知道有一个简单的方法。但我不知道如何实现它 df #is main dataframe which has many columns and rows. Index is a timestamp. df['dates'] = df.index.strftime('%Y-%m-%d') # date part of

我有一个由600天的数据组成的大数据框架。每天有100个时间戳。我有一个30天的单独列表,我想从中获取数据。如何从数据框中删除这30天的数据? 我尝试了一个for循环,但没有成功。我知道有一个简单的方法。但我不知道如何实现它

df #is main dataframe which has many columns and rows. Index is a timestamp. 

df['dates'] = df.index.strftime('%Y-%m-%d') # date part of timestamp is sliced and  
#a new column is created. Instead of index, I want to use this column for comparing with bad list. 
bad_list # it is a list of bad dates   
for i in range(0,len(df)):
    for j in range(0,len(bad_list)):
        if str(df['dates'][i])== bad_list[j]:
            df.drop(df[i].index,inplace=True)

您可以执行以下操作

df['dates'] = df.index.strftime('%Y-%m-%d') 
#badlist should be in date format too. 

newdf = df[~df['dates'].isin(badlist)]
# the ~ is used to denote "not in" the list.

#if Jan 1, 2000 is a bad date, it should be in the list as datetime(2000,1,1)

您可以执行简单的比较:

>>> dates = pd.Series(pd.to_datetime(np.random.randint(int(time()) - 60 * 60 * 24 * 5, int(time()), 12), unit='s'))
>>> dates
0    2019-03-19 05:25:32
1    2019-03-20 00:58:29
2    2019-03-19 01:03:36
3    2019-03-22 11:45:24
4    2019-03-19 08:14:29
5    2019-03-21 10:17:13
6    2019-03-18 09:09:15
7    2019-03-20 00:14:16
8    2019-03-21 19:47:02
9    2019-03-23 06:19:35
10   2019-03-23 05:42:34
11   2019-03-21 11:37:46

>>> start_date = pd.to_datetime('2019-03-20')
>>> end_date = pd.to_datetime('2019-03-22')
>>> dates[(dates > start_date) & (dates < end_date)]
1    2019-03-20 00:58:29
5    2019-03-21 10:17:13
7    2019-03-20 00:14:16
8    2019-03-21 19:47:02
11   2019-03-21 11:37:46
>>dates=pd.Series(pd.to_datetime(np.random.randint(int(time())-60*60*24*5,int(time()),12),单位=s'))
>>>日期
0    2019-03-19 05:25:32
1    2019-03-20 00:58:29
2    2019-03-19 01:03:36
3    2019-03-22 11:45:24
4    2019-03-19 08:14:29
5    2019-03-21 10:17:13
6    2019-03-18 09:09:15
7    2019-03-20 00:14:16
8    2019-03-21 19:47:02
9    2019-03-23 06:19:35
10   2019-03-23 05:42:34
11   2019-03-21 11:37:46
>>>开始日期=pd.至日期时间('2019-03-20')
>>>结束日期=pd.至日期时间('2019-03-22')
>>>日期[(日期>开始日期)和(日期<结束日期)]
1    2019-03-20 00:58:29
5    2019-03-21 10:17:13
7    2019-03-20 00:14:16
8    2019-03-21 19:47:02
11   2019-03-21 11:37:46
如果源
系列
不是
datetime
格式,则需要使用
pd.to\u datetime
对其进行转换