Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何基于给定的列循环数据帧中的所有行,然后生成结果字典?_Python_Date_Pandas_Dataframe - Fatal编程技术网

Python 如何基于给定的列循环数据帧中的所有行,然后生成结果字典?

Python 如何基于给定的列循环数据帧中的所有行,然后生成结果字典?,python,date,pandas,dataframe,Python,Date,Pandas,Dataframe,我有一个pandas.DataFrame如下: 0 Main_1 Main_2 Date1 Date2 .... Date99 1 1994-11-05 1997-11-07 1993-11-07 1994-11-07 2002-11-07 2 1994-1-07 1997-11-07 1993-11-07 1999-11-07 2002-10-07

我有一个
pandas.DataFrame
如下:

0      Main_1      Main_2        Date1       Date2      ....     Date99  
1      1994-11-05  1997-11-07    1993-11-07  1994-11-07          2002-11-07
2      1994-1-07   1997-11-07    1993-11-07  1999-11-07          2002-10-07
3      1994-8-09   1997-11-07    1999-11-07  2000-11-07          2003-11-07
.      .           .             .           .                   .
.      .           .             .           .                   .
30,000 .           .             .           .                   .
Main_1:带有日期的列

Main_2:另一个日期列

Date1到Date99:99个日期列。这99列有不同的日期,它们是连续的。也就是说,每行的日期1早于日期2,日期2早于日期3,依此类推。Date99表示行的最新日期

我试图为每一行创建一个循环,用于检查:

  • Date1
    Date99
    的列,查看值日期是否介于(或等于)
    Main_1
    Main_2
    事件之间
  • 如果该值介于(或等于)
    Main\u 1
    Main 2
    ,则记录相关日期列的名称,如
    Date78
    Date79
    Date80
例如,假设只有
Date90
Date91
Date92
符合第一行的条件,我想查看字典,例如:

{
    (row0: Date90, Date91, and Date92)
    (row1: DateX, DateY)
    .  
    .  
    row30000: DateK, DateM, DateN, DateZ)
}
注意:列是使用
pd.to_datetime
格式='%Y/%m/%d
创建的

因此,我想知道数据框中每行的这些主要事件之间发生了哪些日期列(它们都表示事件)。我尝试使用itertools,但到目前为止我失败了,所以我很感激有人能在这方面指导我

我失败的尝试:

Date_sequence={}
for item, frame in sample_data.ix[:,2:102].iteritems():
    if frame>=sample_data.ix[:,1] and frame<=sample_data.ix[:,2]:
        Date_sequence['item'] = frame
Date_sequence={}
对于项,在示例_data.ix[:,2:102].iteritems()中使用框架:

如果frame>=sample_data.ix[:,1]和frame,则可以使用
pandas.dataFrame.apply()
依次处理每一行。对于每一行,您可以收集列的名称,并测试行值是否包含在日期范围内

代码:

def find_dates_in_range(row):
    start = row.Main_1
    end = row.Main_2
    return [column for column in row.index.values[2:]
            if start <= row[column] <= end]
import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(
    u"""Main_1      Main_2      Date1       Date2       Date3
      1994-11-05  1997-11-07  1993-11-07  1994-11-07  2002-11-07
      1994-01-07  1997-11-07  1993-11-07  1999-11-07  2002-10-07
      1994-08-09  1997-11-07  1995-11-07  2000-11-07  2003-11-07
    """), sep='\s+',
    parse_dates='Main_1 Main_2 Date1 Date2 Date3'.split())

print(df.apply(find_dates_in_range, axis=1))
0    [Date2]
1         []
2    [Date1]
结果:

def find_dates_in_range(row):
    start = row.Main_1
    end = row.Main_2
    return [column for column in row.index.values[2:]
            if start <= row[column] <= end]
import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(
    u"""Main_1      Main_2      Date1       Date2       Date3
      1994-11-05  1997-11-07  1993-11-07  1994-11-07  2002-11-07
      1994-01-07  1997-11-07  1993-11-07  1999-11-07  2002-10-07
      1994-08-09  1997-11-07  1995-11-07  2000-11-07  2003-11-07
    """), sep='\s+',
    parse_dates='Main_1 Main_2 Date1 Date2 Date3'.split())

print(df.apply(find_dates_in_range, axis=1))
0    [Date2]
1         []
2    [Date1]
给我写一份口述:

def find_dates_in_range(row):
    start = row.Main_1
    end = row.Main_2
    return [column for column in row.index.values[2:]
            if start <= row[column] <= end]
import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(
    u"""Main_1      Main_2      Date1       Date2       Date3
      1994-11-05  1997-11-07  1993-11-07  1994-11-07  2002-11-07
      1994-01-07  1997-11-07  1993-11-07  1999-11-07  2002-10-07
      1994-08-09  1997-11-07  1995-11-07  2000-11-07  2003-11-07
    """), sep='\s+',
    parse_dates='Main_1 Main_2 Date1 Date2 Date3'.split())

print(df.apply(find_dates_in_range, axis=1))
0    [Date2]
1         []
2    [Date1]
并转化为问题中要求的形式:

results = df.apply(find_dates_in_range, axis=1)
as_a_dict = {'row%s' % i: v for i, v in enumerate(results)})
print(as_a_dict)
给出:

{'row0': ['Date2'], 'row1': [], 'row2': ['Date1']}

如果你能告诉我哪部分不清楚,我会尽力澄清。我的代码没有带我去任何地方。我在编辑中键入了一些代码。感谢您的回复。好的,发布了一个可能的解决方案,并删除了关闭投票+过时的评论。