Python 使用熊猫提取我需要的数据

Python 使用熊猫提取我需要的数据,python,excel,pandas,Python,Excel,Pandas,我有一个xlsx文件,看起来像这样 Name 01/09/16 02/09/16 03/09/16 Jack In Out In Lisa Out In Out Tom Out In In

我有一个xlsx文件,看起来像这样

Name     01/09/16        02/09/16          03/09/16       
Jack        In            Out                 In          
Lisa        Out           In                  Out             
Tom         Out           In                  In  
+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
dictionary = {'01/09/16': {In: [Jack], Out: [Lisa, Tom] } }
我正试图使用pandas将这些数据打印在如下表格中:

+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
我正在努力寻找一种方法来对付熊猫。我想问一下,是否有任何简单的方法可以遍历dates列,将其与行匹配并获得单元格值

+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
例如,让我们以第一列01/09/16为例,如何使用pandas沿着该列查找单元格值'In',将其与行名称'Jack'匹配,然后将其添加到嵌套字典中,如下所示:

Name     01/09/16        02/09/16          03/09/16       
Jack        In            Out                 In          
Lisa        Out           In                  Out             
Tom         Out           In                  In  
+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
dictionary = {'01/09/16': {In: [Jack], Out: [Lisa, Tom] } }

如果我可以这样得到它,我可以使用类似于上面第二个表中所示的PrettyTable的东西将其组织到一个表中。

考虑一下在dataframe的所有系列列中运行的字典理解。但首先,请确保将名称作为dataframe的索引:

+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
from io import StringIO
import pandas as pd

data = '''
Name     01/09/16        02/09/16          03/09/16       
Jack        In            Out                 In          
Lisa        Out           In                  Out             
Tom         Out           In                  In
'''
df = pd.read_table(StringIO(data), sep="\s+", index_col=0)
print(df)

#      01/09/16 02/09/16 03/09/16
# Name                           
# Jack       In      Out       In
# Lisa      Out       In      Out
# Tom       Out       In       In

# BUILD DICTIONARY
dfdict = {col: (df[col][df[col] == 'In'].index.values,
                df[col][df[col] == 'Out'].index.values) for col in df.columns}
dfdict['Status'] = ['In', 'Out']

# CAST TO DATAFRAME 
finaldf = pd.DataFrame(dfdict)
finaldf = finaldf[['Status'] + [col for col in df.columns]]             # RE-ORDER COLS
print(finaldf)

#   Status     01/09/16     02/09/16     03/09/16
# 0     In       [Jack]  [Lisa, Tom]  [Jack, Tom]
# 1    Out  [Lisa, Tom]       [Jack]       [Lisa]
IIUC

+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+

+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
或者用更少的代码

+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+
df.set_index('Name').unstack().reset_index().groupby(['level_0', 0]) \
    .Name.apply(list).rename_axis([None, None]).unstack(0)

你跑得更快;)
+----------------------------------+-------------+-------------+-------------+
|               Status             |  01/09/16   |  02/09/16   |    03/09/16 |
+----------------------------------+-------------+-------------+-------------+
|               In                 |  Jack          Tom             Tom
                                                 |  Lisa       |    Jack     |
+----------------------------------+-------------+-------------+-------------+
|               Out                |  Lisa
                                      Tom        |  Jack       |    Lisa     |
+----------------------------------+-------------+-------------+-------------+