Python 使用熊猫提取我需要的数据
我有一个xlsx文件,看起来像这样Python 使用熊猫提取我需要的数据,python,excel,pandas,Python,Excel,Pandas,我有一个xlsx文件,看起来像这样 Name 01/09/16 02/09/16 03/09/16 Jack In Out In Lisa Out In Out Tom Out In In
Name 01/09/16 02/09/16 03/09/16
Jack In Out In
Lisa Out In Out
Tom Out In In
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
dictionary = {'01/09/16': {In: [Jack], Out: [Lisa, Tom] } }
我正试图使用pandas将这些数据打印在如下表格中:
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
我正在努力寻找一种方法来对付熊猫。我想问一下,是否有任何简单的方法可以遍历dates列,将其与行匹配并获得单元格值
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
例如,让我们以第一列01/09/16为例,如何使用pandas沿着该列查找单元格值'In',将其与行名称'Jack'匹配,然后将其添加到嵌套字典中,如下所示:
Name 01/09/16 02/09/16 03/09/16
Jack In Out In
Lisa Out In Out
Tom Out In In
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
dictionary = {'01/09/16': {In: [Jack], Out: [Lisa, Tom] } }
如果我可以这样得到它,我可以使用类似于上面第二个表中所示的PrettyTable的东西将其组织到一个表中。考虑一下在dataframe的所有系列列中运行的字典理解。但首先,请确保将名称作为dataframe的索引:
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
from io import StringIO
import pandas as pd
data = '''
Name 01/09/16 02/09/16 03/09/16
Jack In Out In
Lisa Out In Out
Tom Out In In
'''
df = pd.read_table(StringIO(data), sep="\s+", index_col=0)
print(df)
# 01/09/16 02/09/16 03/09/16
# Name
# Jack In Out In
# Lisa Out In Out
# Tom Out In In
# BUILD DICTIONARY
dfdict = {col: (df[col][df[col] == 'In'].index.values,
df[col][df[col] == 'Out'].index.values) for col in df.columns}
dfdict['Status'] = ['In', 'Out']
# CAST TO DATAFRAME
finaldf = pd.DataFrame(dfdict)
finaldf = finaldf[['Status'] + [col for col in df.columns]] # RE-ORDER COLS
print(finaldf)
# Status 01/09/16 02/09/16 03/09/16
# 0 In [Jack] [Lisa, Tom] [Jack, Tom]
# 1 Out [Lisa, Tom] [Jack] [Lisa]
IIUC
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
或者用更少的代码
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
df.set_index('Name').unstack().reset_index().groupby(['level_0', 0]) \
.Name.apply(list).rename_axis([None, None]).unstack(0)
你跑得更快;)
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+