Python 带有键值列的dataframe将键从事件日志转到新列
我有一个像Frame1这样的数据框架,我怎样才能获取key列中的所有项,并将它们转换为新列,获取相应的值,然后像下面那样放置它?我还将真实数据集放在下面 框架1:Python 带有键值列的dataframe将键从事件日志转到新列,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个像Frame1这样的数据框架,我怎样才能获取key列中的所有项,并将它们转换为新列,获取相应的值,然后像下面那样放置它?我还将真实数据集放在下面 框架1: name name2 key value matt face money 100 matt face junk True james face money 50 james face junk False james face wife T
name name2 key value
matt face money 100
matt face junk True
james face money 50
james face junk False
james face wife True
adam face money found
adam face wife False
adam face strange yes
预期产出:
name name2 money junk wife strange
matt face 100 True NAN NAN
adam face found False False yes
james face 50 False True NAN
键列的数量和值是可变的。谢谢你的帮助
#
实际数据:
machinename eventid entrytype source timegenerated timewritten username message action keys vals
0 mycompname 4688 successaudit microsoft-windows-security... 3/7/2017 10:38:16 am 3/7/2017 10:38:16 am NONE a new process has been cre... a new process has been cre... subject NaN
1 mycompname 4688 successaudit microsoft-windows-security... 3/7/2017 10:38:16 am 3/7/2017 10:38:16 am NONE a new process has been cre... a new process has been cre... security id s-1-5-18
2 mycompname 4656 failureaudit microsoft-windows-security... 3/7/2017 10:38:05 am 3/7/2017 10:38:05 am NONE a handle to an object was ... a handle to an object was ... account domain my domain
3 mycompname 4656 failureaudit microsoft-windows-security... 3/7/2017 10:38:05 am 3/7/2017 10:38:05 am NONE a handle to an object was ... a handle to an object was ... logon id 0x3e7
... ... ... ... ... ... ... ... ... ... ... ...
1381 mycompname 4688 successaudit microsoft-windows-security... 3/7/2017 6:47:40 am 3/7/2017 6:47:40 am NONE a new process has been cre... a new process has been cre... source network address NaN
1382 mycompname 4673 successaudit microsoft-windows-security... 3/7/2017 6:47:40 am 3/7/2017 6:47:40 am NONE a privileged service was c... a privileged service was c... source port -
1383 mycompname 4656 failureaudit microsoft-windows-security... 3/7/2017 6:47:40 am 3/7/2017 6:47:40 am NONE a handle to an object was ... a handle to an object was ... detailed authentication i... NaN
1384 mycompname 4656 failureaudit microsoft-windows-security... 3/7/2017 6:47:40 am 3/7/2017 6:47:40 am NONE a handle to an object was ... a handle to an object was ... logon process advapi
1385 mycompname 4656 failureaudit microsoft-windows-security... 3/7/2017 6:47:40 am 3/7/2017 6:47:40 am NONE a handle to an object was ... a handle to an object was ... authentication package NaN
更新
这是将键推送到列名,但不是推送适当的值
df = pd.pivot_table(df, values="vals",index=["MachineName", "EventID","EntryType", "Source", "TimeGenerated", "TimeWritten","UserName", "Message"], columns=['keys'], aggfunc=np.sum)
尝试使用函数
Frame1.设置索引(['name','name2','key'])['value']。取消堆栈('key')
或使用:
Frame1.pivot\u表(columns='key',index='name','name2',values='value')
尝试使用函数
Frame1.设置索引(['name','name2','key'])['value']。取消堆栈('key')
或使用:
Frame1.pivot\u表(columns='key',index='name',name2'],values='value')
尝试了这个解决方案,我认为这是一个很好的解决方案。对我来说效果很好df.set_index(['name','name2','key'])['value'].unstack('key').reset_index()
?根据我的直觉,这可能是因为在['name','name2','key']列中有重复的行,就像有两行相同的“matt,face,money,100”。也许可以尝试df。删除重复项(子集=['name','name2','key'])。设置索引(['name','name2','key'])['value']。取消堆栈('key')
?是的,这很有帮助。我猜对于每个['name'、'name2'、'key']对,可能会有对应于这对的不同操作。必须使['name'、'name2'、'key']列唯一,以便可以取消堆栈。请尝试df['name'、'name2'、'key'、'vals']。删除重复项(子集=['name'、'name2'、'key'])。设置索引(['name'、'name2'、'key']))['vals']。取消堆栈('key')
BTW,您想使用哪些列作为索引name'和'name2'?Frame1=Frame1.set_index(Frame1.columns.difference(['vals']))['vals'])。unstack('key')。reset_index()
尝试过这个解决方案,我认为这是一个很好的解决方案。对我来说效果很好df.set_index(['name','name2','key'])['value'].unstack('key').reset_index()
?根据我的直觉,这可能是因为在['name','name2','key']列中有重复的行,就像有两行相同的“matt,face,money,100”。也许可以尝试df。删除重复项(子集=['name','name2','key'])。设置索引(['name','name2','key'])['value']。取消堆栈('key')
?是的,这很有帮助。我猜对于每个['name'、'name2'、'key']对,可能会有对应于这对的不同操作。必须使['name'、'name2'、'key']列唯一,以便可以取消堆栈。请尝试df['name'、'name2'、'key'、'vals']。删除重复项(子集=['name'、'name2'、'key'])。设置索引(['name'、'name2'、'key']))['vals']。取消堆栈('key')
BTW,您想使用哪些列作为索引name'和'name2'?Frame1=Frame1.set_索引(Frame1.columns.difference(['vals']))['vals'])。取消堆栈('key')。重置_索引()