Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/290.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 更改数据帧格式以获得预期的输出_Python_Pandas_Dataframe - Fatal编程技术网

Python 更改数据帧格式以获得预期的输出

Python 更改数据帧格式以获得预期的输出,python,pandas,dataframe,Python,Pandas,Dataframe,在下面的数据帧中 df = pd.DataFrame({'session' : ["1","1","2","2","3","3"], 'path' : ["p1","p2","p1","p2","p2","p3"], 'seconds' : ["20","21","132","10","24","45"]}) 我需要得到如下输出。(页面作为列,会话作为行,每个单元格中的秒数。) 我到目前为止所做的一切 In [76]: wordlist = ['p1', 'p

在下面的数据帧中

df = pd.DataFrame({'session' : ["1","1","2","2","3","3"],
                'path'  : ["p1","p2","p1","p2","p2","p3"], 'seconds' : ["20","21","132","10","24","45"]})
我需要得到如下输出。(页面作为列,会话作为行,每个单元格中的秒数。)

我到目前为止所做的一切

In [76]: wordlist = ['p1', 'p2', 'p3']
In [77]: df2 = pd.DataFrame(df.groupby('session').apply(lambda x: ','.join(x.path)))
In [78]: df2 #I have renamed the columns
Out[78]: 

                  path
        session       
        1        p1,p2
        2        p1,p2
        3        p2,p3

In [79]: df3 = pd.DataFrame(df.groupby('session').apply(lambda x: ','.join(x.seconds.astype(str))))
In [80]: df3 #I have renamed the columns
Out[80]: 
                   path
        session        
        1         20,21
        2        132,10
        3         24,45
下面给出了布尔结果。我需要得到我的预期输出。有什么帮助吗

In [84]: pd.DataFrame({name : df2["path"].str.contains(name) for name in wordlist})
Out[84]: 
            p1    p2     p3
session                    
1         True  True  False
2         True  True  False
3        False  True   True
使用数据透视表:

df.pivot(index='session', columns='path')
然后将所有Nan替换为零:

df2 = df1.fillna(0)
这将为您提供以下输出:

        seconds        
path         p1  p2  p3
session                
1            20  21   0
2           132  10   0
3             0  24  45
然后,您可能希望删除多索引列:

df1.columns = df1.columns.droplevel(0)
生成所需的解决方案(无逗号):

最后,您可以使用
StringIO
将其转换为逗号分隔的字符串:

import StringIO
s = StringIO.StringIO()
df1.to_csv(s)
print s.getvalue()
具有以下输出:

        seconds        
path         p1  p2  p3
session                
1            20  21   0
2           132  10   0
3             0  24  45
import StringIO
s = StringIO.StringIO()
df1.to_csv(s)
print s.getvalue()
session,p1,p2,p3
1,20,21,0
2,132,10,0
3,0,24,45