python中的行操作

python中的行操作,python,csv,pandas,gephi,Python,Csv,Pandas,Gephi,我正在尝试将csv转换为.gexf格式的文件,用于动态gephi图形。其思想是在属性数据中包含所有平行边(源和目标相同但发布日期不同的边)。在本例中,属性中的所有日期都对应于John在在线课程讨论论坛中回答Jan问题的发布日期 如何获得如下所示的csv: Jan John 2012-04-07 2012-06-06 Jan Jason 2012-05-07 2012-06-06 Jan John 2012-03-02 2012-06-07 Jan Jason 2012-

我正在尝试将csv转换为.gexf格式的文件,用于动态gephi图形。其思想是在属性数据中包含所有平行边(源和目标相同但发布日期不同的边)。在本例中,属性中的所有日期都对应于John在在线课程讨论论坛中回答Jan问题的发布日期

如何获得如下所示的csv:

Jan John    2012-04-07  2012-06-06
Jan Jason   2012-05-07  2012-06-06
Jan John    2012-03-02  2012-06-07
Jan Jason   2012-03-20  2012-06-08
Jan Jack    2012-03-26  2012-06-09
Jan Janet   2012-05-01  2012-06-10
Jan Jack    2012-05-04  2012-06-11
Jan Jason   2012-05-07  2012-06-12
Jan Jack    2012-05-09  2012-06-13
Jan John    2012-05-15  2012-06-14
Jan Janet   2012-05-15  2012-06-15
Jan Jason   2012-05-20  2012-06-16
Jan Jack    2012-05-23  2012-06-17
Jan Josh    2012-05-25  2012-06-18
Jan Jack    2012-05-28  2012-06-19
Jan Josh    2012-06-01  2012-06-20
<edge source="Jan" target="John" start="2012-02-20" end="2012-06-06" weight="1" id="133">
        <attvalues>
          <attvalue for="0" value="1" start="2012-04-07" end="2012-06-06"/>
          <attvalue for="0" value="2" start="2012-06-06" end="2012-06-06"/>
          <attvalue for="0" value="3" start="2012-06-06" end="2012-06-06"/>
        </attvalues>
 </edge>
<next edge...
</next edge>
转换成如下所示的格式:

Jan John    2012-04-07  2012-06-06
Jan Jason   2012-05-07  2012-06-06
Jan John    2012-03-02  2012-06-07
Jan Jason   2012-03-20  2012-06-08
Jan Jack    2012-03-26  2012-06-09
Jan Janet   2012-05-01  2012-06-10
Jan Jack    2012-05-04  2012-06-11
Jan Jason   2012-05-07  2012-06-12
Jan Jack    2012-05-09  2012-06-13
Jan John    2012-05-15  2012-06-14
Jan Janet   2012-05-15  2012-06-15
Jan Jason   2012-05-20  2012-06-16
Jan Jack    2012-05-23  2012-06-17
Jan Josh    2012-05-25  2012-06-18
Jan Jack    2012-05-28  2012-06-19
Jan Josh    2012-06-01  2012-06-20
<edge source="Jan" target="John" start="2012-02-20" end="2012-06-06" weight="1" id="133">
        <attvalues>
          <attvalue for="0" value="1" start="2012-04-07" end="2012-06-06"/>
          <attvalue for="0" value="2" start="2012-06-06" end="2012-06-06"/>
          <attvalue for="0" value="3" start="2012-06-06" end="2012-06-06"/>
        </attvalues>
 </edge>
<next edge...
</next edge>

.

看看这个项目,它旨在简化这种操作。它如何分组和解析数据的示例

# Load your CSV as a pandas 'DataFrame'.
In [13]: df = pd.read_csv('your file', names=['source', 'target', 'start', 'end'])

# Look at the first few rows. It worked.
In [14]: df.head()
Out[14]: 
  source target       start         end
0    Jan  Jason  2012-05-07  2012-06-06
1    Jan   John  2012-03-02  2012-06-07
2    Jan  Jason  2012-03-20  2012-06-08
3    Jan   Jack  2012-03-26  2012-06-09
4    Jan  Janet  2012-05-01  2012-06-10

# Group the rows by the the name columns. Each unique pair gets its own group.
In [15]: edges = df.groupby(['source', 'target'])

In [16]: for (source, target), edge in edges: # consider each unique name pair an edge
    print source, target
    for _, row in edge.iterrows(): # loop through all the rows belonging to these names
        print row['start'], row['end']
   ....:         
Jan Jack
2012-03-26 2012-06-09
2012-05-04 2012-06-11
2012-05-09 2012-06-13
2012-05-23 2012-06-17
2012-05-28 2012-06-19
Jan Janet
2012-05-01 2012-06-10
2012-05-15 2012-06-15
Jan Jason
2012-05-07 2012-06-06
2012-03-20 2012-06-08
2012-05-07 2012-06-12
2012-05-20 2012-06-16
Jan John
2012-03-02 2012-06-07
2012-05-15 2012-06-14
Jan Josh
2012-05-25 2012-06-18
2012-06-01 2012-06-20

剩下的就是用XML详细说明这些打印语句,并可能输出到文件而不是打印。

好吧,你自己试过什么吗?
开始日期和
结束日期是从哪里来的?我不知道问题是什么。我看不出“约翰”和目标中的日期之间的关系。请澄清……“2007-01-22”在哪里“从哪里来?这是一个固定的字符串出现在所有的边缘吗?我会用这个。使用解析出数据,将其放入字典中,并使用模板进行替换。祝你好运,如果你被卡住了,请用一些代码更新你的问题,我们很乐意帮助你!在提供的链接中有很多例子可以让你开始…谢谢Dan!这似乎非常有用。我将开始玩它。它可以在所有平台上运行。我也在Mac电脑上使用它。看到了吗?这对Mac可用吗?我尝试运行了
pip安装程序
,但是
失败,错误代码为1
,没有更具体的错误,我不知道如何帮助您。我在几台Mac电脑上安装并使用了它。如果pip不起作用,在StackOverflow上搜索“install pandas Mac”会发现许多讨论替代方案的线程。我相信网上某处有一个二进制文件,就像Windows一样,但我也找不到@杰夫?@goldisfine在安装熊猫之前,你需要安装numpy和cython