csv文件中的Python分析列和单元格中的数据_Python_Csv

csv文件中的Python分析列和单元格中的数据

python csv

csv文件中的Python分析列和单元格中的数据,python,csv,Python,Csv,我正在尝试为以下数据创建代码：我已使用以下代码导入数据： import csv import itertools import pandas as pd input_file="computation.csv" cmd=pd.read_csv(input_file) subset = cmd[['Carbon A', 'Carbon B']] carbon_pairs = [tuple(y) for y in subset.values] c_pairs = carbon_pairs 我

我正在尝试为以下数据创建代码：

我已使用以下代码导入数据：

import csv
import itertools
import pandas as pd

input_file="computation.csv"
cmd=pd.read_csv(input_file)
subset = cmd[['Carbon A', 'Carbon B']]
carbon_pairs = [tuple(y) for y in subset.values]
c_pairs = carbon_pairs

我想创建一个具有以下输出的代码：

1 is connected to
  2
  4
  6
  7 
  8
2 is connected to
  1
  4
  5

请注意，对于“carbon”2，我希望它重复连接到carbon 1。我想一些排列可以显示这一点，但我很不确定从哪里开始。基本上，代码需要输出：

for every cell with the same value, print adjacent cell

从问题的结尾开始：

c_pairs = [(1, 2), (1, 4), (1, 6), (1, 7), (1, 8), (2, 1), (2, 4), (2, 5)]

你可能想得到更像这样的结果：

groups = {1: [2, 4, 6, 7, 8], 2: [1, 4, 5]}

有很多方法可以达到这个目的

如果您知道您的数据已排序，则可以使用

itertools.groupby

，例如：

first_item = lambda (a, b): a
for key, items in itertools.groupby(c_pairs, first_item):
    print '%s is connected to' % key
    for (a, b) in items:
        print '  %s' % b

如果您的数据未排序，这可能仍然是最快的方法，只需先排序即可：

c_pairs = sorted(c_pairs, key=first_item)

一个更适合自己动手的解决方案是使用

defaultdict

或标准字典创建一个到另一个的映射

groups = collections.defaultdict(list)
for a, b in c_pairs:
    groups[a].append(b)

这相当于没有集合：

groups = {}
for a, b in c_pairs:
    groups.setdefault(a, [])  # many ways to do this as well
    groups[a].append(b)

通过以下函数（Python 2），您可以在不依赖pandas的情况下获得所需的输出，该函数允许您传入所需的任何文件名，并使用您试图查询的索引（基于零）进行控制。此解决方案假定数据按照您提供的示例中的方式进行排序

import csv

def printAdjacentNums(filename, firstIdx, secondIdx):
    with open(filename, 'rb') as csvfile:
        # handle header line
        header = next(csvfile)
        reader = csv.reader(csvfile)
        current_val = ''
        current_adj = []
        # dict of lists for lookback
        lookback = {}
        for row in reader:
            if current_val == '':
                current_val = row[firstIdx]
            if row[firstIdx] == current_val:
                current_adj.append(row[secondIdx])
            else:
                # check lookback
                for k, v in lookback.items():
                    if current_val in v:
                        current_adj.append(k)

                # print what we need to
                print current_val + ' is connected to'
                for i in current_adj:
                    print i

                # append current vals to lookback
                lookback[current_val] = current_adj

                # reassign
                current_val = row[firstIdx]
                current_adj = [row[secondIdx]]

     # print final set
    for k, v in lookback.items():
        if current_val in v:
            current_adj.append(k)
    print current_val + ' is connected to'
    for i in current_adj:
        print i

然后根据您的示例，这样称呼它：

printAdjacentNums('computation.csv', 0, 1)

我建议简化您的问题，并将背景故事放在下面：“给定一个

（a，b）

对的列表，如

[（1，2），（1，4），（1，6），（1，7），（1，8），（2，1），（2，1），（2，4），（2，5）]

，我如何找到与每个

值关联的所有

值？”！但是（例如），它打印“2连接到：4，5”，我想包括它连接到1（第一行）。你知道如何添加这个吗？添加了回溯逻辑。此逻辑不处理重复行，因为这不在原始问题的范围内。如果需要，请使用而不是列表作为当前_adj的数据类型。希望这有帮助。有没有办法向itertools.groupby添加某种类型的回溯逻辑？正如你所说，我想以groups={1:[2,4,6,7,8]，2:[1,4,5]}结束，但是这个代码给了我groups={1:[2,4,6,7,8]，2:[4,5]}@AlexIndeglia哪个代码？根据您的阅读方式，上面有3-4个选项。当运行除一个版本之外的所有版本时，我可以看到上面所需的输出。在这个例子中，您对未排序的数据运行了

groupby

，而没有运行上面调用的about行，并且该代码不返回字典，而是在打印的循环中使用。您确定没有忘记添加

（2，1）

值吗？