数组中值比较的python方法？_Python_Arrays_Csv_Rows

数组中值比较的python方法？

python arrays csv

数组中值比较的python方法？,python,arrays,csv,rows,Python,Arrays,Csv,Rows,问题是：输入是一个以制表符分隔的文件。行是变量，列是示例。变量可以采用三个值（00,01,11），并按需要指定的顺序列出（v1->vN）。有大量的行和列，因此需要分块读取输入文件输入如下所示： s1 s2 s3 s4 v1 00 00 11 01 v2 00 00 00 00 v3 01 11 00 00 v4 00 00 00 00 (...) 我想做的是将输入分成几行，每行的大小足以使每个样本都是唯一的。在上面的示例中，从v1开始，第一个块应该在v3结束，因为在该点上有足够的信

问题是：

输入是一个以制表符分隔的文件。行是变量，列是示例。变量可以采用三个值（00,01,11），并按需要指定的顺序列出（v1->vN）。有大量的行和列，因此需要分块读取输入文件

输入如下所示：

   s1 s2 s3 s4
v1 00 00 11 01
v2 00 00 00 00
v3 01 11 00 00
v4 00 00 00 00
(...)

我想做的是将输入分成几行，每行的大小足以使每个样本都是唯一的。在上面的示例中，从v1开始，第一个块应该在v3结束，因为在该点上有足够的信息表明样本是唯一的。下一个块将从v4开始并重复该过程。到达最后一行时，任务结束。块应打印在输出文件中

我的尝试：

我试图做的是使用csv模块生成一个数组，该数组由列表组成，每个列表包含所有样本的单个变量（00,01,00）的状态。或者，通过旋转输入，创建包含每个变量的示例状态的列表。我想问的是，工作应该集中在列还是行上，即使用v1=['00'、'00'、'11'、'01']或s1=['00'、'00'、'01'、'00'、…]是否更好

下面的代码引用了数据透视操作，我试图通过该操作将列问题更改为行问题。（很抱歉python语法笨拙，这是我能做的最好的了）

解决这个问题的最佳方法是什么？旋转有什么帮助吗？是否有任何内置函数可以依赖？

我完全不理解您的问题（“坐标变量”？“单音确定样本”），但我知道您使用的csv模块不正确，缩进也不正确

我不知道输入文件到底是什么样子的，但假设它是以制表符分隔的，下面的（未测试的）脚本显示了一种方法，可以从输入文件中提取块，将它们转换，然后重写到输出文件

导入csv
#这并不是绝对必要的，但您可以为输入和输出定义自定义方言
类样本方言（csv.dialogue）：
分隔符=“\t”
quoting=csv.QUOTE“无
SampleDialogue=SampleDialogue（）
ifn='my_file.txt'
ofn='转置的.'+ifn
ifp=打开（ifn，‘rb’）
ofp=打开（ofn，‘wb’）
incsv=csv.reader（ifp，方言=sampledial）
outcsv=csv.writer（ofp，方言=SampleDialogue）
页眉=无
块=[]
对于行号，枚举中的样本（incsv）：
如果lineno==0:#标题
标题=样本
持续
block.append（示例）
如果第%3行：
#街区尽头
#用block做点什么
#然后写出来
outcsv.writerows（块）
块=[]
ifp.close（）
ofp.close（）

假设您将csv数据导入为长度相同的列表列表，这对您来说是如何工作的

def get_block(data_rows):
    samples = []

    for cell in data_rows[0]:
        samples.append('')

    # add one row at a time to each sample and see if all are unique
    for row_index, row in enumerate(data_rows):
        for cell_index, cell in enumerate(row):
            samples[cell_index] = '%s%s' % (samples[cell_index], cell)

        are_all_unique = True
        sample_dict = {} # use dictionary keys to find repeats
        for sample in samples:
            if sample_dict.get(sample):
                # already there, so another row needed
                are_all_unique = False
                break
            sample_dict[sample] = True # add the key to the dictionary
        if are_all_unique:
            return True, row_index

    return False, None

def get_all_blocks(all_rows):
    remaining_rows = all_rows[:] # make a copy    
    blocks = []

    while True:
        found_block, block_end_index = get_block(remaining_rows)
        if found_block:
            blocks.append(remaining_rows[:block_end_index+1])
            remaining_rows = remaining_rows[block_end_index+1:]
            if not remaining_rows:
                break
        else:
            blocks.append(remaining_rows[:])
            break

    return blocks


if __name__ == "__main__":
    v1 = ['00', '00', '11', '01']
    v2 = ['00', '00', '00', '00']
    v3 = ['01', '11', '00', '00']
    v4 = ['00', '00', '00', '00']

    all_rows = [v1, v2, v3, v4]

    blocks = get_all_blocks(all_rows)

    for index, block in enumerate(blocks):
        print "This is block %s." % index
        for row in block:
            print row
        print

=================

这是0区

['00'，'00'，'11'，'01']

['00'，'00'，'00'，'00']

[01'，11'，00'，00']

这是第一座

['00'，'00'，'00'，'00']

基本上如何编写一个带有循环的脚本，该循环允许我识别区分每个样本所需的最小有序变量量。这与您发布的代码有什么关系？这个问题的哪一部分是有问题的？现在，这篇文章听起来像是“我试图解决这个问题，但做不到，所以你能做到吗？”我发布的代码应该给出我走了多远的想法，建议将旋转作为一种可能的方法。我不是要剧本，也不是要有人帮我解决问题，我是要一些我在别处找不到的见解。你不需要说服我。事实上，你对此没有答案，还有三票接近。我强烈建议你更新你的问题，把密集的段落分解。我不理解你的问题。准确、一步一步地描述你想要做的事情。“找到单一确定所有样本所需的最小后续变量数”是什么意思？我知道所有这些词，但整个词的意思我想不通。它看起来正是我想要的。谢谢你的帮助，我一试就会告诉你。奇迹工程。我想投你一票，但我的名声太低了！这些块的大小不尽相同。他希望每个块都有足够的行，使每个列与块中的其他示例不同。我更多地演示了如何使用csv读写器。他必须将

if lineno%3:

行更改为他的任何条件。

def get_block(data_rows):
    samples = []

    for cell in data_rows[0]:
        samples.append('')

    # add one row at a time to each sample and see if all are unique
    for row_index, row in enumerate(data_rows):
        for cell_index, cell in enumerate(row):
            samples[cell_index] = '%s%s' % (samples[cell_index], cell)

        are_all_unique = True
        sample_dict = {} # use dictionary keys to find repeats
        for sample in samples:
            if sample_dict.get(sample):
                # already there, so another row needed
                are_all_unique = False
                break
            sample_dict[sample] = True # add the key to the dictionary
        if are_all_unique:
            return True, row_index

    return False, None

def get_all_blocks(all_rows):
    remaining_rows = all_rows[:] # make a copy    
    blocks = []

    while True:
        found_block, block_end_index = get_block(remaining_rows)
        if found_block:
            blocks.append(remaining_rows[:block_end_index+1])
            remaining_rows = remaining_rows[block_end_index+1:]
            if not remaining_rows:
                break
        else:
            blocks.append(remaining_rows[:])
            break

    return blocks


if __name__ == "__main__":
    v1 = ['00', '00', '11', '01']
    v2 = ['00', '00', '00', '00']
    v3 = ['01', '11', '00', '00']
    v4 = ['00', '00', '00', '00']

    all_rows = [v1, v2, v3, v4]

    blocks = get_all_blocks(all_rows)

    for index, block in enumerate(blocks):
        print "This is block %s." % index
        for row in block:
            print row
        print