Python 行之间的条件数学运算

Python 行之间的条件数学运算,python,python-2.7,Python,Python 2.7,在收到否决票后重新发布这篇文章,我确实回去尝试了一些东西,但我想现在还没有 包含如下数据的文件: name count count1 count3 add1 add2 jack 70 55 31 100174766 100170715 jack 45 656 48 100174766 100174052 john 41 22 89 102268764 102267805 john 47 31 63 102268764

在收到否决票后重新发布这篇文章,我确实回去尝试了一些东西,但我想现在还没有

包含如下数据的文件:

name    count   count1  count3  add1    add2
jack    70  55  31  100174766   100170715
jack    45  656 48  100174766   100174052
john    41  22  89  102268764   102267805
john    47  31  63  102268764   102267908
david   10  56  78  103361093   103368592
我需要检查的两个条件和一个稍后需要完成的数学运算: A) 哪些行/行在add1中具有重复的值(始终==2) B) 如果它们等于2,那么add2中哪一行/行的值更大

以杰克为例:

jack    70  55  31  100174766   100170715
jack    45  656 48  100174766   100174052
jack有两个add1==2(出现两次),且
100174052
更大,因此:

row1 = jack 45  656 48  100174766   100174052
row2 = jack 70  55  31  100174766   100170715
数学: 对于两行之间的每个单元格
row1/(row1+row2)

插孔的输出: 最终期望输出 迄今为止的代码: 我知道我没有说明哪一个add2更大,不知道在哪里以及如何做

info = []
with open('file.tsv', 'r') as j:
    for i,line in enumerate(j):
        lines = line.strip().split('\t')
        info.append(lines)

uniq = {}
for index,row in enumerate(info, start =1):
    if row.count(row[4]) == 2:
       key = row[4] + ':' + row[5]
    if key not in uniq:
        uniq[key] = row[1:3]

for k, v in sorted(uniq.iteritems()):
    row1 = k,v
    row2 = k,v
    print 'row1: ', row1[0], '\n', 'row2: ',row2[0]
我所看到的是:

row1:  100174766:100170715 
row2:  100174766:100170715
row1:  100174766:100174052 
row2:  100174766:100174052
而不是

row1:  100174766:100170715
row2:  100174766:100174052

熊猫可以做的任何事情都可以用纯python完成-只需要更多的代码:

要使其成为在内部运行f.e.的完整文件,您需要创建文件:

# create file
with open("d.txt","w") as f:
    f.write("""name    count   count1  count3  add1    add2
jack    70  55  31  100174766   100170715
jack    45  656 48  100174766   100174052
john    41  22  89  102268764   102267805
john    47  31  63  102268764   102267908
david   10  56  78  103361093   103368592""")
顺便说一下,我定义了一些助手:

def printMe(gg):
    """Pretty prints a dictionary"""
    print ""
    for k in gg:
        print k, "\t:  ", gg[k]

def spaceEm(s):
    """Returns a string of input s with 2 spaces prepended"""
    return "  {}".format(s)
开始阅读并计算你的价值观:

data = {}
with open("d.txt","r") as f:
    headers = f.readline().split() # store header line for later
    for line in f:
        if line.strip(): # just a guard against empty lines
            # name, *splitted = line.split() # python 3.x, you specced 2.7
            tmp = line.split()
            name = tmp[0]
            splitted = tmp[1:]
            nums = list(map(float,splitted))
            data.setdefault((name,nums[3]),[]).append(nums)
printMe(data)

# sort data
for nameAdd1 in data:
    # name     :  count   count1  count3  add1    add2 
    data[nameAdd1].sort(key = lambda x: -x[4]) # - "trick" to sort descending, you 
                                               # could use reverse=True instead 
printMe(data)


# calculate stuff and store in result
result = {}
for nameAdd1 in data:
    try:
        values = zip(*data[nameAdd1])

        # this results in value error if you can not decompose in r1,r2
        result[nameAdd1] = [r1 / (r1+r2) for r1,r2 in values]

    except ValueError:
        # this catches the case of only 1 value for a person 
        result[nameAdd1] = data[nameAdd1][0]
printMe(result)


# store as resultfile (will be overwritten each time)
with open("d2.txt","w") as f:
    # header
    f.write(headers[0])
    for h in headers[1:]:
        f.write(spaceEm(h))
    f.write("\n")

    # data
    for key in result:
        f.write(key[0]) # name
        for t in map(spaceEm,result[key]):
            f.write(t) # numbers
        f.write("\n")
输出:

# read from file
('jack', 100174766.0)   :   [[70.0, 55.0, 31.0, 100174766.0, 100170715.0], [45.0, 656.0, 48.0, 100174766.0, 100174052.0]]
('david', 103361093.0)  :   [[10.0, 56.0, 78.0, 103361093.0, 103368592.0]]
('john', 102268764.0)   :   [[41.0, 22.0, 89.0, 102268764.0, 102267805.0], [47.0, 31.0, 63.0, 102268764.0, 102267908.0]]

# sorted by add1
('jack', 100174766.0)   :   [[45.0, 656.0, 48.0, 100174766.0, 100174052.0], [70.0, 55.0, 31.0, 100174766.0, 100170715.0]]
('david', 103361093.0)  :   [[10.0, 56.0, 78.0, 103361093.0, 103368592.0]]
('john', 102268764.0)   :   [[47.0, 31.0, 63.0, 102268764.0, 102267908.0], [41.0, 22.0, 89.0, 102268764.0, 102267805.0]]

# result of calculations
('jack', 100174766.0)   :   [0.391304347826087, 0.9226441631504922, 0.6075949367088608, 0.5, 0.5000083281436545]
('david', 103361093.0)  :   [10.0, 56.0, 78.0, 103361093.0, 103368592.0]
('john', 102268764.0)   :   [0.5340909090909091, 0.5849056603773585, 0.4144736842105263, 0.5, 0.5000002517897694]
输入文件:

输出文件:

name  count  count1  count3  add1  add2
jack  0.391304347826087  0.9226441631504922  0.6075949367088608  0.5  0.5000083281436545
john  0.5340909090909091  0.5849056603773585  0.4144736842105263  0.5  0.5000002517897694
david  10.0  56.0  78.0  103361093.0  103368592.0


免责声明:我在3.x中编写了代码,后来将其修改为2.7,可能会有一些“不需要的”中间变量使其工作…

hi Onyanbu,感谢您的回答,这可以使用纯python完成吗?只是好奇和学习basics@novicebioinforesearcher你说的纯python是什么意思?。或者你的意思是R?我的意思是用pandas编写python代码,你的答案也行。在
add1
中是否有其他非“Jack”具有相同的值?不,在
add1
中不可能有非Jack具有相同的值,这是完美的,你还有py3版本吗?我应该使用py3,因为py27正在逐渐消失。我也不知道这对我来说是个好问题。@nearyorbioinforesearcher使用的python3代码
name,*splited=line.split()
decomposition(它仍然在代码中,刚刚注释)。您必须更改
print
-语句以使用
(..)
,因为print是Python3.x中的一个函数,这是主要的修改<此代码中使用的code>map和
zip
不需要任何更改(它们被迭代一次),因此不需要在它们周围使用列表(它们在2.7中返回列表,在3.x中返回生成器-因此,如果您想多次使用它们,您需要将它们放在列表中)。@newerbioinformeresarcher至于pyfiddle.io-。只需确保正确切换2.7/3.6,并熟悉其有关参数传递和input()值传递的怪癖(即,您需要先在代码中创建文件才能对文件进行操作)。您需要首先在输入字段中输入内容。如果你可以选择2.7或3.6,我会使用3.6-2.7将在2020年停业,几乎所有的东西都是移植的,所以没有必要停留在2.x-ish@novicebioinforesearcher请仔细阅读,其中展示了2.x和3.x中使用的某些内置组件的一些差异
data = {}
with open("d.txt","r") as f:
    headers = f.readline().split() # store header line for later
    for line in f:
        if line.strip(): # just a guard against empty lines
            # name, *splitted = line.split() # python 3.x, you specced 2.7
            tmp = line.split()
            name = tmp[0]
            splitted = tmp[1:]
            nums = list(map(float,splitted))
            data.setdefault((name,nums[3]),[]).append(nums)
printMe(data)

# sort data
for nameAdd1 in data:
    # name     :  count   count1  count3  add1    add2 
    data[nameAdd1].sort(key = lambda x: -x[4]) # - "trick" to sort descending, you 
                                               # could use reverse=True instead 
printMe(data)


# calculate stuff and store in result
result = {}
for nameAdd1 in data:
    try:
        values = zip(*data[nameAdd1])

        # this results in value error if you can not decompose in r1,r2
        result[nameAdd1] = [r1 / (r1+r2) for r1,r2 in values]

    except ValueError:
        # this catches the case of only 1 value for a person 
        result[nameAdd1] = data[nameAdd1][0]
printMe(result)


# store as resultfile (will be overwritten each time)
with open("d2.txt","w") as f:
    # header
    f.write(headers[0])
    for h in headers[1:]:
        f.write(spaceEm(h))
    f.write("\n")

    # data
    for key in result:
        f.write(key[0]) # name
        for t in map(spaceEm,result[key]):
            f.write(t) # numbers
        f.write("\n")
# read from file
('jack', 100174766.0)   :   [[70.0, 55.0, 31.0, 100174766.0, 100170715.0], [45.0, 656.0, 48.0, 100174766.0, 100174052.0]]
('david', 103361093.0)  :   [[10.0, 56.0, 78.0, 103361093.0, 103368592.0]]
('john', 102268764.0)   :   [[41.0, 22.0, 89.0, 102268764.0, 102267805.0], [47.0, 31.0, 63.0, 102268764.0, 102267908.0]]

# sorted by add1
('jack', 100174766.0)   :   [[45.0, 656.0, 48.0, 100174766.0, 100174052.0], [70.0, 55.0, 31.0, 100174766.0, 100170715.0]]
('david', 103361093.0)  :   [[10.0, 56.0, 78.0, 103361093.0, 103368592.0]]
('john', 102268764.0)   :   [[47.0, 31.0, 63.0, 102268764.0, 102267908.0], [41.0, 22.0, 89.0, 102268764.0, 102267805.0]]

# result of calculations
('jack', 100174766.0)   :   [0.391304347826087, 0.9226441631504922, 0.6075949367088608, 0.5, 0.5000083281436545]
('david', 103361093.0)  :   [10.0, 56.0, 78.0, 103361093.0, 103368592.0]
('john', 102268764.0)   :   [0.5340909090909091, 0.5849056603773585, 0.4144736842105263, 0.5, 0.5000002517897694]
name    count   count1  count3  add1    add2
jack    70  55  31  100174766   100170715
jack    45  656 48  100174766   100174052
john    41  22  89  102268764   102267805
john    47  31  63  102268764   102267908
david   10  56  78  103361093   103368592
name  count  count1  count3  add1  add2
jack  0.391304347826087  0.9226441631504922  0.6075949367088608  0.5  0.5000083281436545
john  0.5340909090909091  0.5849056603773585  0.4144736842105263  0.5  0.5000002517897694
david  10.0  56.0  78.0  103361093.0  103368592.0