如何用python编写数学组合公式编码

如何用python编写数学组合公式编码,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我的数据集如下所示: 输入文件: 我需要Python代码打印出所有可能的addr值与公共id的成对组合。主测试文件中有数百万个id和相应的addr值记录。因此,代码应该能够从文本文件中读取列。输出如下(仅显示301和302,其余将继续此模式): 到目前为止,我已经完成了以下工作,但我不知道如何对成对组合部分进行编码。我是Python新手,如果有人能帮我做一些解释,我将不胜感激 # coding: utf-8 # sample tested in python 3.6 import sys f

我的数据集如下所示:

输入文件: 我需要Python代码打印出所有可能的
addr
值与公共
id
的成对组合。主测试文件中有数百万个id和相应的addr值记录。因此,代码应该能够从文本文件中读取列。输出如下(仅显示301和302,其余将继续此模式):

到目前为止,我已经完成了以下工作,但我不知道如何对成对组合部分进行编码。我是Python新手,如果有人能帮我做一些解释,我将不胜感激

# coding: utf-8

# sample tested in python 3.6

import sys
from pip._vendor.pyparsing import empty

if len(sys.argv) < 2:
    sys.stderr.write("Usage: {0} filename\n".format(sys.argv[0]))
    sys.exit()

fn = sys.argv[1]
sys.stderr.write("reading " + fn + "...\n")

# Initialize empty set 
s = {}
line= 0
fin = open(fn,"r")
for line in fin:
    line = line.rstrip()
    f = line.split("\t")
    line +=1
    if line is 1:
        txid_prev = line 
        addr = line 
        s= addr
        continue
    txid=line
    txid_prev=line
    if txid is txid_prev:
        s.push(addr)
    else:
        # connect all pairs in s
        # print all pairs as edges
        s=addr
    txid_prev=txid
if s is not empty:
    # connect and print all edges   
#编码:utf-8
#在Python3.6中测试的示例
导入系统
从pip.\u vendor.py解析导入为空
如果len(系统argv)<2:
sys.stderr.write(“用法:{0}文件名\n.”格式(sys.argv[0]))
sys.exit()
fn=sys.argv[1]
sys.STDER.write(“读取”+fn+“..\n”)
#初始化空集
s={}
直线=0
鳍=打开(fn,“r”)
对于fin中的行:
line=line.rstrip()
f=行分割(“\t”)
行+=1
如果行为1:
txid_prev=线路
地址=行
s=地址
持续
txid=直线
txid_prev=线路
如果txid为txid_prev:
s、 推送(地址)
其他:
#以s连接所有对
#将所有对打印为边
s=地址
txid_prev=txid
如果s不是空的:
#连接并打印所有边
这个怎么样(如果你不在乎重复的话)

输出:

     addr  addr_2
id               
301     1       1
301     1       2
301     1       3
301     1       4
301     2       1
301     2       2
301     2       3
301     2       4
301     3       1
...

像这样的怎么样:

import pandas as pd
import io
import itertools

file="""id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1"""

df= pd.read_csv(io.StringIO(file), sep=" ")

for key,value in df.set_index("addr").groupby("id").groups.items():
    print(key)
    for item in list(itertools.combinations(value.values, 2)):
        print("{} {}".format(*item))
印刷品:

301
1 2
1 3
1 4
2 3
2 4
3 4
302
6 7
6 8
6 9
6 1
7 8
7 9
7 1
8 9
8 1
9 1

或者,我们可以将这些值放入字典中:

a = {} 

for id_,addr in df.values.tolist():
    a.setdefault(str(id_),[]).append(addr)

output = {key:list(itertools.combinations(value, 2)) for key,value in a.items()}


def return_combos(dict_, keys):
    values = []
    for i in keys:
        values.append(a[i])
    values = list(set([i for item in values for i in item]))
    return {','.join(keys):list(itertools.combinations(values, 2))}


output2 = return_combos(a, ["301","302"])
输出打印:

{'301': [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)],
 '302': [(6, 7),
  (6, 8),
  (6, 9),
  (6, 1),
  (7, 8),
  (7, 9),
  (7, 1),
  (8, 9),
  (8, 1),
  (9, 1)]} 
同时输出2个输出:

{'301,302': [(1, 2),
  (1, 3),
  (1, 4),
  (1, 6),
  (1, 7),
  (1, 8),
  (1, 9),
  (2, 3),
  (2, 4),
  (2, 6),
  (2, 7),
  (2, 8),
  (2, 9),
  (3, 4),
  (3, 6),
  (3, 7),
  (3, 8),
  (3, 9),
  (4, 6),
  (4, 7),
  (4, 8),
  (4, 9),
  (6, 7),
  (6, 8),
  (6, 9),
  (7, 8),
  (7, 9),
  (8, 9)]}
更新2或3:这是期望的输出吗??
这个问题可以分为两部分。首先,构造一个将标识符映射到地址列表的字典。其次,生成每个列表的长度为2的组合

from collections import defaultdict
from itertools import combinations

# get lines from your file
f = open('input_file.txt')
lines = f.readlines()
f.close()

# build mapping from file
iden_mapping = defaultdict(list)
for row in lines[1:]:
    iden, addr = row.split()
    iden_mapping[iden].append(addr)

# generate combination from address lists
for iden in sorted(iden_mapping):
    for c in combinations(iden_mapping[iden], 2):
        print(c)

numpy有一个函数loadtxt,所以您不必处理它。对于numpy解决方案,请检查或仅使用
pandas
来处理
DataFrame
sy您可以使用您的代码生成这些组合,其中有几行代码不是有效的Python代码。它们看起来像伪代码,所以我把它们变成了注释。从(2,6)开始,您的输出中似乎有几行太多了。非常感谢您的快速回复。我会检查一下,然后再给你回复。缺少一些输出组合。对于id 301和302,我们有8个数字(1,2,3,4,6,7,8,9),并且成对地,应该有8C2=28个组合。请看我想要的输出列表。谢谢。@rubz我想你算错了:ID301我们有4个数字=6个组合,ID302,5个数字=10个组合。总共16个组合。或者你在想别的什么?比较输出(16个组合)和输出2(28个组合)-一个是每个id(301302)的组合,另一个是id为301+id 302的组合。我想要的是输出2。如果它们是以列方式打印的,这将非常有帮助。谢谢。非常感谢您的快速回复。但是,如果您考虑301,我请求四个数字的成对组合。所以,对组合的总数应该是4C2=6。请重新检查我想要的输出。您的输出结构看起来不错!如果能在第一列中显示身份证号码,那就太好了。谢谢你的快速解决方案。但它无法拆分行值,导致回溯:第12行,在iden中,addr=row.split()值错误:没有足够的值来解包(预期为2,得到0)
{'301': [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)],
 '302': [(6, 7),
  (6, 8),
  (6, 9),
  (6, 1),
  (7, 8),
  (7, 9),
  (7, 1),
  (8, 9),
  (8, 1),
  (9, 1)]} 
{'301,302': [(1, 2),
  (1, 3),
  (1, 4),
  (1, 6),
  (1, 7),
  (1, 8),
  (1, 9),
  (2, 3),
  (2, 4),
  (2, 6),
  (2, 7),
  (2, 8),
  (2, 9),
  (3, 4),
  (3, 6),
  (3, 7),
  (3, 8),
  (3, 9),
  (4, 6),
  (4, 7),
  (4, 8),
  (4, 9),
  (6, 7),
  (6, 8),
  (6, 9),
  (7, 8),
  (7, 9),
  (8, 9)]}
import pandas as pd
import io
import itertools
from collections import OrderedDict

file="""id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1
303 14
303 12"""

df= pd.read_csv(io.StringIO(file), sep=" ")

b = OrderedDict()

for id_,addr in df.values.tolist():
    b.setdefault(str(id_),[]).append((id_,addr))

pairs = [(list(b.keys())[i],list(b.keys())[i+1]) for i in range(len(list(b.keys()))-1)]

output = {}
for pair in pairs:
    output[pair] = [[(i[0][0],i[1][0]),i[0][1],i[1][1]] for i in list(itertools.combinations(b[pair[0]]+b[pair[1]], 2))]

output    

{('301', '302'): [[(301, 301), 1, 2],
  [(301, 301), 1, 3],
  [(301, 301), 1, 4],
  [(301, 302), 1, 6],
  [(301, 302), 1, 7],
  [(301, 302), 1, 8],
  [(301, 302), 1, 9],
  [(301, 302), 1, 1],
  [(301, 301), 2, 3],
  [(301, 301), 2, 4],
  [(301, 302), 2, 6],
  [(301, 302), 2, 7],
  [(301, 302), 2, 8],
  [(301, 302), 2, 9],
  [(301, 302), 2, 1],
  [(301, 301), 3, 4],
  [(301, 302), 3, 6],
  [(301, 302), 3, 7],
  [(301, 302), 3, 8],
  [(301, 302), 3, 9],
  [(301, 302), 3, 1],
  [(301, 302), 4, 6],
  [(301, 302), 4, 7],
  [(301, 302), 4, 8],
  [(301, 302), 4, 9],
  [(301, 302), 4, 1],
  [(302, 302), 6, 7],
  [(302, 302), 6, 8],
  [(302, 302), 6, 9],
  [(302, 302), 6, 1],
  [(302, 302), 7, 8],
  [(302, 302), 7, 9],
  [(302, 302), 7, 1],
  [(302, 302), 8, 9],
  [(302, 302), 8, 1],
  [(302, 302), 9, 1]],
 ('302', '303'): [[(302, 302), 6, 7],
  [(302, 302), 6, 8],
  [(302, 302), 6, 9],
  [(302, 302), 6, 1],
  [(302, 303), 6, 14],
  [(302, 303), 6, 12],
  [(302, 302), 7, 8],
  [(302, 302), 7, 9],
  [(302, 302), 7, 1],
  [(302, 303), 7, 14],
  [(302, 303), 7, 12],
  [(302, 302), 8, 9],
  [(302, 302), 8, 1],
  [(302, 303), 8, 14],
  [(302, 303), 8, 12],
  [(302, 302), 9, 1],
  [(302, 303), 9, 14],
  [(302, 303), 9, 12],
  [(302, 303), 1, 14],
  [(302, 303), 1, 12],
  [(303, 303), 14, 12]]}
from collections import defaultdict
from itertools import combinations

# get lines from your file
f = open('input_file.txt')
lines = f.readlines()
f.close()

# build mapping from file
iden_mapping = defaultdict(list)
for row in lines[1:]:
    iden, addr = row.split()
    iden_mapping[iden].append(addr)

# generate combination from address lists
for iden in sorted(iden_mapping):
    for c in combinations(iden_mapping[iden], 2):
        print(c)