如何用python编写数学组合公式编码_Python_Python 2.7_Python 3.x

如何用python编写数学组合公式编码

python python-2.7 python-3.x

如何用python编写数学组合公式编码,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我的数据集如下所示：输入文件：我需要Python代码打印出所有可能的addr值与公共id的成对组合。主测试文件中有数百万个id和相应的addr值记录。因此，代码应该能够从文本文件中读取列。输出如下（仅显示301和302，其余将继续此模式）：到目前为止，我已经完成了以下工作，但我不知道如何对成对组合部分进行编码。我是Python新手，如果有人能帮我做一些解释，我将不胜感激 # coding: utf-8 # sample tested in python 3.6 import sys f

我的数据集如下所示：

输入文件：我需要Python代码打印出所有可能的

addr

值与公共

id

的成对组合。主测试文件中有数百万个id和相应的addr值记录。因此，代码应该能够从文本文件中读取列。输出如下（仅显示301和302，其余将继续此模式）：

到目前为止，我已经完成了以下工作，但我不知道如何对成对组合部分进行编码。我是Python新手，如果有人能帮我做一些解释，我将不胜感激

# coding: utf-8

# sample tested in python 3.6

import sys
from pip._vendor.pyparsing import empty

if len(sys.argv) < 2:
    sys.stderr.write("Usage: {0} filename\n".format(sys.argv[0]))
    sys.exit()

fn = sys.argv[1]
sys.stderr.write("reading " + fn + "...\n")

# Initialize empty set 
s = {}
line= 0
fin = open(fn,"r")
for line in fin:
    line = line.rstrip()
    f = line.split("\t")
    line +=1
    if line is 1:
        txid_prev = line 
        addr = line 
        s= addr
        continue
    txid=line
    txid_prev=line
    if txid is txid_prev:
        s.push(addr)
    else:
        # connect all pairs in s
        # print all pairs as edges
        s=addr
    txid_prev=txid
if s is not empty:
    # connect and print all edges

#编码：utf-8
#在Python3.6中测试的示例
导入系统
从pip.\u vendor.py解析导入为空
如果len（系统argv）<2：
sys.stderr.write（“用法：{0}文件名\n.”格式（sys.argv[0]））
sys.exit（）
fn=sys.argv[1]
sys.STDER.write（“读取”+fn+“..\n”）
#初始化空集
s={}
直线=0
鳍=打开（fn，“r”）
对于fin中的行：
line=line.rstrip（）
f=行分割（“\t”）
行+=1
如果行为1：
txid_prev=线路
地址=行
s=地址
持续
txid=直线
txid_prev=线路
如果txid为txid_prev：
s、 推送（地址）
其他：
#以s连接所有对
#将所有对打印为边
s=地址
txid_prev=txid
如果s不是空的：
#连接并打印所有边

这个怎么样（如果你不在乎重复的话）

输出：

     addr  addr_2
id               
301     1       1
301     1       2
301     1       3
301     1       4
301     2       1
301     2       2
301     2       3
301     2       4
301     3       1
...

像这样的怎么样：

import pandas as pd
import io
import itertools

file="""id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1"""

df= pd.read_csv(io.StringIO(file), sep=" ")

for key,value in df.set_index("addr").groupby("id").groups.items():
    print(key)
    for item in list(itertools.combinations(value.values, 2)):
        print("{} {}".format(*item))

印刷品：

或者，我们可以将这些值放入字典中：

a = {} 

for id_,addr in df.values.tolist():
    a.setdefault(str(id_),[]).append(addr)

output = {key:list(itertools.combinations(value, 2)) for key,value in a.items()}


def return_combos(dict_, keys):
    values = []
    for i in keys:
        values.append(a[i])
    values = list(set([i for item in values for i in item]))
    return {','.join(keys):list(itertools.combinations(values, 2))}


output2 = return_combos(a, ["301","302"])

输出打印：

{'301': [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)],
 '302': [(6, 7),
  (6, 8),
  (6, 9),
  (6, 1),
  (7, 8),
  (7, 9),
  (7, 1),
  (8, 9),
  (8, 1),
  (9, 1)]}

同时输出2个输出：

{'301,302': [(1, 2),
  (1, 3),
  (1, 4),
  (1, 6),
  (1, 7),
  (1, 8),
  (1, 9),
  (2, 3),
  (2, 4),
  (2, 6),
  (2, 7),
  (2, 8),
  (2, 9),
  (3, 4),
  (3, 6),
  (3, 7),
  (3, 8),
  (3, 9),
  (4, 6),
  (4, 7),
  (4, 8),
  (4, 9),
  (6, 7),
  (6, 8),
  (6, 9),
  (7, 8),
  (7, 9),
  (8, 9)]}

更新2或3：这是期望的输出吗？？

这个问题可以分为两部分。首先，构造一个将标识符映射到地址列表的字典。其次，生成每个列表的长度为2的组合

from collections import defaultdict
from itertools import combinations

# get lines from your file
f = open('input_file.txt')
lines = f.readlines()
f.close()

# build mapping from file
iden_mapping = defaultdict(list)
for row in lines[1:]:
    iden, addr = row.split()
    iden_mapping[iden].append(addr)

# generate combination from address lists
for iden in sorted(iden_mapping):
    for c in combinations(iden_mapping[iden], 2):
        print(c)

numpy有一个函数loadtxt，所以您不必处理它。对于numpy解决方案，请检查或仅使用

pandas

来处理

DataFrame

sy您可以使用您的代码生成这些组合，其中有几行代码不是有效的Python代码。它们看起来像伪代码，所以我把它们变成了注释。从（2,6）开始，您的输出中似乎有几行太多了。非常感谢您的快速回复。我会检查一下，然后再给你回复。缺少一些输出组合。对于id 301和302，我们有8个数字（1,2,3,4,6,7,8,9），并且成对地，应该有8C2=28个组合。请看我想要的输出列表。谢谢。@rubz我想你算错了：ID301我们有4个数字=6个组合，ID302，5个数字=10个组合。总共16个组合。或者你在想别的什么？比较输出（16个组合）和输出2（28个组合）-一个是每个id（301302）的组合，另一个是id为301+id 302的组合。我想要的是输出2。如果它们是以列方式打印的，这将非常有帮助。谢谢。非常感谢您的快速回复。但是，如果您考虑301，我请求四个数字的成对组合。所以，对组合的总数应该是4C2=6。请重新检查我想要的输出。您的输出结构看起来不错！如果能在第一列中显示身份证号码，那就太好了。谢谢你的快速解决方案。但它无法拆分行值，导致回溯：第12行，在iden中，addr=row.split（）值错误：没有足够的值来解包（预期为2，得到0）

{'301': [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)],
 '302': [(6, 7),
  (6, 8),
  (6, 9),
  (6, 1),
  (7, 8),
  (7, 9),
  (7, 1),
  (8, 9),
  (8, 1),
  (9, 1)]}

{'301,302': [(1, 2),
  (1, 3),
  (1, 4),
  (1, 6),
  (1, 7),
  (1, 8),
  (1, 9),
  (2, 3),
  (2, 4),
  (2, 6),
  (2, 7),
  (2, 8),
  (2, 9),
  (3, 4),
  (3, 6),
  (3, 7),
  (3, 8),
  (3, 9),
  (4, 6),
  (4, 7),
  (4, 8),
  (4, 9),
  (6, 7),
  (6, 8),
  (6, 9),
  (7, 8),
  (7, 9),
  (8, 9)]}

import pandas as pd
import io
import itertools
from collections import OrderedDict

file="""id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1
303 14
303 12"""

df= pd.read_csv(io.StringIO(file), sep=" ")

b = OrderedDict()

for id_,addr in df.values.tolist():
    b.setdefault(str(id_),[]).append((id_,addr))

pairs = [(list(b.keys())[i],list(b.keys())[i+1]) for i in range(len(list(b.keys()))-1)]

output = {}
for pair in pairs:
    output[pair] = [[(i[0][0],i[1][0]),i[0][1],i[1][1]] for i in list(itertools.combinations(b[pair[0]]+b[pair[1]], 2))]

output    

{('301', '302'): [[(301, 301), 1, 2],
  [(301, 301), 1, 3],
  [(301, 301), 1, 4],
  [(301, 302), 1, 6],
  [(301, 302), 1, 7],
  [(301, 302), 1, 8],
  [(301, 302), 1, 9],
  [(301, 302), 1, 1],
  [(301, 301), 2, 3],
  [(301, 301), 2, 4],
  [(301, 302), 2, 6],
  [(301, 302), 2, 7],
  [(301, 302), 2, 8],
  [(301, 302), 2, 9],
  [(301, 302), 2, 1],
  [(301, 301), 3, 4],
  [(301, 302), 3, 6],
  [(301, 302), 3, 7],
  [(301, 302), 3, 8],
  [(301, 302), 3, 9],
  [(301, 302), 3, 1],
  [(301, 302), 4, 6],
  [(301, 302), 4, 7],
  [(301, 302), 4, 8],
  [(301, 302), 4, 9],
  [(301, 302), 4, 1],
  [(302, 302), 6, 7],
  [(302, 302), 6, 8],
  [(302, 302), 6, 9],
  [(302, 302), 6, 1],
  [(302, 302), 7, 8],
  [(302, 302), 7, 9],
  [(302, 302), 7, 1],
  [(302, 302), 8, 9],
  [(302, 302), 8, 1],
  [(302, 302), 9, 1]],
 ('302', '303'): [[(302, 302), 6, 7],
  [(302, 302), 6, 8],
  [(302, 302), 6, 9],
  [(302, 302), 6, 1],
  [(302, 303), 6, 14],
  [(302, 303), 6, 12],
  [(302, 302), 7, 8],
  [(302, 302), 7, 9],
  [(302, 302), 7, 1],
  [(302, 303), 7, 14],
  [(302, 303), 7, 12],
  [(302, 302), 8, 9],
  [(302, 302), 8, 1],
  [(302, 303), 8, 14],
  [(302, 303), 8, 12],
  [(302, 302), 9, 1],
  [(302, 303), 9, 14],
  [(302, 303), 9, 12],
  [(302, 303), 1, 14],
  [(302, 303), 1, 12],
  [(303, 303), 14, 12]]}

from collections import defaultdict
from itertools import combinations

# get lines from your file
f = open('input_file.txt')
lines = f.readlines()
f.close()

# build mapping from file
iden_mapping = defaultdict(list)
for row in lines[1:]:
    iden, addr = row.split()
    iden_mapping[iden].append(addr)

# generate combination from address lists
for iden in sorted(iden_mapping):
    for c in combinations(iden_mapping[iden], 2):
        print(c)