如何用python编写数学组合公式编码
我的数据集如下所示: 输入文件: 我需要Python代码打印出所有可能的如何用python编写数学组合公式编码,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我的数据集如下所示: 输入文件: 我需要Python代码打印出所有可能的addr值与公共id的成对组合。主测试文件中有数百万个id和相应的addr值记录。因此,代码应该能够从文本文件中读取列。输出如下(仅显示301和302,其余将继续此模式): 到目前为止,我已经完成了以下工作,但我不知道如何对成对组合部分进行编码。我是Python新手,如果有人能帮我做一些解释,我将不胜感激 # coding: utf-8 # sample tested in python 3.6 import sys f
addr
值与公共id
的成对组合。主测试文件中有数百万个id和相应的addr值记录。因此,代码应该能够从文本文件中读取列。输出如下(仅显示301和302,其余将继续此模式):
到目前为止,我已经完成了以下工作,但我不知道如何对成对组合部分进行编码。我是Python新手,如果有人能帮我做一些解释,我将不胜感激
# coding: utf-8
# sample tested in python 3.6
import sys
from pip._vendor.pyparsing import empty
if len(sys.argv) < 2:
sys.stderr.write("Usage: {0} filename\n".format(sys.argv[0]))
sys.exit()
fn = sys.argv[1]
sys.stderr.write("reading " + fn + "...\n")
# Initialize empty set
s = {}
line= 0
fin = open(fn,"r")
for line in fin:
line = line.rstrip()
f = line.split("\t")
line +=1
if line is 1:
txid_prev = line
addr = line
s= addr
continue
txid=line
txid_prev=line
if txid is txid_prev:
s.push(addr)
else:
# connect all pairs in s
# print all pairs as edges
s=addr
txid_prev=txid
if s is not empty:
# connect and print all edges
#编码:utf-8
#在Python3.6中测试的示例
导入系统
从pip.\u vendor.py解析导入为空
如果len(系统argv)<2:
sys.stderr.write(“用法:{0}文件名\n.”格式(sys.argv[0]))
sys.exit()
fn=sys.argv[1]
sys.STDER.write(“读取”+fn+“..\n”)
#初始化空集
s={}
直线=0
鳍=打开(fn,“r”)
对于fin中的行:
line=line.rstrip()
f=行分割(“\t”)
行+=1
如果行为1:
txid_prev=线路
地址=行
s=地址
持续
txid=直线
txid_prev=线路
如果txid为txid_prev:
s、 推送(地址)
其他:
#以s连接所有对
#将所有对打印为边
s=地址
txid_prev=txid
如果s不是空的:
#连接并打印所有边
这个怎么样(如果你不在乎重复的话)
输出:
addr addr_2
id
301 1 1
301 1 2
301 1 3
301 1 4
301 2 1
301 2 2
301 2 3
301 2 4
301 3 1
...
像这样的怎么样:
import pandas as pd
import io
import itertools
file="""id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1"""
df= pd.read_csv(io.StringIO(file), sep=" ")
for key,value in df.set_index("addr").groupby("id").groups.items():
print(key)
for item in list(itertools.combinations(value.values, 2)):
print("{} {}".format(*item))
印刷品:
301
1 2
1 3
1 4
2 3
2 4
3 4
302
6 7
6 8
6 9
6 1
7 8
7 9
7 1
8 9
8 1
9 1
或者,我们可以将这些值放入字典中:
a = {}
for id_,addr in df.values.tolist():
a.setdefault(str(id_),[]).append(addr)
output = {key:list(itertools.combinations(value, 2)) for key,value in a.items()}
def return_combos(dict_, keys):
values = []
for i in keys:
values.append(a[i])
values = list(set([i for item in values for i in item]))
return {','.join(keys):list(itertools.combinations(values, 2))}
output2 = return_combos(a, ["301","302"])
输出打印:
{'301': [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)],
'302': [(6, 7),
(6, 8),
(6, 9),
(6, 1),
(7, 8),
(7, 9),
(7, 1),
(8, 9),
(8, 1),
(9, 1)]}
同时输出2个输出:
{'301,302': [(1, 2),
(1, 3),
(1, 4),
(1, 6),
(1, 7),
(1, 8),
(1, 9),
(2, 3),
(2, 4),
(2, 6),
(2, 7),
(2, 8),
(2, 9),
(3, 4),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(4, 6),
(4, 7),
(4, 8),
(4, 9),
(6, 7),
(6, 8),
(6, 9),
(7, 8),
(7, 9),
(8, 9)]}
更新2或3:这是期望的输出吗??
这个问题可以分为两部分。首先,构造一个将标识符映射到地址列表的字典。其次,生成每个列表的长度为2的组合
from collections import defaultdict
from itertools import combinations
# get lines from your file
f = open('input_file.txt')
lines = f.readlines()
f.close()
# build mapping from file
iden_mapping = defaultdict(list)
for row in lines[1:]:
iden, addr = row.split()
iden_mapping[iden].append(addr)
# generate combination from address lists
for iden in sorted(iden_mapping):
for c in combinations(iden_mapping[iden], 2):
print(c)
numpy有一个函数loadtxt,所以您不必处理它。对于numpy解决方案,请检查或仅使用
pandas
来处理DataFrame
sy您可以使用您的代码生成这些组合,其中有几行代码不是有效的Python代码。它们看起来像伪代码,所以我把它们变成了注释。从(2,6)开始,您的输出中似乎有几行太多了。非常感谢您的快速回复。我会检查一下,然后再给你回复。缺少一些输出组合。对于id 301和302,我们有8个数字(1,2,3,4,6,7,8,9),并且成对地,应该有8C2=28个组合。请看我想要的输出列表。谢谢。@rubz我想你算错了:ID301我们有4个数字=6个组合,ID302,5个数字=10个组合。总共16个组合。或者你在想别的什么?比较输出(16个组合)和输出2(28个组合)-一个是每个id(301302)的组合,另一个是id为301+id 302的组合。我想要的是输出2。如果它们是以列方式打印的,这将非常有帮助。谢谢。非常感谢您的快速回复。但是,如果您考虑301,我请求四个数字的成对组合。所以,对组合的总数应该是4C2=6。请重新检查我想要的输出。您的输出结构看起来不错!如果能在第一列中显示身份证号码,那就太好了。谢谢你的快速解决方案。但它无法拆分行值,导致回溯:第12行,在iden中,addr=row.split()值错误:没有足够的值来解包(预期为2,得到0)
{'301': [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)],
'302': [(6, 7),
(6, 8),
(6, 9),
(6, 1),
(7, 8),
(7, 9),
(7, 1),
(8, 9),
(8, 1),
(9, 1)]}
{'301,302': [(1, 2),
(1, 3),
(1, 4),
(1, 6),
(1, 7),
(1, 8),
(1, 9),
(2, 3),
(2, 4),
(2, 6),
(2, 7),
(2, 8),
(2, 9),
(3, 4),
(3, 6),
(3, 7),
(3, 8),
(3, 9),
(4, 6),
(4, 7),
(4, 8),
(4, 9),
(6, 7),
(6, 8),
(6, 9),
(7, 8),
(7, 9),
(8, 9)]}
import pandas as pd
import io
import itertools
from collections import OrderedDict
file="""id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1
303 14
303 12"""
df= pd.read_csv(io.StringIO(file), sep=" ")
b = OrderedDict()
for id_,addr in df.values.tolist():
b.setdefault(str(id_),[]).append((id_,addr))
pairs = [(list(b.keys())[i],list(b.keys())[i+1]) for i in range(len(list(b.keys()))-1)]
output = {}
for pair in pairs:
output[pair] = [[(i[0][0],i[1][0]),i[0][1],i[1][1]] for i in list(itertools.combinations(b[pair[0]]+b[pair[1]], 2))]
output
{('301', '302'): [[(301, 301), 1, 2],
[(301, 301), 1, 3],
[(301, 301), 1, 4],
[(301, 302), 1, 6],
[(301, 302), 1, 7],
[(301, 302), 1, 8],
[(301, 302), 1, 9],
[(301, 302), 1, 1],
[(301, 301), 2, 3],
[(301, 301), 2, 4],
[(301, 302), 2, 6],
[(301, 302), 2, 7],
[(301, 302), 2, 8],
[(301, 302), 2, 9],
[(301, 302), 2, 1],
[(301, 301), 3, 4],
[(301, 302), 3, 6],
[(301, 302), 3, 7],
[(301, 302), 3, 8],
[(301, 302), 3, 9],
[(301, 302), 3, 1],
[(301, 302), 4, 6],
[(301, 302), 4, 7],
[(301, 302), 4, 8],
[(301, 302), 4, 9],
[(301, 302), 4, 1],
[(302, 302), 6, 7],
[(302, 302), 6, 8],
[(302, 302), 6, 9],
[(302, 302), 6, 1],
[(302, 302), 7, 8],
[(302, 302), 7, 9],
[(302, 302), 7, 1],
[(302, 302), 8, 9],
[(302, 302), 8, 1],
[(302, 302), 9, 1]],
('302', '303'): [[(302, 302), 6, 7],
[(302, 302), 6, 8],
[(302, 302), 6, 9],
[(302, 302), 6, 1],
[(302, 303), 6, 14],
[(302, 303), 6, 12],
[(302, 302), 7, 8],
[(302, 302), 7, 9],
[(302, 302), 7, 1],
[(302, 303), 7, 14],
[(302, 303), 7, 12],
[(302, 302), 8, 9],
[(302, 302), 8, 1],
[(302, 303), 8, 14],
[(302, 303), 8, 12],
[(302, 302), 9, 1],
[(302, 303), 9, 14],
[(302, 303), 9, 12],
[(302, 303), 1, 14],
[(302, 303), 1, 12],
[(303, 303), 14, 12]]}
from collections import defaultdict
from itertools import combinations
# get lines from your file
f = open('input_file.txt')
lines = f.readlines()
f.close()
# build mapping from file
iden_mapping = defaultdict(list)
for row in lines[1:]:
iden, addr = row.split()
iden_mapping[iden].append(addr)
# generate combination from address lists
for iden in sorted(iden_mapping):
for c in combinations(iden_mapping[iden], 2):
print(c)