Python 如何减少这种数据帧连接代码

Python 如何减少这种数据帧连接代码,python,pandas,knowledge-graph,Python,Pandas,Knowledge Graph,下面有一个元组列表形式的模板,我将使用DataFrameJoin对其进行实例化 rule = [('#1', 'X', 'Y'), ('#2', 'X', 'Z'), ('#3', 'Z', 'Y')] 我还有一个作为字典的模板的每个组件的实例 rComp_substitution = {('#1', 'X', 'Y'): pred subj obj 0 nationality BART USA, ('#2', 'X

下面有一个元组列表形式的模板,我将使用DataFrameJoin对其进行实例化

rule = [('#1', 'X', 'Y'), ('#2', 'X', 'Z'), ('#3', 'Z', 'Y')]
我还有一个作为字典的模板的每个组件的实例

rComp_substitution =

{('#1', 'X', 'Y'):           pred  subj  obj
                   0  nationality  BART  USA, 
 
 ('#2', 'X', 'Z'):            pred  subj      obj
                   0  placeOfBirth  BART  NEWYORK
                   1     hasFather  BART   HOMMER, 
 
 ('#3', 'Z', 'Y'):           pred     subj  obj
                   0    locatedIn  NEWYORK  USA
                   1  nationality   HOMMER  USA }
每个组件对应的实例是一个dataframe,有三列。对于
(“#1”、“X”、“Y”)
#1
对应于
pred
X
对应于
suba
Y
对应于
obj

例如,首先实例化('#1',X',Y'),('#2',X',Z')

我们可以检查('#1',X',Y')和('#2',X',Z')的公共变量

并用一个键连接每个数据帧的公共变量X(subc),以获得('#1',X',Y'),('#2',X',Z')的实例

下面是我的代码

depth = 0    
# step1 check common variable
current_subj = rule[depth][1] #['X']
current_obj = rule[depth][2] #['Y']
next_subj = rule[depth+1][1] #['X']
next_obj = rule[depth+1][2] #['Z']
if current_subj == next_subj or current_subj == next_obj:
    comVar = current_subj
elif current_obj == next_subj or current_obj == next_obj:
    comVar = current_obj

# step2 Create currnt_rComp with common variable for joining dataframes
current_rComp = rComp_substitution[rule[depth]]
unified_rComp = []
for col in current_rComp.itertuples(index=False):
    if comVar == current_subj:
        unified_rComp.append([col.subj, [list(col)]])
    elif comVar == current_obj:
        unified_rComp.append([col.obj, [list(col)]])
current_rComp = pd.DataFrame(unified_rComp, columns=['comVar', 'triples'])

# step3 Create next_rComp with common variable for joining dataframes
next_rComp = rComp_substitution[rule[depth+1]]
unified_rComp = []
for col in next_rComp.itertuples(index=False):
    if comVar == next_subj:
        unified_rComp.append([col.subj, [list(col)]])
    elif comVar == next_obj:
        unified_rComp.append([col.obj, [list(col)]])
next_rComp = pd.DataFrame(unified_rComp, columns=['comVar', 'triples'])

# step4 Join currnt_rComp and next_rComp with common variable as key
partial_proof_path = pd.merge(current_rComp, next_rComp, how='inner', on='comVar')
print(partial_proof_path)
此代码输出为

  comVar                   triples_x                        triples_y
0   BART  [[nationality, BART, USA]]  [[placeOfBirth, BART, NEWYORK]]
1   BART  [[nationality, BART, USA]]      [[hasFather, BART, HOMMER]]
我认为这段代码太长了。有没有一种方法可以对更简单的代码执行同样的操作?

输入数据:

rComp_substitution={('1','X','Y'):pd.DataFrame({'pred':['national'],'subc':['BART'],'obj':['USA']}),
(“#2”,“X”,“Z”):pd.数据帧({'pred':['placeOfBirth','hasdafter','subc':['BART','BART'],'obj':['NEWYORK','HOMMER']}),
(“#3”,“Z”,“Y”):pd.数据帧({'pred':['locatedIn','national'],'subc':['NEWYORK','HOMMER'],'obj':['USA','USA']})
规则=列表(rComp\u substitution.keys())
主要功能:

def merge_from_common_key(规则0,规则1):
#加载数据帧
df0=rComp_替换[rule0]
df1=rComp_替换[规则1]
#按规则重命名[“pred”、“subc”、“obj”]
df0.columns=rule0
df1.columns=rule1
#找到公共密钥并合并两个数据帧
key=df0.columns.intersection(df1.columns.tolist())
df=pd.merge(df0,df1,on=key)
#构建新的数据框架
返回pd.DataFrame({“common”:df[“X”].values.tolist(),
“left”:df[list(rules[0])].values.tolist(),
“right”:df[list(rules[1])].values.tolist()})
用法:

>>从公共密钥合并密钥(规则[0],规则[1])
公共左-右
0巴特[国籍,巴特,美国][出生地,巴特,纽约]
1巴特[国籍,巴特,美国][父亲,巴特,霍默]