Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/339.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫交叉连接没有公共列_Python_Pandas - Fatal编程技术网

Python 熊猫交叉连接没有公共列

Python 熊猫交叉连接没有公共列,python,pandas,Python,Pandas,您将如何使用pandas执行两个数据帧的完全外部联接和交叉联接,其中没有公共列 在MySQL中,您只需执行以下操作: SELECT * FROM table_1 [CROSS] JOIN table_2; 但在熊猫身上,做: df_1.merge(df_2, how='outer') 给出一个错误: MergeError: No common columns to perform merge on 到目前为止,我拥有的最好的解决方案是使用sqlite: import sqlalchemy

您将如何使用pandas执行两个数据帧的完全外部联接和交叉联接,其中没有公共列

在MySQL中,您只需执行以下操作:

SELECT *
FROM table_1
[CROSS] JOIN table_2;
但在熊猫身上,做:

df_1.merge(df_2, how='outer')
给出一个错误:

MergeError: No common columns to perform merge on

到目前为止,我拥有的最好的解决方案是使用
sqlite

import sqlalchemy as sa engine = sa.create_engine('sqlite:///tmp.db') df_1.to_sql('df_1', engine) df_2.to_sql('df_2', engine) df = pd.read_sql_query('SELECT * FROM df_1 JOIN df_2', engine) 将sqlalchemy作为sa导入 引擎=sa。创建引擎('sqlite:///tmp.db') df_1.to_sql('df_1',引擎) df_2.to_sql('df_2',引擎) df=pd.read\u sql\u查询('SELECT*FROM df\u 1 JOIN df\u 2',engine)
即使在MySQL中,您也必须指定要加入哪些字段

例如:

SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);
熊猫也有同样的概念:

Parameters: 
right : DataFrame
how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
left: use only keys from left frame (SQL: left outer join)
right: use only keys from right frame (SQL: right outer join)
outer: use union of keys from both frames (SQL: full outer join)
inner: use intersection of keys from both frames (SQL: inner join)
on : label or list
Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
left_on : label or list, or array-like
Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns
right_on : label or list, or array-like
Field names to join on in right DataFrame or vector/list of vectors per left_on docs

IIUC您需要两个
数据帧的临时列
tmp

import pandas as pd

df1 = pd.DataFrame({'fld1': ['x', 'y'],
                'fld2': ['a', 'b1']})


df2 = pd.DataFrame({'fld3': ['y', 'x', 'y'],
                'fld4': ['a', 'b1', 'c2']})

print df1
  fld1 fld2
0    x    a
1    y   b1

print df2
  fld3 fld4
0    y    a
1    x   b1
2    y   c2

df1['tmp'] = 1
df2['tmp'] = 1

df = pd.merge(df1, df2, on=['tmp'])
df = df.drop('tmp', axis=1)
print df
  fld1 fld2 fld3 fld4
0    x    a    y    a
1    x    a    x   b1
2    x    a    y   c2
3    y   b1    y    a
4    y   b1    x   b1
5    y   b1    y   c2

在MySQL中,ON子句。没有它,MySQL将执行交叉连接。我对问题进行了编辑以使其更清晰。看起来像是黑客攻击,但它有效!而且比sqlite快。希望有一个更干净的方法来做到这一点…:(这是需要的,所以在以后的版本中,
pandas