Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python clickhouse驱动程序:ValueError:参数应为dict格式_Python_Clickhouse - Fatal编程技术网

Python clickhouse驱动程序:ValueError:参数应为dict格式

Python clickhouse驱动程序:ValueError:参数应为dict格式,python,clickhouse,Python,Clickhouse,我有一些ETL,它使用clickhouse驱动程序将数据保存到clickhouse 保存函数的外观与此完全相同: def insert_data(data: Iterable[Dict], table: str, client: Client = None): columns = get_table_cols(table) client = client or get_ch_client(0) query = f"insert into {table} ({',

我有一些ETL,它使用clickhouse驱动程序将数据保存到clickhouse

保存函数的外观与此完全相同:

def insert_data(data: Iterable[Dict], table: str, client: Client = None):
    columns = get_table_cols(table)
    client = client or get_ch_client(0)
    query = f"insert into {table} ({', '.join(columns)}) values"
    data = map(lambda row: {key: row[key] for key in columns}, data)
    client.execute(query, data)
def save_data(data: DataFrame, client: Client):

    mapper = get_mapper(my_table_map)
    data = map(lambda x: {col_new: getattr(x, col_old)
                          for col_old, col_new in map_dataframe_to_ch.items()},
               data.collect())
    data = map(mapper, data)
    insert_data(data, 'my_table_name', client)
def map_row(row: Dict[str, Any]) -> Dict[str, Any]:
    nonlocal map_
    return {key: map_[key](val) for key, val in row.items()}
调用
insert\u data
的函数如下所示:

def insert_data(data: Iterable[Dict], table: str, client: Client = None):
    columns = get_table_cols(table)
    client = client or get_ch_client(0)
    query = f"insert into {table} ({', '.join(columns)}) values"
    data = map(lambda row: {key: row[key] for key in columns}, data)
    client.execute(query, data)
def save_data(data: DataFrame, client: Client):

    mapper = get_mapper(my_table_map)
    data = map(lambda x: {col_new: getattr(x, col_old)
                          for col_old, col_new in map_dataframe_to_ch.items()},
               data.collect())
    data = map(mapper, data)
    insert_data(data, 'my_table_name', client)
def map_row(row: Dict[str, Any]) -> Dict[str, Any]:
    nonlocal map_
    return {key: map_[key](val) for key, val in row.items()}
get\u mapper
返回如下所示的映射函数:

def insert_data(data: Iterable[Dict], table: str, client: Client = None):
    columns = get_table_cols(table)
    client = client or get_ch_client(0)
    query = f"insert into {table} ({', '.join(columns)}) values"
    data = map(lambda row: {key: row[key] for key in columns}, data)
    client.execute(query, data)
def save_data(data: DataFrame, client: Client):

    mapper = get_mapper(my_table_map)
    data = map(lambda x: {col_new: getattr(x, col_old)
                          for col_old, col_new in map_dataframe_to_ch.items()},
               data.collect())
    data = map(mapper, data)
    insert_data(data, 'my_table_name', client)
def map_row(row: Dict[str, Any]) -> Dict[str, Any]:
    nonlocal map_
    return {key: map_[key](val) for key, val in row.items()}
所以基本上,最后我有一些嵌套的生成器来生成字典。为了证实这一点,如果我把
print(next(data))
放在
client.execute之前,我会得到我期望的dict。以下是隐藏敏感信息的示例:

{'account_currency': '***', 
 'instrument': '***',
 'operation': 'open',
 'event_time': datetime.datetime(2020, 7, 7, 19, 11, 49),
 'country': 'CN',
 'region': 'Asia and Pacific',
 'registration_source': '***',
 'account_type': '***',
 'platform': '***',
 'server_key': '***'}
表模式如下所示:

"account_currency": "String",
"instrument": "String",
"operation": "String",
"event_time": "DateTime",
"country": "String",
"region": "String",
"registration_source": "String",
"account_type": "String",
"platform": "String",
"server_key": "String"
def _g():
    yield 1
GeneratorType = type(_g())
但不管是什么原因,我都会犯这样的错误:

  File "src/etl/usd_volume/prepare_users.py", line 356, in <module>
    main()
  File "src/etl/usd_volume/prepare_users.py", line 348, in main
    save_data(data, client)
  File "src/etl/usd_volume/prepare_users.py", line 302, in save_data
    insert_data(data, 'report_traded_volume_usd', client)
  File "/root/data/src/common/clickhouse_helper.py", line 14, in insert_data
    client.execute(query, data)
  File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/client.py", line 224, in execute
    columnar=columnar
  File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/client.py", line 341, in process_ordinary_query
    query = self.substitute_params(query, params)
  File "/usr/local/lib/python3.6/dist-packages/clickhouse_driver/client.py", line 422, in substitute_params
    raise ValueError('Parameters are expected in dict form')
我真的没找到他们检查发电机的地方。 Clickhouse驱动程序版本为0.1.4


非常感谢您对这个问题的任何帮助。

好的,对源代码的进一步研究揭示了根本原因

抛出error
substitute\u params
的函数在
Client
类的
process\u common\u query
方法中调用。基本上,除了INSERT之外,任何查询都会调用此方法

这部分
execute
方法检查查询符号为INSERT或任何其他:

is_insert = isinstance(params, (list, tuple, types.GeneratorType))

if is_insert:
    rv = self.process_insert_query(
        query, params, external_tables=external_tables,
        query_id=query_id, types_check=types_check,
        columnar=columnar
    )
else:
    rv = self.process_ordinary_query(
        query, params=params, with_column_types=with_column_types,
        external_tables=external_tables,
        query_id=query_id, types_check=types_check,
        columnar=columnar
    )
关键是
isinstance(参数(列表、元组、类型.GeneratorType))

类型。GeneratorType
的定义如下:

"account_currency": "String",
"instrument": "String",
"operation": "String",
"event_time": "DateTime",
"country": "String",
"region": "String",
"registration_source": "String",
"account_type": "String",
"platform": "String",
"server_key": "String"
def _g():
    yield 1
GeneratorType = type(_g())
这就导致了:

>>>GeneratorType
<class 'generator'>
因此,避免此问题的最简单解决方案是简单地将
数据
转换为具有生成器理解功能的生成器。这就彻底解决了问题

>>>data = (i for i in data)
>>>isinstance(data, GeneratorType)
True
或者,如果要以独占方式执行INSERT查询,可以直接调用
process\u INSERT\u query
,这样就不需要将数据转换为生成器


我认为clickhouse驱动程序的类型检查有点模棱两可,但这就是我们所拥有的。

您能提供CH表的模式吗?使用的clickhouse驱动程序的版本是什么?在insert\u data
data
中,是一个生成器吗?是否尝试制作
数据
列表?执行(查询、列表(数据))
。是的,它是一个生成器。我不会把它列为一个列表,因为它的长度将是数百万条,它必须是一个发电机。另外,我不认为这是问题所在,因为根据源代码
数据
基本上应该是可移植的,无论是哪种类型的。