使用dsbulk将json数据加载到Cassandra
我觉得dsbulk文档中缺少将json文件加载到cassandra中的文档 以下是我尝试加载的json文件的一部分:使用dsbulk将json数据加载到Cassandra,cassandra,datastax,dsbulk,Cassandra,Datastax,Dsbulk,我觉得dsbulk文档中缺少将json文件加载到cassandra中的文档 以下是我尝试加载的json文件的一部分: [ { "tags": [ "r" ], "owner": { "reputation": 23, "user_id": 12235281, "user_type": "registered", "profile_image": "https://www.gravatar.com/avatar
[
{
"tags": [
"r"
],
"owner": {
"reputation": 23,
"user_id": 12235281,
"user_type": "registered",
"profile_image": "https://www.gravatar.com/avatar/60e28f52215bff12adb9758fc2cf86dd?s=128&d=identicon&r=PG&f=1",
"display_name": "Me28",
"link": "https://stackoverflow.com/users/12235281/me28"
},
"is_answered": false,
"view_count": 3,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053659,
"creation_date": 1589053659,
"question_id": 61702762,
"link": "https://stackoverflow.com/questions/61702762/merge-dataframes-in-r-with-different-size-and-condition",
"title": "Merge dataframes in R with different size and condition"
},
{
"tags": [
"python",
"location",
"pyautogui"
],
"owner": {
"reputation": 1,
"user_id": 13507535,
"user_type": "registered",
"profile_image": "https://lh3.googleusercontent.com/a-/AOh14GgtdM9KrbH3X5Z33RCtz6xm_TJUSQS_S31deNYUcA=k-s128",
"display_name": "lowhatex",
"link": "https://stackoverflow.com/users/13507535/lowhatex"
},
"is_answered": false,
"view_count": 2,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053657,
"creation_date": 1589053657,
"question_id": 61702761,
"link": "https://stackoverflow.com/questions/61702761/want-to-get-a-grip-of-this-pyautogui-command",
"title": "Want to get a grip of this pyautogui command"
}
]
我尝试加载此文件的方式如下:
dsbulk load-url./data\u so1.json-k stackoverflow\u t-t staging\u t-h'182.14.0.1'-头false-u username-p password
这是我得到的最接近的值,它将值逐行推入Cassandra,如下所示:
data
-------------------------------------------------------------------------------------------------------------------------------
"title": "'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine giving exception on client"
"profile_image": "https://www.gravatar.com/avatar/05085ede54486bdaebefcf8363e081e2?s=128&d=identicon&r=PG&f=1",
"view_count": 422,
"question_id": 61702768,
"user_id": 12235281,
这只是按原样接收行(包括逗号)。我尝试过使用-m键进行映射,但没有真正实现
如何正确地将这些值添加到各自的列中?命令行中有几个不一致之处:1)需要使用-c JSON指定JSON连接器;2) -标题false为CSV,您可以将其删除;3) 根据文件的结构,您需要添加--connector.json.mode SINGLE_DOCUMENT@adutra我查看了Brian的博客文章系列,没有找到任何JSON示例。“也许我们需要多发一条帖子?”阿杜特拉谢谢,这很有效。现在我已经介绍了您,您知道在dsbulk映射部分的“owner”部分中获取这些嵌套值的好技巧吗?您不能将嵌套的json结构映射到单个列。您需要将整个“owner”部分映射到具有相同结构的UDT(用户定义类型)。