从Pig连接到Cassandra

从Pig连接到Cassandra,cassandra,apache-pig,Cassandra,Apache Pig,我正在尝试从pig连接到Cassandra。 但Cassandra安装在不同的集群中,我需要连接才能从pig远程连接到Cassandra 我指的是以下链接 像这样得到错误 Failed to parse: Can not retrieve schema from loader org.apache.cassandra.hadoop.pig.CqlStorage@1216d9bf at org.apache.pig.parser.QueryParserDriver.parse(QueryP

我正在尝试从pig连接到Cassandra。 但Cassandra安装在不同的集群中,我需要连接才能从pig远程连接到Cassandra

我指的是以下链接

像这样得到错误

Failed to parse: Can not retrieve schema from loader org.apache.cassandra.hadoop.pig.CqlStorage@1216d9bf
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688)
    at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421)
    at org.apache.pig.PigServer.parseAndBuild(PigServer.java:354)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:379)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:365)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:484)
    at org.apache.pig.Main.main(Main.java:158)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我的猪脚本如下

A=LOAD'cql://userName:password/mykeyspace/mycolumnfamily' 
使用org.apache.cassandra.hadoop.pig.CqlStorage()
AS(用户id:long,fname:chararray,上次更新日期:chararray,lname:chararray);
倾倒垃圾


请让我知道我们必须在哪里提供安装Cassandra的系统的ip

我在互联网上搜索到的东西是

用Pig查询Cassandra

通过Datasax Enterprise启动pig客户端。

除了在分析模式下启动集群外,无需任何设置

 (14:52:17)[~/BlogPosts/CassPig_Libraries]dse pig
 2013-08-26 14:52:27,166 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/russellspitzer/BlogPosts/CassPig_Libraries/pig_1377553947163.log
 2013-08-26 14:52:27,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: cfs://127.0.0.1/
 2013-08-26 14:52:27.488 java[64588:1503] Unable to load realm info from SCDynamicStore
 2013-08-26 14:52:28,348 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 127.0.0.1:8012
grunt>

Next we construct our pig commands, starting with loading our data from Cassandra. We’ll be using the cql:// url and the CqlStorage() connector. The format of the command is basically load ‘cql://keyspace/table’. More info on CQL3 and Pig.


grunt> libdata = load 'cql://libdata/libout' USING CqlStorage(); 
grunt> DESCRIBE libdata;
将以下内容设置为环境变量(大写, 下划线),或作为Hadoop配置变量(小写、虚线):

 * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
 * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
 * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
例如,对于具有默认设置的本地节点,您可以使用:

 export PIG_INITIAL_ADDRESS=localhost
 export PIG_RPC_PORT=9160
 export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
如果对输入和输出使用不同的群集,则可以使用以下内容覆盖这些属性:

 * PIG_INPUT_INITIAL_ADDRESS : initial address to connect to for reading
 * PIG_INPUT_RPC_PORT : the port thrift is listening on for reading
 * PIG_INPUT_PARTITIONER : cluster partitioner for reading
 * PIG_OUTPUT_INITIAL_ADDRESS : initial address to connect to for writing
 * PIG_OUTPUT_RPC_PORT : the port thrift is listening on for writing
 * PIG_OUTPUT_PARTITIONER : cluster partitioner for writing
有关更多参考,请参阅下面的URL

希望这有帮助