Graph 将土卫六图像摄取到福努斯时发生的问题_Graph_Titan_Tinkerpop_Faunus

Graph 将土卫六图像摄取到福努斯时发生的问题

graph

Graph 将土卫六图像摄取到福努斯时发生的问题,graph,titan,tinkerpop,faunus,Graph,Titan,Tinkerpop,Faunus,我已经安装了Titan和Faunus，它们似乎都工作正常（Titan-0.4.4和Faunus-0.4.4）然而，在泰坦中摄取了一个相当大的图形并试图通过 FaunusFactory.open( ) 我遇到了一些问题。更准确地说，我似乎从FaunusFactory.open（）调用中得到了一个faunus图但是，即使是问一个简单的问题 g.v(10) 我确实得到了这个错误： Task Id : attempt_201407181049_0009_m_000000_0, Status

我已经安装了Titan和Faunus，它们似乎都工作正常（Titan-0.4.4和Faunus-0.4.4）

然而，在泰坦中摄取了一个相当大的图形并试图通过

FaunusFactory.open(    )

我遇到了一些问题。更准确地说，我似乎从FaunusFactory.open（）调用中得到了一个faunus图

但是，即使是问一个简单的问题

g.v(10)

我确实得到了这个错误：

Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)

我的属性文件直接从Faunus页面获取，带有Titan HBase输入，当然除了更改hadoop集群的url之外：

faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000

有人能帮忙吗

附录：

为了解决下面的评论，这里有一些上下文：正如我所提到的，我在Titan中有一个图表，可以对它执行基本的gremlin查询

然而，我确实需要运行一个gremlin全局查询，由于图的大小，它需要Faunus及其底层MR功能。因此需要进口。在我看来，我得到的错误并不是指向图形本身的不一致性。

我不确定你的Faunus“流”是否正确。如果您的最终结果是对图进行全局查询，那么考虑这种方法：

将图表拉到

对序列文件发出全局查询

更具体地说，创建hbase seq.properties：

# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000

# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true

在Faunus中，复制do：

g = FaunusFactory.open('hbase-seq.properties')
g._()

这将从hbase读取图形并将其写入HDFS中的序列文件。接下来，使用以下内容创建：

seq noop.properties

：

# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0

# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true

上面的配置将从上一步读取序列文件，而无需重新写入图形（这就是NoopOutFormat的用途）。现在在福努斯做：

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()

这将执行度分布，将结果写入HDFS中的'analysis'目录。显然，你可以在这里做任何你想做的Faunus风味的小精灵-我只是想提供一个例子。我认为从图形分析的角度来看，这是使用Faunus的一个相当标准的“流程”或模式。

获得一个FaunusGraph实例基本上意味着您的配置文件被正确使用了，但并不意味着它将实际执行。你能修改一下你要执行的小精灵吗？另外，您试图通过以下方法实现什么：

faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]

ok，我修改了查询：g.v（1000）。我刚刚输入了它，看看能不能得到一个顶点。我打算运行的实际查询并不是那么简单（它是对图中路径的全局查询），但首先我需要正确地摄取图。我解释了为什么我对这篇文章感兴趣。我为什么会犯这个错误？你的回答对我更好地理解如何使用福努斯很有用。它呈现得很好，也很清楚。不幸的是，它没有解决我当前的问题：我使用了您的hbase-seq.properties（仅更改IP），但它返回相同的错误。我很确定这个异常与整个设置有关，尽管我不知道在哪里：hadoop工作正常，hbase、titan和faunus也是如此（当它们单独使用时）。我将深入研究com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager…..好的-我认为它不一定能解决您的问题，但我想先弄清楚“流程”。读取序列文件是使用Faunus可以做的最简单（也是最常见）的事情。既然你这么做了，但仍然得到同样的错误，我同意这一定是设置中的某些东西。您提供的例外情况还有什么吗？也许在堆栈跟踪的更深处有一个“原因”异常？我的同事Varun在这里添加了完整的堆栈跟踪：（到目前为止没有答案）。这一例外已经出现在一些地方，正如谷歌所显示的，尽管在不同的背景下。我仍然不知道其原因，尽管很明显这与Faunus试图通过Titan访问HBase有关。Titan本身工作正常这一事实让我认为它自己与HBase的通信是正确的。谜团还在继续…@MircoMannucci我将重新打开该期，继续讨论第714期中的评论，因为这似乎不太适合SO的QA格式。@DanLaRocque谢谢！我刚请瓦伦调查一下。同时，有关于泰坦/福努斯集成Hadoop 2的消息吗？我们的问题开始了，因为我们不得不将我们的环境“降级”到Hadoop 1以使用Faunus。。。

g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()