Scala Hadoop:java.io.IOException:传递删除或放置

Scala Hadoop:java.io.IOException:传递删除或放置,scala,hadoop,Scala,Hadoop,我在控制台上得到了这些错误日志 java.io.IOException: Pass a Delete or a Put at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:125) at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(Tabl

我在控制台上得到了这些错误日志

java.io.IOException: Pass a Delete or a Put
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:125)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:84)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:586)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:156)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
15/01/06 14:13:34 INFO mapred.JobClient: Job complete: job_local259887539_0001
15/01/06 14:13:34 INFO mapred.JobClient: Counters: 19
15/01/06 14:13:34 INFO mapred.JobClient:   File Input Format Counters 
15/01/06 14:13:34 INFO mapred.JobClient:     Bytes Read=0
15/01/06 14:13:34 INFO mapred.JobClient:   FileSystemCounters
15/01/06 14:13:34 INFO mapred.JobClient:     FILE_BYTES_READ=12384691
15/01/06 14:13:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=12567287
15/01/06 14:13:34 INFO mapred.JobClient:   Map-Reduce Framework
15/01/06 14:13:34 INFO mapred.JobClient:     Reduce input groups=0
15/01/06 14:13:34 INFO mapred.JobClient:     Map output materialized bytes=8188
15/01/06 14:13:34 INFO mapred.JobClient:     Combine output records=0
15/01/06 14:13:34 INFO mapred.JobClient:     Map input records=285
15/01/06 14:13:34 INFO mapred.JobClient:     Reduce shuffle bytes=0
15/01/06 14:13:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
15/01/06 14:13:34 INFO mapred.JobClient:     Reduce output records=0
15/01/06 14:13:34 INFO mapred.JobClient:     Spilled Records=285
15/01/06 14:13:34 INFO mapred.JobClient:     Map output bytes=7612
15/01/06 14:13:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=1029046272
15/01/06 14:13:34 INFO mapred.JobClient:     CPU time spent (ms)=0
15/01/06 14:13:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
15/01/06 14:13:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=77
15/01/06 14:13:34 INFO mapred.JobClient:     Map output records=285
15/01/06 14:13:34 INFO mapred.JobClient:     Combine input records=0
15/01/06 14:13:34 INFO mapred.JobClient:     Reduce input records=0
当我尝试使用基于的Scala实现制作CopyTable时

下面是我的代码示例,还有比这样做更好的吗

package com.example

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.client.Get
import java.io.IOException
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase._
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io._
import org.apache.hadoop.hbase.mapreduce._
import org.apache.hadoop.io._
import org.apache.hadoop.mapreduce._
import scala.collection.JavaConversions._

case class HString(name: String) {
        lazy val bytes = name.getBytes
        override def toString = name
}
object HString {
        import scala.language.implicitConversions
        implicit def hstring2String(src: HString): String = src.name
        implicit def hstring2Bytes(src: HString): Array[Byte] = src.bytes
}

object Families {
        val stream = HString("stream")
        val identity = HString("identity")
}
object Qualifiers {
        val title = HString("title")
        val url = HString("url")
        val media = HString("media")
        val media_source = HString("media_source")
        val content = HString("content")
        val nolimitid_timestamp = HString("nolimitid.timestamp")
        val original_id = HString("original_id")
        val timestamp = HString("timestamp")
        val date_created = HString("date_created")
        val count = HString("count")
}
object Tables {
        val rawstream100 = HString("raw_stream_1.0.0")
        val rawstream = HString("rawstream")
}

class tmapper extends TableMapper[ImmutableBytesWritable, Put]{
  def map (row: ImmutableBytesWritable, value: Result, context: Context) {
    val put = new Put(row.get())
    for (kv <- value.raw()) {
        put.add(kv)
    }
    context.write(row, put)
  }
}

object Hello {
  val hbaseMaster = "127.0.0.1:60000"
  val hbaseZookeper = "127.0.0.1"
  def main(args: Array[String]): Unit = {
        val conf = HBaseConfiguration.create()
    conf.set("hbase.master", hbaseMaster)
    conf.set("hbase.zookeeper.quorum", hbaseZookeper)
    val hbaseAdmin = new HBaseAdmin(conf)

    val job = Job.getInstance(conf, "CopyTable")
    job.setJarByClass(classOf[Hello])
    job.setMapperClass(classOf[tmapper])
    job.setMapOutputKeyClass(classOf[ImmutableBytesWritable])
    job.setMapOutputValueClass(classOf[Result])
    //
    job.setOutputKeyClass(classOf[ImmutableBytesWritable])
    job.setOutputValueClass(classOf[Put])

        val scan = new Scan()
        scan.setCaching(500)         // 1 is the default in Scan, which will be bad for MapReduce jobs
        scan.setCacheBlocks(false)   // don't set to true for MR jobs

        TableMapReduceUtil.initTableMapperJob(
          Tables.rawstream100.bytes,     // input HBase table name
          scan,                      // Scan instance to control CF and attribute selection
          classOf[tmapper],  // mapper class
          null,             // mapper output key class
          null,     // mapper output value class
          job
        )

        TableMapReduceUtil.initTableReducerJob(
          Tables.rawstream,          // Table name
          null, // Reducer class
          job
        )
        val b = job.waitForCompletion(true);
        if (!b) {
            throw new IOException("error with job!");
        }
  }
}

class Hello {}
package.com.example
导入org.apache.hadoop.hbase.HBaseConfiguration
导入org.apache.hadoop.hbase.client.HBaseAdmin
导入org.apache.hadoop.hbase.client.HTable
导入org.apache.hadoop.hbase.util.Bytes
导入org.apache.hadoop.hbase.client.Put
导入org.apache.hadoop.hbase.client.Get
导入java.io.IOException
导入org.apache.hadoop.conf.Configuration
导入org.apache.hadoop.hbase_
导入org.apache.hadoop.hbase.client_
导入org.apache.hadoop.hbase.io_
导入org.apache.hadoop.hbase.mapreduce_
导入org.apache.hadoop.io_
导入org.apache.hadoop.mapreduce_
导入scala.collection.JavaConversions_
案例类HString(名称:String){
lazy val bytes=name.getBytes
覆盖def toString=name
}
对象字符串{
导入scala.language.implicitConversions
隐式def hstring2String(src:HString):String=src.name
隐式def hstring2Bytes(src:HString):数组[Byte]=src.bytes
}
对象族{
val stream=hs字符串(“流”)
val标识=hs字符串(“标识”)
}
对象限定符{
val title=hs字符串(“标题”)
val url=hs字符串(“url”)
val媒体=hs字符串(“媒体”)
val media\u source=HString(“media\u source”)
val content=hs字符串(“内容”)
val nolimitid_timestamp=hs字符串(“nolimitid.timestamp”)
val original_id=HString(“original_id”)
val timestamp=hs字符串(“时间戳”)
val date_created=hs字符串(“date_created”)
val count=hs字符串(“计数”)
}
对象表{
val rawstream100=hs字符串(“原始流1.0.0”)
val rawstream=hs字符串(“rawstream”)
}
类tmapper扩展了TableMapper[ImmutableBytesWritable,Put]{
def映射(行:ImmutableBytesWritable,值:Result,上下文:context){
val put=new put(row.get())

对于(kv如果您的任务只是复制表(而不是通过scala在hbase中实现mapreduce),您可以在hbase服务器包中使用类,如下所示:

import org.apache.hadoop.hbase.mapreduce.CopyTable
CopyTable.main(Array("--peer.adr=127.0.0.1:2181:/hbase", "--new.name=rawstream", "raw_stream_1.0.0"))

查看以获取其他参数。

如果您不需要注释代码,请将其从问题中删除。啊,谢谢您的建议啊,谢谢我已经尝试过了,它就像独立设置的魅力:)但是我需要早点知道我的代码有什么问题,另外我想学习如何使用MapReduce。你知道问题出在哪里了吗?