Java 在MaprDB中存储文档（.pdf、.doc和.txt文件）_Java_Hbase_Hue_Mapr_Nosql

Java 在MaprDB中存储文档（.pdf、.doc和.txt文件）

java hbase nosql

Java 在MaprDB中存储文档（.pdf、.doc和.txt文件）,java,hbase,hue,mapr,nosql,Java,Hbase,Hue,Mapr,Nosql,我需要将.pdf、.doc和.txt等文件存储到MaprDB。我在Hbase中看到了一个例子，它以二进制形式存储文件，并以色调形式检索文件，但我不确定如何实现它。你知道如何将文档存储在MaprDB中吗首先，我不知道Maprdb，因为我在使用Cloudera。但我有在hbase中以字节数组的形式存储多种类型对象的经验，如下所述在hbase或任何其他数据库中存储数据的最原始方式是字节数组您可以使用ApacheCommonsLangAPI以下面的方式实现这一点。这可能是最好的选择，适用于所有对

我需要将.pdf、.doc和.txt等文件存储到MaprDB。我在Hbase中看到了一个例子，它以二进制形式存储文件，并以色调形式检索文件，但我不确定如何实现它。你知道如何将文档存储在MaprDB中吗

首先，我不知道Maprdb，因为我在使用Cloudera。但我有在hbase中以字节数组的形式存储多种类型对象的经验，如下所述

在hbase或任何其他数据库中存储数据的最原始方式是字节数组

您可以使用ApacheCommonsLangAPI以下面的方式实现这一点。这可能是最好的选择，适用于所有对象，包括图像/音频/视频等

请使用任何文件的对象类型之一测试此方法。

SerializationUtils.serialize

将返回字节。可以插入

import org.apache.commons.lang.SerializationUtils;
/**
* testSerializeAndDeserialize.
*
**/
public void testSerializeAndDeserialize throws Exception {

//serialize here
    byte[] bytes = SerializationUtils.serialize("your object here which is of type f  .pdf, .doc and .txt ");


 // deserialize the same here and see you are getting back or not.
 yourobjecttype objtypeofpdfortxtordoc = (yourobjecttype) SerializationUtils.deserialize(bytes);

}

注意：ApacheCommonsLang的jar在hadoop集群中始终可用。（非外部依赖）另一个例子：

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.commons.lang.SerializationUtils;

public class SerializationUtilsTrial {
  public static void main(String[] args) {
    try {
      // File to serialize object to
      String fileName = "testSerialization.ser";

      // New file output stream for the file
      FileOutputStream fos = new FileOutputStream(fileName);

      // Serialize String
      SerializationUtils.serialize("SERIALIZE THIS", fos);
      fos.close();

      // Open FileInputStream to the file
      FileInputStream fis = new FileInputStream(fileName);

      // Deserialize and cast into String
      String ser = (String) SerializationUtils.deserialize(fis);
      System.out.println(ser);
      fis.close();
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

出于任何原因，如果您不想使用Apache commons lang提供的
SerializationUtils
类，那么您可以查看下面的pdf序列化和反序列化示例，以便更好地理解，但是如果您使用
SerializationUtils
它的冗长代码，代码将会减少。

上面是字节数组，您可以准备put请求以上载到数据库，即Hbase或任何其他数据库

一旦您坚持，您可以使用hbase get或

scan

you

get

your pdf字节获得相同的数据，并使用下面的代码再次生成相同的文件，在本例中为someFile.pdf。编辑：既然您询问了HBASE示例，我就添加这个。。在下面的方法中

yourcolumnasBytearray

是您的文档文件，例如pdf。。在上面的示例中已转换为字节数组（使用

序列化utils.serialize

）

  /**
 * Put (or insert) a row
 */
@Override
public void addRecord(final String tableName, final String rowKey, final String family, final String qualifier,
                final byte[] yourcolumnasBytearray) throws Exception {
    try {
        final HTableInterface table = HBaseConnection.getHTable(getTable(tableName));
        final Put put = new Put(Bytes.toBytes(rowKey));
        put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), yourcolumnasBytearray);
        table.put(put);
        LOG.info("INSERT record " + rowKey + " to table " + tableName + " OK.");
    } catch (final IOException e) {
        printstackTrace(e);
    }

谢谢您的回复，我将尝试在MaprDB上完成。您是否有在Hbase上执行此操作的示例？又添加了一个PDF ser/反序列化示例，但没有

SerializationUtils

        File someFile = new File("someFile.pdf");
        FileOutputStream fos = new FileOutputStream(someFile);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }
}

  /**
 * Put (or insert) a row
 */
@Override
public void addRecord(final String tableName, final String rowKey, final String family, final String qualifier,
                final byte[] yourcolumnasBytearray) throws Exception {
    try {
        final HTableInterface table = HBaseConnection.getHTable(getTable(tableName));
        final Put put = new Put(Bytes.toBytes(rowKey));
        put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), yourcolumnasBytearray);
        table.put(put);
        LOG.info("INSERT record " + rowKey + " to table " + tableName + " OK.");
    } catch (final IOException e) {
        printstackTrace(e);
    }