Unix 如何归档配置单元表？_Unix_Hive

Unix 如何归档配置单元表？

unix hive

Unix 如何归档配置单元表？,unix,hive,Unix,Hive,是否有办法检查90天前创建的配置单元外部表，并将这些表连同底层hdfs数据一起删除。这可以在unix脚本中实现吗？请参见，如果配置单元表路径为/path/your\u hive\u table\u path/，如下所示： hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/ drwxrwxrwx+ - h_mifi supergroup 0 2019-01-24 10:33 /path

是否有办法检查90天前创建的配置单元外部表，并将这些表连同底层hdfs数据一起删除。这可以在unix脚本中实现吗？

请参见，如果配置单元表路径为

/path/your\u hive\u table\u path/

，如下所示：

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/

drwxrwxrwx+  - h_mifi supergroup          0 2019-01-24 10:33 /path/your_hive_table_path//mifidw_car_insurance_expire_month_data
drwxrwxrwx+  - h_mifi supergroup          0 2019-01-24 10:39 /path/your_hive_table_path//mifidw_car_owner
drwxr-xr-x+  - h_mifi supergroup          0 2019-05-30 03:01 /path/your_hive_table_path//push_credit_card_mine_result_new
drwxr-xr-x+  - h_mifi supergroup          0 2019-05-30 03:41 /path/your_hive_table_path//push_live_payment_bill_mine_result_new

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/ | awk -F'[ ]+' '{print $6}'
2019-01-24
2019-01-24
2019-05-30
2019-05-30

我们可以按如下方式获取表格文件的最新更新日期：

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/

drwxrwxrwx+  - h_mifi supergroup          0 2019-01-24 10:33 /path/your_hive_table_path//mifidw_car_insurance_expire_month_data
drwxrwxrwx+  - h_mifi supergroup          0 2019-01-24 10:39 /path/your_hive_table_path//mifidw_car_owner
drwxr-xr-x+  - h_mifi supergroup          0 2019-05-30 03:01 /path/your_hive_table_path//push_credit_card_mine_result_new
drwxr-xr-x+  - h_mifi supergroup          0 2019-05-30 03:41 /path/your_hive_table_path//push_live_payment_bill_mine_result_new

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/ | awk -F'[ ]+' '{print $6}'
2019-01-24
2019-01-24
2019-05-30
2019-05-30

我们需要一个

循环

来检查每个表是否超过90天，并执行

删除

和

删除

操作。下面是完整的shell脚本，我已经测试过了，效果很好，希望对您有所帮助

hadoop --cluster your-hadoop-cluster fs -ls /path/your_hive_table_path/ | grep '/path/your_hive_table_path/' | while read line
do
   #Get the update date of hive table
   date_str=`echo $line | awk -F'[ ]+' '{print $6}'`
   #get the path of hive table
   table_path=`echo $line | awk -F'[ ]+' '{print $8}'`
   #Get the table name of hive table
   table_name=`echo $table_path | awk -F'/' '{print $7}' `

   today_date_stamp=`date +%s`
   table_date_stamp=`date -d $date_str +%s`
   stamp_diff=`expr $today_date_stamp - $table_date_stamp`

   #Get the diff days from now
   days_diff=`expr $stamp_diff / 86400`

   #if diff days is greater than 90, rm and drop.
   if [ $days_diff -gt 90 ];then
      #remove the hdfs file
      hadoop --cluster your-hadoop-cluster fs -rm $table_path
      #drop the hive table
      hive -e"drop table $table_name"
   fi
done

当然，我们可以通过一个shell脚本来实现这一点，您可以执行

hadoop fs-ls/path/your\u hive\u table\u path/

来获取数据写入时间，然后执行

hadoop fs-rm

来删除90天内创建的数据。我还需要删除该表，以及如何做到这一点，添加代码以执行HiveQL

drop table

。您可以帮助我如何执行该操作吗。我会给你们举一些例子。我认为最好的方法是将属性文件改为internal并删除表。这是一个好主意吗？啊，这也是一种实现方法，但是如果您不想更改表的所有内容，可以执行上面的脚本。您可以为脚本设置

crontab

调度程序，每天检查和删除过期的表。请小心，我们在日常工作中使用

shell脚本维护我们的hive数据仓库，这很方便。如果您方便，请给我一个投票
，因为我在这个工作日的早上花了半个多小时编辑了这个答案。非常感谢！：）。