Docker上的配置单元:失败:执行错误,从org.apache.hadoop.Hive.ql.exec.mr.MapRedTask返回代码2
我最近开始在Docker上使用Hive。我有两张表格,结构如下:Docker上的配置单元:失败:执行错误,从org.apache.hadoop.Hive.ql.exec.mr.MapRedTask返回代码2,docker,hive,Docker,Hive,我最近开始在Docker上使用Hive。我有两张表格,结构如下: 0: jdbc:hive2://localhost:10000> describe users; +-------------+------------+----------+ | col_name | data_type | comment | +-------------+------------+----------+ | userid | int | | | gen
0: jdbc:hive2://localhost:10000> describe users;
+-------------+------------+----------+
| col_name | data_type | comment |
+-------------+------------+----------+
| userid | int | |
| gender | string | |
| age | int | |
| occupation | int | |
| zipcode | string | |
+-------------+------------+----------+
0: jdbc:hive2://localhost:10000> describe users_2;
+-------------+------------+----------+
| col_name | data_type | comment |
+-------------+------------+----------+
| userid | int | |
| gender | string | |
| age | int | |
| occupation | string | |
| zipcode | string | |
+-------------+------------+----------+
我想做的是通过将某个字符串关联到第一个字符串的每个INT,将用户的内容复制到users2。为此,我编写了以下python脚本:
import sys
occupation_dict = {
0: "other or not specified",
1: "academic/educator",
2: "artist",
3: "clerical/admin",
4: "college/grad student",
5: "customer service",
6: "doctor/health care",
7: "executive/managerial",
8: "farmer",
9: "homemaker",
10: "K-12 student",
11: "lawyer",
12: "programmer",
13: "retired",
14: "sales/marketing",
15: "scientist",
16: "self-employed",
17: "technician/engineer",
18: "tradesman/craftsman",
19: "unemployed",
20: "writer"
}
for line in sys.stdin:
line = line.strip()
userid, gender, age, occupation, zipcode = line.split('#')
occupation_str = occupation_dict[occupation]
print '#'.join([userid, gender, age, occupation_str, zipcode])
因此,在Docker中,我运行以下命令:
0: jdbc:hive2://localhost:10000> add FILE /data/ml-1m/map_fun.py;
No rows affected (0.007 seconds)
0: jdbc:hive2://localhost:10000> INSERT OVERWRITE TABLE users_2
. . . . . . . . . . . . . . . .> SELECT
. . . . . . . . . . . . . . . .> TRANSFORM (userid, gender, age, occupation, zipcode)
. . . . . . . . . . . . . . . .> USING 'python map_fun.py'
. . . . . . . . . . . . . . . .> AS (userid, gender, age, occupation_str, zipcode)
. . . . . . . . . . . . . . . .> FROM users;
但我犯了以下错误,无法克服:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) (state=08S01,code=2)
如果我走得太久,我很抱歉,我希望我已经把你需要的一切都放好了