在数据计算完成后,有部分数据会有导入ES的需求,最常用也是最简单的就是使用hive-elasticsearch的映射表来实现,可参考将Hive表与ElasticSearch关联。在ElasticSearch7.8.0CDH5.9中可以正常将数据由Hive写入ElasticSearch。但是更换为CDH6.3以后,会出现如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
2020-11-04 19:10:19,363 INFO [IPC Server handler 6 on 36328] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1604387526070_2011_m_000000_0: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory
at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)
at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)
at org.elasticsearch.hadoop.rest.NetworkClient.(NetworkClient.java:82)
at org.elasticsearch.hadoop.rest.NetworkClient.(NetworkClient.java:58)
at org.elasticsearch.hadoop.rest.RestClient.(RestClient.java:101)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:89)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:585)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
... 8 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.protocol.ProtocolSocketFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 25 more

很是费解,乍一看就是httpclient的锅,于是我将httpclient-4.5.12.jar添加到hive中,发现错误依旧,后来查阅资料才发现这里要加入的httpclient是commons-httpclient-3.1.jar,于是将其添加后执行成功。

1.在hive shell中添加

这种方式是session级别的,仅对当前会话有用,无需重启hiveserver2服务。

1
2
3
4
5
6
## 本地jar
add jar /opt/expansion/jars/elasticsearch-hadoop-hive-7.8.0.jar;
add jar /opt/expansion/jars/commons-httpclient-3.1.jar;
## hdfs的jar包
add jar hdfs://nameservice1/expansion/jars/elasticsearch-hadoop-hive-7.8.0.jar;
add jar hdfs://nameservice1/expansion/jars/commons-httpclient-3.1.jar;

2.启动hive shell时添加

这种方式同样也是session级别

1
hive -hiveconf hive.aux.jars.path=/opt/expansion/jars/elasticsearch-hadoop-hive-7.8.0.jar

3.在hive-site.xml中添加

这种方式是服务级别的,在hive shell和hiveserver2中均生效,需要重启hiveserver2服务。 编辑hive-site.xml

1
2
3
4
<property>
<name>hive.aux.jars.path</name>
<value>/opt/hive-auxlib/elasticsearch-hadoop-hive-7.8.0.jar</value>
</property>

通常都推荐第三种方式,将其放在hive.aux.jars.path下使用。 commons-httpclient-3.1.jar通常CDH都会自带,具体路径为:

1
/opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar