在数据计算完成后,有部分数据会有导入ES的需求,最常用也是最简单的就是使用hive-elasticsearch的映射表来实现,可参考将Hive表与ElasticSearch关联。在ElasticSearch7.8.0与CDH5.9中可以正常将数据由Hive写入ElasticSearch。但是更换为CDH6.3以后,会出现如下错误:
2020-11-04 19:10:19,363 INFO [IPC Server handler 6 on 36328] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1604387526070_2011_m_000000_0: Error: java.lang.RuntimeException: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: java.lang.NoClassDefFoundError: org/apache/commons/httpclient/protocol/ProtocolSocketFactory at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40) at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99) at org.elasticsearch.hadoop.rest.NetworkClient.(NetworkClient.java:82) at org.elasticsearch.hadoop.rest.NetworkClient.(NetworkClient.java:58) at org.elasticsearch.hadoop.rest.RestClient.(RestClient.java:101) at org.elasticsearch.hadoop.rest.InitializationUtils.discoverNodesIfNeeded(InitializationUtils.java:89) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:585) at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175) at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) ... 8 more Caused by: java.lang.ClassNotFoundException: org.apache.commons.httpclient.protocol.ProtocolSocketFactory at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 25 more
很是费解,乍一看就是httpclient的锅,于是我将httpclient-4.5.12.jar
添加到hive中,发现错误依旧,后来查阅资料才发现这里要加入的httpclient是commons-httpclient-3.1.jar
,于是将其添加后执行成功。
1.在hive shell中添加
这种方式是session级别的,仅对当前会话有用,无需重启hiveserver2服务。
## 本地jar add jar /opt/expansion/jars/elasticsearch-hadoop-hive-7.8.0.jar; add jar /opt/expansion/jars/commons-httpclient-3.1.jar; ## hdfs的jar包 add jar hdfs://nameservice1/expansion/jars/elasticsearch-hadoop-hive-7.8.0.jar; add jar hdfs://nameservice1/expansion/jars/commons-httpclient-3.1.jar;
2.启动hive shell时添加
这种方式同样也是session级别
hive -hiveconf hive.aux.jars.path=/opt/expansion/jars/elasticsearch-hadoop-hive-7.8.0.jar
3.在hive-site.xml中添加
这种方式是服务级别的,在hive shell和hiveserver2中均生效,需要重启hiveserver2服务。
编辑hive-site.xml
<property> <name>hive.aux.jars.path</name> <value>/opt/hive-auxlib/elasticsearch-hadoop-hive-7.8.0.jar</value> </property>
通常都推荐第三种方式,将其放在hive.aux.jars.path
下使用。
commons-httpclient-3.1.jar通常CDH都会自带,具体路径为:
/opt/cloudera/parcels/CDH/jars/commons-httpclient-3.1.jar