Spark 连接 Hive 3.1.2 ThriftServer 指南

在 Spark 中使用 Hive ThriftServer 适配 Hive 3.1.2，你需要执行以下步骤：

下载 Hive 3.1.2，并将其解压到你想要安装的目录中。
在 Hive 的 conf 目录中创建 hive-site.xml 文件，并设置以下属性：

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=/path/to/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
    <description>Disable impersonation in HiveServer2</description>
  </property>
  <property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
    <description>Disable authentication in HiveServer2</description>
  </property>
  <property>
    <name>hive.server2.allow.user.substitution</name>
    <value>true</value>
    <description>Enable user substitution in HiveServer2</description>
  </property>
  <property>
    <name>hive.stats.fetch.partition.stats</name>
    <value>false</value>
    <description>Disable fetching partition statistics in HiveServer2</description>
  </property>
</configuration>

启动 Hive Metastore 服务：

$ cd /path/to/hive-3.1.2
$ ./bin/hive --service metastore

启动 Hive ThriftServer 服务：

$ cd /path/to/hive-3.1.2
$ ./bin/hiveserver2 --hiveconf hive.server2.enable.doAs=false --hiveconf hive.server2.authentication=NONE --hiveconf hive.server2.allow.user.substitution=true --hiveconf hive.stats.fetch.partition.stats=false

在 Spark 中连接到 Hive ThriftServer：

val spark = SparkSession.builder()
  .appName('Hive ThriftServer Example')
  .config('spark.sql.hive.thriftServer.singleSession', 'true')
  .enableHiveSupport()
  .getOrCreate()

现在你可以使用 SparkSession 来执行 SQL 查询，并将其发送到 Hive ThriftServer 上运行：

val df = spark.sql('SELECT * FROM my_table')
df.show()

注意：为了让 Spark 能够连接到 Hive ThriftServer，你需要确保 Spark 的 classpath 中包含 Hive 相关 jar 文件。你可以将这些 jar 文件添加到 spark.jars 配置项中，或将它们放置在 Spark 的 lib 目录下。