1. 首先需要安装PostgreSQL JDBC驱动,可以通过以下命令安装:
wget https://jdbc.postgresql.org/download/postgresql-42.2.23.jar
  1. 在Spark中创建一个DataFrame,读取PostgreSQL的表A的数据:
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
    .appName("Spark-PostgreSQL")
    .config("spark.master", "local")
    .getOrCreate()

val jdbcUrl = "jdbc:postgresql://<host>:<port>/<database>"
val tableA = spark.read.format("jdbc")
    .option("url", jdbcUrl)
    .option("dbtable", "A")
    .option("user", "<user>")
    .option("password", "<password>")
    .option("driver", "org.postgresql.Driver")
    .load()
  1. 统计表A的数据量:
val count = tableA.count()
println(s"Table A has $count rows.")

完整代码如下:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
    .appName("Spark-PostgreSQL")
    .config("spark.master", "local")
    .getOrCreate()

val jdbcUrl = "jdbc:postgresql://<host>:<port>/<database>"
val tableA = spark.read.format("jdbc")
    .option("url", jdbcUrl)
    .option("dbtable", "A")
    .option("user", "<user>")
    .option("password", "<password>")
    .option("driver", "org.postgresql.Driver")
    .load()

val count = tableA.count()
println(s"Table A has $count rows.")
用spark-scala.sh 统计postgresql的表A的数据量

原文地址: http://www.cveoy.top/t/topic/mEQ 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录