用spark-scala.sh 统计postgresql的表A的数据量
- 首先需要安装PostgreSQL JDBC驱动,可以通过以下命令安装:
wget https://jdbc.postgresql.org/download/postgresql-42.2.23.jar
- 在Spark中创建一个DataFrame,读取PostgreSQL的表A的数据:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.appName("Spark-PostgreSQL")
.config("spark.master", "local")
.getOrCreate()
val jdbcUrl = "jdbc:postgresql://<host>:<port>/<database>"
val tableA = spark.read.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "A")
.option("user", "<user>")
.option("password", "<password>")
.option("driver", "org.postgresql.Driver")
.load()
- 统计表A的数据量:
val count = tableA.count()
println(s"Table A has $count rows.")
完整代码如下:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.appName("Spark-PostgreSQL")
.config("spark.master", "local")
.getOrCreate()
val jdbcUrl = "jdbc:postgresql://<host>:<port>/<database>"
val tableA = spark.read.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "A")
.option("user", "<user>")
.option("password", "<password>")
.option("driver", "org.postgresql.Driver")
.load()
val count = tableA.count()
println(s"Table A has $count rows.")
原文地址: http://www.cveoy.top/t/topic/mEQ 著作权归作者所有。请勿转载和采集!