Spark SQL 插入数据列数不一致错误:目标表有3列,但插入数据有4列

在使用 Spark SQL 插入数据时,可能会遇到以下错误:

Exception in thread "main" 
org.apache.spark.sql.AnalysisException: `deploy_uedsy`.`test_spark_sql` requires that the data to be inserted have the same number of columns as the target table: target table has 3 column(s) but the inserted data has 4 column(s), including 1 partition column(s) having constant value(s).;
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:341)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:373)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:368)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:368)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:328)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
	at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:124)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:118)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:103)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
	at com.startdt.spark.simbaspark.SparkMr.main(SparkMr.java:146)

这个错误信息表明,插入的数据列数与目标表的列数不一致。目标表有3列,但插入的数据有4列,其中包括1个分区列,该分区列的值是常数。

解决方法:

需要保证插入的数据列数与目标表的列数一致。可以使用以下方法解决这个问题:

  1. 检查插入数据的列数是否与目标表一致。
  2. 如果插入的数据包含分区列,需要将分区列的值设置为目标表的分区列的值。
  3. 可以使用 select * from your_table 查询目标表的列数。
  4. 使用 select * from your_dataframe 查询插入数据的列数。

示例:

假设目标表 test_spark_sql 有3列,分别是 id, name, age,而插入的数据包含4列,分别是 id, name, age, gender。为了解决这个问题,可以将 gender 列从插入的数据中删除,或者将 gender 列的值设置为常数,例如 null

代码示例:

# 删除 gender 列
df = df.drop('gender')

# 将 gender 列的值设置为 null
df = df.withColumn('gender', lit(None))

通过以上方法,可以保证插入的数据列数与目标表的列数一致,从而解决这个错误。

Spark SQL 插入数据列数不一致错误:目标表有3列,但插入数据有4列

原文地址: https://www.cveoy.top/t/topic/okr9 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录