Spark SQL 插入数据列数不一致错误：目标表有3列，但插入数据有4列

在使用 Spark SQL 插入数据时，可能会遇到以下错误：

Exception in thread "main" 
org.apache.spark.sql.AnalysisException: `deploy_uedsy`.`test_spark_sql` requires that the data to be inserted have the same number of columns as the target table: target table has 3 column(s) but the inserted data has 4 column(s), including 1 partition column(s) having constant value(s).;
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:341)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:373)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion$$anonfun$apply$3.applyOrElse(rules.scala:368)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:368)
	at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.apply(rules.scala:328)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
	at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
	at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
	at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:124)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:118)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:103)
	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
	at com.startdt.spark.simbaspark.SparkMr.main(SparkMr.java:146)

这个错误信息表明，插入的数据列数与目标表的列数不一致。目标表有3列，但插入的数据有4列，其中包括1个分区列，该分区列的值是常数。

解决方法：

需要保证插入的数据列数与目标表的列数一致。可以使用以下方法解决这个问题：

检查插入数据的列数是否与目标表一致。
如果插入的数据包含分区列，需要将分区列的值设置为目标表的分区列的值。
可以使用 select * from your_table 查询目标表的列数。
使用 select * from your_dataframe 查询插入数据的列数。

示例：

假设目标表 test_spark_sql 有3列，分别是 id, name, age，而插入的数据包含4列，分别是 id, name, age, gender。为了解决这个问题，可以将 gender 列从插入的数据中删除，或者将 gender 列的值设置为常数，例如 null。

代码示例：

# 删除 gender 列
df = df.drop('gender')

# 将 gender 列的值设置为 null
df = df.withColumn('gender', lit(None))

通过以上方法，可以保证插入的数据列数与目标表的列数一致，从而解决这个错误。