pyspark中dataframe设置features

日期: 2027-05-08

标签: 教育

在PySpark中，可以使用VectorAssembler将多个列合并为一个特征向量列。以下是一个示例：

from pyspark.ml.feature import VectorAssembler

# 假设有一个DataFrame，包含两列：col1和col2
df = spark.createDataFrame([(1, 2), (3, 4), (5, 6)], ["col1", "col2"])

# 创建一个VectorAssembler对象，并指定输入列和输出列
assembler = VectorAssembler(inputCols=["col1", "col2"], outputCol="features")

# 使用VectorAssembler对象转换DataFrame
output_df = assembler.transform(df)

# 输出结果
output_df.show()

上述代码将DataFrame的"col1"和"col2"列合并为一个名为"features"的新列。最后的输出结果将包含原始列和新列"features"。

请注意，VectorAssembler只能处理数值类型的列。如果需要处理非数值类型的列，需要首先对它们进行编码或转换为数值类型

原文地址: https://www.cveoy.top/t/topic/hBDx 著作权归作者所有。请勿转载和采集!