1 | initial version |
To transform a Vector of Strings into a Dataframe using Scala Spark, you can follow these steps:
Assuming that you have the following vector of strings:
val vector = Vector("John,30", "Jane,25", "Bob,40")
val rdd = spark.sparkContext.parallelize(vector)
val splitRDD = rdd.map(_.split(","))
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
val schema = StructType(Array(
StructField("Name", StringType),
StructField("Age", IntegerType)
))
val dataframe = spark.createDataFrame(splitRDD.map(row => Row(row(0), row(1).toInt)), schema)
Note that in this example, we have defined the schema of the dataframe to have two columns ("Name" and "Age"), with "Name" being a StringType and "Age" being an IntegerType.
You can then check the contents of the dataframe using:
dataframe.show()
which should output:
+----+---+
|Name|Age|
+----+---+
|John| 30|
|Jane| 25|
| Bob| 40|
+----+---+