Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

To transform a Vector of Strings into a Dataframe using Scala Spark, you can follow these steps:

Assuming that you have the following vector of strings:

val vector = Vector("John,30", "Jane,25", "Bob,40")
  1. Convert the Vector to an RDD using:
val rdd = spark.sparkContext.parallelize(vector)
  1. Split each string in the RDD using the "," delimiter:
val splitRDD = rdd.map(_.split(","))
  1. Convert the splitRDD to a DataFrame:
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}

val schema = StructType(Array(
  StructField("Name", StringType),
  StructField("Age", IntegerType)
))

val dataframe = spark.createDataFrame(splitRDD.map(row => Row(row(0), row(1).toInt)), schema)

Note that in this example, we have defined the schema of the dataframe to have two columns ("Name" and "Age"), with "Name" being a StringType and "Age" being an IntegerType.

You can then check the contents of the dataframe using:

dataframe.show()

which should output:

+----+---+
|Name|Age|
+----+---+
|John| 30|
|Jane| 25|
| Bob| 40|
+----+---+