How can I transform a Vector of Strings into a Dataframe using Scala Spark?

asked 2023-04-29 05:37:27 +0000

answered 2023-04-29 05:46:02 +0000

To transform a Vector of Strings into a Dataframe using Scala Spark, you can follow these steps:

Assuming that you have the following vector of strings:

val vector = Vector("John,30", "Jane,25", "Bob,40")
  1. Convert the Vector to an RDD using:
val rdd = spark.sparkContext.parallelize(vector)
  1. Split each string in the RDD using the "," delimiter:
val splitRDD =","))
  1. Convert the splitRDD to a DataFrame:
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}

val schema = StructType(Array(
  StructField("Name", StringType),
  StructField("Age", IntegerType)

val dataframe = spark.createDataFrame( => Row(row(0), row(1).toInt)), schema)

Note that in this example, we have defined the schema of the dataframe to have two columns ("Name" and "Age"), with "Name" being a StringType and "Age" being an IntegerType.

You can then check the contents of the dataframe using:

which should output:

|John| 30|
|Jane| 25|
| Bob| 40|
