How can I transform a Vector of Strings into a Dataframe using Scala Spark?

answered 2023-04-29 05:46:02 +0000

woof
21 ●1 ●1

To transform a Vector of Strings into a Dataframe using Scala Spark, you can follow these steps:

Assuming that you have the following vector of strings:

val vector = Vector("John,30", "Jane,25", "Bob,40")

Convert the Vector to an RDD using:

val rdd = spark.sparkContext.parallelize(vector)

Split each string in the RDD using the "," delimiter:

val splitRDD = rdd.map(_.split(","))

Convert the splitRDD to a DataFrame:

import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}

val schema = StructType(Array(
  StructField("Name", StringType),
  StructField("Age", IntegerType)
))

val dataframe = spark.createDataFrame(splitRDD.map(row => Row(row(0), row(1).toInt)), schema)

Note that in this example, we have defined the schema of the dataframe to have two columns ("Name" and "Age"), with "Name" being a StringType and "Age" being an IntegerType.

You can then check the contents of the dataframe using:

dataframe.show()

which should output:

+----+---+
|Name|Age|
+----+---+
|John| 30|
|Jane| 25|
| Bob| 40|
+----+---+

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

How can I transform a Vector of Strings into a Dataframe using Scala Spark?

1 Answer

Your Answer

Question Tools

Stats

Related questions

How can I transform a Vector of Strings into a Dataframe using Scala Spark? edit

1 Answer

Your Answer

Question Tools

Stats

Related questions

How can I transform a Vector of Strings into a Dataframe using Scala Spark?