Ask Your Question
0

How can I transform a Vector of Strings into a Dataframe using Scala Spark?

asked 2023-04-29 05:37:27 +0000

david gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
3

answered 2023-04-29 05:46:02 +0000

woof gravatar image

To transform a Vector of Strings into a Dataframe using Scala Spark, you can follow these steps:

Assuming that you have the following vector of strings:

val vector = Vector("John,30", "Jane,25", "Bob,40")
  1. Convert the Vector to an RDD using:
val rdd = spark.sparkContext.parallelize(vector)
  1. Split each string in the RDD using the "," delimiter:
val splitRDD = rdd.map(_.split(","))
  1. Convert the splitRDD to a DataFrame:
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}

val schema = StructType(Array(
  StructField("Name", StringType),
  StructField("Age", IntegerType)
))

val dataframe = spark.createDataFrame(splitRDD.map(row => Row(row(0), row(1).toInt)), schema)

Note that in this example, we have defined the schema of the dataframe to have two columns ("Name" and "Age"), with "Name" being a StringType and "Age" being an IntegerType.

You can then check the contents of the dataframe using:

dataframe.show()

which should output:

+----+---+
|Name|Age|
+----+---+
|John| 30|
|Jane| 25|
| Bob| 40|
+----+---+
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-04-29 05:37:27 +0000

Seen: 9 times

Last updated: Apr 29 '23