Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Here is one approach to transforming a dictionary presented as a string into a structured DataFrame in Scala:

  1. Parse the string into a Map using the parse() method from the spray-json library. This library provides JSON parsing and serialization for Scala.
import spray.json._

val jsonString = "{ \"name\":\"John\", \"age\":30, \"city\":\"New York\" }"
val jsonMap = jsonString.parseJson.convertTo[Map[String, Any]]
  1. Convert the Map into a Spark RDD using the parallelize() method.
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder.appName("DictionaryStringToDataFrame").getOrCreate()

val rdd = spark.sparkContext.parallelize(Seq(jsonMap))
  1. Convert the RDD into a DataFrame using the createDataFrame() method from the SparkSession.
import org.apache.spark.sql.DataFrame

val df = spark.createDataFrame(rdd)
  1. Print the DataFrame schema and data.
df.printSchema()
df.show()

The output should look something like this:

root
 |-- age: integer (nullable = false)
 |-- city: string (nullable = true)
 |-- name: string (nullable = true)

+---+--------+----+
|age|    city|name|
+---+--------+----+
| 30|New York|John|
+---+--------+----+