Ask Your Question
2

How can a Scala function be utilized to replace null column values in DataFrame1 with the corresponding values in DataFrame2?

asked 2023-07-12 13:38:31 +0000

scrum gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2023-07-12 13:40:02 +0000

bukephalos gravatar image

Assuming that the two data frames have the same schema and the same number of rows, the following Scala function can be used to replace null column values in DataFrame1 with the corresponding values in DataFrame2:

import org.apache.spark.sql.DataFrame

def replaceNullValues(df1: DataFrame, df2: DataFrame): DataFrame = {
  val columnNames = df1.columns
  columnNames.foldLeft(df1) { (tempDF, colName) =>
    tempDF.na.fill(df2.select(colName).collect()(0)(0), Seq(colName))
  }
}

The function takes two data frames as input and returns a new data frame. It starts by getting the column names of the first data frame, and then uses a fold operation to iterate over each column. For each column, the function uses the na.fill() method to replace null values in the column with the corresponding non-null value from the second data frame. The Seq(colName) parameter specifies the name of the column to fill, and the df2.select(colName).collect()(0)(0) expression selects the first non-null value from the second data frame for that column.

To use the function, simply call it with the two data frames as arguments:

val df1 = ... // original data frame with some null values
val df2 = ... // data frame with replacement values for nulls
val filledDF = replaceNullValues(df1, df2)

The filledDF data frame will have the same schema and number of rows as df1, but with null values replaced by the corresponding non-null values from df2.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-07-12 13:38:31 +0000

Seen: 11 times

Last updated: Jul 12 '23