Ask Your Question
3

How can we convert the values of specific columns from a given list in a pyspark dataframe into separate rows with their respective values?

asked 2021-06-23 11:00:00 +0000

lalupa gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2021-10-18 22:00:00 +0000

scrum gravatar image

You can use the explode function in PySpark to convert the values of specific columns from a given list into separate rows with their respective values.

Here is an example:

Suppose you have a PySpark dataframe with the following structure:

+---+------------+
| id|   fruits   |
+---+------------+
|  1|[apple,banana]|
|  2|[orange,kiwi]|
+---+------------+

You want to convert the fruits column into separate rows with their respective values.

To achieve this, you can use the explode function as follows:

from pyspark.sql.functions import explode

df = df.select(df.id, explode(df.fruits).alias("fruit"))

df.show()

This will produce the following output:

+---+------+
| id| fruit|
+---+------+
|  1| apple|
|  1|banana|
|  2|orange|
|  2|  kiwi|
+---+------+

As you can see, the fruits column has been exploded into separate rows with their respective values.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2021-06-23 11:00:00 +0000

Seen: 8 times

Last updated: Oct 18 '21