What is the issue with the .display() function on a Pandas DataFrame in PySpark on Databricks?

asked 2023-07-02 03:00:27 +0000

djk
21 ●1 ●1

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2023-07-02 03:09:02 +0000

plato
56 ●5 ●5

The issue with the .display() function on a Pandas DataFrame in PySpark on Databricks is that it is not a built-in function for PySpark dataframes, which are distributed across many nodes. The .display() function is only available for Pandas dataframes, which are a single node object. Hence, to use the .display() function on a PySpark dataframe in Databricks, users need to convert it to a Pandas dataframe first, which may not be feasible for large datasets.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2023-07-02 03:00:27 +0000

Seen: 9 times

Last updated: Jul 02 '23

What is the issue with the .display() function on a Pandas DataFrame in PySpark on Databricks? edit

1 Answer