What is the reason for the inability to write a spark dataframe when encountering an error about a nested NullType in the column 'colname' that is of type ArrayType?

asked 2022-08-13 11:00:00 +0000

devzero
51 ●1 ●4 ●4

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2022-04-02 15:00:00 +0000

bukephalos
21 ●4 ●3

The reason for the inability to write a spark dataframe when encountering an error about a nested NullType in the column 'colname' that is of type ArrayType is that the NullType is not supported in the Parquet file format. When writing a dataframe to Parquet, all null values in the data are converted to a special internal representation of null called the "null bitset," which is not compatible with the nested NullType. Therefore, if a nested NullType exists in a column of type ArrayType, the dataframe cannot be written to Parquet.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2022-08-13 11:00:00 +0000

Seen: 9 times

Last updated: Apr 02 '22

What is the reason for the inability to write a spark dataframe when encountering an error about a nested NullType in the column 'colname' that is of type ArrayType? edit

1 Answer