The reason for the inability to write a spark dataframe when encountering an error about a nested NullType in the column 'colname' that is of type ArrayType is that the NullType is not supported in the Parquet file format. When writing a dataframe to Parquet, all null values in the data are converted to a special internal representation of null called the "null bitset," which is not compatible with the nested NullType. Therefore, if a nested NullType exists in a column of type ArrayType, the dataframe cannot be written to Parquet.
Asked: 2022-08-13 11:00:00 +0000
Seen: 11 times
Last updated: Apr 02 '22