One option would be to decrease the row group sizes until the total allocation falls within the 95.00% limit of the driver_memory. This can be done by adjusting the parquet.block.size
and spark.sql.parquet.row.group.size
parameters.
Another option would be to increase the driver_memory to accommodate the total allocation. However, this may not be feasible if there are other constraints or limitations in the system.
It's important to note that row group sizes should generally be chosen based on the data characteristics and workload requirements, rather than solely based on memory constraints.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2022-07-13 11:00:00 +0000
Seen: 12 times
Last updated: Sep 26 '22
What is the method for programmatic access to a time series?
What is the procedure for using pg_restore on Windows with Docker?
Can SqlDependency be used in a programming language other than .NET, such as node js?
How can multiple queries be merged into a single stored procedure in MySQL?
How can I deal with Expression.Error related to a column in Power Query?
How can you implement pagination in Oracle for the LISTAGG() function?
What is the process for implementing a FutureBuilder on an OnTap function in Flutter?