There could be a few reasons why only the first file is being read when using Polars and Glob to read Parquet from S3:
Incorrect file path: Make sure the file path you are providing to the glob.glob()
function is correct and includes the full file path, including the bucket name and file extension.
Parallelism settings: By default, Polars and Glob only read one file at a time. If you have multiple files in your S3 bucket, you can increase the number of partitions to read all the files in parallel. You can do this by setting the n_workers
parameter to a value greater than 1.
Memory limitations: It's possible that your machine does not have enough memory to read all the files at once. In this case, you can try reading the files in batches by setting the row_group_size
parameter to a smaller value.
Parquet file compatibility: Make sure the Parquet file you are trying to read is compatible with Polars. If the schema of the file is not compatible, you may encounter errors or only be able to read a portion of the file.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2021-12-04 11:00:00 +0000
Seen: 14 times
Last updated: Jan 31 '22
What is the process for adding a class path permanently to the Mac terminal?
Why isn't the CSS background image being displayed?
What are the steps to install Homebrew in the /opt/ directory?
How can I insert a file path into Tkinter through drag and drop function?
What is preventing me from adding new static files to my springboot web application?
Using cucumber-junit, what is the process for configuring the path to my Cucumber features?
How can I extract the directory name from a given path in bash?