Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

There could be a few reasons why only the first file is being read when using Polars and Glob to read Parquet from S3:

  1. Incorrect file path: Make sure the file path you are providing to the glob.glob() function is correct and includes the full file path, including the bucket name and file extension.

  2. Parallelism settings: By default, Polars and Glob only read one file at a time. If you have multiple files in your S3 bucket, you can increase the number of partitions to read all the files in parallel. You can do this by setting the n_workers parameter to a value greater than 1.

  3. Memory limitations: It's possible that your machine does not have enough memory to read all the files at once. In this case, you can try reading the files in batches by setting the row_group_size parameter to a smaller value.

  4. Parquet file compatibility: Make sure the Parquet file you are trying to read is compatible with Polars. If the schema of the file is not compatible, you may encounter errors or only be able to read a portion of the file.