What is the reason for the inability of Pyspark to overwrite a CSV file stored in S3?

asked 2023-06-11 09:57:38 +0000

nofretete
31 ●3 ●5

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2023-06-11 10:01:01 +0000

ladyg
21 ●1 ●2

The inability of Pyspark to overwrite a CSV file stored in S3 is mainly due to the nature of S3 storage. S3 is object storage and does not allow in-place updates, which means files cannot be overwritten or modified once they are stored in the bucket. Instead, the existing object must be deleted and replaced with a new version. Pyspark cannot directly overwrite a CSV file stored in S3 but can replace it by deleting the old file and writing the new one with the same name.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2023-06-11 09:57:38 +0000

Seen: 15 times

Last updated: Jun 11 '23

What is the reason for the inability of Pyspark to overwrite a CSV file stored in S3? edit

1 Answer