Ask Your Question
3

What is the reason for the inability of Pyspark to overwrite a CSV file stored in S3?

asked 2023-06-11 09:57:38 +0000

nofretete gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
2

answered 2023-06-11 10:01:01 +0000

ladyg gravatar image

The inability of Pyspark to overwrite a CSV file stored in S3 is mainly due to the nature of S3 storage. S3 is object storage and does not allow in-place updates, which means files cannot be overwritten or modified once they are stored in the bucket. Instead, the existing object must be deleted and replaced with a new version. Pyspark cannot directly overwrite a CSV file stored in S3 but can replace it by deleting the old file and writing the new one with the same name.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-06-11 09:57:38 +0000

Seen: 15 times

Last updated: Jun 11 '23