There are various ways to store the output of a Scrapy spider outside of a Docker container. Here are a few options:
docker run -v /path/on/host:/output scrapy-image scrapy crawl spider -o /output/output.json
FEED_URI = 's3://bucket-name/spider-output/%(name)s-%(time)s.json'
Use a container orchestration platform, such as Kubernetes or Docker Swarm, to manage your Docker containers and store the output. These platforms can provide persistent storage solutions that allow you to store the output even if the container is destroyed or recreated.
Store the output in a database, such as MongoDB, and configure Scrapy to save its output to the database. This will allow you to access the data from outside the container and easily query it.
Overall, the best approach will depend on your specific use case and requirements.
Asked: 2022-09-01 11:00:00 +0000
Seen: 8 times
Last updated: Dec 08 '21