There are various ways to store the output of a Scrapy spider outside of a Docker container. Here are a few options:
docker run -v /path/on/host:/output scrapy-image scrapy crawl spider -o /output/output.json
FEED_URI = 's3://bucket-name/spider-output/%(name)s-%(time)s.json'
Use a container orchestration platform, such as Kubernetes or Docker Swarm, to manage your Docker containers and store the output. These platforms can provide persistent storage solutions that allow you to store the output even if the container is destroyed or recreated.
Store the output in a database, such as MongoDB, and configure Scrapy to save its output to the database. This will allow you to access the data from outside the container and easily query it.
Overall, the best approach will depend on your specific use case and requirements.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2022-09-01 11:00:00 +0000
Seen: 8 times
Last updated: Dec 08 '21
How do you log Python data into a database?
Is it possible to query a unique index directly instead of querying a collection in MongoDB?
What is the process of integrating API data into MongoDB using Spark/Python?
Please help me with connecting my MongoDB to my JS file as I am struggling to do so.
How can additional fields that have been transformed be queried in MongoDB?