How can I extract text that is separated by <wbr> from an anchor tag using Scrapy?

answered 2023-06-27 20:24:01 +0000

ladyg
21 ●1 ●2

You can extract text that is separated by <wbr> from an anchor tag using Scrapy by using the response.xpath method and selecting the text() of the anchor tag. Here's an example:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    start_urls = ['https://www.example.com']

    def parse(self, response):
        # select all anchor tags that contain <wbr> tags
        for anchor in response.xpath('//a[contains(.,"<wbr>")]'):
            # extract the text by joining the parts separated by <wbr> tags
            text = ''.join(anchor.xpath('.//text()').extract()).replace('<wbr>','')
            # do something with the text
            yield {
                'text': text
            }

In this example, we first select all anchor tags that contain <wbr> tags using the contains() function in XPath. Then we extract the text of each anchor tag using the text() function and join the parts separated by <wbr> tags using the replace() method. Finally, we yield a dictionary containing the extracted text.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

How can I extract text that is separated by <wbr> from an anchor tag using Scrapy?

1 Answer

Your Answer

Question Tools

Stats

Related questions

How can I extract text that is separated by <wbr> from an anchor tag using Scrapy? edit

1 Answer