Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

You can extract text that is separated by <wbr> from an anchor tag using Scrapy by using the response.xpath method and selecting the text() of the anchor tag. Here's an example:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    start_urls = ['https://www.example.com']

    def parse(self, response):
        # select all anchor tags that contain <wbr> tags
        for anchor in response.xpath('//a[contains(.,"<wbr>")]'):
            # extract the text by joining the parts separated by <wbr> tags
            text = ''.join(anchor.xpath('.//text()').extract()).replace('<wbr>','')
            # do something with the text
            yield {
                'text': text
            }

In this example, we first select all anchor tags that contain <wbr> tags using the contains() function in XPath. Then we extract the text of each anchor tag using the text() function and join the parts separated by <wbr> tags using the replace() method. Finally, we yield a dictionary containing the extracted text.