Ask Your Question
3

How can I extract text that is separated by <wbr> from an anchor tag using Scrapy?

asked 2023-06-27 19:54:49 +0000

david gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2023-06-27 20:24:01 +0000

ladyg gravatar image

You can extract text that is separated by <wbr> from an anchor tag using Scrapy by using the response.xpath method and selecting the text() of the anchor tag. Here's an example:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    start_urls = ['https://www.example.com']

    def parse(self, response):
        # select all anchor tags that contain <wbr> tags
        for anchor in response.xpath('//a[contains(.,"<wbr>")]'):
            # extract the text by joining the parts separated by <wbr> tags
            text = ''.join(anchor.xpath('.//text()').extract()).replace('<wbr>','')
            # do something with the text
            yield {
                'text': text
            }

In this example, we first select all anchor tags that contain <wbr> tags using the contains() function in XPath. Then we extract the text of each anchor tag using the text() function and join the parts separated by <wbr> tags using the replace() method. Finally, we yield a dictionary containing the extracted text.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-06-27 19:54:49 +0000

Seen: 11 times

Last updated: Jun 27 '23