1 | initial version |
You can extract text that is separated by <wbr> from an anchor tag using Scrapy by using the response.xpath
method and selecting the text()
of the anchor tag. Here's an example:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['https://www.example.com']
def parse(self, response):
# select all anchor tags that contain <wbr> tags
for anchor in response.xpath('//a[contains(.,"<wbr>")]'):
# extract the text by joining the parts separated by <wbr> tags
text = ''.join(anchor.xpath('.//text()').extract()).replace('<wbr>','')
# do something with the text
yield {
'text': text
}
In this example, we first select all anchor tags that contain <wbr>
tags using the contains()
function in XPath. Then we extract the text of each anchor tag using the text()
function and join the parts separated by <wbr>
tags using the replace()
method. Finally, we yield a dictionary containing the extracted text
.