You can extract text that is separated by <wbr> from an anchor tag using Scrapy by using the response.xpath
method and selecting the text()
of the anchor tag. Here's an example:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['https://www.example.com']
def parse(self, response):
# select all anchor tags that contain <wbr> tags
for anchor in response.xpath('//a[contains(.,"<wbr>")]'):
# extract the text by joining the parts separated by <wbr> tags
text = ''.join(anchor.xpath('.//text()').extract()).replace('<wbr>','')
# do something with the text
yield {
'text': text
}
In this example, we first select all anchor tags that contain <wbr>
tags using the contains()
function in XPath. Then we extract the text of each anchor tag using the text()
function and join the parts separated by <wbr>
tags using the replace()
method. Finally, we yield a dictionary containing the extracted text
.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-06-27 19:54:49 +0000
Seen: 11 times
Last updated: Jun 27 '23
How can I make the html anchor link jump further?
How can I export anchor tags between different YAML files?
How can I display only the name and hide the button and text input box?
How can I load a text file in gensim?
How can I indent numerous lines of the HTML tag <p>?
How can I change the name of a subfolder within multiple folders by replacing a portion of the text?
How to arrange columns in a Flutter datatable?
How can I use html-React to substitute specific text within a contentEditable div?
How can the text size of the plot be enlarged using ggcuminc?