There are several ways to retrieve website links using Python. Here are two common methods:
import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
# Find all links on the page
links = []
for link in soup.find_all('a'):
links.append(link.get('href'))
print(links)
from selenium import webdriver
url = 'https://www.example.com'
driver = webdriver.Chrome('path/to/chromedriver.exe')
driver.get(url)
# Find all links on the page
links = []
for link in driver.find_elements_by_tag_name('a'):
links.append(link.get_attribute('href'))
print(links)
driver.quit()
Note that using the requests and BeautifulSoup method requires less setup and browser overhead compared to the Selenium method, but is limited to static web pages. If you need to scrape dynamic web pages that involve user interaction or require JavaScript rendering, Selenium is a better choice.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-07-09 06:52:36 +0000
Seen: 13 times
Last updated: Jul 09 '23
How can a list be sorted alphabetically within a console application?
What is a more effective method for substituting a value in the query string of a specific URL?
Is it feasible to utilize a Toggle Button to switch among multiple classes exceeding 2?
What is the process for generating a dynamic subdomain/URL using vue.js?
How can I create a transition on click using CSS without the need for JavaScript?