1 | initial version |
There are several ways to retrieve website links using Python. Here are two common methods:
import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
# Find all links on the page
links = []
for link in soup.find_all('a'):
links.append(link.get('href'))
print(links)
from selenium import webdriver
url = 'https://www.example.com'
driver = webdriver.Chrome('path/to/chromedriver.exe')
driver.get(url)
# Find all links on the page
links = []
for link in driver.find_elements_by_tag_name('a'):
links.append(link.get_attribute('href'))
print(links)
driver.quit()
Note that using the requests and BeautifulSoup method requires less setup and browser overhead compared to the Selenium method, but is limited to static web pages. If you need to scrape dynamic web pages that involve user interaction or require JavaScript rendering, Selenium is a better choice.