1 | initial version |
To remove all HTML tags from a string in Python, you can use the re module (regular expressions) to search for and replace HTML tags with an empty string. Here's a function that does this:
import re
def remove_html_tags(text):
"""Remove all HTML tags from a given string."""
# The pattern to match any HTML tag: <tag_name attribute1="value1" attribute2="value2"> or </tag_name>
html_tag_pattern = re.compile(r'<[^>]+>')
return html_tag_pattern.sub('', text)
You can use this function by passing a string with HTML tags to it:
html_string = "<p>Hello, <strong>world</strong>! This is an <em>example</em> string with <a href='https://www.example.com'>HTML tags</a>.</p>"
plain_text = remove_html_tags(html_string)
print(plain_text)
This will output:
Hello, world! This is an example string with HTML tags.
Keep in mind that this is a simple approach and may not handle all edge cases of HTML tags. If you need to handle more complex HTML, consider using an HTML parsing library like Beautiful Soup.