Revision history [back]

To remove all HTML tags from a string in Python, you can use the re module (regular expressions) to search for and replace HTML tags with an empty string. Here's a function that does this:

import re

def remove_html_tags(text):
    """Remove all HTML tags from a given string."""
    # The pattern to match any HTML tag: <tag_name attribute1="value1" attribute2="value2"> or </tag_name>
    html_tag_pattern = re.compile(r'<[^>]+>')
    return html_tag_pattern.sub('', text)

You can use this function by passing a string with HTML tags to it:

html_string = "<p>Hello, <strong>world</strong>! This is an <em>example</em> string with <a href='https://www.example.com'>HTML tags</a>.</p>"
plain_text = remove_html_tags(html_string)
print(plain_text)

This will output:

Hello, world! This is an example string with HTML tags.

Keep in mind that this is a simple approach and may not handle all edge cases of HTML tags. If you need to handle more complex HTML, consider using an HTML parsing library like Beautiful Soup.