Ask Your Question
1

Function in Python that removes all HTML Tags from a String?

asked 2023-04-25 20:24:38 +0000

wagner gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2023-04-25 20:26:06 +0000

wagner gravatar image

To remove all HTML tags from a string in Python, you can use the re module (regular expressions) to search for and replace HTML tags with an empty string. Here's a function that does this:

import re

def remove_html_tags(text):
    """Remove all HTML tags from a given string."""
    # The pattern to match any HTML tag: <tag_name attribute1="value1" attribute2="value2"> or </tag_name>
    html_tag_pattern = re.compile(r'<[^>]+>')
    return html_tag_pattern.sub('', text)

You can use this function by passing a string with HTML tags to it:

html_string = "<p>Hello, <strong>world</strong>! This is an <em>example</em> string with <a href='https://www.example.com'>HTML tags</a>.</p>"
plain_text = remove_html_tags(html_string)
print(plain_text)

This will output:

Hello, world! This is an example string with HTML tags.

Keep in mind that this is a simple approach and may not handle all edge cases of HTML tags. If you need to handle more complex HTML, consider using an HTML parsing library like Beautiful Soup.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

1 follower

Stats

Asked: 2023-04-25 20:24:38 +0000

Seen: 24 times

Last updated: Apr 25 '23