What is the HTML parser in Python?

Methods

HTMLParser.feed(data): used to input data to the HTML parser.
HTMLParser.handle_starttag(tag, attrs): used to handle the start tags in the HTML. The parameter tag contains the opening tag, and the attrs parameter contains the attribute of that tag.
HTMLParser.handle_endtag(tag, attrs): used to handle the end tags in the HTML. The parameter tag contains the closing tag, and the attrs parameter contains the attribute of that tag.
HTMLParser.handle_data(data): used to handle the data contained between the HTML tags.
HTMLParser.handle_comment(data):used to handle HTML comments.

Example

The functions of HTMLParser will be overridden to provide the desired functionality. Note that the class Parser() inherits from the HTMLParser class.

from html.parser import HTMLParser
class Parser(HTMLParser):
  # method to append the start tag to the list start_tags.
  def handle_starttag(self, tag, attrs):
    global start_tags
    start_tags.append(tag)
    # method to append the end tag to the list end_tags.
  def handle_endtag(self, tag):
    global end_tags
    end_tags.append(tag)
  # method to append the data between the tags to the list all_data.
  def handle_data(self, data):
    global all_data
    all_data.append(data)
  # method to append the comment to the list comments.
  def handle_comment(self, data):
    global comments
    comments.append(data)
start_tags = []
end_tags = []
all_data = []
comments = []
# Creating an instance of our class.
parser = Parser()
# Poviding the input.
parser.feed('<html><title>Desserts</title><body><p>'
            'I am a fan of frozen yoghurt.</p><'
            '/body><!--My first webpage--></html>')
print("start tags:", start_tags)
print("end tags:", end_tags)
print("data:", all_data)
print("comments", comments)

Free Resources