Share
Use lxml which is the best xml/html library for python.
import lxml.html t = lxml.html.fromstring("...") t.text_content()
And if you just want to sanitize the html look at the lxml.html.clean module