elgeopaso.utils.text_toolbelt module¶
Tool.
- class elgeopaso.utils.text_toolbelt.TextToolbelt[source]¶
Bases :
object
Tools to manipulate text: tokenize, clean, etc.
- classmethod remove_html_markups(html_text, cleaner='bs-lxml')[source]¶
Very basic cleaner for HTML markups.
- classmethod tokenize(input_content)[source]¶
Extraction of words mentioned into the offers. The goal is to perform a semantic analysis. Mainly based on NLTK: https://www.nltk.org/.