The goal of this project was to develop an AI backend for a FAQ chatbot. A client had collected significant amount of questions and answers that were used to train machine learning models. Besides text, the questions had assigned tags which were used to cluster questions info topic-based segments. The overall algorithm was trained as follows:
first we trained a topic classifier based on the assigned tags; for this we performed TF-IDF transformation and we trained an XGBoost classifier second, for each topic we built a Doc2Vec embeddings targeting only the specific topic.