Machine Learnin backed FAQ Chatbot
The goal of this project was to develop an AI backend for a frequently asked questions chatbot. A client had collected a significant amount of questions and answers that were used to train machine learning natural language processing models.
Besides text, the questions had assigned tags which were used to cluster questions into topic-based segments.
The overall algorithm was trained as follows:
- First we trained a topic classifier based on the assigned tags; for this, we performed TF-IDF transformation, and we trained an XGBoost classifier
- second, for each topic we built Doc2Vec embeddings targeting only the specific topic. Such an approach allowed us to save computational costs and memory footprint, as only one relatively small model was loaded to the main memory when processing a question.
An arriving question was processed as follows:
- First we had to use a generic classifier to find the right topic,
- then we loaded the embeddings for the topic,
- finally, we found the most similar embedding to the question
The model was deployed and served as a REST API within a docker container, which enabled horizontal scalability of the solution.