Question Answering systems using BERT and Transformers

2 minute read

Introduction

Question Answering is a fascinating topic since ages. Examinations, Quiz competitions - QA is ubiquitous. There are different methods to find answers to questions - search, FAQ based, extractive QA and others. Each method is different and has its own pros and cons.

To set some context on what these words mean -

  • Search - Finding relevant documents in a text corpus
  • FAQ - Finding the answer based on a similar question already in the FAQ (Frequently Asked Questions)
  • Extractive QA - Finding the right answer automatically from the text corpus

The invention of Transformer and subsequent BERT methods in recent years has moved the needle in achieving great accuracies and have made these techniques usable in production applications.

We explain how to build QA applications at scale using these recent trends.

Our Work

We have applied BERT based architectures in FAQ based, document based QA systems. Overall the procedure is -

Data Processing

Text data from different sources (read webpages, documents, social media, emails and others) are collected, cleaned, pre-processed and indexed into a search platform like Elastic Search.

Model Training & Fine Tuning

There are a variety of BERT based models. Models pre-trained on large corpus of text have to be fine tuned on the required task, in this case a Q&A task using Q&A datasets. The data can be from open source or annotated using the particular customer data.

Ask a Question

When a query is entered by the user in the platform – the following happens.

  • Candidate passages from the search platform will be retrieved using a text algorithm
  • The candidate passages will be scored based on relevance
  • The top N passages will be input into the model to generate the potential answers (for every passage) along with the confidence scores.
  • The produced answers will be scored using an ML approach to finalize the best k answers
  • The answers will be presented to the user

User Feedback

Platform will support the capability for the user to provide feedback on the answers shown. The collected data will be stored and used for improving the scoring algorithms.

A schematic is below

Summary

Overall, QA systems reduce the time taken to find answers in a text corpus. We have experience in building and deploying these applications in cloud as well as on-premise.

A couple of demo applications:

Updated:

Comments