Introduction to Transformer Models for NLP

Important Links

Timeline of NLP

  • 1990 : RNN
  • 1997 : LSTM
  • 2013 : Training of RNN on a PhD thesis by Ilya Sutskever.
  • 2014 : Sequence to sequence models → Encoder (RNN) + Decoder (RNN)
  • 2015 : Attention → Decoder can check hidden states of the Encoder
  • 2017 : Transformers
  • 2018 : Pre-Trained Language Models

Transformer Architecture

2x4 x 4x2 = 2x2

BERT