Evolution of Language Models

Evolution of Language Models

Soumendra Kumar Sahoo

What are Language Models?

A language model (LM) is a tool that guesses the next word in a given sequence of words.

Evolution of Language Models

The evolution of LMs can be broadly classified into five stages.
  1. Rule-based LM
  1. Statistical LM
  1. Neural LM
  1. Pre-Trained LM
  1. Large Language Models (LLM)

Rule-based Language Models

  • Grammatical rules of a specific language were used to predict the next word in a sentence.
  • For e.g. in English `I` will be followed by `am` not `are`, and `They` can be followed by `have` or `are` like these grammatical rules.
  • However, there are many exceptions and it is quite difficult to handle all the rules of a language.

Statistical Language Models

  • In this method, a large set of texts was analyzed and the word-level probability of a word after a set of words was determined in a statistical way.
  • How many times does `am` appear after `I` that probability is compared with other words like `are` or `is`.
  • In an advanced SLM n-gram models were used where instead of finding probability from a previous single word, the last bi-gram (two words), and tri-grams (three words) were used to find the probability of the next word.
  • However, In English, a single word can have multiple meanings based on the context of the sentence. SLM can not able to determine the context of the sentence.

Neural Language Models

  • With Word2Vec (Word to Vector), these models calculate the probability of the following words by neural networks. Example: RNN (Recurrent Neural Network), LSTM (Long Short Term Memory)

Pre-Trained Language Models

  • With ELMo (Context-aware Word Embedding) and Self-Attention through Transformer raised the performance bar of NLP tasks. Example: BERT and GPT-2

Large Language Models (LLM)

  • There is a thin line between PLM and LLM.
  • Scaling model size and training data size of PLMs new emergent abilities of model discovered. Example: ChatGPT, LLaMA, Claude
  • LLM is different from PLM broadly in three ways:
    • Emergent abilities
    • Prompting/Conversational Interface
    • To attend the scale, Engineering and Research problems must be solved.
    • Β 


[1]: Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., . . . Wen, J. (2023). A Survey of Large Language Models. ArXiv. /abs/2303.18223