Evolution of Language Models

What are Language Models?

A language model (LM) is a tool that guesses the next word in a given sequence of words.

Evolution of Language Models

The evolution of LMs can be broadly classified into five stages.

Rule-based LM

Statistical LM

Neural LM

Pre-Trained LM

Large Language Models (LLM)

Rule-based Language Models

Grammatical rules of a specific language were used to predict the next word in a sentence.

For e.g. in English `I` will be followed by `am` not `are`, and `They` can be followed by `have` or `are` like these grammatical rules.

However, there are many exceptions and it is quite difficult to handle all the rules of a language.

Statistical Language Models

In this method, a large set of texts was analyzed and the word-level probability of a word after a set of words was determined in a statistical way.

How many times does `am` appear after `I` that probability is compared with other words like `are` or `is`.

In an advanced SLM n-gram models were used where instead of finding probability from a previous single word, the last bi-gram (two words), and tri-grams (three words) were used to find the probability of the next word.

However, In English, a single word can have multiple meanings based on the context of the sentence. SLM can not able to determine the context of the sentence.

Neural Language Models

With Word2Vec (Word to Vector), these models calculate the probability of the following words by neural networks. Example: RNN (Recurrent Neural Network), LSTM (Long Short Term Memory)

Pre-Trained Language Models

With ELMo (Context-aware Word Embedding) and Self-Attention through Transformer raised the performance bar of NLP tasks. Example: BERT and GPT-2

Large Language Models (LLM)

There is a thin line between PLM and LLM.

Scaling model size and training data size of PLMs new emergent abilities of model discovered. Example: ChatGPT, LLaMA, Claude

LLM is different from PLM broadly in three ways:

Emergent abilities
Prompting/Conversational Interface
To attend the scale, Engineering and Research problems must be solved.

References

[1]: Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., . . . Wen, J. (2023). A Survey of Large Language Models. ArXiv. /abs/2303.18223

About

Training & Certifications

External Links

Blog↗️

Obsidian Notes↗️

Badges↗️

Countdowns↗️