A language model (LM) is a tool that guesses the next word in a given sequence of words.
The evolution of LMs can be broadly classified into five stages.
- Rule-based LM
- Statistical LM
- Neural LM
- Pre-Trained LM
- Large Language Models (LLM)
- Grammatical rules of a specific language were used to predict the next word in a sentence.
- For e.g. in English `I` will be followed by `am` not `are`, and `They` can be followed by `have` or `are` like these grammatical rules.
- However, there are many exceptions and it is quite difficult to handle all the rules of a language.
- In this method, a large set of texts was analyzed and the word-level probability of a word after a set of words was determined in a statistical way.
- How many times does `am` appear after `I` that probability is compared with other words like `are` or `is`.
- In an advanced SLM n-gram models were used where instead of finding probability from a previous single word, the last bi-gram (two words), and tri-grams (three words) were used to find the probability of the next word.
- However, In English, a single word can have multiple meanings based on the context of the sentence. SLM can not able to determine the context of the sentence.
- With Word2Vec (Word to Vector), these models calculate the probability of the following words by neural networks. Example: RNN (Recurrent Neural Network), LSTM (Long Short Term Memory)
- With ELMo (Context-aware Word Embedding) and Self-Attention through Transformer raised the performance bar of NLP tasks. Example: BERT and GPT-2
- There is a thin line between PLM and LLM.
- Scaling model size and training data size of PLMs new emergent abilities of model discovered. Example: ChatGPT, LLaMA, Claude
- LLM is different from PLM broadly in three ways:
- Emergent abilities
- Prompting/Conversational Interface
- To attend the scale, Engineering and Research problems must be solved.
: Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., . . . Wen, J. (2023). A Survey of Large Language Models. ArXiv. /abs/2303.18223