Table of Contents

Most ML algorithms can only take lowdimensional numerical data as inputs. Why? â†’ Faster for computation.

That means we must transform nonnumeric variables (ex. items and users) into numbers and vectors.

We could try to represent items by numerical values as it is, however, neural networks treat numerical inputs as continuous variables. That means higher numbers have greater significance than lower numbers.

It also sees numbers that are similar as being similar items. This makes perfect sense for a field like â€śageâ€ť but is nonsensical when the numbers represent a categorical variable.

Onehot encoding works in turning a category into a set of continuous variables, but we literally end up with a huge vector of 0s with a single or a handful of 1â€™s (Sparse Vectors), creates an unmanageable number of dimensions.

Since each item is technically equidistant in vector space, it omits context around similarity. As a result, we have no way of evaluating the relationship between the two entities.

We could generate more onetoone mappings, or attempt to group them and look for similarities. This requires extensive work and manual labeling thatâ€™s typically infeasible.

Embeddings are dense (no/fewer 0s) numerical representations of realworld objects and relationships, expressed as a vector.

The vector space quantifies the semantic similarity between categories. Embedding vectors that are close to each other are considered similar.

Sometimes, they are used directly for the â€śSimilar items to thisâ€ť section in an ecommerce store. Other times, embeddings are passed to other models. In those cases, the model can share learnings across similar items rather than treating them as two completely unique categories, as is the case with onehot encodings.

For this reason, embeddings can be used to accurately represent sparse data like clickstreams, text, and ecommerce purchases as features to downstream models.

On the other hand, embeddings are much more expensive to compute than onehot encodings and are far less interpretable.