The Data Savvy

The Transformer Architecture Transformer is an attention-based model that increases training speed. Parallelization is what makes the Transformer unique. Many models are built on this foundation. Researchers at Google and the University of Toronto developed Transformers in 2017 originally for translation. The three main concepts behind Transformers are: 1. Positional Encodings : In language processing, the order of words matters. The Transformer model takes all the words in the input sequence — an English sentence — and appends a number to each word to denote its order. 2. Attention : Depending on their impact on target sequence generation, individual words in the input sequence are weighed in the attention mechanism. 3. Self-Attention : Language tasks will be more manageable if a neural network learns a better internal representation of language is the concept behind self-attention. Compared to recurrent layers, self-attention layers connect all positions with an equal number of seque...

Search This Blog

The Data Savvy

Posts

The Transformer Architecture