To go the information about the relative dependencies of various tokens appearing at diverse destinations inside the sequence, a relative positional encoding is calculated by some form of Studying. Two popular types of relative encodings are:As a result, architectural aspects are the same as the baselines. Furthermore, optimization configurations f