A downloadable game

We sought to carry out interpretability assessment on boolean logical operations solved by single layer transformers. We found that attention layer activations were interpretable for single bit/single operation and 6-bit/3 operation tasks with attention being paid to the correct tokens, i.e. those that contained information. In the 6-bit/3 operation task we additionally found that the operation was also attended to, but not as much as we had anticipated. The MLP layers appeared to have achieved significant separation in the representation of the tokens provided - not a surprising result. However there were some unexpected results in the representation of the begin sentence token and the operator in the 1bit task that we did not understand. Our best guess is that the MLP layers separate out these tokens from others as they do not have any variability or information for carrying out the task.

More information

Status	Released
Author	catubc

Download

Hackthon_writeup_Team_ZAIA_Final.pdf 450 kB

Algorithmic bit-wise boolean task on a transformer

Download

Leave a comment