Algorithmic bit-wise boolean task on a transformer
A downloadable game
We sought to carry out interpretability assessment on boolean logical operations solved by single layer transformers. We found that attention layer activations were interpretable for single bit/single operation and 6-bit/3 operation tasks with attention being paid to the correct tokens, i.e. those that contained information. In the 6-bit/3 operation task we additionally found that the operation was also attended to, but not as much as we had anticipated. The MLP layers appeared to have achieved significant separation in the representation of the tokens provided - not a surprising result. However there were some unexpected results in the representation of the begin sentence token and the operator in the 1bit task that we did not understand. Our best guess is that the MLP layers separate out these tokens from others as they do not have any variability or information for carrying out the task.
Leave a comment
Log in with itch.io to leave a comment.