Your work is much appreciated. I'm sorry that I'm not a native English speaker, so please understand if there are mistakes. I have a question, does your work only discuss whether the similarity between the first n-1 layers only and not the similarity between the n-1th layer and the nth layer? Is this because the last layer is connected to the classification header so we always keep the last transformer layer by default?
Under a certain threshold, if the similarity exceeds the threshold, we will always delete the latter layer, for example, if the similarity between layer i and layer i+1 exceeds the threshold and we need to delete or replace the layer, then we must target layer i+1
