Skip to content

The last transformer layer of the model is always kept by default? #3

@Jayce0625

Description

@Jayce0625

Your work is much appreciated. I'm sorry that I'm not a native English speaker, so please understand if there are mistakes. I have a question, does your work only discuss whether the similarity between the first n-1 layers only and not the similarity between the n-1th layer and the nth layer? Is this because the last layer is connected to the classification header so we always keep the last transformer layer by default?
Under a certain threshold, if the similarity exceeds the threshold, we will always delete the latter layer, for example, if the similarity between layer i and layer i+1 exceeds the threshold and we need to delete or replace the layer, then we must target layer i+1
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions