You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was deeply impressed by the window-major feature map organization proposed in the paper and I checked the implementation.
However, It seems that Pytorch does the window-major feature map organization automatically for us when we want to perform window attention (through the copy and re-organization built in the reshape function for non-contiguous tensor) and I could not come up with a way to calculate window attention with row-major feature map organization.
What I want is to write a code to make clear efficiency comparison between these two organization schemes. Is there any code available? Or any suggestion?
The text was updated successfully, but these errors were encountered:
I was deeply impressed by the window-major feature map organization proposed in the paper and I checked the implementation.
However, It seems that Pytorch does the window-major feature map organization automatically for us when we want to perform window attention (through the copy and re-organization built in the reshape function for non-contiguous tensor) and I could not come up with a way to calculate window attention with row-major feature map organization.
What I want is to write a code to make clear efficiency comparison between these two organization schemes. Is there any code available? Or any suggestion?
The text was updated successfully, but these errors were encountered: