Mariusz Kurman’s Post

Take note of the following: 1. If your cross-entropy loss remains consistently high during SFT 2. If your LoRA fine-tuning fails to meet its objectives 3. If you wish to enhance your models' capabilities even further Consider trying my implementation of the LinearMoE, which I use to replace all the linear layers. If you have any improvements in mind, feel free to share them in the comments. Many thanks for any suggestions! Link: https://lnkd.in/d-yWjPdb

GitHub - mkurman/linearmoe_pytorch: This repo contains my custom implementation of a mixture of experts as an extension of the linear layer.

GitHub - mkurman/linearmoe_pytorch: This repo contains my custom implementation of a mixture of experts as an extension of the linear layer.

github.com

To view or add a comment, sign in

Explore topics