qlora and gptq
#author_luna #large_language_models #peft #artificial_intelligence
if i'm wrong about this email me.
from what i gathered those two do not mix together very well. qlora it seems does their own quantization system on the NF4 datatype, which requires loading the entire model, making it impossible for me to take a 7B or 13B GPTQ (which uses INT4) model and finetune it on my RTX3060 12GB
however it seems you can just run normal LoRA on a GPTQ model:
- hf peft pr merged on August 10th 2023
- Sathish Gangichetty talks about it on September 1st 2023
- axolotl provides a config example on the 5th, making it very easy to do this type of run
i'm doing some private test training runs on llama-2-7b-gptq and got around 5~6gb vram usage, which leaves me with a lot of headroom, makes me think if i could make 13B finetunes.
i am using
lora_target_linear: true, as suggested by qlora here: why all linear layers?