qlora and gptq

#author_luna #large_language_models #peft #artificial_intelligence

if i'm wrong about this email me.


from what i gathered those two do not mix together very well. qlora it seems does their own quantization system on the NF4 datatype, which requires loading the entire model, making it impossible for me to take a 7B or 13B GPTQ (which uses INT4) model and finetune it on my RTX3060 12GB

however it seems you can just run normal LoRA on a GPTQ model:

i'm doing some private test training runs on llama-2-7b-gptq and got around 5~6gb vram usage, which leaves me with a lot of headroom, makes me think if i could make 13B finetunes.

i am using lora_target_linear: true, as suggested by qlora here: why all linear layers?