qlora and gptq

#author_luna #large_language_models #peft #artificial_intelligence

if i'm wrong about this email me.

from what i gathered those two do not mix together very well. qlora it seems does their own quantization system on the NF4 datatype, which requires loading the entire model, making it impossible for me to take a 7B or 13B GPTQ (which uses INT4) model and finetune it on my RTX3060 12GB

however it seems you can just run normal LoRA on a GPTQ model:

hf peft pr merged on August 10th 2023
Sathish Gangichetty talks about it on September 1st 2023
axolotl provides a config example on the 5th, making it very easy to do this type of run

i'm doing some private test training runs on llama-2-7b-gptq and got around 5~6gb vram usage, which leaves me with a lot of headroom, makes me think if i could make 13B finetunes.

i am using lora_target_linear: true, as suggested by qlora here: why all linear layers?