model formats
just jotting down these for future reference
there are multiple formats of a network
- pytorch (.pth, .bin), this is an insecure format as that's just python's
pickle
- if you use downloaded models from huggingface, they already do pickle scanning, but it's generally on you to check if it's safe or not before downloading
- safetensors (usually
.safetensors
), the supposed future, audited too - ggml variants (from the ggml project, made for cpu inference, an unstable format (getting an old ggml file won't run on latest versions of llama.cpp), but there is a proposal to make it more future proof than what it is right now)
- whatever tensorflow does
- whatever JAX has
- i dont know everything
there are multiple data types of a network
- float32
- float16
- int8, also see this funny thing
- GPTQ, generally used for 4-bit quantization in the wild
- SpQR, released on june 6th 2023, so its not really in the wild yet
key people:
- TheBloke has become literally Sir Quantizer. all the guy does is quantize models. they are doing the best public service
- tim dettmers, which made int8 and spqr