llama2 inference numbers
#llama #large_language_models #technology #performance %at=2023-07-25T01:18:10 remember, the gold **SUBJECTIVE** target is 10tok/s (i think thats what chatgpt runs at, probably 40tok/s, i dont run those numbers yet), which is around 100ms per token. goi…