Trying to think about future personal use of #LLMs on desktop. Assuming people roll their own #models, what is the "natural" #API for a service running models locally?
#ChatGPT seems to choose what model to run by itself: It usually starts fast (prob. 3.5 turbo), and if the question is complex enough or it's prompted to correct itself, it'll use prob. #GPT4.