My guy, we’re not talking about just leaving a model loaded, we’re talking about actual usage in a cloud setting with far more GPUs and users involved.
Given that cloud providers are desperately trying to get more compute resources, but are limited by chip production - yes, of course? Why do you think they’re trying to expand their resources while their existing resources aren’t already limited?
Sure, and that’s why many cloud providers - even ones that don’t train their own models - are only slowly onboarding new customers onto bigger models. Sure. Makes total sense.
My guy, we’re not talking about just leaving a model loaded, we’re talking about actual usage in a cloud setting with far more GPUs and users involved.
So you think they’re all at full load at all times? Does that seem reasonable to you?
Given that cloud providers are desperately trying to get more compute resources, but are limited by chip production - yes, of course? Why do you think they’re trying to expand their resources while their existing resources aren’t already limited?
Because they want the majority of the new chips for training models, not running the existing ones would be my assertion. Two different use cases
Sure, and that’s why many cloud providers - even ones that don’t train their own models - are only slowly onboarding new customers onto bigger models. Sure. Makes total sense.
I mean do you actually know or are you just assuming?
I know.