• PeriodicallyPedantic@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    ·
    1日前

    I’m not sure that’s true, if you look up things like “tokens per kwh” or “tokens per second per watt” you’ll get results of people measuring their power usage while running specific models in specific hardware. This is mainly for consumer hardware since it’s people looking to run their own AI servers who are posting about it, but it sets an upper bound.

    The AI providers are right lipped about how much energy they use for inference and how many tokens they complete per hour.

    You can also infer a bit by doing things like looking up the power usage of a 4090, and then looking at the tokens per second perf someone is getting from a particular model on a 4090 (people love posting their token per second performance every time a new model comes out), and extrapolate that.