• Jakeroxs@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    2
    ·
    edit-2
    10 hours ago

    I mean I literally run a local LLM, while the model sits in memory it’s really not using up a crazy amount of resources, I should hook up something to actually measure exactly how much it’s pulling vs just looking at htop/atop and guesstimating based on load TBF.

    Vs when I play a game and the fans start blaring and it heats up and you can clearly see the usage increasing across various metrics

    • PeriodicallyPedantic@lemmy.ca
      link
      fedilink
      English
      arrow-up
      3
      ·
      10 hours ago

      He isn’t talking about locally, he is talking about what it takes for the AI providers to provide the AI.

      To say “it takes more energy during training” entirely depends on the load put on the inference servers, and the size of the inference server farm.

      • Jakeroxs@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        ·
        9 hours ago

        There’s no functional difference aside from usage and scale, which is my point.

        I find it interesting that the only actual energy calculations I see from researchers is the training and the things going along with the training, rather then the usage per actual request after training.

        People then conflate training energy costs to normal usage cost without data to back it up. I don’t have the data either but I do have what I can do/see on my side.

        • PeriodicallyPedantic@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 hours ago

          I’m not sure that’s true, if you look up things like “tokens per kwh” or “tokens per second per watt” you’ll get results of people measuring their power usage while running specific models in specific hardware. This is mainly for consumer hardware since it’s people looking to run their own AI servers who are posting about it, but it sets an upper bound.

          The AI providers are right lipped about how much energy they use for inference and how many tokens they complete per hour.

          You can also infer a bit by doing things like looking up the power usage of a 4090, and then looking at the tokens per second perf someone is getting from a particular model on a 4090 (people love posting their token per second performance every time a new model comes out), and extrapolate that.

    • MotoAsh@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      10 hours ago

      One user vs a public service is apples to oranges and it’s actually hilarious you’re so willing to compare them.

      • Jakeroxs@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 hours ago

        It’s literally the same thing, the obvious difference is how much usage it’s getting at a time per gpu, but everyone seems to assume all these data centers are running at full load at all times for some reason?

    • FooBarrington@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      10 hours ago

      My guy, we’re not talking about just leaving a model loaded, we’re talking about actual usage in a cloud setting with far more GPUs and users involved.

        • FooBarrington@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 hours ago

          Given that cloud providers are desperately trying to get more compute resources, but are limited by chip production - yes, of course? Why do you think they’re trying to expand their resources while their existing resources aren’t already limited?

          • Jakeroxs@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 hours ago

            Because they want the majority of the new chips for training models, not running the existing ones would be my assertion. Two different use cases

            • FooBarrington@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              8 hours ago

              Sure, and that’s why many cloud providers - even ones that don’t train their own models - are only slowly onboarding new customers onto bigger models. Sure. Makes total sense.