I love to show that kind of shit to AI boosters. (In case you’re wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the “softer” parts of the test.

  • diz@awful.systemsOP
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    lmao: they have fixed this issue, it seems to always run python now. Got to love how they just put this shit in production as “stable” Gemini 2.5 pro with that idiotic multiplication thing that everyone knows about, and expect what? to Eliza Effect people into marrying Gemini 2.5 pro?

    • scruiser@awful.systems
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      Have they fixed it as in genuinely uses python completely reliably or “fixed” it, like they tweaked the prompt and now it use python 95% of the time instead of 50/50? I’m betting on the later.

      • aramova@infosec.pub
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Non-deterministic LLMs will always have randomness in their output. Best they can hope for is layers of sanity checke slowing things down and costing more.