I love to show that kind of shit to AI boosters. (In case you’re wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the “softer” parts of the test.

  • scruiser@awful.systems
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    Have they fixed it as in genuinely uses python completely reliably or “fixed” it, like they tweaked the prompt and now it use python 95% of the time instead of 50/50? I’m betting on the later.

    • aramova@infosec.pub
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      Non-deterministic LLMs will always have randomness in their output. Best they can hope for is layers of sanity checke slowing things down and costing more.