

We barely understsnd how LLMs actually work
I would be careful how you say this. Eliezer likes to go on about giant inscrutable matrices to fearmoner, and the promptfarmers use the (supposed) mysteriousness as another avenue for crithype.
It’s true reverse engineering any specific output or task takes a lot of effort and requires access to the model’s internals weights and hasn’t been done for most tasks, but the techniques exist for doing so. And in general there is a good high level conceptual understanding of what makes LLMs work.
which means LLMs don’t understand their own functioning (not that they “understand” anything strictly speaking).
This part is absolutely true. If you catch them in mistake, most of their data about responding is from how humans respond, or, at best fine-tuning on other LLM output and they don’t have any way of checking their own internals, so the words they say in response to mistakes is just more bs unrelated to anything.
Have they fixed it as in genuinely uses python completely reliably or “fixed” it, like they tweaked the prompt and now it use python 95% of the time instead of 50/50? I’m betting on the later.