• Frezik@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    34
    arrow-down
    1
    ·
    2 months ago

    To those who have played around with LLM code generation more than me, how are they at debugging?

    I’m thinking of Kernighan’s Law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” If vibe coding reduces the complexity of writing code by 10x, but debugging remains just as difficult as before, then Kernighan’s Law needs to be updated to say debugging is 20x as hard as vibe coding. Vibe coders have no hope of bridging that gap.

    • Ledivin@lemmy.world
      link
      fedilink
      English
      arrow-up
      27
      arrow-down
      1
      ·
      2 months ago

      They’re not good at debugging. The article is pretty spot on, IMO - they’re great at doing the work; but you are still the brain. You’re still deciding what to do, and maybe 50% of the time how to do it, you’re just not executing the lowest level anymore. Similar for debugging - this is not an exercise at the lowest level, and needs you to run it.

      • hisao@ani.social
        link
        fedilink
        English
        arrow-up
        8
        ·
        2 months ago

        deciding what to do, and maybe 50% of the time how to do it, you’re just not executing the lowest level anymore

        And that’s exactly what I want. And I don’t get it why people want more. Having more means you have less and less control or influence on the result. What I want is that in other fields it becomes like it is in programming now, so that you micromanage every step and have great control over the result.

    • very_well_lost@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      2
      ·
      edit-2
      2 months ago

      The company I work for has recently mandated that we must start using AI tools in our workflow and is tracking our usage, so I’ve been experimenting with it a lot lately.

      In my experience, it’s worse than useless when it comes to debugging code. The class of errors that it can solve is generally simple stuff like typos and syntax errors — the sort of thing that a human would solve in 30 seconds by looking at a stack trace. The much more important class of problem, errors in the business logic, it really really sucks at solving.

      For those problems, it very confidently identifies the wrong answer about 95% of the time. And if you’re a dev who’s desperate enough to ask AI for help debugging something, you probably don’t know what’s wrong either, so it won’t be immediately clear if the AI just gave you garbage or if its suggestion has any real merit. So you go check and manually confirm that the LLM is full of shit which costs you time… then you go back to the LLM with more context and ask it to try again. It’s second suggestion will sound even more confident than the first, (“Aha! I see the real cause of the issue now!”) but it will still be nonsense. You go waste more time to rule out the second suggestion, then go back to the AI to scold it for being wrong again.

      Rinse and repeat this cycle enough times until your manager is happy you’ve hit the desired usage metrics, then go open your debugging tool of choice and do the actual work.

      • HubertManne@piefed.social
        link
        fedilink
        English
        arrow-up
        10
        ·
        2 months ago

        maybe its just me but I find typos to be the most difficult because my brain and easily see it as correct so the whole code looks correct. Its like the way you can take the vowels out of sentences and people can still immediately read it.

        • ganryuu@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          ·
          2 months ago

          Probably why they talked about looking at a stack trace, you’ll see immediately that you made a typo in a variable’s name or language keyword when compiling or executing.

        • wols@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          The nastiest typos are autocompleted similarly named (and correctly typed) variables, functions, or types. Which is why it’s a good idea to avoid such name clashes in the first place. If this is impossible or not practical, at least put the part that differs at the start of the name.

            • wols@lemmy.zip
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 months ago

              What do you mean? For what purpose would you sort variables or functions?

              • HubertManne@piefed.social
                link
                fedilink
                English
                arrow-up
                2
                ·
                2 months ago

                Sorry. I was thinking hostnames or other endpoints and was thinking that way back with typos. dev78usc03 instead of dev78usc02 or such.

      • HarkMahlberg@kbin.earth
        link
        fedilink
        arrow-up
        9
        arrow-down
        1
        ·
        2 months ago

        we must start using AI tools in our workflow and is tracking our usage

        Reads to me as “Please help us justify the very expensive license we just purchased and all the talented engineers we just laid off.”

        I know the pain. Leadership’s desperation is so thick you can smell it. They got FOMO’d, now they’re humiliated, so they start lashing out.

        • frog_brawler@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          edit-2
          2 months ago

          Funny enough, the AI shift is really just covering for the over-hiring mistakes in 2021. They can’t admit they fucked up in hiring too many people during Covid, so they’re using AI as the scapegoat. We all know it’s not able to actually replace people yet; but that’s happening anyway.

          There won’t be any immediate ramifications, we’ll start to see that in probably 12-18 months or so. It’s just another form of kicking the can down the road.

      • TrooBloo@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 months ago

        As it seems to be the case in all of these situations, AI fails hard at tasks when compared to tools specifically designed for that task. I use Ruff in all my Python projects because it formats my code and finds (and often fixes) the kind of low complexity/high probability problems that are likely to pop up as a result of human imperfection. It does it with great accuracy, incredible speed, using very little computing resources, and provides levels of safety in automating fixes. I can run it as an automation step when someone proposes code changes, adding all of 3 or 4 seconds to the runtime. I can run it on my local machine to instantly resolve my ID10T errors. If AI can’t solve these problems as quickly, and if it can’t solve anything more complicated reliably, I don’t understand why it would be a tool I would use.

    • Pechente@feddit.org
      link
      fedilink
      English
      arrow-up
      14
      ·
      2 months ago

      Definitely not good. Sometimes they can solve issues but you gotta point them in the direction of the issue. Other times they write hacky workarounds that do the job for the moment but crash catastrophically with the next major dependency update.

      • HarkMahlberg@kbin.earth
        link
        fedilink
        arrow-up
        13
        arrow-down
        1
        ·
        2 months ago

        I saw an LLM override the casting operator in C#. An evangelist would say “genius! what a novel solution!” I said “nobody at this company is going to know what this code is doing 6 months from now.”

        It didn’t even solve our problem.

        • hisao@ani.social
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          10
          ·
          2 months ago

          I saw an LLM override the casting operator in C#. An evangelist would say “genius! what a novel solution!” I said “nobody at this company is going to know what this code is doing 6 months from now.”

          Before LLMs people were often saying this about people smarter than the rest of the group. “Yeah he was too smart and overengineered solutions that no one could understand after he left,”. This is btw one of the reasons why I increasingly dislike programming as a field over the years and happily delegate the coding part to AI nowadays. This field celebrates conformism and that’s why humans shouldn’t write code manually. Perfect field to automate away via LLMs.

          • very_well_lost@lemmy.world
            link
            fedilink
            English
            arrow-up
            11
            ·
            2 months ago

            Before LLMs people were often saying this about people smarter than the rest of the group.

            Smarter by whose metric? If you can’t write software that meets the bare minimum of comprehensibility, you’re probably not as smart as you think you are.

            Software engineering is an engineering discipline, and conformity is exactly what you want in engineering — because in engineering you don’t call it ‘conformity’, you call it ‘standardization’. Nobody wants to hire a maverick bridge-builder, they wanna hire the guy who follows standards and best practices because that’s how you build a bridge that doesn’t fall down. The engineers who don’t follow standards and who deride others as being too stupid or too conservative to understand their vision are the ones who end up crushed to death by their imploding carbon fiber submarine at the bottom of the Atlantic.

            AI has exactly the same “maverick” tendencies as human developers (because, surprise surprise, it’s trained on human output), and until that gets ironed out, it’s not suitable for writing anything more than the most basic boilerplate — which is stuff you can usually just copy-paste together in five minutes anyway.

            • hisao@ani.social
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              8
              ·
              2 months ago

              You’re right of course and engineering as a whole is a first-line subject to AI. Everything that has strict specs, standards, invariants will benefit massively from it, and conforming is what AI inherently excels at, as opposed to humans. Those complaints like the one this subthread started with are usually people being bad at writing requirements rather than AI being bad at following them. If you approach requirements like in actual engineering fields, you will get corresponding results, while humans will struggle to fully conform or even try to find tricks and loopholes in your requirements to sidestep them and assert their will while technically still remaining in “barely legal” territory.

              • TechLich@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                2 months ago

                I feel like this isn’t quite true and is something I hear a lot of people say about ai. That it’s good at following requirements and confirming and being a mechanical and logical robot because that’s what computers are like and that’s how it is in sci fi.

                In reality, it seems like that’s what they’re worst at. They’re great at seeing patterns and creating ideas but terrible at following instructions or staying on task. As soon as something is a bit bigger than they can track context for, they’ll get “creative” and if they see a pattern that they can complete, they will, even if it’s not correct. I’ve had copilot start writing poetry in my code because there was a string it could complete.

                Get it to make a pretty looking static web page with fancy css where it gets to make all the decisions? It does it fast.

                Give it an actual, specific programming task in a full sized application with multiple interconnected pieces and strict requirements? It confidently breaks most of the requirements, and spits out garbage. If it can’t hold the entire thing in its context, or if there’s a lot of strict rules to follow, it’ll struggle and forget what it’s doing or why. Like a particularly bad human programmer would.

                This is why AI is automating art and music and writing and not more mundane/logical/engineering tasks. Great at being creative and balls at following instructions for more than a few steps.

                • hisao@ani.social
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  2
                  ·
                  2 months ago

                  That it’s good at following requirements and confirming and being a mechanical and logical robot because that’s what computers are like and that’s how it is in sci fi.

                  They’re good at that because they are ANNs.

                  In reality, it seems like that’s what they’re worst at. They’re great at seeing patterns and creating ideas but terrible at following instructions or staying on task. As soon as something is a bit bigger than they can track context for, they’ll get “creative” and if they see a pattern that they can complete, they will, even if it’s not correct. I’ve had copilot start writing poetry in my code because there was a string it could complete.

                  Get it to make a pretty looking static web page with fancy css where it gets to make all the decisions? It does it fast.

                  Give it an actual, specific programming task in a full sized application with multiple interconnected pieces and strict requirements? It confidently breaks most of the requirements, and spits out garbage. If it can’t hold the entire thing in its context, or if there’s a lot of strict rules to follow, it’ll struggle and forget what it’s doing or why. Like a particularly bad human programmer would.

                  This is why AI is automating art and music and writing and not more mundane/logical/engineering tasks. Great at being creative and balls at following instructions for more than a few steps.

                  My experience is opposite.

          • Feyd@programming.dev
            link
            fedilink
            English
            arrow-up
            7
            arrow-down
            2
            ·
            2 months ago

            Wow you just completely destroyed any credibility about your software development opinions.

            • hisao@ani.social
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              7
              ·
              2 months ago

              Why though? I think hating and maybe even disrespecting programming and wanting your job to be as much redundant and replaced as possible is actually the best mindset for a programmer. Maybe in the past it was a nice mindset to become a teamlead or a project manager, but nowadays with AI it’s a mindset for programmers.

              • Feyd@programming.dev
                link
                fedilink
                English
                arrow-up
                9
                arrow-down
                2
                ·
                2 months ago

                Before LLMs people were often saying this about people smarter than the rest of the group. “Yeah he was too smart and overengineered solutions that no one could understand after he left,”.

                This part.

                • hisao@ani.social
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  5
                  ·
                  2 months ago

                  The fact that I dislike it that it turned out that software engineering is not a good place for self-expression or for demonstrating your power level or the beauty and depth of your intricate thought patterns through advanced constructs and structures you come up with, doesn’t mean that I disagree that this is true.

                  • Feyd@programming.dev
                    link
                    fedilink
                    English
                    arrow-up
                    9
                    arrow-down
                    1
                    ·
                    2 months ago

                    The problem is that you don’t realize that writing code that is difficult to maintain is in fact not a sign of intelligence, or “power level”.

                  • very_well_lost@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    7
                    arrow-down
                    2
                    ·
                    2 months ago

                    demonstrating your power level

                    lolwut? I’m so tired of tech people acting like they’re the next Genghis Khan or Julius Caesar…

                  • chunkystyles@sopuli.xyz
                    link
                    fedilink
                    English
                    arrow-up
                    2
                    arrow-down
                    1
                    ·
                    2 months ago

                    If your code is as comprehensible as that run-on sentence, I can understand why coworkers would ask you to please write simpler code.

    • hisao@ani.social
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 months ago

      My first level of debugging is logging things to console. LLMs here do a decent job at “reading your mind” and autocompleting “pri” into something like “println!(“i = {}, x = {}, y = {}”, i, x, y);” with very good context awareness of what and how exactly it makes most sense to debug print in the current location in code.

    • 0x01@lemmy.ml
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      7
      ·
      2 months ago

      I use it extensively daily.

      It cannot step through code right now, so true debugging is not something you use it for. Most of the time the llm will take the junior engineer approach of “guess and check” unless you explicitly give it better guidance.

      My process is generally to start with unit tests and type definitions, then a large multipage prompt for every segment of the app the llm will be tasked with. Then I’ll make a snapshot of the code, give the tool access to the markdown prompt, and validate its work. When there are failures and the project has extensive unit tests it generally follows the same pattern of “I see that this failure should be added to the unit tests” which it does and then re-executes them during iterative development.

      If tests are not available or if it is not something directly accessible to the tool then it will generally rely on logs either directly generated or provided by the user.

      My role these days is to provide long well thought out prompts, verify the integrity of the code after every commit, and generally just kind of treat the llm as a reckless junior dev. Sometimes junior devs can surprise you, like yesterday I was very surprised by a one shot result: asking for a mobile rn app for taking my rambling voice recordings and summarize them into prompts, it was immediately remarkably successful and now I’ve been walking around mic’d up to generate prompts.

    • frog_brawler@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      2 months ago

      How are they at debugging? In a silo, they’re shit.

      I’ve been using one LLM to debug the other this past week for a personal project, and it can be a bit tedious sometimes, but it eventually does a decent enough job. I’m pretty much vibe coding things that are a bit out of my immediate knowledge and skill set, but I know how they’re supposed to work. For example, I’ve got some python scripts using rekognition to scan photos for porn or other explicit stuff before they get sent to an s3 bucket. After that happens, there’s now a dashboard that’s going to give me results on how many images were scanned and then marked as either acceptable or flagged as inappropriate. After a threshold of too many inappropriate images being sent in, it’ll shadowban them from sending any more dick pics in.

      For someone that’s never taken a coding course, I’m relatively happy with the results I’m getting so far. Granted, this may be small potatoes for someone with an actual development background; but as someone that’s been working adjacent to those folks for several years, I’m happy with the output.

    • demizerone@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      2 months ago

      I am working at a big AI company on llm generating code for automation. I’ve had cursor solve a bug that was occuring in prod after prompting and asking questions of the responses. It took a few rounds but it found a really obscure interaction with the app and the host, and it thanked me for the insight. 😀. I deployed the fix and it worked.

      The problem I have is I member it solving this bug, and I remember being impressed, but I don’t remember the bug. I took a screenshot of it, but currently don’t have access to those. I am disconnected from the code that the llm has generated, but I am very aware of how the app works and what it should do very intently because I had to write requirements and design doc.

    • sobchak@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      2 months ago

      I’ve used AI by just pasting code, then asking if there’s anything wrong with it. It would find things wrong with it, but would also say some things were wrong when it was actually fine.

      I’ve used it in an agentic-AI (Cursor), and it’s not good at debugging any slightly-complex code. It would often get “stuck” on errors that were obvious to me, but making wrong, sometimes nonsensical changes.

    • Zexks@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      12
      ·
      2 months ago

      Working just fine. It one shot a kodi tv channel addon for me last week end. Used it to integrate kofax into docusign. Building 2 blazor apps one new one an upgrade. Used it to create a stack of mc servers for the kids with a dashboard of statuses and control switches. My son is working on his own mc mod with it. Use it almost daily for random file organization and management scripts. Using it to clean uo my media library meta data. Anytime i have to do something to more than 5 or so files i pull it up and ask for a script.

      Its a tool like any other. There will be people who adapt and people who fail to. Just like we had with computers the internet. It zeems to be long forgotten now but literally ALL of these anti ai arguments were made against computers and the internet 30_50 years ago. Very similar ones were made when books and writing became common place as well.

      • Feyd@programming.dev
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        3
        ·
        2 months ago

        “Some random people were wrong about something in the past so nobody is allowed to speculate that any technology isn’t as revolutionary as it’s hyped to be ever again” is not a useful or compelling argument.

      • TheFinn@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 months ago

        Apparently I’m not up to date. I’ve been impressed by some things and turned off by others. But I haven’t seen any workflows or setups that enabled access to my file system. How is that accomplished, and are there any safeguards around it?

        • Zexks@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          I didnt give it access just had it make sceipts in various languages to handle large repetative file tasks. Something that wpuld take me 30-45 minutes toclpoks up and piece together it can do is 30-45 seconds. And depending on how simple or able you are to describe the task at hand the better it can do. Even when i know what i want to type, like during the blazor conversion it simply types faster for a much simpler prompt. Once i had a single page sorted i asked it for a step by step of what we did. Then took that and said ‘hey do this to the follpwing page abc.html’ and done. Then just tell it ‘now this page …’ etc etc. That was in copilot though so it could see my solutions files.

        • Feyd@programming.dev
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          2 months ago

          You’re looking for an MCP server, which is the standard way to hook things into chatbots now, and safeguards would depend on the particular server.