Hey y’all! Recently my homeserver (an old laptop) has started crashing every night (after weeks of uptime just working), without anything useful in the logs. Any suggestion about what it might be? (Just started logging battery info to test tonight)

  • Eugenia@lemmy.ml
    link
    fedilink
    English
    arrow-up
    41
    ·
    2 months ago

    I don’t know the root of your ills on your server, but I have an interesting story to share (shared by my husband who was an engineer at the company mentioned below).

    Back in 1998, the engineers at Be,Inc (who were developing BeOS, a beloved OS at the time) were experiencing kernel panics right after 7 am, on a specific computer. All of the crashes at around the same, while the computer was running tests all night. It had become a big mystery because they couldn’t find the bug.

    It took them days, but they decided to sit around at 7 am to see what was happening. They saw that a single, strong sun ray was entering the room from the window, and was directly hitting the PC’s floppy drive (the PC was not completely closed up with its cover, since it was a test machine). They found that the sun ray would alter some bits in the electronics and what not, and would crash the kernel! :o)

    • paper_moon@lemmy.world
      link
      fedilink
      arrow-up
      11
      arrow-down
      1
      ·
      edit-2
      1 month ago

      I’ve got a fun one to share from my college programming professor. Similar situation, they had a machine that kept locking up, and this was back in the days of huge mainframes the size of rooms. So they call the repair tech from the manufacturer.

      So the repair tech shows up to the office gets the run down on what’s been going on, and goes out to his car and brings in a huge piece of wood and just starts wailing on the thing as hard as he could. The whole office was freaking out thinking this guy had lost it, and he later explained that the memory was a grid of magnetic coils, and the coils would rust and the rust shavings would fall between the coils below, corrupting the memory bits. So he was shaking them loose by slamming the machine with this piece of wood. Lol wild times.

      1000013393

    • setVeryLoud(true);@lemmy.ca
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      2 months ago

      That’s a great story, thanks! Old electronics were particularly sensitive to light and other EM disturbance 😄

      • irotsoma@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        1
        ·
        1 month ago

        Still are, though most often it’s heat rather than photons from sunlight since it’s not really necessary to disassemble hardware to that extent these days. And there’s available processing power to retry or do other error handing for any interference. Like running an unshielded Ethernet cable through a wall next to a power cable or through a room with heavy machinery can definitely cause data corruption from EM interference, but it will likely manifest as slowness rather than crashing a whole system. But there are lots of things that still cause computers or applications to crash that are related to stray energy, we just are so used to buggy software now that it rarely is noticed. 😁

  • TerHu@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    11
    ·
    1 month ago

    a friend of mine uses a non modern ryzen that has issues with sleep states. his home server ran fine until the os got an update and managed to idle much lower than before. that made his machine crash and was a really weird error to catch. dunno if this could apply to you at all. just throwing it out there.

  • hirnpilot89@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    2 months ago

    Had the same problem. Reason in my case was high ssd temperature (passive case) caused by high load from some jellyfin job.

    • deathbird@mander.xyz
      link
      fedilink
      arrow-up
      2
      ·
      1 month ago

      Looking for overnight jobs might help. He could try disabling them and see if the issue stops, and if it does then re-enable them in a controlled way to determine which one was causing it.

    • First_Thunder@lemmy.zipOP
      link
      fedilink
      arrow-up
      3
      ·
      2 months ago

      worth a try (though somewhat useless given the number of shenanigans i’ve been changing constantly, lol, will be hard to find the correct commit)

  • MangoPenguin@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 month ago

    Running memtest to check if it’s a RAM issue might be worth it.

    Also could be overheating storage, that can cause weird issues.

    • First_Thunder@lemmy.zipOP
      link
      fedilink
      arrow-up
      3
      ·
      2 months ago

      Turning off without leaving logs. When it turn it on (by pressing the power buttons), it does so normally

      • frongt@lemmy.zip
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        1 month ago

        Does it have a system log in the BIOS? What’s the battery like, present and healthy or dead/removed? Fans working?

        • First_Thunder@lemmy.zipOP
          link
          fedilink
          arrow-up
          3
          ·
          1 month ago

          Ive tried stress testing for about an hour, didn’t die, fans work, does run hot and thermal throttle, but apparently under control. Battery is in, level is stable, even during stress testing level doesn’t budge

          • frongt@lemmy.zip
            link
            fedilink
            arrow-up
            3
            ·
            1 month ago

            Strange. Memtest? I can’t think of anything else off the top of my head, unless you sit up and watch it.

  • drspod@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    2 months ago

    Check your cron and systemd timers to see if a regular scheduled job is running at that time.

  • SteveTech@programming.dev
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 months ago

    Maybe look into using the pstore, it can store kernel panics in ACPI or UEFI variables to be read by the next boot. Usually this is accessible at /sys/fs/pstore, but if systemd-pstore is installed then it should be in the journal, but it can also be here: /var/lib/systemd/pstore.

    • First_Thunder@lemmy.zipOP
      link
      fedilink
      arrow-up
      3
      ·
      2 months ago

      Thanks, I checked some forums, and despite allegedly being enabled by default, pstore doesn’t exist as a folder and sys/fs/pstore sits empty