SD Card Corruption

I’m not sure what’s happening but this is the second time in 6 months that I’ve encountered irreparable SD Card corruption. I say irreparable in the sense that upon reboot WebThings won’t come up and after using fsck to repair the OS partition, the image still won’t boot. I’ve tried two different SD Cards from two different vendors and both cards function after reformatting following corruption.

I have the Raspberry Pi connected to an UPS and my entire house is on a different UPS. The WebThings gateway was running fine up until the other day when it started showing problems with the UI. I rebooted the gateway and it never came back up.

I have several other Raspberry Pis running off the same network/setup without issue.

I’m in the process of reconfiguring the gateway for yet a 3rd time using a different Raspberry Pi and SD Card.

Maybe it’s time to experiment with booting onto a SSD? In another recent post, the poster opined about similar fs corruption and concluded they were powering-off the pi to restart rather than doing a clean shutdown & reboot.

A costly endeavor. I’d have to purchase not only an SSD but also a USB Adapter and a Raspberry Pi 4. At that price point I might just consider options for a different platform entirely.

Regarding clean shutdowns and reboots, I’m pretty diligent when it comes to that. But, outside of any automatic updates, this Pi wasn’t reboot or shutdown. It runs 24/7 in a climate controlled environment from a 5v supply attached to an uninterruptible power supply AND our whole house is on a battery backup.

It’s like wearing two condoms. :slight_smile:

Sorry you’re having these problems.

What brands of SD card have you tried? I’ve found SanDisk Ultra cards to be very reliable.

How much logging are you doing? SD cards do not cope well with a large number of writes.

The original card was an AData 16GB card, I think I trashed it after the first corruption issue. The second was a Kingston 16GB, the current is a Microcenter branded 32GB card I had laying around.

I had originally setup logging for a couple Z-Wave blinds to see usage patterns during the day to setup schedules but after the first corruption, I never bothered to set them back up so on the second install the only logging occurring is whatever happens by default.

I still have the Kingston card, I ran fsck on it in an attempt to repair it but otherwise the image is still intact. So I can do some diagnostics on it if anyone wants to guide me.

Unfortunately, the one iron-clad rule is that computer storage will fail.

The only doubt is whether it will fail today, tomorrow or in 9 years time. This was rammed home when I worked in PC support and several business owner were pissed off that their valuable data was gone with little or no chance of recovery. When asked if they would spend thousands of dollars for specialist recovery with a 30% chance of full recovery, they invariably said no (so the data was not that valuable after all).

So the questions are what’s the value of the data you want to keep and what measures you are prepared to take to protect it. It’s entirely reasonable to say that it’s more cost effective for you to have no protection and to rebuild from scratch on each failure.

One relatively low cost approach is to build on a brand new SD card and get it fully configured. Then designate that SD card as your golden backup, then copy it over to another SD card and boot from the copy. When the copy fails, copy the golden backup to yet another SD card.

With this method, you periodically may want to copy the current running SD card to a brand new SD card and making that the new golden backup. Then you can use the previous golden backup after the next SD card failure. This might be, for example, after stabilising a major gateway upgrade or after adding and/or re-configuring the system. Ideally, you should use a clean operating system installation, and just copy the WebThings installation over, for less chance of corruption.

It is very important not to make a heavily used SD card into a golden backup as it may already have hidden storage failures.