I’ve been running Webthings on a R-Pi3 for the last 9 months or so and it was running relatively smoothly until about a week ago.
The first problem that arose was the Zigbee adapter couldn’t connect to its Things. Everything else appeared to be working OK (Z-Wave devices continued working, Rules, accessing the Gateway). This seemed to happen after an Zigbee adapter update, so I logged an issue here. It was closed immediately and it looks like another update was sent out overnight, but this didn’t fix the problem.
Within a day, I could no longer access the Gateway at all (either locally or externally) and the Z-Wave devices (or the Rules) no longer activated.
I don’t know that these two issues (Zigbee adapter and the Gateway) are necessarily related, only that they occurred within a day or two.
I should also note that since I set up the Webthings last year, I had to get a new router from my ISP. There has been an intermittent problem like the one I have now, in that every couple of weeks I wouldn’t be able to access the Gateway locally or externally, but all I had to do was restart the R-Pi and it would then be fine. I found reference via ISP forums that this is happening with other IOT gateways with this router. Obviously, restarting the R-Pi is not working for me now. I have also tried resetting the router, but that didn’t fix or change the problem.
The behaviour seems very similar to this and this topic, but there were no suggestions in there that have helped.
To summarise:
From what I can tell, the R-Pi appear to boot up OK
It obtains an IP address (via wireless) and I can see the R-Pi through the router portal (and it’s name is webthings)
I can ping the R-Pi
I can SSH to the R-Pi
I can ping the internet from the R-Pi
Any ideas?
Are there log files that might shed light? If so, please be specific – I’m a novice. I can see that there have been no new log files generated in the .webthings/log directory since the time I think the Gateway became inaccessible – not sure if that is relevant.
Sorry you’re having problems. Can I first check that your gateway has updated to 1.0 and that you have transitioned from a mozilla-iot.org domain to a webthings.io domain? (See this blog post for more information.)
That should not prevent you accessing the gateway locally though, so may not be the problem. Can you access the gateway’s web interface by typing its local IP address into your browser (the IP you can see being assigned by your router)?
What do you see if you type ps aux | grep gateway on the command line on the Raspberry Pi?
I received email confirmation of transition to webthings.io domain on 22 Dec and I’m sure I checked that accessing the gateway through this domain worked.
I’m assuming that the gateway updated to 1.0 at that time, but I can’t remember specifically checking that it did. I can see that the R-Pi has folders for .mozilla-iot, .mozilla-iot.old and .webthings. Is this expected after the update? Is there any other way to confirm that the update has been installed properly?
I cannot access the gateway locally through a browser (Edge on PC or Android, Chrome on Android) either via local IP address or webthings.local (or gateway.local).
Typing ps aux | grep gateway on the command line on the Raspberry Pi delivers:
OK, all the errors in the latest run-app.log relate to weather-adapter. That log file is dated 2021-01-30, and this grouping of errors occurs 3 times in the file – presuming that it’s tried to start itself again after a period? There have been no log files since then, even though I’ve restarted the R-Pi quite a few times since then.
2021-01-30 09:11:43.361 ERROR : weather-adapter: Failed to poll weather provider: { FetchError: request to https://api.openweathermap.org/data/2.5/weather?lat=-37.95&lon=144.93&units=metric&appid=2bc03e37c3bac538da91803f1a4c2a3b failed, reason: Client network socket disconnected before secure TLS connection was established
2021-01-30 09:12:11.478 ERROR : weather-adapter: at ClientRequest. (/home/pi/.webthings/addons/weather-adapter/node_modules/node-fetch/lib/index.js:1461:11)
2021-01-30 09:12:36.572 ERROR : weather-adapter: at ClientRequest.emit (events.js:198:13)
2021-01-30 09:12:52.733 ERROR : weather-adapter: at TLSSocket.socketErrorListener (_htt0_client.js:401:9)
2021-01-30 09:13:17.602 ERROR : weather-adapter: at TLSSocket.emit (events.js:198:13)
2021-01-30 09:13:45.800 ERROR : weather-adapter: at emitErrorNT (internal/streams/destroy.js:91:8)
2021-01-30 09:14:05.067 ERROR : weather-adapter: at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
2021-01-30 09:14:19.693 ERROR : weather-adapter: at process._tickCallback (internal/process/next_tick.js:63:19)
2021-01-30 09:14:39.728 ERROR : weather-adapter: message:
2021-01-30 09:14:52.800 ERROR : weather-adapter: ‘request to https://api.openweathermap.org/data/2.5/weather?lat=-37.95&lon=144.93&units=metric&appid=2bc03e37c3bac538da91803f1a4c2a3b failed, reason: Client network socket disconnected before secure TLS connection was established’,
2021-01-30 09:15:10.396 ERROR : weather-adapter: type: ‘system’,
2021-01-30 09:15:23.332 ERROR : weather-adapter: errno: ‘ECONNRESET’,
2021-01-30 09:15:51.924 ERROR : weather-adapter: code: ‘ECONNRESET’ }
One option to “test”, would be to run a clean/new Gateway image using it’s DOCKER container to isolate it from the core OS. You can easily start and make changes, and then rinse-and-repeate until the issue is found.
After starting the container for the first time and running through the installation process you can shut down the image and copy the base configuration folders to a “golden” backup. Then shutdown, restore, boot, and tweak as needed. In your case, testing basic access w/o addons would be your first test and adding the weather addon next.
On my rpi, I maintain seperate boot images and configuration folders for each WebThing release allowing me to easily boot into any of them.
At lease you would be able to determine if it’s a new WT installation with only the Weather addon installed.
Use hard-coded IPs; alternate DNS servers; verify network stays up; use eth switch rather than ports on the router; doing apt-get update/upgrade on the rpi;
The only other lines in the log files from the days before the gateway closed are INFO rather than ERROR:
Thousands of the following entry over minutes or even a few seconds. I wonder if this relates to a poorly constructed timer I’ve set up?
INFO : DeviceProxy: requestAction: restart for: timer-b60a4ac89c575c0abe7a5d52a4b10640
And, the only other entry I can see:
INFO : zigbee-adapter: Kicking WatchDog for 3600 seconds
When I run ~/webthings/gateway/run-app.sh manually on the command line from the home directory, the final returned line is:
/home/pi/webthings/gateway/run-app.sh: line 50: ./tools/post-upgrade.sh: No such file or directory
If I go to the webthings/gateway directory and run it from there, that error doesn’t appear (there is a file webthings/gateway/tools/post-upgrade.sh). Here is the final section of the output when running from there. The output was quite long – let me know if you want to see it all and I will post. The gateway still hasn’t started and there is no new run-app.log.
Segmentation fault + NVM_NPM_PREFIX= + nvm_tree_contains_path /home/pi/.nvm + [ _0 = _1 ] + nvm deactivate + nvm_err nvm is not compatible with the npm config “prefix” option: currently set to “” + nvm_echo nvm is not compatible with the npm config “prefix” option: currently set to “” + command printf %s\n nvm is not compatible with the npm config “prefix” option: currently set to “” nvm is not compatible with the npm config “prefix” option: currently set to “” + nvm_has npm + type npm + nvm_err Run nvm use --delete-prefix v10.23.0 --silent to unset it. + nvm_echo Run nvm use --delete-prefix v10.23.0 --silent to unset it. + command printf %s\n Run nvm use --delete-prefix v10.23.0 --silent to unset it. Run nvm use --delete-prefix v10.23.0 --silent to unset it. + return 10 + return 11 + EXIT_CODE=11 + set -e + return 11
I am not a node/npm/nvm expert, or even use the tools, but know enough that there appears to be a program version mis-match. In the URL above, this can occur when multiple different version of programs are installed by different applications or when you manually “upgrade” software versions. Often, one tool, nvm in this case depends on specific versions of nvm.
I think that the recommended command would remove an version that is not compatible, as the note says
I however can’t say what other applications may be affected. Maybe the real developers will comment.
After some digging, found that one of the main reasons that a Pi’s SD card can become corrupted is repeated powering off/on from the switch, which is what I’ve been doing when my router mysteriously loses connection with the gateway. I noted this problem in my initial post, but not that my restart was via power switch – this might have got us to the solution more quickly!
I tried unsuccessfully to fix any corruption with fsck and related methods. fsck did find and supposedly fix a problem, but rebooting didn’t get the gateway running.
Anyway, I’m going to assume that a corruption is the problem. So when I came to a dead-end (for me) trying to fix it, I decided to just start again and reflash the SD card.
Let’s consider this (re)solved! And if I have the issue of the router disconnecting from the gateway, I’ll connect directly into the Pi to reboot and try to work out why that is happening.
FYI, in another of my previous posts about running the GW on a RPI, I mentioned I use the Docker image, but also that I mount a USB stick on boot and map a folder on the USB stick to install/store WebThing data. This prevents excessive writing on the SD card causing premature failure. A couple times a year, I manually shut down the RPI and can peform a “backup” of the USB card. You can google posts about RPI that describe how to move other “logfiles” to the USB and further reduce SD card writes. Finally, I remember reading a nice article about USB thumb drives with “better” chips in them intended for higher-write applicaitons that I’ve not investigated in case you want to be anal about using the best…