Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy/Telemetry/APRS Error Long Duration Raspberry Pi 3 #190

Closed
kb1lqd opened this issue Apr 28, 2017 · 9 comments
Closed

Proxy/Telemetry/APRS Error Long Duration Raspberry Pi 3 #190

kb1lqd opened this issue Apr 28, 2017 · 9 comments
Assignees
Labels

Comments

@kb1lqd
Copy link
Contributor

kb1lqd commented Apr 28, 2017

Summary

While letting a Raspberry Pi 3 run the noted commit run for ~1.5 days it seems that APRS.py errored. Telemetry and Proxy remained running although after a restart of APRS.py only ARPS uploading restarted for only 1 of two units. 1 Unit is local while the other is remote (RF).

Problem Explanation

This long duration test of proxy, telemetry, and APRS scripts ran for over a day but failed due to what looks like unexpected data/format from unit KB1LQD-25. The error crash APRS.py, I was logging proxy logs and telemetry logs. Restarting the APRS.py script refreshed KB1LQD-2 but not KB1LQD-15.

Environment

Software

Running Master:

commit 3d91490
Merge: 421aa2f 84c67be
Author: Reilly Grant [email protected]
Date: Mon Apr 24 00:23:35 2017 -0700

Merge pull request #174 from FaradayRF/kb1lqc-proxylink-2

Fixed readme.md Proxy folder links

Hardware

  • Faraday REV D1 - SN:18 (Local)
  • Faraday REV D1 - SN:13 (Remote - powered by external USB HUB)

Supporting Information

Logs

  • Proxy = logs.db
  • Telemetry = telemetry.db

Logs.zip

Debugging

Proxy

image

Telemetry

image

APRS

image

APRS Stopped working randomly after many hours left running (~1.5 days). It seems like only APRS died.

I restarted APRS:
image

A little longer later…

image

KB1LQD-2 seems to have restarted but -25 did not… must be the one erroring:

image

@kb1lqd kb1lqd added the bug label Apr 28, 2017
@kb1lqd kb1lqd added this to the Alpha Software milestone Apr 28, 2017
@kb1lqc
Copy link
Member

kb1lqc commented Apr 28, 2017

So is the remote unit even logging into telemetry sqlite db anymore?

@kb1lqd
Copy link
Contributor Author

kb1lqd commented Apr 28, 2017

I'm not sure, looks like -25 is the last log in the system but only -2 is being updated after APRS.py restart...

@kb1lqd
Copy link
Contributor Author

kb1lqd commented Apr 28, 2017

See the log file I posted? @kb1lqc

@kb1lqc
Copy link
Member

kb1lqc commented Apr 28, 2017

Per telemetry.db in the attached log.zip with this ticket the problem is with GPSALTITUDE KB1LQD-25 and occurred at or around EPOCH 1493345336.83446 (Thursday, April 27, 2017 7:08:56 PM GMT-7:00 DST) which is onKEYID = 14456

GPSALTITUDE = 17.43.90

This is causing the float error since it is not a float as we expect.

Problems

  1. This should not error but should simply raise an exception
  2. This is likely firmware not software, however there's room for it to be Telemetry

It appears as though the GPS Altitude of 17.4 was appended with a 3.90 to create 17.43.90 we see. notice the previous GPS Altitudes for KB1LQC-25:

image

@kb1lqd I believe it's firmware because I've seen the firmware do this however I guess it could be software since we really should be checking GPS Altitude as a float prior to inserting it into the database. However it is converted to a string so there is a change we are messing up in Telemetry.

@el-iso
Copy link
Contributor

el-iso commented Apr 29, 2017

Any idea where the 3.90 came from? It seems like knowing that might help isolate the bug.
Also, could this be a problem with formatting json (a missing comma in the json or something like that)? Just tossing in my thoughts... @kb1lqd @kb1lqc

@kb1lqd
Copy link
Contributor Author

kb1lqd commented Apr 30, 2017

Having only restarted APRS.py on thursday after the crash and not seeing KB1LQD-25 which is the local unit to the raspberry pi when restarting it seems like a check-in tonight shows the station back!

image

I'm not sure why it is now working but it's a good clue that it cleared itself after the restart, eventually.

@el-iso I'm not sure yet but I think this is a question of both:

  • Where did the invalid data/format come from
  • Should we be protecting against a hard crash in APRS.py or is this a situation that should never happen and crashing is a good indicator to fix it.

@kb1lqd
Copy link
Contributor Author

kb1lqd commented Apr 30, 2017

Interesting, in KB1LQD-2 shows active VALID position data to APRS-IS but the last valid telemetry data to APRS-IS is over a day old! Weird!
image

It seems like KB1LQD-25 is updating both OK:

image

@kb1lqc
Copy link
Member

kb1lqc commented Apr 30, 2017

The last telemetry packet from KB1LQD-2 was:

2017-04-29 02:34:56 CDT: KB1LQD-2>GPSFDY,QAR,KB1LQD-25,qAS,KB1LQD:T#525,149,147,140,000,001,00000000

image

Seems spotty/weird in your RAW packet stream. Unsure if whether the APRS application is not sending it or APRS-IS is ignoring them...

@kb1lqd
Copy link
Contributor Author

kb1lqd commented Sep 28, 2017

PR #273 looks to have fixed this issue in conjunction with PR #271.

@kb1lqd kb1lqd closed this as completed Sep 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants