Superp Posted December 18, 2020 at 09:54 AM Posted December 18, 2020 at 09:54 AM Hello, I have a problem with a brand new LCD 128x64 Bricklet connected to a HAT Brick with a brand new cable. The Bricklet is responding normally to API calls. brickd.log shows thousands of errors like this: 2020-12-18 08:52:06.937629 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5721) 2020-12-18 08:52:24.206204 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5722) 2020-12-18 08:52:41.476851 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5723) 2020-12-18 08:52:58.746923 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5724) Question #1: Can you point me to a document somewhere outlining what to do in a case like this? How do I diagnose and solve this problem? Both the Brick and the Bricklet do not seem to report these errors through the API (this is Ruby): > hat.get_identity => ["S2c", "0", "i", [1, 0, 0], [2, 0, 2], 111] > hat.get_spitfp_error_count => [0, 0, 0, 0] > lcd.get_identity => ["R3S", "S2c", "g", [1, 0, 0], [2, 0, 9], 298] > lcd.get_spitfp_error_count => [0, 0, 0, 0] Question #2: Using the API, how can I check for problems like the above (excessive checksum errors) ? Quote
rtrbt Posted December 18, 2020 at 10:45 AM Posted December 18, 2020 at 10:45 AM Hi, 49 minutes ago, Superp said: How do I diagnose and solve this problem? Are you using Brick Daemon 2.4.3? There have been some changes to fix problems with the dynamic clock rate of newer Raspberry Pis in this version. 49 minutes ago, Superp said: Using the API, how can I check for problems like the above (excessive checksum errors) ? Unfortunately you can't. Brick Daemon itself does not have any API to query for these errors. Quote
Superp Posted December 18, 2020 at 12:37 PM Author Posted December 18, 2020 at 12:37 PM Yes, brickd is 2.4.3: brickd --version 2.4.3 This is on a Pi 3B+. The problem seems to be specifically with the LCD. Brickv (with the LCD tab active) triggers several errors per second. It may be a thing with callbacks? Anything else I can do to solve the LCD problem? (The problem with monitoring brickd health remotely I am parking; I might get back to that later.) Quote
Superp Posted December 21, 2020 at 01:55 PM Author Posted December 21, 2020 at 01:55 PM This problem has not been solved yet. With LCD 128x64 Bricklet: 4-5 "Message checksum errors" per second, logfile flooded. Without LCD 128x64 Bricklet: no errors. Do you need more time to investigate, or do you need more info from me? Quote
rtrbt Posted December 21, 2020 at 03:05 PM Posted December 21, 2020 at 03:05 PM I've just tried the same thing here and could reproduce the bug. I will report back when we know more. (Probably next year) Quote
rtrbt Posted December 21, 2020 at 03:09 PM Posted December 21, 2020 at 03:09 PM Another thought: Does this happen if you downgrade to Brick Daemon 2.4.1? This version is too old to be available in the APT repository, however you can download it here: https://download.tinkerforge.com/tools/brickd/linux/brickd-2.4.1_armhf.deb Quote
Superp Posted December 22, 2020 at 07:07 AM Author Posted December 22, 2020 at 07:07 AM Okay, thanks. Good to know. I'll keep the LCD disconnected for the moment, until I hear from you. I might try the brickd downgrade later. Let me know if I can help with testing etc. Quote
photron Posted January 11, 2021 at 01:18 PM Posted January 11, 2021 at 01:18 PM Downgrading to brickd 2.4.1 will "hide" these messages, because before brickd 2.4.2 these messages where logged on debug level, which is not visible by default. Do you actually have issue with the LCD 128x64 Bricklets functionality? Or are you just seeing these messages in the brickd.log file, but the Bricklet works fine regardless? What else do you have connected to the HAT Brick? Quote
Superp Posted January 12, 2021 at 08:01 AM Author Posted January 12, 2021 at 08:01 AM Hello Photron, Thanks for taking the time. This is a problem already reproduced and confirmed by your colleague. For your benefit, I have shutdown the system this morning, disconnected all devices, connected the LCD, and booted. Start Brickv, select LCD, and the log starts to flood. This should be fairly easy to reproduce by you. API calls are answered okay, except the system will ultimately become unstable and will no longer boot because it will run out of disk space, I suppose. The real problem, and for me this is a showstopper, is that client-side (through the API), there is no way to detect problems like this or basically find out if brickd is okay. Brickd will happily answer API calls, while brickd.log is flooded with error messages. This can happen with other scenarios (not the LCD bricklet), too. Here is an idea: A basic API call to report the current log size (Tinkerforge::IPConnection#brickd_log_size), or error count, or some other "health indicator", would be a great first step towards making things more robust. Throttling log messages would be helpful, too. Cheers. Quote
photron Posted January 12, 2021 at 11:01 AM Posted January 12, 2021 at 11:01 AM 2 hours ago, Superp said: except the system will ultimately become unstable and will no longer boot because it will run out of disk space That will not happen, as the brickd.log is rotated. 2 hours ago, Superp said: Here is an idea: A basic API call to report the current log size (Tinkerforge::IPConnection#brickd_log_size), or error count, or some other "health indicator", would be a great first step towards making things more robust. Yes, something like that is on the todo list, but log size it not a good indicator here. All Bricks and 7-pol Bricklet have error counter APIs. Brickd will get something similar. Regarding the error messages from the LCD Bricklet: I can see the problem here. This is unexpected and i'll look into this why the LCD Bricklet does this. But this is not an immediate problem as the actual functionality of the Bricklet is not affected, because the error recovery mechanisms are working. This is more a problem of error reporting in brickd. Before 2.4.2 all these errors where logged on debug level and you would not have seen them by default. In 2.4.2 we changed that to give these errors more visibility with the downside of given them too much visibility in some cases. Your case is on the edge of this. Quote
Superp Posted January 12, 2021 at 12:44 PM Author Posted January 12, 2021 at 12:44 PM Photron, Good to hear there are plans for brickd health reporting in the API. I hope this gets implemented soon. In my priorities, this goes to the top. Whether a system will fall over or not, I am not going to bicker with you over this. I will say, however, that I have seen brickd producing several hundred messages per second and yet respond normally to calls, and that is a showstopper. Best! Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.