Jump to content

Recommended Posts

Posted

Hello,

I have a problem with a brand new LCD 128x64 Bricklet connected to a HAT Brick with a brand new cable. The Bricklet is responding normally to API calls.

brickd.log shows thousands of errors like this:

2020-12-18 08:52:06.937629 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5721)
2020-12-18 08:52:24.206204 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5722)
2020-12-18 08:52:41.476851 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5723)
2020-12-18 08:52:58.746923 <E> <bricklet_stack.c:478> Message checksum error (port: G, count: 5724)

Question #1: Can you point me to a document somewhere outlining what to do in a case like this? How do I diagnose and solve this problem?

Both the Brick and the Bricklet do not seem to report these errors through the API (this is Ruby):

> hat.get_identity
=> ["S2c", "0", "i", [1, 0, 0], [2, 0, 2], 111]
> hat.get_spitfp_error_count
=> [0, 0, 0, 0]

> lcd.get_identity
=> ["R3S", "S2c", "g", [1, 0, 0], [2, 0, 9], 298]
> lcd.get_spitfp_error_count
=> [0, 0, 0, 0]

Question #2: Using the API, how can I check for problems like the above (excessive checksum errors) ?

Posted

Hi,

49 minutes ago, Superp said:

How do I diagnose and solve this problem?

Are you using Brick Daemon 2.4.3? There have been some changes to fix problems with the dynamic clock rate of newer Raspberry Pis in this version.

49 minutes ago, Superp said:

Using the API, how can I check for problems like the above (excessive checksum errors) ?

Unfortunately you can't. Brick Daemon itself does not have any API to query for these errors.

Posted

Yes, brickd is 2.4.3:

brickd --version
2.4.3

This is on a Pi 3B+.

The problem seems to be specifically with the LCD. Brickv (with the LCD tab active) triggers several errors per second. It may be a thing with callbacks?

Anything else I can do to solve the LCD problem?

(The problem with monitoring brickd health remotely I am parking; I might get back to that later.)

Posted

This problem has not been solved yet.

With LCD 128x64 Bricklet:  4-5 "Message checksum errors" per second, logfile flooded.

Without LCD 128x64 Bricklet: no errors.

Do you need more time to investigate, or do you need more info from me?

  • 3 weeks later...
Posted

Downgrading to brickd 2.4.1 will "hide" these messages, because before brickd 2.4.2 these messages where logged on debug level, which is not visible by default.

Do you actually have issue with the LCD 128x64 Bricklets functionality? Or are you just seeing these messages in the brickd.log file, but the Bricklet works fine regardless?

What else do you have connected to the HAT Brick?

Posted

Hello Photron,

Thanks for taking the time.

This is a problem already reproduced and confirmed by your colleague. For your benefit, I have shutdown the system this morning, disconnected all devices, connected the LCD, and booted. Start Brickv, select LCD, and the log starts to flood. This should be fairly easy to reproduce by you.

API calls are answered okay, except the system will ultimately become unstable and will no longer boot because it will run out of disk space, I suppose.

The real problem, and for me this is a showstopper, is that client-side (through the API), there is no way to detect problems like this or basically find out if brickd is okay. Brickd will happily answer API calls, while brickd.log is flooded with error messages. This can happen with other scenarios (not the LCD bricklet), too.

Here is an idea: A basic API call to report the current log size (Tinkerforge::IPConnection#brickd_log_size), or error count, or some other "health indicator", would be a great first step towards making things more robust.

Throttling log messages would be helpful, too.

Cheers.

Posted
2 hours ago, Superp said:

except the system will ultimately become unstable and will no longer boot because it will run out of disk space

That will not happen, as the brickd.log is rotated.

2 hours ago, Superp said:

Here is an idea: A basic API call to report the current log size (Tinkerforge::IPConnection#brickd_log_size), or error count, or some other "health indicator", would be a great first step towards making things more robust.

Yes, something like that is on the todo list, but log size it not a good indicator here. All Bricks and 7-pol Bricklet have error counter APIs. Brickd will get something similar.

Regarding the error messages from the LCD Bricklet: I can see the problem here. This is unexpected and i'll look into this why the LCD Bricklet does this. But this is not an immediate problem as the actual functionality of the Bricklet is not affected, because the error recovery mechanisms are working.

This is more a problem of error reporting in brickd. Before 2.4.2 all these errors where logged on debug level and you would not have seen them by default. In 2.4.2 we changed that to give these errors more visibility with the downside of given them too much visibility in some cases. Your case is on the edge of this.

Posted

Photron,

Good to hear there are plans for brickd health reporting in the API. I hope this gets implemented soon. In my priorities, this goes to the top.

Whether a system will fall over or not, I am not going to bicker with you over this. I will say, however, that I have seen brickd producing several hundred messages per second and yet respond normally to calls, and that is a showstopper.

Best!

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...