openhab is timeouting on masterbrick.

xsherlock · March 23, 2016 at 22:22

Hi,

It looks I hit some bug.

My setup is RPI running openhab 1.8.0 and couple of masterbrick around house for the home automation. All master bricks have Ethernet extension (old one).

All powered by POE.

The particular one is a stack of

2xMasterBrick and Ethernet with 7 bricklets

3 x PTC

2 x Temp

1 x Humidity

1 x DualRelay

I switch the relay 3-4 times in 2 sec intervals and the whole stack becomes unresponsive. Stops pinging back and requires power cycle.

The logs reads


2016-03-23 22:54:58.434 [ERROR] [.t.i.m.impl.PTCTemperatureImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1
2016-03-23 22:55:00.936 [ERROR] [.t.i.m.i.MBrickletHumidityImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1
2016-03-23 22:55:03.437 [ERROR] [t.i.m.i.MDualRelayBrickletImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 2
2016-03-23 22:55:05.946 [ERROR] [i.m.i.MBrickletTemperatureImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1
2016-03-23 22:55:08.452 [ERROR] [.t.i.m.impl.PTCTemperatureImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1
2016-03-23 22:55:10.958 [ERROR] [t.i.m.i.MDualRelayBrickletImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 2
2016-03-23 22:55:13.472 [ERROR] [i.m.i.MBrickletTemperatureImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1
2016-03-23 22:55:15.974 [ERROR] [.t.i.m.impl.PTCTemperatureImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1

Am I doing something wrong?

rwblinn · March 27, 2016 at 17:54

I am experiencing a similiar error:

2016-03-25 07:11:25.198 [ERROR] [t.i.m.i.MBrickletBarometerImpl] - Tinkerforge Error: Tinkerforge timeout occurred : Did not receive response in time for function ID 1

In addition, from time to time after running openHAB for a while (rTS is the UID for the Barometer Bricklet):

- COMMAND no tinkerforge device found for command for item uid: rTS subId: null

After resetting the Master Brick and openHAB, the error disappears, but comes back after openHAB runs for a while.

Setup:

Raspberry Pi 3 with latest Raspian (Jessie)
Master Brick 2.0 with Temperature, Humidity and Barometer Bricklet
openHAB 1.8.1, TinkerForge Bindings org.openhab.action.tinkerforge-1.8.1.jar and org.openhab.binding.tinkerforge-1.8.1.jar

xsherlock · March 28, 2016 at 21:01

Hmm, that is not good, maybe Theo would look into it as I guess that came with the 1.8 version of openhab and he is the developer of the bindings.

Before I decided to do the whole Home Automation on TF bricks, I was hammering a simple setup of 1 Temp bricklet and 1 dual relay for over 2 moths and never seen it time-out this way. That was running 1.7 with some snapshot 1.8 bindings that I needed for one feature. The only other thing is that was a brick connected to the REDbrick and not to the Ethernet Extension.

But I see you have the same problem on the direct connection from RPI so it could be not the Ethernet Extension that was my best suspect.

I will update to 1.8.2 as it came out today. And try to simplify my setup to isolate the bug.

theo · March 28, 2016 at 21:23

Hi xsherlock and rwblinn,

there are no changes to the tinkerforge binding for 1.8.2 therefore I think things would not change if you upgrade.

In both cases it seems like the TF stack stopped working. The log messages indicate that the getSensorValue calls don't get responses in time. This is ok if the stack gets offline (e.g. because of wifi errors) but should heal as soon as the TF stack is back online.

@xsherlock: am i right: you only reseted your TF stack and not openHAB to get the setup running again? May be you can try to set higher values for the the callbackPeriods of the sensors. It would be even better if you can test your stack with another program than openHAB. May be the TF shell bindings could help out?

Regards,

Theo

rwblinn · March 30, 2016 at 09:30

Hi Theo,

thanks for your reply but what are the next steps to resolve this issue?

Please advice.

Much Appreciated.

theo · March 30, 2016 at 18:04

Hi rwblinn,

can you check if the TF stack hangs or openHAB? Try out if the problem disappears if you only reset the TF Stack and don't restart openHAB. Before resetting the stack you can also test the stack with the BrickViewer.

Regards,

Theo

xsherlock · March 30, 2016 at 22:33

I have got a case when the openhab reported a timeout and the stack was still pinging back but I was unable to connect with brickv.

Also rebooting the stack, does not always make the openhab to recover. Sometimes I need to restart openhab as well.

All my sensors are at 1000ms callback. is that stressing the openhab?

One more think is that for the reason not to the get bogus readout every now and then the temp sensors have tinkerforge:temperature.slowI2C=True set.

But not the PTC sensors or Humidity. Can that be a mismatch?

theo · March 31, 2016 at 20:53

All my sensors are at 1000ms callback. is that stressing the openhab?

No, I thought it could be too stressing for the TF master brick. But 1000ms shouldn't hit any limits.

Any chance to get a test of your stack with another program than openHAB?

I will try if I can reproduce the errors with a test setup. We will see.

xsherlock · March 31, 2016 at 21:31

What other way do you want me to stress test this stack. I can do that.

---------

I just made stack crash in a simple way with brickv. I did flop the dual relay couple of times and it stopped pinging back and imediately openhab throwed the timeouts.

------

Some more findings. Reducing the stack to single MasterBrick and Ehternet POE extension and a single DualRelay bricklet will still make it crash.

The stack stops pinging and then sometimes it recovers and some times it does not and requires a reboot.

I can flood ping a stack with small packet at 10hz and it is fine

but it will not ping back on from anything larger then 128 bytes !!!!!

This could be the source of the problem!

pi@rpi-openhab ~ $ ping 172.16.10.20 -s 120
PING 172.16.10.20 (172.16.10.20) 120(148) bytes of data.
128 bytes from 172.16.10.20: icmp_req=1 ttl=128 time=0.354 ms
wrong data byte #119 should be 0x77 but was 0x74
#8      8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27
#40     28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47
#72     48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64 65 66 67
#104    68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 74
^C
--- 172.16.10.20 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.354/0.354/0.354/0.000 ms
pi@rpi-openhab ~ $

theo · April 1, 2016 at 21:19

I've setup a test environment with:

2 Master Bricks

1 Ethernet Extension (without POE)

1 Humidity Bricklet

1 Barometer Bricklet

1 AmbientLightV1 Bricklet

1 Dual Relay Bricklet

1 PTC Bricklet

1 Temperature Bricklet

connected to a Pi3 with openhab 1.8.2 running.

I've tried your ping of death with:

theo$ ping -s 130 pi3

No problems so far.

We'll have to wait and see.

xsherlock · April 1, 2016 at 21:42

If the Masterbrick is connected to the host over USB then it will ping back fine . I have one connected to the Redbrick (in my case 172.16.2.156). Only those stacks that are made of the MasterBrick and Ethernet Extension POE fail to ping back. I have connected 2nd stack with new out of the box MasterB and EthernetEXT POE and it experience the very same problem. It is 100% reproducible and the very first packet >128 bytes does not pass it has little to do with waiting.

I did upgraded to 1.8.2 before today tests.

I have no EthernetEXT without POE but will try to set the switch not to provide power and power the stack over USB. This is the test to do


C:\Users\sherlock>ping 172.16.10.10 -l 128

Pinging 172.16.10.10 with 128 bytes of data:
Reply from 172.16.10.10: bytes=128 - MISCOMPARE at offset 119 - time=220ms TTL=128
Reply from 172.16.10.10: bytes=128 - MISCOMPARE at offset 119 - time=169ms TTL=128
Reply from 172.16.10.10: bytes=128 - MISCOMPARE at offset 119 - time=217ms TTL=128

Ping statistics for 172.16.10.10:
    Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 169ms, Maximum = 220ms, Average = 202ms

C:\Users\sherlock>ping 172.16.2.156 -l 128
Pinging 172.16.2.156 with 128 bytes of data:
Reply from 172.16.2.156: bytes=128 time=90ms TTL=64
Reply from 172.16.2.156: bytes=128 time=10ms TTL=64

Ping statistics for 172.16.2.156:
    Packets: Sent = 2, Received = 2, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 10ms, Maximum = 90ms, Average = 50ms

theo · April 1, 2016 at 21:49

Just to be sure: the ping problem is totally unrelated to the openHAB setup?

xsherlock · April 1, 2016 at 22:07

Yep, I was just pinging the stacks from the Windows machine. Same problem

That is some kind of glitch, that could be or not causing problems with openhab. I can imagine that if some sensor updates are to arrive at the same time the relay is fired that could be packed into larger TCP packet that does not pass and hence the timeout.

xsherlock · April 4, 2016 at 11:44

I just swapped Ethernet POE extension for the wifi Extentension.

No other changes.

pi@rpi-openhab ~ $ ping 172.16.10.20 -s 1400
PING 172.16.10.20 (172.16.10.20) 1400(1428) bytes of data.
1408 bytes from 172.16.10.20: icmp_req=1 ttl=255 time=5.89 ms
1408 bytes from 172.16.10.20: icmp_req=2 ttl=255 time=5.71 ms
1408 bytes from 172.16.10.20: icmp_req=3 ttl=255 time=5.71 ms
^C
--- 172.16.10.20 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 5.711/5.772/5.890/0.120 ms

As you can see it pings back nicely for all sizes. But the stack will still crash on numerous dual relay actions. If I use REST to switch it after 3rd 4th it will kill the stack and it stops pinging back!!!!

So the bug is not isolated to the POE extension

theo · April 5, 2016 at 20:39

My test stack is now running fine since 4 days. I switch the relays from time to time. I think you have a hardware problem.

May be you can check all the cables and especially the bricklet connectors. Sometimes it happens that a pin of a connector is bent. May be it is also a good idea to run the stack without the relay and check if it's working.

xsherlock · April 5, 2016 at 21:05

I noticed that on my other MasterBrick with 8 relays on the IO16 bricklet I could not crash the stack no matter how much I would hammer the relays with switch commands.

So I spent whole day rebuilding the panel in the boiler room rewiring it to the IO16 and external relay boards. As I now suspect that the issue is related to the dual relay bricklet (I only have one so cant replace it) I will have a conclusion tomorrow.

xsherlock · April 6, 2016 at 20:54

The new setup was working flawlessly all morning when I had no devices connected to the relays. The relay board is a 5V one and it powered from the power out on the IO16. The very moment I did connect a 230V device ( 3 way heating valve max 10W) the masterbrick started crashing again exactly like it was with dual relay.

Further testing is required to see if that IS something related to POE power or there is no isolation somewhere and there is a clash between the POE in and Relay board but I was hoping that should not be a problem with IO16

-----

Ok Now I'm sure that connecting a 230V Phase to the relay board and switching that relay will cause Masterbrick to hang. It is enough to connect a single relay to the phase and common neutral to the switched device and it will cause problems. The switched phase was the same that powers POE switch

I tested that on 4 separate configurations and 2 different MasterBricks

Masterbrick(1) with POE ext-io16-5v Relay board powered from the io16.

Masterbrick(1) with POE ext- dual relay

Masterbrick(1) with wifi ext, USB power PS - dual relay

Masterbrick(2) with POE ext - io16- 12Vrelay board with separate 12V PS for relays.

The last config will also crash but in slightly different manner. It will stop pinging back for about 7 seconds after first 2-3 relay switches. And then it will recover after a period of non responsiveness.

That is the 2nd picture.

I run out of ideas, what am I doing wrong, and why it does not work.

Is that my cabling? or some earthing problem?

bixator · April 8, 2016 at 13:06

Hi,

I had exactly the same issue when trying to power on a small 3Ways valve engine

using the TKF Dual Relay bricklet.

My stack was using an older Ethernet POE extension but the whole stack,

which includes a RED Brick, is powered using a external source.

The RED brick was not reachable anymore after switching the 230V phase on.

I've tried to replace the Dual Relay bricklet with a brand new one but the issue remained.

So, I've tried to plug a Wifi USB dongle into the RED Brick USB port, instead

of using the Ethernet connection, and this issue did not show up anymore...

Regards.

xsherlock · May 2, 2016 at 22:34

I have made more tests and it looks that over Wifi iy will work reliably but over POE it will always fail after couple switches.

I tried a super save setup with industrial quad relay bricklet switching 12V for the coil of a 2 contact 230V relay (that was cutting both neutral and phase) and that also has time out problems.

As a last resort I ordered a new 1.1 POE to test and will also try to disable power over ethernet and use stepdown power supply while stll use cabled connection, but it looks as the POE extension design is flawed in some way.

-------------

POE extension 1.0 with external power is a no go... same hangups.

A package arrived and instead POE 1.1 I just got new plain Ethernet extension 1.1. Tested it to work with Quad Industrial relay and dual contact external relay. Cant kill that setup and I loose no pings on the way.

Will update when I finaly get new POE extension.

------------------

Just got POE 1.1 and it is working flawless switching the relays under load. So the issue was POE 1.0 having a bug.

openhab is timeouting on masterbrick.

Recommended Posts

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Link zu diesem Kommentar

Share on other sites

Join the conversation