Planning for offline and disconnected devices
You should never think of any IoT environment as stable, and constantly online. Gateways and devices will lose connectivity from time to time, especially if on 2G/3G/LTE networks. The important thing is to plan for such disruptions and have plans for how to deal with these scenarios. And part from disconnected networks, you also need to plan for service failures, such as with you IoT Hub or microServiceBus.com for that matter.
In terms of microServiceBus.com, Nodes (the device agent) are architecturally built up of two components; microServiceBus-node and microServiceBus-core. microServiceBus-node is responsible for the Device Management communication with microServiceBus.com (the portal) from where you manage all devices. This is where you can start, stop, enable, disable and update your Nodes.
Losing connectivity with microServiceBus.com
If, for any reason, the Node loses connection with the portal, it will continue to work until set into “Disconnected State”. This is determined by a number of missed so called Heartbeats. How many and how often these occur can be set in a Disconnected Policy on each Node and also in a Node Template. Once a Node is set into Disconnected State, a presiding “Disconnect Action” will be taken. This action can also be set in the policy, and you can select either “Restart” or “Reboot”.
However, if the Node recover and reestablish its connectivity it will take “Reconnect Action” which is either “Update” or “Do nothing”. “Update” means, stop all services and retrieve all Flows, Services and configuration from the portal. Then start the services again. This option will make sure that the Node is running with the latest configuration should there have been any updates while off-line. The “Do nothing” action, as the name implies, continues as if nothing has happened.
Losing connectivity with IoT Hub
As part of the configuration of your Organization you set up the integration with your IoT Hub. Authentication and address to this IoT Hub is provided from the portal upon successful sign in of the Node. If the Node loses connectivity to the IoT Hub, any outbound messages will be persisted locally while there is at least 25% free storage space left on the device. Once the connectivity with the IoT Hub is reestablished those messages are sent. While the connectivity to the IoT Hub is broken, exception information will be sent to the portal.
Let’s assume we have configured our Disconnect Policy as follows:
- Heartbeat interval: 30 (number of seconds between heartbeats)
- Missing heartbeat limit: 3 (number of heartbeats without response before the Node takes “Disconnect Action”
- Disconnect action: Restart
- Reconnect action: Update
- Offline mode: Enabled (allow node to start in offline mode with latest known configuration)
Recover network connectivity disruption after one minute
With an Heartbeat interval set to 30 seconds and the limit set to 3, the Node will never get to be set to “Disconnected state”. One connectivity is reestablished the Node will take the *Reconnect action” which in our case was “Update”.
All Services will be stopped, after which they will be downloaded along with their Flows and configuration and the restarted again.
As stated above, this configuration will make sure our Node is up-to-date, but the penalty is there will be a disturbance in terms of transmitting messages as all services are restarted. Such configuration will also cause higher data consumption as all Services and Flows are downloaded.
Network connectivity disruption for one hour
With three missing Heartbeats, the Node is considered in “Disconnected state” and will take “Disconnect action” which in our case was “Restart”. Restart means restart the process, which in this case will happen about every two minutes. During the time of which the Node cannot establish internet connectivity (one hour) the Node is going to start in Offline mode. Our policy has Offline mode set to “enabled” and will therefor start all Services and Flows with its last know state and continue as normal, but of course persisting outgoing messages to disk.
After one hour, when the device comes online again, it will take the “Reconnect action” and restart all Services and Flows.
Let us know if you have any questions.