Introduction
This document describes the possibilities of monitoring the jtel system within the framework of an operational concept and integrating it into an alarm system or telemetry. Regarding the architecture and structure of the system, reference is made to other documents and to the specific project specification sheet.
General
There are many tools that can be used to monitor systems during operation. jtel relies on the tool Nagios, although other tools can certainly be used. In individual cases, please ask us.
If it is desired to set up monitoring for your system, jtel requires an additional server in the system architecture on which the Nagios server is installed. If you already have a Nagios server, it can be used. A Nagios client with corresponding plug-ins is installed on the individual servers of the jtel system. The installation is done on a service basis. Generally, no more than one day should be necessary for this.
Furthermore, it is possible to carry out a so-called end-to-end test on a regular basis. This test can be used to ensure that the system's telephony paths actually work continuously by having an external system (at jtel) call the system on a test number and in turn connect to an external system at jtel.
Prometheus and Grafana
jtel relies on the Prometheus system to acquire telemetry data and Grafana to display it. Other monitoring systems can be implemented as a project if desired.
The monitoring system is configured to monitor various points in the system. Here, measured values such as RAM and CPU utilisation are observed, and individual processes are also monitored to see whether they are actually running. It is recommended to install the monitoring system on a separate physical machine.
If a threshold is exceeded or an individual part of the system does not react, an alarm is triggered. This alarm can be sent by SMS or email.
As a general rule, the first failed attempt to reach a system part should not immediately trigger an alarm, but the test should be repeated a short time later. Only after two failed attempts should an alarm be triggered.
Monitoring
Depending on the system, the services to be monitored will vary.
The right services are selected on the basis of the corresponding installed roles. Also see https://wiki.jtel.de/display/JPW/Installation.
In order to monitor the services used, several so-called "exporters" are installed. These provide telemetry data to Prometheus and are queried regularly. It is thus possible to view a history of telemetry data in Grafana
Telemetry
In telemetry, values are measured. Values can exceed a certain warning or error threshold at which an action is triggered - i.e. a warning or error message is generated.
Running services
In the case of running services, a description is given of which services must be running in order to guarantee a certain function.
This measurement is binary: if a service is not running, an error is triggered because a function is then impaired.
Please note that in installations where certain components are not running, some alarms will be disabled. For example, if the email service has not been licensed, this alarm is deactivated.
In the following tables, no special alarm is documented in this case - if the service is not running, an error is always triggered by email and, if connected, SMS.
All Roles
These monitoring points should be set up on all systems.
Telemetry
Monitor point | Measured value | Action on error | Comment |
Ping Test | Ping succesful | SMS + email | Machine is accessible. |
Harddrive Operating system + data | 80% 90% | SMS + email | When the hard disk capacity is reached, it is to be expected that individual components will no longer function correctly. |
CPU | > 80% longer than 2 minutes 100% longer than 1 minute | SMS + email | The system is overloaded. A process part may not be working correctly. |
Time synchronisation | Time, comparison with NTP server, delta < 2 seconds | SMS + email | Time synchronisation differences between the systems can lead to errors in the calculation of the queues and statistics. |
Running services
Service | Measured value | Action on error | Comment |
ntp Service (Windows) | Service running | SMS + email | The time synchronisation service is running. |
chronyd (Linux) | Service running | SMS + email | The time synchronisation service is running. |
Role TEL
Telemetry
Monitor point | Measured value | Action on error | Comment |
error.log | Growing too fast (> 10kB per second) | SMS + email | If the error.log grows quickly, there may be a system error.. |
error.log | DB Connection error (ODBC error) | SMS + email | The database connectivity is disrupted. |
Running services
Service | Program | Comment |
Telephony server | robot5.exe | If this process is not running, telephony is not accessible. |
SIP / RTP Service | giHal.exe | If this process is not running, telephony is not accessible. |
SIP Registration service | giAcu.exe | If this process is not running, the SIP Trunk to PBX will not be registered. Note: this service is not required for all installations. |
Platform UDP Listener | javaw.exe (platform-UDP-listener-1.0-jar-with-dependencies.jar) | If this process is not running, no messages about calls running in the telephony server are transmitted to the rest of the installation. Thus, for example, only a partial update of the update of the data in the agent home takes place. |
Rolle LB
Telemetry
Monitor point | Measured value | Action on error | Comment |
Port 80 TCP | Port is reachable | SMS + email | The load balancer can be reached via http. |
Port 443 TCP | Port is reachable | SMS + email | The load balancer can be reached via https. |
http call to http://(ip)/admin | 200 OK inkl Web-Inhalt | SMS + email | The web application and SOAP interfaces are accessible via http. |
http call to https://(ip)/admin | 200 OK inkl Web-Inhalt | SMS + email | The web application and SOAP interfaces are accessible via https. |
Running Services
Service | Program | Comment |
Load Balancer Service | haproxy | If this process is not running, the web application or SOAP interface of the solution is not accessible. |
Rolle WEB
Telemetry
Monitor point | Measured value | Action on error | Comment |
Port 8080 TCP | Port is reachable | SMS + email | The web application or SOAP interface is accessible. |
http call to http://(ip):8080/admin | 200 OK incl web content | SMS + email | The web application and SOAP interface are accessible. |
Running Services
Service | Program | Comment |
Web Server Service | wildfly | If this process is not running, the web server is not accessible. |
Rolle DB
Telemetry
Monitor point | Measured value | Action on error | Comment |
MySQL Master | Alarm threshold at 30 seconds for long-lasting queries on the master database. | SMS + email | Individual database queries do not require a disproportionately long time. |
MySQL Slave | No errors, Seconds behind Master is low (< 5 seconds). | SMS + email | The slave database replicates correctly. |
Running Services
Service | Program | Comment |
MySQL Database Server | mysqld | If this process is not running, the database is not accessible |
Rolle REST
Service / Installed where | Program | Comment |
jrest REST Service / Role TEL or one of the Linux servers | javaw.exe (jtel-jrest-1.0.jar) jrest (linux) | If this process is not running, the jtel REST service is not accessible. |
TK- und Presence Connectoren
Service / Installed where | Program | Comment |
Presence Connector (Teams, NFON, Cisco and others) / Role TEL | javaw.exe (jtel-system-presence-aggregator-1.0.jar) | If this process is not running, presence data is no longer transmitted from teams and the PBX to the jtel system. The transmission of data back to Teams is also disrupted. |
Innovaphone PBX Connector / Role TEL | JTELInnovaphonePBXService.exe | If this process is not running, no more presence data is transmitted from the PBX to the jtel system. The off-hook function is disturbed. The detection of whether a call has been transferred is disturbed. |
TAPI PBX Connector / Role TEL | jtelTAPIMonitorService.exe | If this process is not running, no more presence data is transmitted from the PBX to the jtel system. The off-hook function is disturbed. The detection of whether a call has been transferred is disturbed. |
STARFACE v2 Connector / Role TEL | jtelStarface6v2SOAPWindowsService.exe | If this process is not running, no calls are put through to the jtel system. No more presence data is transmitted from the PBX to the jtel system. |
email Servicee
Service / Installed where | Program | Comment |
email (IMAP) / Role TEL | jtelIMAPMailService.exe | If this process is not running, emails will not be collected and processed from IMAP mailboxes. |
email (Exchange) / Role TEL | jtelEWSMailService.exe | If this process is not running, emails will not be fetched and processed from Exchange server mailboxes. |