CMS – NHS problem

There is a problem which seems to occur mostly on Mondays when a user connects from the NHS gateway {IP 81.145.165.2}. This is causing some threads of the java process that runs sitemanager to hang while still consuming CPU cycles. The system can operate with 1 of these as its a dual CPU server but once there are 2 or more of these then the service will degrade steadily.¬† There is a case raised with Terminal 4 who are investigating but in the meantime they advise restarting tomcat. The user interface will keep running for up to an hour but as publishing slows down and these back up then the perfomance will fall off. The rsync to the live server is badly affected as well. These gradually build up. The nagios service can also be used to view the problem. Look under “Apache Status” and select the entry”CMS tomcat” or go directly to http://cmst4.qub.ac.uk:8080/manager/status

The cpu guzzling processses will be obvious, but check with top.

There are 2 possible actions 1. restart tomcat 2. renice java and keep it running for a while  (I would do this if its 4.30 keeping things ticking over until after 5.00 and then restart tomcat}

1. restart tomcat

{on jackie}
ps -ef | grep java
kill -9 {java process id}
rm /usr/local/tomcat/temp/catalina.pid
/etc/init.d/tomcat start

2. change priorities

ps -ef | grep java
renice +19 {java process id}
renice -19 {process ids of the rsync process}

Also of note is that the NHS gateway can be blocked by adding the following line immediately under the input directive to the iptables config. in /etc/sysconfig:

:INPUT ACCEPT [0:0]
-A INPUT -s 81.145.165.2 -j DROP

There is a copy of the iptables file with this line included called iptables-hsblock. This is a measure of last resort as it also blocks NHS staff accessing an eform which they are using at present to register for a workshop.