Xymon Mailing List Archive search

xymongen hanging

list Jeremy Laidman
Mon, 17 Oct 2022 17:12:07 +1100
Message-Id: <CACO=ejzr-EFARgwTBugGdB=user-daa3ac02355a@xymon.invalid>

Hi David

The "snapshot.cgi" runs from the web interface, and creates a snapshot
report. The script snapshot.sh runs snapshot.cgi, and this in turn runs
xymongen with "--snapshot=..." as an argument.

Similarly, the "report.cgi" runs from the web interface, and creates an
availability report, using "--reportops=..." as an argument.

Also, take a look at the xymonreports.sh script. At the top (of my copy) of
this script there are instructions on creating a crontab entry to run the
script so as to generate daily, weekly and monthly reports. These would
generate xymongen processes with "--reportopts=..." as an argument.

See "man snapshot" and "man report" for more info.

Cheers
Jeremy

On Mon, 17 Oct 2022 at 15:57, David Logan <user-8d29ef0e6ab8@xymon.invalid> wrote:
Hi Folks,


Just wondering if anybody has any experience with xymongen hanging. I have
a large number of xymongen processes being kicked off sometime over the
weekend, unfortunately they are owned by apache and have a PPID of 1 so I
can?t tell how they were started. I?m presuming either xymoncmd but I can?t
see anything in the crontab for xymon or in tasks.cfg that would kick off
the snapshots and reporting processes.


These then sit for a very long time (> 24hrs) while trying to read a data
file from a specific server.


apache   14749     1 44 Oct16 ?        10:28:39
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/14748-1665896723

apache   14867     1 43 Oct16 ?        10:26:32
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/14866-1665896747

apache   15107     1 43 Oct16 ?        10:26:05
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/15106-1665896768

apache   15118     1 43 Oct16 ?        10:25:58
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/15117-1665896774

apache   15125     1 43 Oct16 ?        10:25:12
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/15124-1665896783

apache   15238     1 43 Oct16 ?        10:23:26
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15237-1665896797

apache   15269     1 43 Oct16 ?        10:25:31
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/15268-1665896804

apache   15349     1 43 Oct16 ?        10:22:20
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/15348-1665896807

apache   15382     1 43 Oct16 ?        10:23:40
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15381-1665896828

apache   15398     1 43 Oct16 ?        10:25:13
/xymon/server/server/bin/xymongen --snapshot=2222867979 XYMONGENSNAPOPTS
/xymon/server/server/www/snap/15397-1665896834

apache   15400     1 43 Oct16 ?        10:22:59
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15399-1665896837

apache   15757     1 43 Oct16 ?        10:24:48
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15756-1665896864

apache   15842     1 43 Oct16 ?        10:22:32
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15841-1665896873

apache   15964     1 43 Oct16 ?        10:24:21
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15963-1665896897

apache   15996     1 43 Oct16 ?        10:22:25
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/15995-1665896912

apache   16133     1 43 Oct16 ?        10:22:07
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/16132-1665896933

apache   16149     1 43 Oct16 ?        10:23:37
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/16148-1665896954

apache   16215     1 43 Oct16 ?        10:23:45
/xymon/server/server/bin/xymongen --reportopts=2222871640:2222958039:1:
/xymon/server/server/www/rep/16214-1665896972


An strace for the first pid is as follows (they are all the same) and
looking at file descriptor 3


[root at dcslmonitor 15238]# strace -f -p 14749

Process 14749 attached

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0

read(3, "", 4096)                       = 0


fd3 is


xymongen  14749               apache  cwd       DIR
253,0         6  134320195 /xymon/server/data/acks

xymongen  14749               apache  rtd       DIR
8,2       269         64 /

xymongen  14749               apache  txt       REG              253,0
1106256  135222190 /xymon/server/server/bin/xymongen

xymongen  14749               apache  mem       REG                8,6
155784    4448319 /usr/lib64/libselinux.so.1

xymongen  14749               apache  mem       REG                8,6
109976    4873245 /usr/lib64/libresolv-2.17.so

xymongen  14749               apache  mem       REG                8,6
15688    4259351 /usr/lib64/libkeyutils.so.1.5

xymongen  14749               apache  mem       REG                8,6
67104    4471490 /usr/lib64/libkrb5support.so.0.1

xymongen  14749               apache  mem       REG                8,6
142144    4873243 /usr/lib64/libpthread-2.17.so

xymongen  14749               apache  mem       REG                8,6
90632    4195838 /usr/lib64/libz.so.1.2.7

xymongen  14749               apache  mem       REG                8,6
19248    4358022 /usr/lib64/libdl-2.17.so

xymongen  14749               apache  mem       REG                8,6
210824    4471445 /usr/lib64/libk5crypto.so.3.1

xymongen  14749               apache  mem       REG                8,6
15920    4939663 /usr/lib64/libcom_err.so.2.1

xymongen  14749               apache  mem       REG                8,6
967840    4259800 /usr/lib64/libkrb5.so.3.3

xymongen  14749               apache  mem       REG                8,6
320400    4256684 /usr/lib64/libgssapi_krb5.so.2.2

xymongen  14749               apache  mem       REG                8,6
2156272    4262067 /usr/lib64/libc-2.17.so

xymongen  14749               apache  mem       REG                8,6
402384    4259730 /usr/lib64/libpcre.so.1.2.0

xymongen  14749               apache  mem       REG                8,6
2521008    4256674 /usr/lib64/libcrypto.so.1.0.2k

xymongen  14749               apache  mem       REG                8,6
470360    4195836 /usr/lib64/libssl.so.1.0.2k

xymongen  14749               apache  mem       REG                8,6
163312    4448246 /usr/lib64/ld-2.17.so

xymongen  14749               apache    0r     FIFO
0,8       0t0  404824379 pipe

xymongen  14749               apache    1w     FIFO
0,8       0t0  404824380 pipe

xymongen  14749               apache    2w     FIFO
0,8       0t0  404824381 pipe

xymongen  14749               apache    3r      REG
253,0       524   67195718 /xymon/server/data/hist/accessntg.sslcert


Every process (in the process list above) shows they have the same file
open as fd3, are they locking each other out or more to the point, should
they be?


Any ideas on where to look or what to do next?


Thanks


*David Logan*

*Senior Systems Administrator*

*Data Centre Services*

Department of *Corporate and Digital Development* *| *Northern Territory
Government
GPO Box 2391, Darwin, NT 0801,
Australia

*DCS Midrange Ticketing System*

*p   ... <+61> 8 8999 6968 *

*m ?  <+61> 458 631 117            *New and Existing tickets:
http://dcscentral.nt.gov.au/

*e  ... **user-8d29ef0e6ab8@xymon.invalid
<user-8d29ef0e6ab8@xymon.invalid>                                                *or
user-9146ca35cd60@xymon.invalid

*w ? www.nt.gov.au
<http://www.nt.gov.au/>;
 **Escalations: (08) 8999 7654*


*Our vision:* *improve government through services and solutions that
exceed expectations*

Our values: *Honest  **| **Professional*  *| Respectful  | **Accountable*
  *| **Innovative *

The information in this e-mail is intended solely for the addressee named.
It may contain legally privileged or confidential information that is
subject to copyright. If you are not the intended recipient you must not
use, disclose copy or distribute this communication. If you have received
this message in error, please delete the e-mail and notify the sender. No
representation is made that this e-mail is free of viruses. Virus scanning
is recommended and is the responsibility of the recipient.

Please consider the environment before printing this email.