Xymon Mailing List Archive search

xymongen crashes in 4.3.29

list Japheth Cleaver
Thu, 22 Aug 2019 14:11:59 -0700
Message-Id: <user-d0c10d73038e@xymon.invalid>

Hi,

I think this might be xymongen in report mode from the "dailyreport" 
file in /tasks.d/; the timing would check out.? I believe the problem 
here is one of the Terabithia patches now doing the wrong thing after 
some of the string-handling changes in 4.3.29 -- causing core dumps in 
certain situations.

If you're running actual RHEL7 on this (not CentOS, which hasn't 
released 7.7 yet), would you mind checking the xymon-4.3.30-0.5 package 
in the EL7 Terabithia testing repo and see if this helps?
https://repo.terabithia.org/rpms/xymon/testing/el7/x86_64/

Regards,
-jc

On 8/22/2019 11:34 AM, Matt Vander Werf wrote:
Hi Torsten,

No, there wasn't anything running from cron or anything else around 
that time, let alone anything that restarts the network or Xymon.

Thanks.

--
Matt Vander Werf


On Wed, Aug 21, 2019 at 5:43 AM Torsten Richter <user-c862b499d9fa@xymon.invalid 
<mailto:user-c862b499d9fa@xymon.invalid>> wrote:

    Hi Matt,

    dumb question: is there any cron job running at this time that is
    restarting XYmon fiddling with the network, like restarting the
    network for some reason?

    Regards,
    Torsten
    Matt Vander Werf <user-dfc3cf2ca434@xymon.invalid <mailto:user-dfc3cf2ca434@xymon.invalid>>
    hat am 20. August 2019 um 17:10 geschrieben:

    Hi all,

    Every day since we updated our Xymon server to 4.3.29 (from
    4.3.28), I've gotten an e-mail alert due to xymond turning red
    that reads:

    red xymongen program crashed

    Fatal signal caught!

    The strange thing is that this has happened at 1:04 AM every
    day...like clockwork. I have xymongen set to run every 1 minute
    and it has no problems running any other time of the day. We are
    using the Terabithia RPMs and the Xymon server is running RHEL 7.

    I've scoured the system to find anything that is set to run
    at/around that time via cron, etc. and haven't found anything.
    The system logs don't show anything is happening around that time
    either.

    I turned on debug logging for xymond and xymongen and haven't
    been able to find anything unusual in either logs around that
    time. But it is dumping core files for xymongen every time it
    crashes.

    I used gdb to get the backtrace on all of the core files (so far)
    and I've found that they all show the same thing. It shows the
    same host in the backtrace too (although I'm farily confident it
    isn't specific or isolated to that host but just the first one it
    runs into that it has issues with when processing).

    I've included an example gdb output below (the most recent one) [1].

    Is anyone else running into this by chance? Or any idea what
    might be the cause?

    Thanks!


    [1]
    # gdb -q /usr/libexec/xymon/xymongen core.16327
    Reading symbols from /usr/libexec/xymon/xymongen...Reading
    symbols from /usr/lib/debug/usr/libexec/xymon/xymongen.debug...done.
    done.
    [New LWP 16327]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    Core was generated by `/usr/libexec/xymon/xymongen
    --reportopts=1566187200:1566273599:0:nongr --recent'.
    Program terminated with signal 6, Aborted.
    #0 ?0x00007f4657c49377 in __GI_raise (sig=sig at entry=6) at
    ../nptl/sysdeps/unix/sysv/linux/raise.c:55
    55 ?return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
    (gdb) bt
    #0 ?0x00007f4657c49377 in __GI_raise (sig=sig at entry=6) at
    ../nptl/sysdeps/unix/sysv/linux/raise.c:55
    #1 ?0x00007f4657c4aa68 in __GI_abort () at abort.c:90
    #2 ?0x00005589375dd455 in sigsegv_handler (signum=<optimized
    out>) at sig.c:57
    #3 ?<signal handler called>
    #4 ?strchrnul () at ../sysdeps/x86_64/strchrnul.S:33
    #5 ?0x00007f4657c5b681 in __find_specmb (format=0xfce <Address
    0xfce out of bounds>) at printf-parse.h:109
    #6 ?_IO_vfprintf_internal (s=s at entry=0x7ffd5dabcc00,
    ? ? format=format at entry=0xfce <Address 0xfce out of bounds>,
    ap=ap at entry=0x7ffd5dabcd38) at vfprintf.c:1308
    #7 ?0x00007f4657d28c78 in ___vsprintf_chk (s=0x7ffd5dabcf82 "",
    flags=1, slen=18446744073709551615,
    ? ? format=0xfce <Address 0xfce out of bounds>,
    args=args at entry=0x7ffd5dabcd38) at vsprintf_chk.c:83
    #8 ?0x00007f4657d28bcd in ___sprintf_chk (s=<optimized out>,
    flags=flags at entry=1,
    ? ? slen=slen at entry=18446744073709551615, format=<optimized out>)
    at sprintf_chk.c:32
    #9 ?0x00005589375ce8ca in sprintf (__fmt=<optimized out>,
    __s=<optimized out>)
    ? ? at /usr/include/bits/stdio2.h:33
    #10 parse_histlogfile (starttime=1566187200,
    ? ? timespec=0x558937840f50 <timespec.7157>
    "Wed_Sep_2_19:34:55_2015", servicename=0x5589383b6d70 "procs",
    ? ? hostname=0x558938a335d0 "<client hostname>") at
    availability.c:174
    #11 parse_historyfile (fd=fd at entry=0x558938a3aea0,
    repinfo=<optimized out>,
    ? ? hostname=0x558938a335d0 "<client hostname>",
    servicename=0x5589383b6d70 "procs",
    ? ? fromtime=<optimized out>, totime=1566273599,
    for_history=for_history at entry=0, warnlevel=97,
    ? ? greenlevel=99.995000000000005, warnstops=-1, reporttime=0x0)
    at availability.c:475
    #12 0x00005589375c38cc in init_state (filename=<optimized out>,
    ? ? filename at entry=0x7ffd5dacf210 "<client hostname>.procs",
    log=log at entry=0x7ffd5dacf120)
    ? ? at loaddata.c:275
    #13 0x00005589375c45ee in load_state
    (sumhead=sumhead at entry=0x558937809d48 <dispsums>) at loaddata.c:626
    #14 0x00005589375be6f4 in main (argc=5, argv=0x7ffd5dad4418) at
    xymongen.c:599


    -- 
    Matt Vander Werf