Last two weekends - nagios and more cfengine 2 & 3

So, what started as take a week to set up a new nagios server at work ended up taking almost a month...because there were many days where I'd only have an hour or less to put some time into the side task. The other stumbling block was I had decided that the new nagios server configuration files would get managed under subversion, instead of RCS as it had been done in the previous two incarnations. New SA's don't seem to understand RCS and that the file is read-only for a reason...and its not to make them use :w! ... which lately has resulted in a the sudden reappearance of monitors of systems that had been shutdown long ago.

Though now that I think of it, there used to be the documented procedure for editing zone files (back when it was done directly on the master nameserver and version controlled by RCS.) Which as I recall was to perform an rcsdiff, and then use the appropriate workflow to edit the zone file.

% rcsdiff zonefile

if differences

      % rcs -l zonefile
      % ci -l zonefile
        make rude comment that somebody made edits
      % vi zonefile
      % ci -u zonefile


      % co -l zonefile
      % vi zonefile
      % ci -u zonefile


But, when I took over managing DNS servers, I switched to having cfengine manage them and the zone files now live under masterfiles, so version control is now done using subversion. Had started butchering the DNS section in the wiki, probably should see about writing something up on all the not so simple things I've done to DNS since taking it over...like split, stealth, sed processing of master zone for different views, DNSSEC, the incomplete work to allow outside secondary to take over as master should we ever get a DR site, and other gotchas, like consistent naming of slave zone files now that they are binary.

Additionally work on the nagios at work was hampered by the fact that for Solaris and legacy provisioning is CF2, and the new chef based provisioning is still a work in progress...where I haven't had time to get into any of it yet. So, I had to recreate my CF3 promises for nagios in CF2.

But Friday before last weekend it finally reached the point where it was ready to go live. Though I've been rolling in other wishlist items and smashing bugs in its configuration, and still need to decide what the actual procedure will be for delegating sections of nagios to other groups.

One of the things I had done with new nagios at work, was set up PNP4Nagios...as I had done at home. And, while looking to see if I needed to apply performance tweaks to the work nagios, all the pointers were to have mrtg or cacti collect and plot data from nagiostats. Well, a new work cacti is probably not going to happen anytime soon, and the old cacti(s) are struggling to monitor what they have now (I spent some time a while back trying to tune one them...but its probably partly being hampered by the fact that its mysql can use double the memory that is allocated to the VM. though reducing it from running 2 spine's of 200 threads each...on the 2 CPU VM to a single spine with fewer threads has helped. Something like the boost plugin would probably help in this case, but the version of cacti is pre-PIA. But, it could be a long time before it get's replaced (not sure if upgrade is possible....) Our old cacti is running on a Dell poweredge server that has been out of service over 6 years... with the cacti instance over 8 years old (Jul 8, 2005)....and the OS is RHEL3.

Anyways, it occurs to me that there should be a way to get PNP4Nagios to generate the graphs, and I search around and find check_nagiostats. Though no template for it. Oh, there's a template nagiostats.php, if I create a link for check_nagiostats.php it should get me 'better' graphs. Which is what I have CF2 do at work.

  08:32:00 pm, by The Dreamer   , 2306 words  
Categories: ReplayTV, Time Warner/Cox Cable, TiVo HD / Premiere / Elite, Cox HSI

Cox in Manhattan, KS has gone SDV

Sometime around mid-May, I think, I had gotten a plain business size envelope with just the Cox logo in the top left corner and my address on it. No other indication as to what the content was, so basically resembling similar letters pitching new services or such. Though those will probably start appearing again when the students return....

Not sure why, I opened it for some reason...and eventually found it to be a letter informing me that Cox is about to go SDV and that I would need to obtain some 'free' tuning adapters (where the only 'free' part is no new monthly charge to my cable card service -- where I only needed 4 tuning adapters to go with the 6 cablecards I'm renting.....the two older TiVo HDs having 2 S-Cards each...had thought about switching both to single M-Cards, but given my experience with previous 'self-install M-Card, turning into a full-service install charge because the M-Card was defective....so a tech had to come out to make that determination and give me a non-defective one.)

The letter said that I could visit a Cox Home Solutions store or call to have them delivered. After some thought, I decide that I will call and have them deliver 4 tuning adapters versus no idea how big or when I would make it out that way. Memorial day weekend was out, as was the first weekend of June for sure. And, the switch over was to be like June 25th.

So, I call and delivery is offered to me, which I confirm that is what I want. Where the agent checks to make sure they do shipments to Manhattan, but then later on its apparently not possible to deliver. But, after some further investigation by the agent, it seems like it might be possible and eventually I get a call back saying that they have been shipped and I should get them around Friday (June 7th).

From past experience, items are shipped FedEx Home Delivery, so I had expected to find them sitting there waiting for me when I got home from work...but there wasn't anything. Perhaps Saturday instead....while I didn't intend to wait all the way to when they did show up.... cacti consumed more time than I had intended that Saturday...

The delivery was kind of annoying...in that the delivery person dropped the boxes loudly on my door step with a big whomp...and then it was like he tried to kick in my door before running off.... mainly because it set off my burglar alarm.... :##

Anyways what I found were 4 huge boxes....big enough to contain DVRs or like components. Perhaps its the only size package they have for deliveries. But, after a quick unpack....the still packaged 4 tuning adapters would fit into one of the boxes, minus the ineffective foam insert...because the insert was sized to protect something bigger than the small boxes that had come instead.

At first it looked like I wasn't getting the complete self-install kit...but later I found that the self-install box contained one coax, splitter and filter...while the tuning adapter box contained the other coax along with usb cable and power wart.

Though I had decided before getting this that I was going to go with a splitter closer to wall and run new longer run to tuning adapter by each TiVo.... And, it was my intent for June 8th to be a Target run. It didn't pan out, and its not something that is sold by the union computer store or the union bookstore...so I ended up ordering cables from Amazon.com.

And, then it was a matter of figuring out when I would get around to setting them up. A number of times where I could've done it went by, but it didn't occur to me that I should do it then. Since the operation would be disruptive to my cable hookup, needed to avoid primetime. And, needed to be sufficiently awake and steady too.

And, then I notice there's a $10 self-install charge on my Cox bill.

Full story »


  10:06:00 pm, by The Dreamer   , 989 words  
Categories: Hardware, Software, Computer, Ubuntu, FreeBSD, CFEngine

Took a diversion from cacti and now its nagios

So, doing cacti on cbox doesn't seem to be working long term... but, the moment is being prepared for....I starting to assemble the pieces to build a new machine to do this and handle some other tasks that I've been looking for a place for.

Back to cfengine, I added a promise for dnetc (distributed.net)....and then a promise to finally configure CUPS on the two servers. And, then I turned to nagios.

I spent a couple evenings creating the initial configuration of nagios, working in design changes that I wanted to make and initial monitoring of localhost (dbox). Though it wasn't straight forward....there were differences here and there....mostly in FreeBSD layout, paths, and some of the commands taking different options. But, eventually I got everything running. My old check_dyndns worked once, but then stopped working.... problem was that it did 'stat -c "%Y" ..." which doesn't work on FreeBSD, 'stat -f "%m" ...' was the adjustment for that. All, while all the checks_* seem to be there, command definitions was lacking....but I guess having command definitions for everything is part of the debian/ubuntu packaging. There were other frills that came with that, that I don't mind not having...

I did run into check_ntp being deprecated....with check_ntp_time and check_ntp_peer being the tests to use....separating and making more clear on whether you're comparing time between servers using ntp or checking the state of the ntp server...
It did show some interesting oddities in holding NTP time on my home network.... I know that I should have 3 or more ntp servers, but it seems that I'm often landing in the state where I only have 2....with lots of delay, resulting in pretty good swings of jitter....almost makes me wonder if this something I could graph in cacti.... :hmm:

Wonder if I can find a cheap NTP appliance somewhere....

The last stumbling block was check_dhcp. Which seems to be broken on FreeBSD. All, the discussion on it seemed to point to firewalls, but no firewalls and it still didn't work....tcpdump on both places, and its saying it sending stuff, but no packets appearing on the network. But, I can see the other DHCP traffic on the network.

I remove that check and call it a night. I mull some possible work arounds....first one I tried was setting up linux compability and try running the check_dhcp from my working (ubuntu) nagios. Well, it didn't work...it couldn't find an interface. Oh well, guess there's the ugly way....use nrpe to invoke it. Though that didn't work right away.....probably because while I had created new nrpe configs for all my servers in cfengine, I haven't put any of my ubuntu servers under cfengine yet. Most of the other promises haven't been implemented for ubuntu yet. It was pretty simple to include nrpe.cfg for everything.... in fact it condensed to only 3 files.... a freebsd version, an ubuntu version and a host specific version for orac. Well, not right away...that happened more recently...while I was going through and updating the nrpe.cfg's by hand on the ubuntu servers. Was when I noticed that some of the files were only different in comments....so I made further simplifications in cfengine...which'll propagate out eventually....

Long term, I'll probably just have to track down some alternate implementation of check_dhcp....

I then add cbox to monitoring...and then looked to see about monitoring things that are on cbox/dbox...so I found checks for freeradius, cups, squid, along with improvements to checks on ntp. The check_squid was tricky....I got it working by hand, after making the suggested change for the default Cache type parsing, which turned out to be changes for squid3 vs. squid2 (but box is still running squid 2.7 - since I had re-built it by hand with SSL support, and blocked ubuntu from updating it. Orac wasn't blocked so it eventually turned into squid3.

it worked by hand, but wouldn't work under nagios...turned out that the embedded perl wasn't liking it. I was going to disable embedded perl for it, when I took a look at seeing what it was complaining about. And, did some reading on embedded perl.... the gist was "use strict", "perl -w" and "perl -c" as starting points. perl -w was find, but perl -c had one problem....which I fixed. But, no go. And, then noticed the line "# todo : use strict", guess I'll have to deal with that.

And, making that all happy, got it working.

The only other quirk was the memory check wouldn't work on FreeBSD, I guess there's no mallinfo() available for that. So, no running that test on those servers....plus no Cache test on box. But, it still left enough variety of tests that worked on all. And, it wasn't so much that I wanted to get all the information, but I choose to define all the different tests with ports set into the test....so running the check would also test that all my squid ports worked. There's actually only two that matter, but I have all my squid's configured the same, listening on 5 or 7 ports....depending on whether I have SSL enabled. Though I pretty much only need two now. I'm not doing transparent proxying and I don't need the SSL now that I've split box into dbox/cbox....the SSL was so ddclient could work on box and update dyndns via proxy to DSL....

Next up is adding zen to nagios, and coming with with more tests of things that are specific to zen, but covered or not covered in the old nagios.

Though as I worked along...there were things I couldn't find monitors for...though I realized that I could have cfengine promise that those services were running. Plus cfengine was also taking care of other things. So, I should probably work on writing some promises for zen. So, I can have promises to make sure things are started up again after a port is updated or that php/extensions.ini is reordered, etc.

But, I'll probably continue adding everything else to nagios first.


  10:52:00 pm, by The Dreamer   , 2431 words  
Categories: Software, Computer, Ubuntu, FreeBSD, CFEngine

Home server migration ran into some cacti

The home server migration that I wrote about on April 7th, hit a delay .... I started working on migrating cacti and nagios.

I probably should've started with nagios, since I don't think that would've taken as long as cacti has.

I had already been monitoring the new servers using my old cacti installation. I had pretty much decided that moving the old installation to the new servers wasn't going to straightforward.... partly because of versions, and no easy intermediary. But, I wasn't too worried about the historical data in my old cacti....

I figured that once I got things up and running, I'd just export the templates and import them into my new system and I'd be done.

But, then I hit a hitch....the squid templates I had weren't working on the new system....all I could find were old results about issues with doing SNMP to ports other than 161, and possibly due to newer versions of net-snmp....though that later turned out to be a wild goose.

Anyways...the work around was to use the proxy option in net-snmp. Though I recall having tried net-snmp before discovering bsnmpd on FreeBSD, but I gave it a shot.

Before I got to testing the proxy...I soon saw that it wasn't giving the same information as bsnmpd...specifically, for the HOST-RESOURCES-MIB and parts of UCB-SNMP-MIB. So, I decided that I could proxy net-snmp to bsnmpd and get those. But, that didn't work.....after some reading the answer was I needed to either map bsnmpd in somewhere else or exclude those areas from net-snmp.

Well, during the build of net-snmp, it did make reference to being able to set some variables in make.conf -- such as NET_SNMP_WITH_MIB_MODULE_LIST and NET_SNMP_WITHOUT_MIB_MODULE_LIST. And, by default NET_SNMP_WITH_MIB_MODULE_LIST contained "host disman/event-mib smux mibII/mta_sendmail mitII/tcpTable ucd-snmp/diskio sctp-mib if-mib"

So, I tried setting NET_SNMP_WITH_MIB_MODULE_LIST without host and ucb-snmp/diskio and tried to exclude the rest of ucb-snmp in NET_SNMP_WITHOUT_MIB_MODULE_LIST. Which got me a strange error about host being in both lists.

I delved into the Makefile, and found while the other settable NET_SNMP parameters were done as '?=' in the Makefile, the NET_SNMP_WITH_MODULE_LIST was done as '+='...with conditionals that '+=' the last two modules.

OSVERSION >= 700028 adds 'sctp-mib' and the port option MFD_REWRITES adds 'if-mib'....I had started looking at what the fix might be, but decided that all I needed to do was remove all these lines...since I'm going to have my own definition in my /etc/make.conf file.

Trying to exclude all of ucd-snmp wouldn't make things work....but I did an snmpwalk comparing bsnmpd and net-snmp, and decided that the two areas that were lacking were ucd-snmp/diskio and ucd-snmp/disk_hw. So, I recreated the 'original' NET_SNMP_WITH_MODULE_LIST in /etc/make.conf, without 'host' and 'ucd-snmp/diskio' and put 'ucd-snmp/disk_hw' in NET_SNMP_WITHOUT_MODULE_LIST. The build grumbled, but finished.

I that worked.....all my ucd/snmp host graphs were working on m new cacti server in the same detail that I was getting before (IE: the CPU Utilization gave traces for each of the 8 vCPUs...instead of just one.... I could see all the ZFS filesystems, not just the the single zroot.

So, I went back to looking at getting squid graphs to work....that didn't work.

  11:43:05 am, by The Dreamer   , 804 words  
Categories: Stuff, Software, Computer, VoIP, BOINC

Death to Skype! Bring me its head!

Link: http://jeff-duntemann.livejournal.com/281642.html

Years ago, I had installed Skype on various computers thinking that I'd jump into the whole using my computer for VoIP calling either as a supplement to doing VoIP using ATA's or successor. It revived buying webcams and/or headsets on my newer computers. I bought a webcam for my dad. But, he never used it.

And, I've never really used any of the many new webcams I keep buying either. But, recently I've met some friends on line that are into chatting online using Skype. I did do it once, and it was a strange experience actually talking to people....rather than the usual type into a window or web form chatting that I'm accustom to for online chat. But, they weren't into the webcam feature of Skype, so I don't know if I'll ever get to use that part for more than just as the microphone. (Which I might not do, since I bought a headset for one of my computers...)

Meanwhile, yesterday on Gumby....the computer I installed Skype and a(n HP) webcam onto, because ole TARDIS wasn't feeling up to getting new hardware (the plastic tab in some of its USB ports have broken off, though the kind of still work sometimes....not as bad as the old Gumby where the front USB ports didn't work at all anymore....And, I was using a PCCard USB adapter to help...)

Anyways, yesterday I recall seeing this EasyBits Games or something dialog come up, when I was checking some stuff on the machine. It had been up for over 16 (reached a max of 16.8 this time, TARDIS is currently at 35.52 days), but it seemed confused. A java app said it was out of memory, though there was plenty on the machine (over 1900MB free)...so I tweaked its JVM settings. Though later I discovered that PRTG had stopped running at some point. Someday I'll install Nagios, or something and replace that aspect of PRTG that I use. I had switched to running it, because bello-monitors-the.net ceased operation a few months back...and I was using it to monitor the performance and availability of this blog. At first PRTG seemed the solution to doing this...but its more annoying than just works, so I'll probably look at some other online service...

Anyways, suddenly with no explanation my screen went dark and the lights went out on Gumby...a moment later it was booting back up. Nothing in the event log to explain why it had gone down. I came back later to see the finished boot, and dismiss the various dialogs that come up during a boot and see why PRTG was complaining about something I was sure I had told it to stop monitoring (LHAVEN....I haven't finished that post yet). That's when I saw the huge gap in data, and it had apparently stopped working sometime yesterday morning (this unexplained reboot was around 5:27pm).

I vaguely recall seeing a strange dialog from Skype in reference to EasyBits Go games, seemed odd that Skype wanted me to install something like this. And, I have Skype in offline state on this computer...since I had upgraded to Zen...and Skype is online there. But, I dismissed it. I wasn't interested in it. But, later I saw that it had installed itself anyways. :##

Then a strange Flash Player install dialog, not the usual one from Adobe that comes up now and then. But, it seemed to do the normal Adobe Flash Player install...the except it was the kind that tried to install crapware with it, that you have to figure out the checkbox to uncheck to not get it before downloading it...while needing to check other boxes for it to proceed.

Guess I should've read it, because it was yet another part of the Crapware infestation that was coming to my computer via Skype/Microsoft.

Did a quick google search this morning, and found out how that its realy crapware, because the uninstall doesn't do anything...have to do more complicated stuff to remove it.


Found the steps to eradicate it at: Jeff Duntemann's ContraPositive Diary - EasyBits GO, Skype, and The Crapware Problem


Just firing off this quick blog post and then to reboot.... :cool:

Wonder if the Free AVG will catch it later, or if I should install something like Norton Internet Security (I seem to have gotten a 3-seat license, but I'm only using it on one computer at the moment....I do need to upgrade the version of Trend that is running on TARDIS, but getting NIS working and not working on Zen has been quite the hassle...so maybe I'm still looking, at least its SONAR isn't going after BOINC like NAV used to on TARDIS....wouldn't be so bad, if it didn't start doing that just after I extended it for another year.)

Wonder if I shouldn't just remove Skype while I'm at it.... :hmm:

