« The risk of high uptimes....Moving irssi »

Last two weekends - nagios and more cfengine 2 & 3


  01:14:00 pm, by The Dreamer   , 3144 words  
Categories: Software, Operating Systems, Ubuntu, FreeBSD, CFEngine

Last two weekends - nagios and more cfengine 2 & 3

So, what started as take a week to set up a new nagios server at work ended up taking almost a month...because there were many days where I'd only have an hour or less to put some time into the side task. The other stumbling block was I had decided that the new nagios server configuration files would get managed under subversion, instead of RCS as it had been done in the previous two incarnations. New SA's don't seem to understand RCS and that the file is read-only for a reason...and its not to make them use :w! ... which lately has resulted in a the sudden reappearance of monitors of systems that had been shutdown long ago.

Though now that I think of it, there used to be the documented procedure for editing zone files (back when it was done directly on the master nameserver and version controlled by RCS.) Which as I recall was to perform an rcsdiff, and then use the appropriate workflow to edit the zone file.

% rcsdiff zonefile

if differences

      % rcs -l zonefile
      % ci -l zonefile
        make rude comment that somebody made edits
      % vi zonefile
      % ci -u zonefile


      % co -l zonefile
      % vi zonefile
      % ci -u zonefile


But, when I took over managing DNS servers, I switched to having cfengine manage them and the zone files now live under masterfiles, so version control is now done using subversion. Had started butchering the DNS section in the wiki, probably should see about writing something up on all the not so simple things I've done to DNS since taking it over...like split, stealth, sed processing of master zone for different views, DNSSEC, the incomplete work to allow outside secondary to take over as master should we ever get a DR site, and other gotchas, like consistent naming of slave zone files now that they are binary.

Additionally work on the nagios at work was hampered by the fact that for Solaris and legacy provisioning is CF2, and the new chef based provisioning is still a work in progress...where I haven't had time to get into any of it yet. So, I had to recreate my CF3 promises for nagios in CF2.

But Friday before last weekend it finally reached the point where it was ready to go live. Though I've been rolling in other wishlist items and smashing bugs in its configuration, and still need to decide what the actual procedure will be for delegating sections of nagios to other groups.

One of the things I had done with new nagios at work, was set up PNP4Nagios...as I had done at home. And, while looking to see if I needed to apply performance tweaks to the work nagios, all the pointers were to have mrtg or cacti collect and plot data from nagiostats. Well, a new work cacti is probably not going to happen anytime soon, and the old cacti(s) are struggling to monitor what they have now (I spent some time a while back trying to tune one them...but its probably partly being hampered by the fact that its mysql can use double the memory that is allocated to the VM. though reducing it from running 2 spine's of 200 threads each...on the 2 CPU VM to a single spine with fewer threads has helped. Something like the boost plugin would probably help in this case, but the version of cacti is pre-PIA. But, it could be a long time before it get's replaced (not sure if upgrade is possible....) Our old cacti is running on a Dell poweredge server that has been out of service over 6 years... with the cacti instance over 8 years old (Jul 8, 2005)....and the OS is RHEL3.

Anyways, it occurs to me that there should be a way to get PNP4Nagios to generate the graphs, and I search around and find check_nagiostats. Though no template for it. Oh, there's a template nagiostats.php, if I create a link for check_nagiostats.php it should get me 'better' graphs. Which is what I have CF2 do at work.

Now last weekend's project was to finally start bring my Ubuntu servers under CF3 control, because they run unattended-upgrades set to automatically do more than just security updates. And, for the second time since my newer nagios setup, nagios-plugins has been updated on ubuntu. Where it removes the setuid bit that nagios needs to do SMART checks of its drives. Where the obvious solution would be to have CF3 promise that the setuid bit is set on that file.

So, I start the weekend off by doing more tweaks to nagios at work (from home)...so I decide the first thing I should do is bootstrap the ubuntu server I want into my cfengine3 system. Except that I've never been able to get that to work, the online guide that I had followed for my initial configuration involved manually copying keys and failsafe.cf/update.cf between hosts. (at work the equivalent files have been rolled into our install package, and does some key setup work in its post-install...though we then have to edit the config files because the name of our policyhost had changed since package creation, and nobody has thought to reroll the packages during in the last 3 years...there's probably a blocker in the update build box that prevents the extreme efforts that the admin had gone through to make completely static executable on Solaris, despite warnings that its not kosher. Violates ABI...though so far we've been lucky.)

I decide to find out why bootstrapping isn't working...so I spend some time reading searching for the errors and trying to figure out what its complaining about. The error is generic, so finding the problem is hard. Eventually, I find its the ACLs in cf-serverd.cf that is the problem. I had created this file based on the example provided on the site.... so it starts with this:

body server control {
        allowallconnects        => { "", "" };
        allowconnects           => { "", "" };
        trustkeysfrom           => { "", "" };

When the correct form is:

body server control {
        allowallconnects        => { "", "192.168.0.*" };
        allowconnects           => { "", "192.168.0.*" };
        trustkeysfrom           => { "", "192.168.0.*" };

Since the strings are regex's....the '.'s should probably be escaped but the regex is specific enough that nothing else should match. Though I don't use as my home network, but what I do use is also specific enough. Conceivably somebody else could use a subnet where a regex like this would be too permissive.

Of course, further down in 'bundle server access_rules', the use of "" is correct. Which now that I think of it, is the same kind of thing I had to do when allow new vlans to CF2. Where the regex often allows more than the access rule. Though the firewall (ipf & fwsm) matches the access rule. Don't have a firewall on my policyhost at home, well I do...buts its only to do policy-based routing.

What I then find is bootstrap seems to have different ideas on what it should do, that differ from how I'm doing things. In that for some reason it copies over my entire masterfiles directory on the policy host into its inputs directory, along with my inputs. I thought my inputs would purge when they take over later, but evidently they don't...so I clean up inputs by hand. And, then realize I should probably have gone through everything to make sure that promises that weren't ready to apply on ubuntu don't get done.

Many of the promises are controlled by classes generated from hostnames, though I had written some promises without such a limiter because they would apply to all hosts. Examples are my ssh, cups, ntp promises.

The ntp promise already supported ubuntu, while the other two didn't. The cups promise didn't cause any errors, because I didn't notice until later that it wasn't working. I had foolishly done this:



        "cups"  expression => fileexists("$(cupsd)");


I didn't spot the problem until I yesterday....

But, first ssh is broken, because none of the ssh2 portion had been implemented for ubuntu. The non-ssh2 part worked though, though remote ssh into the box is lost now.

The weekend before, after realizing that I try to run separate sshd's on systems that need outside ssh access, where the outside facing sshd has at least "PermitRoot no" set, that I should be doing the same with my servers. So, I had updated the config files to do this on all servers, but hadn't implemented the create second sshd for ubuntu. So, I quickly figure out how to do that....after first figuring out how to create a new /etc/init.d/ssh2 and then a new /etc/init/ssh2.conf. Which was pretty easy to do having done similar on FreeBSD already.

With my ubuntu server settled down again....its only the promises that apply to any host right now....oh, which also includes my nrpe promise, which is where I promise that certain check_* files have the setuid bit set (a different slist for ubuntu:: than freebsd::)

I turn my attention back to also doing check_nagiostats and having PHP4Nagios graph things, even though I'm already having cacti graph things (using templates I had found, and then extended).

Pages: 1· · 3

No feedback yet

Now instead of subjecting some poor random forum to a long rambling thought, I will try to consolidate those things into this blog where they can be more easily ignored profess to be collected thoughts from my mind.

Latest Poopli Updaters -- http://lkc.me/poop


There are 16 years 8 months 8 days 52 minutes and 13 seconds until the end of time.
And, it has been 8 years 4 months 20 days 13 hours 10 minutes and 43 seconds since The Doctor saved us all from the end of the World!


May 2021
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30