« Apparently my dd-wrt does loopback nowLast two weekends - nagios and more cfengine 2 & 3 »

The risk of high uptimes....

07/29/13

  09:06:00 pm, by The Dreamer   , 729 words  
Categories: Operating Systems, FreeBSD, CFEngine

The risk of high uptimes....

There are Unix servers at work that have uptimes in the >1000 days, there are even servers with updates in the >2000 days, in fact there are servers that have now exceeded 2500 days (I'm looking at one with 2562+ days.)

On one hand there are SAs that see this as a badge of honor or something to have had a system stay up this long. OTOH, its a system of great dread.

A while back this system was having problems....its Solaris and somebody had filled up /tmp....fortunately, I was able to clean things up and recover before another SA resorted to hard rebooting it.

The problem with these long running servers, especially in a ever changing, multi-admin shop, is that you can't be sure that the system will come back up correctly after a reboot.

We've lost a few systems at work due to a reboot. Some significant ones as simple as replacing a root disks under vxvm and forgetting to update the sun partition table, or a zpool upgrade and forgetting to reinstall the boot. To more significant ones, where a former SA had temporarily changed the purpose of an existing system all by command line and running out of /tmp...so that after its been up for 3+ years and he's been gone over a year....patching and rebooting makes it disappear.... the hardware that the system was supposed to be on needed repair, but he had never gotten around to it.

It'll be interesting to see what happens should the system ever get rebooted.

:?: So, what brought this post one?

Well, recently FreeBSD 9.1-RELEASE-p5 appeared. For some reason it showed up for zen on Sunday, but I couldn't get it to show up for cbox/dbox. After a few 'freebsd-update fetch's I gave up to wait to see it show up eventually through cron.

I had rebooted zen after installing the update, and everything seemed to be alright...though I didn't look too closely....

But, this morning it showed up for dbox/cbox. So, I decided that I would quickly reboot dbox before I left for work. (if memory serves it probably had about 21 days of uptime. cbox has just 14 days, because it had rebooted itself with no explanation one afternoon while I was at work....)

Three of my Ubuntu servers rebooted themselves automatically this morning as well due to unattended-upgrades. The 4th isn't allowed to automatically reboot.... (I see my work Ubuntu server has also rebooted itself...the Windows 7 VM it hosts is also enabled for self updating/rebooting)

Well, the reboot of dbox didn't go as well. The automatic restart of irssi didn't work, never did test what I had put in. Suspect its one of those things where the environment when the command string is called by rc is different than when I run it by hand. May require some more reboots in the future to get fully resolved. But, there were also other problems....namely two copies of bitlbee were running, and the logs were angry it was running other things twice and them colliding.

Suddenly its obvious why.... Unlike other Unix/Linux systems with rc level directories...FreeBSD doesn't. So, it'll run everything it finds in its rc.d directories....meaning its also running the files in there that have '.cfsaved' or '.cf-before-edit' appended to them.

It just happened that the things that doubled on zen, were things that don't allow multiple execution or things that don't impact my user experience....

At work this only causes a mild annoyance when watching a system boot with all the timestamp backups when files in /var/spool/cron/crontabs are edited.

The obvious quick fix is for these files don't create backups. But, backups are good. Especially when something goes wrong or you're testing an edit promise. Just put them somewhere else. There's a global parameter to save these in a repository, but in places where the backups don't get in the way....its my preference that they be with the files.

Eventually, I track down what I need to do, an update a bunch of cf files....should be good now. Currently the only thing that the change has done is create the /var/cfengine/repository directory as promised.

I suppose its probably shown up for my FreeBSD systems at work (at least those that call freebsd-update cron) Wonder which ones will get it and which ones won't (where not interrupting the users is more important than keeping up to date...)

Now to go reboot cbox....

1 comment

Comment from: The Dreamer [Member]  

Well, cbox rebooted fine, except that I forgot to run ‘freebsd-update install’ before doing it.

Meanwhile, rebooting my FreeBSD workstation at work ‘mew’ didn’t go as planned. I rebooted it before going for lunch, though the hardware seems to intermittently not want to reboot…and this was one of those times.

Did reveal that my new nagios is still busted in one area….it still won’t send sms notifications. Maybe I’ll try writing a command object for it from scratch….

07/31/13 @ 09:13
Now instead of subjecting some poor random forum to a long rambling thought, I will try to consolidate those things into this blog where they can be more easily ignored profess to be collected thoughts from my mind.

Latest Poopli Updaters -- http://lkc.me/poop

bloglovin

There are 20 years 5 months 27 days 1 hour 21 minutes and 37 seconds until the end of time.
And, it has been 4 years 7 months 1 day 12 hours 41 minutes and 19 seconds since The Doctor saved us all from the end of the World!

Search

July 2017
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
Google

Linkblog

  XML Feeds

Who's Online?

  • Guest Users: 35
This seal is issued to lawrencechen.net by StopTheHacker Inc.
multi-blog

hosted by
Green Web Hosting! This site hosted by DreamHost.

monitored by
Monitored by eXternalTest
SiteUptime Web Site Monitoring Service
website uptime