Today this message appeared, and I knew that I needed to find a socket with a QLIM smaller than QLEN=8, but couldn't remember what the formula was.
But, the topic had come up on the bind-users list back on November 14th, 2013, where the messages was about '16 already in queue'.
Where for months before this I had been getting messages for '10 already in queue', and the only tcp socket I found that might be a problem The only thing with a QLIM of 10 was the submission port on sendmail, which didn't make sense...and bumping it up didn't help.
And, searching my system for the pcb was a bust (using
lsof ‑i ‑Tfs | grep LISTEN or
Reducing end digits until I got matches, resulted in matches that didn't seem to fit.
So, I tried to ignore it....
When it popped up on the bind-users list. The discussion went to that the tcp-listen-queue default is 10. But, it didn't seem to apply in my case, until later when I did see some messages for "5 already in queue", because the base bind in FreeBSD 9.2 is 9.8.4-P2 where the default tcp-listen-queue is 3. It was changed to 10 in bind-9.9.
Anyways, when the thread came up on bind-users list, I decided that I needed to really dig for the answer. Searching through the kernel source, I eventually found my answer.
The message is reporting when QLEN > 3 * (QLIM / 2)
Aha...QLEN = 10 => QLIM = 6....which was my Socks5 proxy server (
Couldn't figure out how to change the listen queue in it through its configuration file, so I stopped using it. And, the messages stopped. I had filled out the proxy settings in
squid for http & https and
ss5 for Socks5....and evidently some update around the same time as when I upgraded to FreeBSD 9.2 (or perhaps FreeBSD 9.2 made the message show up for
dmesg?). Switching to using
squid for all protocols fixed it.
Meanwhile...while I was looking for that old message, which I had posted back on November 20th, 2013. I stumbled upon some older threads on freebsd-stable.
I was searching on home computer, where I'm subscribed to the list, while my work email isn't subscribed to the list... and all my old freebsd list emails have since been purged. Still trying to get my email back under control after switching providers...both personally and at work. Plan to let an old personal domain expire once the migration is fully done, but its going so slowly that I let it auto-renew last year...and perhaps forgetting to change to the default 2 year auto renew to 1 year was intentional? New expiration date is November 20th, 2015. It was an early domain that I had registered, before I knew that '-'s in domains are considered bad. There were a number of different blogs that I would try to leave comments at, and the comments would claim to go to moderation but actually get discarded. The owner of one site eventually responded saying the system automatically does that to domains with '-'s in them, since most of them are spam. But, he'll whitelist my domain for the future. (IIRC, it was about a different antispam patch he had written for our blogging platform, functionality that never made it into newer releases and hadn't gotten updated. Wishing something like it was back again.)
That made me wonder if another site, running under my employer's domain...with a '-' in it, was rejecting my comments under my work email account, because it has a '-' in it. Switching to the form without the '-', and the comments would appear. I suggested to the site owner that he should remove that filter or at least whitelist our employer's domain.
The threads were older, and associated with upgrading to FreeBSD 9.2....first thread was started on August 1st, 2013. Was for "8 already in queue", and later indicated that the system was for backups and did outgoing rsync's and also did NFS and Samba. The discussion talked of strangeness of only having a queue limit that small, and that the default limit (128) is like 20 times that. The last reply to the thread was October 7th, 2013. Another thread started on September 30th, 2013 for "193 already in queue", with the last reply on November 12th, 2013.
The main hanging point again was that the pcb couldn't be found...and the suspicion is that its how daemons fork processes to listen to sockets and/or to handle requests, plus that they might create all these things and then use fork to detach to run in the background. The last thread was about using dtrace to maybe see if the process could be found that way.
I've been meaning to play around with that, but when I had last tried...found that its a module, and
kldload dtrace wasn't the right way to load it.... its
kldload dtraceall Guess I've rebooted since then, so it should be right (and done automatically in
/boot/loader.conf.) Guess when I have time....
So, I wonder if I should reply to one or both of the threads....but first, its been a while since I blogged....so here I am.
As for today's message?
QLEN = 8 => QLIM = 5
At first I looked for the full address:
trimming, I eventually got:
nrpe? Hmmm, did that one new disk check push me over?
What else is 5?
10143, imapproxyd - wasn't accessing roundcube
9032, there shouldn't be anything accessing pyTiVo
2049, NFS hmmm....well, my MacBook Air might be doing a PowerNap and doing its TimeMachine backup to the NFS share on my FreeBSD server.
873, rsyncd - BackupPC is constrained against running more than 3 jobs at once, and at most 3 against this server (I break up my [bigger] systems so its not all backed up at once, using lockfile in DumpPreUserCmd, though I have exceptions on this server so that certain rsync shares aren't blocked if a really long backup is running (recently had an incremental take 1 day and 11 hours - at least on my FreeBSD/ZFS system I have a comamnd in DumpPreShareCmd to take a snapshot.... a couple of weeks earlier, I had an incremental take 1 day and 15.5 hours.
Tweaked some sysctl's, and deleted some old snapshots seems to have sped things back up.
So some of the messages convert to:
QLEN => QLIM ==== ==== 193 128 16 10 10 6 8 5 5 3
OTOH, "8 already in queue" is what the first thread in August had, and he had added about being a backup server that does output rsync and had also mentioned NFS (and Samba).
Additionally, in the output looking for QLIM == 5, were these lines
When I was previously looking for QLIM == 6, there were only the two tcp sockets, so it was only 50-50 on picking the culprit, and since the other was minidlna which I haven't done more than build/install it so far. It was really only the one socket to explain it, and it did clear up immediately once I stopped using it.
As for NRPE, there doesn't seem to be a way to change it easily....so I'll just see if the problem continues to happen, before investigating other solutions.
So, the announcement of FreeBSD 9.2 came out on Monday [September 30th], which I missed because I was focused on my UNMC thing. But, once it appeared, I knew that I was going to want to upgrade to it sooner than later.
From its highlights, the main items that caught my attention were:
But, I did start this upgrade on October 4th....where for an unknown reason, I launched the
freebsd-update process on cbox, the busier of the two headless servers. I suspect I went with doing the upgrade on my headless servers, because they are entirely running on SSD and would likely see the benefit of lz4 compression. And, perhaps I did cbox, because it was the system that could most gain from lz4.
It took a couple iterations through
freebsd-update, before I got an upgrade scenario that could proceed. And, it took a long time given the high load that is cbox.
That is cbox is an Atom D2700 (2.13GHz, dual core) processor. And, cacti (especially with the inefficient, processor/memory intensive percona monitoring scripts -- might help if only scrpt server support worked, and wasn't just a left over from what it was based on.) being the main source of load. That is usually in the 11.xx area, except during certain other events (like, since 3.5, when
cf-agent fires...cbox is set to run at a lower frequency than my other systems.) or when the majority of logs get rotated and bzip'd. And, there's also some impact when zen connects to
rsyncd each day for
backuppc. But, these spikes weren't that significant. Though the high load would cause
cf-agent runs to take orders of magnitude longer than other systems, including its 'twin' dbox.
Also ran into a problem (again?) where a lot of the differences that
freebsd-update needed resolved were differences in revision tags....some as silly as '9.2' vs '9.1', others had new time stamps or usernames, but seldom any changes to the contents of the file. Which I then discovered a problem from having some of these files under
cfengine would revert these files back to having '9.1' revision strings, which confused the
freebsd-update. I ended up updating all the files in
cfengine to have the 9.2 versioning, though I thought about just removing/replacing it with something else entirely, though wasn't sure the impact that would have on current/future
Though it did seem to cause problem with the other two upgrades, where it would say that some of these files were now removed and asked if I wanted to remove these. Which doesn't make sense, since it didn't say that with the first upgrade. It was probably just angry that these files already claimed to be from FreeBSD 9.2.
It also didn't like that I use
sendmail, therefore my sendmail configs are specific to my configuration, or that I use
printercap is the one auto-generated by cups, etc.
But, once it got to where it would let me run my first "
freebsd-update install". I ran it, rebooted, ran it again, rebooted, updated stuff (though it didn't complain as much, perhaps because some of the troublesome kernel mod ports had corrected the problem of installing into
/boot/kernel, or perhaps enough stayed the same between 9.1 and 9.2, that things didn't freak out like before. And, this includes the virtualbox kernel mod, when I did the upgrade on zen, and later mew. But, I re-installed these ports and lsof. I did a quick check of other services, and then upgraded the 'zroot' zpool to have feature flags (which now means it no longer has a version, apparently instead of jumping the numbers to distinguish from Sun/Oracle it has eliminated having version numbers (for beyond 28) and having flags for the features added since. Wonder if the flags capture all has changed since 28, since I thought there have been other improvements internal that aren't described by version numbers. Namely, I seem to recall that there have been improvements in recoverability....namely it had been suggested, when I was trying to recover a corrupt 'zroot' on
mew, to try finding a v5000 ZFS live CD. Which I don't think I ever found, and gave up anyways when I concluded the level of corruption was too great for any hope of recovery and that I needed to resort to a netbackup restore, before the last successful full get's expired. Though being that it was nearly 90 days old, the other two month fulls didn't exist due to system instability that eventually caused the corrupted zpool (eventually found to be a known bad revision of the Cougar Point chipset and a bad DIMM...things seem to finally be stable from using a SiI3132 SATA controller instead of the on board, and getting that bad DIMM replaced....was weird that it was a Dell Optiplex 990, purchased new over a year after the problem had been identified and a newer revision of the chipset was released. I did eventually convince Dell support to send me a new motherboard and replace the DIMM. The latter was good, since I had to use DIMMs from another Dell that had been upgraded, so I had less memory for a while. But, while at first I did use the onboard SATA again, eventually I started having problems that would result in losing a disk from the mirrored zpool, to eventually causing a reboot where they would both be present again [though gmirror would need manual intervention]....and moving back to the SiI3132 has finally gotten things stable again. Though the harddrives in mew are SATA-III, so it would've been desirable to have stayed on the SATA-III onboard ports, where it was these ports that were the main source of problems in the prior defective version. Perhaps the fact that the prior version had a heatsink and the new version didn't, wasn't because they didn't need it to try to compensate for the problems caused by over-driving the silicon for the SATA-III portion. But, an oversight with the newer revision motherboard. The problem did tend to occur in the early morning hours on the weekend, when not only is there a lot of daily disk activity, but there is also a lot of weekly disk activity, etc. Oh well.)
So, after upgrading the zpool, and reinstalling the boot block/code. I then rebooted the system again. I had already identified the zfs filesystems where I had 'compression=on', so had written a script to change all these to 'compression=lz4'. Which I now ran.
And, then I turned my attention to doing dbox.
Pages: 1· 2
Latest Poopli Updaters -- http://lkc.me/poop
|<< <||> >>|
«air purifier» «tivo hd» «sans digital» raid cpap tardis cox «chicago tardis» progressive «windows 7» amazon.com «doctor who» cfengine3 «windows xp» quicken voip ups 10.04lts dvd «amazon prime» twitter netflix appletv tv usb boinc linux ebay «hd movie» «powersource 400» mdadm zen orac replaytv box raid1 dsl prescription tivo «watch instantly» woot upgrade «instant streaming» backuppc virtualbox eyeglasses b2evolution freebsd lhaven ubuntu