Friday, 12 November 2010

SBS2008 and the mysterious DNS drops

I've been asked to look into an install of SBS2008 which is occasionally failing to forward DNS requests from clients. Apparently there are two workarounds. 1) A registry change and 2) set up forwarders to DNS servers... My colleague couldn't remember the details...

Hmm.. Google: 945 results from EggHeadCafe alone for SBS2008 DNS. 118 in the past year. 22 in the past month. None in the last week... so it's a dying topic...?

Is XP losing its binding to the DNS Servers in DNS? Is SBS2008 failing to forward the request? There are some Mac OSX... is Mac OSX losing its binding to DNS?

Restarting DNS solves it... so DHCP is not a problem and the problem can't be on the client-side because restarting DNS on the server wouldn't fix it. All clients are affected.

Event Log... DNS.. the Event ID sequence 3,2,4 (DNS Shutdown/Start/Ready) have occurred frequently over the last few days, and every 2 or 3 days for at least a month back... That has to be a manual action, doesn't it?

Apparently a user is restarting the DNS service... and has been for some time... I've gone through the log and noted into Excel the date and time when the user has restarted the DNS Service... then convert that into a simple dot chart...


The clumping is occurring with 2 day intervals between the events... except the last clump which is occurring more frequently because of a 'fix' put in place specifically to correct this... it's a weird clumping... 2 days later the users experience the same fault but about 30 mins later in the day.  It recurs 2 days and 30 minutes after restarting DNS?  That would equate roughly to a TTL, somewhere, of 2900 (rather than 2880)...?

These events occur between 09:30 and 20:10 purely because no-one is around to restart the DNS outside of those times.

FRS (File Repl. Svc) reported space issues (almost daily) on C drive between 23rd Oct and 1st Nov... but there are about 30gb free on C drive now... Shadow Copies is using its maximum of 10gb...
(Note if space is an issue then DNS might be prevented from working properly...)


Event Source: NETLOGON      ID: 5781
Description:
Dynamic registration or deletion of one or more DNS records associated with DNS domain 'DomainDnsZones.mydomain.local.' failed.


Possible causes of failure include:  
- TCP/IP properties of network connections contain wrong IP addresses of the preferred and alternate DNS servers 
- Specified preferred and alternate DNS servers are not running 
- DNS server(s) primary for the records to be registered is not running 
- Preferred or alternate DNS servers are configured with wrong root hints 
- Parent DNS zone contains incorrect delegation to the child zone authoritative for the DNS records that failed registration  


There's one of these for each DnsZone... and these appear when the DNS service starts up on the SBS2008 server.  But they occur when the service starts so they are related but not a cause.  They occur simply because the NETLOGON service doesn't receive a 'Success' signal back from the DNS Server that owns the zones of the records that are being registered.  The zones in question are owned by this server?  So this server is not responding to itself?

I've just discovered a utility called DNSLint.exe from Microsoft via (I won't repeat it here):
http://www.computerperformance.co.uk/w2k3/services/DNS_dnslint.htm

Hmm.. DNSLint on 2008?  I have no DVD to hand...  DNS Tools:
http://technet.microsoft.com/en-us/library/dd197560(WS.10).aspx

ok.. back to error 5781 - it appears that this only occurs when the DNS service starts because the server's DNS on its own network connection is pointing towards itself... and since the DNS service hasn't yet begun, it can't register on that server... so I'm going to ignore that ... you can make the NetLogon service depend on the DNS service... http://support.microsoft.com/kb/259277 but it'll try and re-register 5 minutes later anyway.

After some searching around I've found a couple of root-server addresses in Root Hints that are using IP addresses for l.root-server.net. from 2 years ago... maybe the tech at Microsoft who put SBS 2008 together didn't know?  Resolve all the Root servers to check...

Also I switched off DNS from listening on IPv6 interfaces for DNS requests... nothing IPv6 on the network... if so I'll find out tomorrow.

I also found a registry entry in HKLM\Systems\CurrentControlSet\Services\DNS\Parameters... called MaxCacheTtl... the MS KB (http://support.microsoft.com/kb/968372) says it's MaxCacheTTL ... does it make a difference?  I could get ProcessSpy running to see if DNS Server on SBS2008 is making calls for MaxCacheTTL or Ttl... it's set to 172800 seconds = 2880 minutes... which is the timeout value I questionned earlier... (2900 minutes = 2 days and 20 minutes) ... what about the 20mins?  It's 2 days and 20 mins between restarts of DNS... so that's 2 days for the TTL and 20 minutes for the users to realise something is wrong and restart DNS again...

Is this TTL value killing the DNS?  Can it?  We'll find out tomorrow... or in 2 days time...  Then we'll find out if it's a problem with AVG or not.. :)  or perhaps it's the router...

No comments:

Post a comment