Friday 26 March 2010

iPhone and Exchange ActiveSync: not solved - reoccurred

So I got a call that the iPhone stopped getting emails... I didn't check till they went home in case I needed to restart their server out-of-hours ...

When I did connect I decided to run MS Exchange Best Practices Analyser... (search on ExBPA).

I knew one of the Best Practices was having an Application Log size in Event Viewer of 40Mb - so I decided to alter that myself first whilst I was waiting... I must have dozed off at that point ... suddenly I was disconnected... RealVNC was suggesting I was connected - it wasn't asking for my username and password - but my connection was getting refused by the server - 'Read/state - Connection disconnected by peer (10054)'

Somewhere I read that 10054 'read/state' might mean that the Application log size was too small and so the server could not accept connections... that was close? Did I type 40kb? I couldn't connect via VNC... but I couldn't connect either via VPN... could that cause that too?

I had to wait until the morning and catch the first person into the office... when I called they said they couldn't log in either... they could log into the server terminal... a quick look at the System log revealed srv errors related to non-paged pool memory.... I got them to reboot so we could all log in ...

When non-paged memory has only 20Mb left then Windows shuts all connections down... IIS6, HTTP.SYS, users are logged out, VPN connections shut down, RealVNC connections... and so on... it does that to prevent resources becoming so low that the system crashes... you are forced to log into the server and sort it out.

David Wang's HowTo post helped point me to Poolmon.exe - a tool that monitors which components are using Paged and Non-Paged memory... Poolmon.exe is one of the Windows Support Tools and can be downloaded from Microsoft.

Once installed, open a cmd window and type: poolmon -b to list by bytes - which is the column to watch...

Because my client's server has just booted this will just tell me initial values... so I am saving to a file: poolmon -b -n datetime.txt. Then I'll import the file into Excel and time stamp each row. I'll run poolmon every now and again and see which values are changing and which aren't.

So far I've got several culprits:
  • Irp - - Io, IRP packets
  • File - - File objects
  • AvgU - an AVG component
  • Ntfr - ntfs.sys - ERESOURCE (not to be confused with NtfR)
  • MmCa - nt!mm - Mm control areas for mapped files
the largest riser is AvgU, followed by File. That's not surprising considering everyone had been logging in and were now accessing their files...

An hour later:
  • FMsl - - fltmgr.sys - STREAM_LIST_CTRL structure
  • File - - File objects
  • AvgA - an AVG component
  • AvgU - an AVG component
  • Ntfr - ntfs.sys - ERESOURCE
  • MmCa - nt!mm - Mm control areas for mapped files
My bet is that AVG is going to be the culprit... I will stop working on this now. Later I will take one more reading, but I may just remove AVG and install the correct server version - this version looks different to the other servers that I get to look at.

No comments:

Post a Comment