Wednesday, November 26, 2014

Outlook Email Registration Fails, Certificate root error

Good day All,

Our Citrix team recently installed Outlook on a Citrix Server and when they started configuring the Outlook client , the Client E-mail registration was failing with the below error ..



Well they tried the best and it was brought to my attention, seeing the error i said its Certificate root issue, they said to me yes we know but its from Verizon root certificate and it should take automatically..

I said hold on, let me show what i was referring.. so i browsed the website on my laptop and clicked on the certificate icon , so i opened and went to Certification path (highlighted in yellow) asked can you check on the Server if both Root and Intermediate Certificate is available in the Certificate.MMC console.


After some search they confirmed back to me that they don't see the Certificates, so i exported them from my laptop and applied on the Server.. then i asked them to test it..

While testing, smile on the face understood that it worked.. and then they came back to me and asked why this is Internet facing Certificate authority we never have to install certificate they said..

So i asked them one question, did you guys check if that Server has access to internet because all our Servers are locked down, well the obvious answer was No. Still they had question saying root certificate should all be Pre-installed on the Server..
I said very true but  as the Server is Windows 2003 OS , the Certificate root looks like changed at some point and root certificates was never updated, finally they got convinced.

On the side note please note that Outlook 2010 and above when Adding client , will use the Exchange web client URL to authenticate..

Hope this helps some one !!!

Tuesday, November 25, 2014

SSL Certificate Private Key Generation

Good day All,

Team was working on a requirement where they have been asked to setup a SSL certificate for a website.. As this Website has to pass through ISA load balancer and we needed the same Web server certificate to be exported along with Private Key to be imported in ISA Server..

We have so many articles out there on how to request a certificate so i wouldn't go over that, so we received the certificate and when double clicked we saw that there was no Private Key attached..



Usually when we send a cer file to Certificate Authority(CA) we usually put a comment saying please send the certificate back with Private key enabled.. but that depends on the CA and we do get certificates back with our Private Key.

So how do we enable Private key there are lot of articles out there saying we can use "Serial Number" properties but that never worked us.. so we always used Thumbprint

 Say Ctrl+C to copy and execute the following command in a CMD.

Certutil –repairstore my  “0e a9 88 d4 6d 04 38 fd dd 38 39 e0 2a d5 1a da 62 dd a1 39”






Now we see Private Key enabled... So couple of takeaways and points to remember

      1. Always run the Certutil cmd on the Server where we had CSR generated,                                                if not i have seen command unsuccessful error.  
  2 Always make sure that before you run the above command the Certificate is all healthy            with both root and intermediate certificates already installed on the Server.


Hope this helps some one!!!

Monday, November 24, 2014

ERROR OPENING PERFMON.EXE, LOT OF ALERTS GENERATING......

Good day All,

We had a wearied issue lately on couple of Servers and we had lot of Alerts started to generate on our monitoring .

When did a deep dive we found that when we open Perform we started to get alerts as below saying some counters unable to load.. so we did some search and found that we have to load and re-load counters in order to fix it..

CMD # cd\windows\system32 , type lodctr /R




Well it did work for some servers and it didn't fix for others.. so after some search we stumbled across cool little tool which helped us, please find below link to download and more information about the tool..

http://blogs.technet.com/b/askperf/archive/2010/03/05/two-minute-drill-disabled-performance-counters-and-exctrlst-exe.aspx

Hope this helps some one....




Bug Check 0x50, WINDOWS 2008 X86 Server

Good day All,

Welcome back, Today will share with all some issue we encountered on a Windows 2008 x86 Version of a Server,

For some client requirement we had to build a Windows 2008 x86 Server, it was stable for a while till recently every other week it would do a unexpected reboot..As the Server was having 36 GB of memory we only had mini dump enabled.. and Mini dump was pointing to Bug Check 50.. When we check in Windbg about bug check this is what it showed..

Resolution

Resolving a faulty hardware problem: If hardware has been added to the system recently, remove it to see if the error recurs. If existing hardware has failed, remove or replace the faulty component. You should run hardware diagnostics supplied by the system manufacturer. For details on these procedures, see the owner's manual for your computer.
Resolving a faulty system service problem: Disable the service and confirm that this resolves the error. If so, contact the manufacturer of the system service about a possible update. If the error occurs during system startup, restart your computer, and press F8 at the character-mode menu that displays the operating system choices. At the resulting Windows Advanced Options menu, choose the Last Known Good Configuration option. This option is most effective when only one driver or service is added at a time.
Resolving an antivirus software problem: Disable the program and confirm that this resolves the error. If it does, contact the manufacturer of the program about a possible update.
Resolving a corrupted NTFS volume problem: Run Chkdsk /f /r to detect and repair disk errors. You must restart the system before the disk scan begins on a system partition. If the hard disk is SCSI, check for problems between the SCSI controller and the disk.

Finally, check the System Log in Event Viewer for additional error messages that might help pinpoint the device or driver that is causing the error. Disabling memory caching of the BIOS might also resolve it.

Well the first thing we did was updating the Server with Latest HP Support Pack 2014, that didn't help and Server had unexpected reboot again in weeks time..So next thing we check was any antivirus problem with antivirus team and it was clean .. later even checked chkdsk for any disk errors but nothing we could find.. the server was  rebooting every week..and the issue was getting heated up..

So we circled back raised a ticket with hardware vendor .. the only thing they found was memory modules was not in a order so they asked us to try putting in a order .. well we requested downtime and then tried it too.. well this time after 10 days we had the same issue..

Now the issue was getting more attention even though it was internal server we had this issue close to 2 months now..
After all the options the only options we had left was enable Full memory dump , so move files around to accommodate at-least 50 GB  of dump because as i said Server is stacked with 36 GB Memory. We started to wait for a couple of days and then we could capture Memory dump and it was close to 46 GB...

So we started to analyze using Windbg and this is what the analyze -v showed for stack

STACK_TEXT: 
9f9f7964 81c67de4 00000000 e3d64e18 00000000 nt!MmAccessFault+0x10b
9f9f7964 81d96782 00000000 e3d64e18 00000000 nt!KiTrap0E+0xdc
9f9f7a40 81d96258 f307da20 00000000 e3d64024 nt!CmpCheckValueList+0x83
9f9f7a8c 81d9c81a 01000001 009c4020 009c3f70 nt!CmpCheckKey+0x5b4
9f9f7abc 81d9ce48 f307da20 01000001 00000006 nt!CmpCheckRegistry2+0x8c
9f9f7b04 81d9786e 01000001 9f9f7c60 80005a74 nt!CmCheckRegistry+0xf5
9f9f7b60 81d99fdd 9f9f7bb4 00000005 00000000 nt!CmpInitializeHive+0x4c1
9f9f7bd8 81d9c27d 9f9f7c60 00000000 9f9f7c4c nt!CmpInitHiveFromFile+0x19e
9f9f7c18 81d924c5 9f9f7c60 00000000 9f9f7c7b nt!CmpCmdHiveOpen+0x36
9f9f7d14 81d926fa 00000002 81d125a0 00000002 nt!CmpFlushBackupHive+0x2fd
9f9f7d38 81e71cbd 81d1c13c 967612d8 81cbfd4a nt!CmpSyncBackupHives+0x90
9f9f7d44 81cbfd4a 00000000 00000000 967612d8 nt!CmpPeriodicBackupFlushWorker+0x32
9f9f7d7c 81df001c 00000000 c8084d5c 00000000 nt!ExpWorkerThread+0xfd
9f9f7dc0 81c58eee 81cbfc4d 00000001 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16


We still seeing the Registry Hive error so we started to search in the Microsoft site for any Registry Hive and Bug check 50, so we came across this article ..
http://support.microsoft.com/kb/2709236/en-us

The article resembled close to our issues, keeping fingers crossed we applied the hotfix and guess what that turned out to be the fix...

Hope this helps someone out there....


Friday, November 21, 2014

WINDOWS 2003 ,Insufficient System Resource error while taking TSM Backup

Good day All,


We have Windows 2003 which has close to like 1.5 TB of 1 disk and like 600 GB of couple of disk was converted to a Virtual Machine and after that we started to see TSM backup getting failure and Server would just hang with saying Insufficient System Resources and becomes unresponsive..

We have seen this error in past TSM when taking backup consumes all the Page Pool Memory and backups will fail.. so we basically follow the steps in the article, tweak the registry settings and maximize the Paged and NON-Paged pool..
http://support.microsoft.com/kb/304101

After tweaking Backup went fine for couple of weeks and we started to see the same error.. it go so frustrating that we started to have the issue every other day and lot of backup failures started to be reported..

We escalated this to client saying the Backup disk are too big and we should start to migrate the data to either a new Windows 2008 or 2012 OS .

At this stage i got involved and first question i had was any Backup Failure before this Server was refreshed to Virtual Machines and the answer we got was No.

So i was not really buying it because when it was Physical it was working fine with no backup Failure what migrating to Virtual Machine it was not.. So i started to big more.. enabled Poolmon and started to observe the trend.. After checking for some time i saw that the Paged pool was not growing beyond 230 MB and non-paged pool no beyond 100 MB, i said what we had tweaked the registry but still it not set growing.. So we started to recheck the registry setting but that looked all ok...


Well i opened by Windbg, used the kernel debug mode connected to the Server and when i run
 !VM command it clearly showed the Max values of Paged and Non-paged and that was too low even after registry changes..

Further digging i stumble across this article on /3 GB switch which clearly says that only half of Paged /Non-Paged pool will be used..

http://blogs.technet.com/b/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx

Guess what when i checked my boot.ini file yes we had that /3GB switch.. so i felt having /3GB switch for a 4 GB memory Server with having IIS Application didn't really made sense to me so i went ahead and removed the /3GB Switch...
After that close to like 4 months now not a single backup failure..  In-fact the article which we used to tweak registry settings to maximize the Paged/Non-Paged Pool clearly asked to check if the boot.ini had that 3GB switch..

Well on the happy note issue got resolved but client decided to postpone the new OS build for now...


Windows Server 2012 accessing Windows 2000 Server, ERROR: The account is not authorized to log in from this station

Good day All,

We recently build a Windows 2012 Server and as part of the Application team requirement they wanted to access Printers on Windows 2000 Server, so when tried to browse and type in \\Server Name we keep getting error as " The account is not authorized to log in from this station" .

So i hopped on to a Windows 2008 Server and when i tried to browse the Server i see i was getting the same error...

As usual browsed across internet and there was couple of articles that matched the error

http://support.microsoft.com/kb/982734

http://support.microsoft.com/kb/281648

After some lab setting up and trying all the combinations we figured it out that only changing the below Settings worked..

Administrative Tools ---> Local Security polices, Security options and change the status to disabled for Microsoft network client: Digitally sign communications (always) and Rebooted.


Hope this help someone out there..



Monday, November 17, 2014

HP PROLIANT GEN8 AGENTLESS MANAGEMENT FLOODS ON ESXi 5.1 u2, we Proved VMWARE wrong!!!

Good day! All,

I know we are away behind on our ESXi upgrade still around 5.1 u2. We recently added new G8 HP Blades to our ESXi infrastructure and started with ESXi upgrade from 5.0 to 5.1 U2.
As all these are HP Blades we downloaded a custom ESXi image for HP and re-mediated all the blades for firmware using Update Manager.

We did all this like 3 months ago, as this was small site with about 60 VM's on 4 ESXi blades we never had any performance issue and for some reason we never identified that this site had DRS issue and VM's are not migrating until one day we started to do some VMTools upgrade on the Virtual Machines.

Started to browse on the internet for the error and we found across this article  which kind of closely matched to our issue.. So we logged a case, VMware was involved and after going through the logs they confirmed the same..

Now we identified the issue so getting down time was big challenge and VMWARE clearly stated that we need to shutdown all the VM's , upgrade AMS or disabled AMS on the ESXi host to fix the DRS issue.

So getting downtime all business was a daunting task so we checked all the VM's and picked a ESXi host with least number of Virtual Machines running on it... So we took the risk asked the business to give down time for 6 Servers and keeping fingers crossed started to work on the issue.

After Uninstalling and re-installing the updated AMS component to our Surprise guess what VMmotion started to work on that ESXi host.. and this just proved VMWARE wrong :)

I know writing this article would be too late may be someone would have tried something like this or most of them have already upgrade to latest ESXi upgrades.. but still if someone out there who still on 5.1 U2 this article would kind of help them in planning properly and not go shutting down all the VM's.

After the upgrade we started to move VM's across to this ESXi host and started to do the same on other ESXi host one by one and end of the day we achieved this will very minimal downtime.

So this just proves that we should always have a development environment but let me guess ESXi  development environment will business spend , leave it for your guys to thing :)

Have a good day!!!!





WINDOWS 2008 R2 SP1 CLUSTER NAME FAILURE! DUPLICATE IP ADDRESS DETECTED

Good day! All,

I got sometime today so started to write couple of issues we encounter which i got involved ,troubleshooted and fixed the issue. Today will cover how we fixed Cluster Name failure..

We have a 2 Active Node Cluster found that Cluster IP Address and Name after failure to other Node didn't come online.

I went in there to other Node tried to bring the cluster online and still no luck..So started to check event logs and everything was clean except that failure of resources..
So i went on to generate the cluster logs using cluster.exe log /g to generate cluster logs on both the nodes and started to investigate further..
Started to look around but still it was not clear why the Cluster resources didn't come online.. So i hopped on to the another node and started to look for any ERR in the cluster logs.. at some point i saw something called duplicate IP Adderss.. i didn't give much attention because this is not newly build cluster and it was working for years now.. so i moved on to search some other errors..

After some troubleshooting we had no luck, then i saw our Monitoring Alert popping up for this Server saying duplicate IP Address.. now that go me puzzled why would it do it and started to check on the IP's...

Checked NSLOOKUP, DNS, PING test all came back clean..

I said to myself there is some where the Server is see duplicate IP's so opened TCP/IP Properties..

to my surprise i saw that Cluster IP Address was added as Secondary IP Address for that Server and when every we moved the Server resource to other Node this IP Address was moving along with it and adding as Secondary IP Address on both Nodes.

I have worked on so many cluster and configured so many still no clear why someone would add the Cluster IP Address to the TCP/IP Properties, what exactly this would achieve i have no idea.. so if someone out there, they have a reason please free to reply me back..

Ok then i removed the IP Address from TCP/IP properties and every thing started to work and resource came back online.

For starters in Windows 2008 there is no way in GUI you can move the cluster group along with Quorum disk to another Node, you need to use powershell or cmd .
Note:In Windows 2012 this has been fixed and now using Failover Cluster Manager you move even Quorum disk and Cluster group.

Cluster group “Cluster Group” /move:<newnode>