Monday, November 17, 2014

WINDOWS 2008 R2 SP1 CLUSTER NAME FAILURE! DUPLICATE IP ADDRESS DETECTED

Good day! All,

I got sometime today so started to write couple of issues we encounter which i got involved ,troubleshooted and fixed the issue. Today will cover how we fixed Cluster Name failure..

We have a 2 Active Node Cluster found that Cluster IP Address and Name after failure to other Node didn't come online.

I went in there to other Node tried to bring the cluster online and still no luck..So started to check event logs and everything was clean except that failure of resources..
So i went on to generate the cluster logs using cluster.exe log /g to generate cluster logs on both the nodes and started to investigate further..
Started to look around but still it was not clear why the Cluster resources didn't come online.. So i hopped on to the another node and started to look for any ERR in the cluster logs.. at some point i saw something called duplicate IP Adderss.. i didn't give much attention because this is not newly build cluster and it was working for years now.. so i moved on to search some other errors..

After some troubleshooting we had no luck, then i saw our Monitoring Alert popping up for this Server saying duplicate IP Address.. now that go me puzzled why would it do it and started to check on the IP's...

Checked NSLOOKUP, DNS, PING test all came back clean..

I said to myself there is some where the Server is see duplicate IP's so opened TCP/IP Properties..

to my surprise i saw that Cluster IP Address was added as Secondary IP Address for that Server and when every we moved the Server resource to other Node this IP Address was moving along with it and adding as Secondary IP Address on both Nodes.

I have worked on so many cluster and configured so many still no clear why someone would add the Cluster IP Address to the TCP/IP Properties, what exactly this would achieve i have no idea.. so if someone out there, they have a reason please free to reply me back..

Ok then i removed the IP Address from TCP/IP properties and every thing started to work and resource came back online.

For starters in Windows 2008 there is no way in GUI you can move the cluster group along with Quorum disk to another Node, you need to use powershell or cmd .
Note:In Windows 2012 this has been fixed and now using Failover Cluster Manager you move even Quorum disk and Cluster group.

Cluster group “Cluster Group” /move:<newnode>








No comments:

Post a Comment