Wednesday, November 22, 2017

CPU 95% spike alert - resolved!!!!

Good day All,

Welcome back!!!! We had a strange incident for CPU alerts been reported by Client.. our monitoring tool was not showing any high CPU usage but Client as monitoring setup and they are getting CPU spike alerts one or two times in couple of days...

We did routine health check by checking month log CPU usage , CPU was hardly showing 30% spiking for entire month but still Client was getting alert so following steps was performed to further troubelshoot

1. Open resource monitor and started to keep a eye on CPU spike to see which Service is causing it..
2. After a close watch for like a hr was able to tell that svchost.exe was utilizing it
3. As svchost is a shared process had to identity which Service is causing it so enabled the check box next  to Service host in CPU tab of resource montior and monitored as below



































4. From the above i was able to identify it was event log Service which is piking the CPU
5.So opened event viewer and started to check the events found that in Security log for every 1 sec at-least 10 to 15 events on Event ID:5156 Platform Filtering Connection was getting generated and filling up the 1 GB security log file in no time. After the log file is filled its trying to over write but number are events are so high its unable to process it and spiking the CPU


























6. Verified in local security policy that Audit Object Access policy was enabled for both Success, Failure and it been enforced with GPO.






























7. Starting Windows 2008 Audit policy has changed and lot of subcategories are added and you can verify it by typing in command prompt auditpol / get / category

















8.Our Security policy was to have both Success and Failure for Filtering Platform Connection, had to raise a exception and policy was changed from Success and Failure to only Failure with the following command on the Server
auditPol /set /Subcategory:"Filtering Platform Connection" /Success:disable /failure:enable

Ola the issue got resolved and we did't see any more spikes....

hopefully this helps someone, until next one you all have a good day!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Preparing to configure windows - Do not turn off your computer

Good day All,

Welcome back!!! today will share a issue we encountered on a Windows 2008 Server after patching.

After patching when logged into console this is the error we started to see for a very long time



















Troubleshooting steps performed

1. rebooted the Server, went to safe mode and try to uninstall installed patches and it showed error and unable to unistall
2.Tried last good configuration that didn't work
3.dism.exe /cleanup-image /scanhealth didnt help

So to fix the issue we reboot the Server and during the above screen did a MMC to check if all Service was running fine or anything is struck in stopping mode.

On verification we found that Antivirus scan was in stopping state , we tried pskill to kill the service but no luck so we changed the service settings to manual and then did a reboot..
Ola we able to login back and then did a windows update, patches got installed and then started the service back..

I have seen across the internet some one saying to remove the pending.xml file under C:\windows\winsxs so just sharing in-case above steps doesn't work for someone.

Hopefully this helps someone, until next one you all have good day!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Friday, November 17, 2017

Linux Blade after firmware either NIC's disconnected or unable to reboot or power off.

Good day All,

Welcome back!!! recently Unix team started to do firmware upgrade on the blades and after upgrade when to reboot the Servers couple of issues was reported

1. all the NIC cards would be in disconnected state
2. Blade will not respond or will not be able to hard reset or power it off state.

Following steps was performed to fix the issue

1. Reset the blade :

  • Login to OA and make note of the bay the blade is in under. 
  • Now open putty and connect to primary OA IP
  • type reset server bayno. 


ask you to confirm and then it will show successfully done.

2. If resetting the blade fails then login to Virtual connect Manager. Go to profiles and select the blade and click edit.Down below you will see a option to un-assign the profile for the server and click ok.  Now go back to Server and then apply the same profile back and click Apply and power on the blade.

hopefully this helps someone and until next one all have a good day!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Sunday, October 15, 2017

C7000 - decomm 2 C7000 Frames and creating new Domain

Good day All,

Welcome back!!! recently we had a request to decomm 2 C7000 Frames which are part of the 4 Frames Linked Frames.
Kindly note as the decomm is for the Primary frame in the domain we will need to re-create a new domain.

Pre-task to be done:

1. Backup the currect VC domain in case if we have to revert back
2. All Server Profiles complete screen shot showing which NIC is configured on the Server
3. Shared uplink set screen shot showing the ports which are configured to uplink switches.
4. All Ethernet networks information which are associated to all the Shared uplink sets and VLAN's
5. Need to determine if you want to use a old name or a new Name for the domain
6. What would be the IP for the new domain
7. Who will remove the existing cables from 4 frames and how to configure new stacking links only for 2 frames.
8. If direct attached SAN is present all the information needed to recreate should be taken
9. All blade Servers will need to shutdown and we will have to recreate all the Shareuplinks,Ethernet networks and Servers profiles so for the whole activity at-least 12-14 hrs of downtime. We may need more in case we run into issues so plan your downtime well.

Note: All our blades are using the Factory default settings for MAC address so no additional information needed to be captured. If you have to use the domain then need to determine what series will be used for it.


Stacking Links connected requested as below:

Frame 1 Interconnect Slot 1 port 1 Frame 2 Interconnect Slot 1 port 1
Frame 1  Interconnect Slot 2 port 1 Frame 2 Interconnect Slot 2 port 1


During the change Window:

1. Shutdown all the Server blades
2. Login to VC and Un-assigned all the Ethernet networks for each Servers in the profile for all profiles in all 4 frames
3. Deleted all the Ethernet networks on all the frames
4. Deleted all the Shared uplink ports on all the frames.
5. Now click on enclosure and under configuration deleted the 2 Frames which you will be reused to create new domain (Note: delete domain can be done too now )

Note: You will not be able to delete domain or delete a enclosure without blades are been shutdown and Ethernet networks been unassigned in Server profiles and shareuplink sets are deleted.

6. Now request the cabling guy to connect the stacking links and confirm he is done before you proceed

7. Now login to OA for the Primary frame , click on Enclosures IP4 and make sure all the VC IP and ILO for the blades are intact.

Note: if this new domain creation then you will need to configure you OA first with IP for all the VC Modules and ILO for Server blades.

8. Now use the IP for VC1 module and connect ,as soon as you authenticate you will notice that domain configuration wizard will start
First step is to configure domain name .Click next will ask you import the current enclosure module, give the password and hit next and module should be imported and then at the end before  you click finish you see a option to un-check  configure network just un-check it as we will do it later.

9. Now click on Configurations , go to IP Address tab and click use Virtual connect domain IP and provide the VC domain IP,

10. you will notice that current session will be logged out and will be redirect to new VC domain IP

11. Logged back and started creating the Shared uplink Sets. Note that all the uplinks will be in standby mode don't worry after you add ethernet networks and assign to Server profile the uplinks will come online.

12. Now started to create Ethernet networks

13. Created new Server profile and assigned the Ethernet networks for the NIC's as it was before

14. Powered on the blades and the Servers came back online with no issues and no need to reconfigure IP's because we used Factory default of blades so no mac-address got changed even though new Server profile was attached.

15. Completed the creation for all the frames and it was handed over for testing.

So this is how we successfully decommission 2 Frames in the 4 frame domain and recreated a new domain.

How to make sure stacking links are connect properly:

After logging to VC, when you click on Stacking links you will the connecting Status and redundancy status showing all OK.

Also i have see people finding default to understand see below, marked in yellow is the external cable links we requested engineer to connect and if you see the other connections they are just internal wired connections between 2 VC modules which are adjacent to each other in  frame.
Example : Connection from VC1 to VC 2 module

Note: Always make sure that VC modules id more than 1 frame are created as parallel, should never connect cross cabled.



















So this is how we successfully completed the request and sharing the same so that it helps someone!!!
Until next one you all have good day!!!!!!!!!!!!!!!!!!!!!!!!!!!

Monday, September 4, 2017

Basic to GPT disk

Good day All,

Welcome back!!! recently we had a request to add additional 1 TB of space to a already existing     1.5 TB of basic disk and its virtual machine.
Lot of guys may be thing what is the issue here and why we need a post for this?
well i have seen still lot of Admins do the mistake of just extending disk beyond 2 TB and struggle to understand why disk is not extending beyond 2 TB,
If you have read my first sentence there is answer to it, well i said this a Basic disk and Basic disk can't be extended beyond 2 TB so we need to convert to either GPT disk or add new disk and make the disk as dynamic.

Dynamic disk : Well even MS does't recommend this on new OS like 2008/2012 and disk performance are not that great.

GPT disk - is the way to go for performance and further growth but we will have to format the drive and restore the data.

After some discussion we decided that as this is File share drive,sighting performance and future growth we decided to go with GPT and wanted to come with a plan so that downtime for this is as minimal as possible.


Pre-Task we performed:

1. Took File share permission screenshots
2.Registry backup was taken as all the file share permission are present in case we have to revert or apply it
3.a New 2.5 TB GPT disk was created and attached to a Server let says name as B in the same ESXi Farm
4.we started a Robocopy batch script with below details from Source Server disk to GPT disk in Server B on the destination Server B.

ROBOCOPY /e /xj /ZB /r:2 /w:5 /LOG+:"C:\Log.txt" /it /purge /copyall Source_Path Destination_Path

@Echo Copying Complete
Pause 

Syntax:
/E :: copy subdirectories, including Empty ones.
/XJ :: eXclude Junction points. (normally included by default).
/ZB :: use restartable mode; if access denied use Backup mode.
 /R:n :: number of Retries on failed copies: default 1 million.
/W:n :: Wait time between retries: default is 30 seconds.
/IT :: Include Tweaked files.
/COPYALL :: COPY ALL file info (equivalent to /COPY:DATSOU)- Includes all Security Permissions.


5. 1.5 TB of data copy took about 15 hrs
6.A day before cutover we did one more incremental Robocopy and synced all the new changes and it took us about 30 mints

Steps performed during the cut-over:

1.Go to shares and close all the open shares for the drive
2.Initiated a Final Sync so that we are not missing new changes, it took about 15 mints
3. Removed the  1.5 TB disk from edit settings on the properties of the VM
4.Removed the 2.5 TB disk from destination VM and noted down the path
5.On the Source Server in edit settings given the new path to 2.5 TB disk
6.went to disk management on the Source Server scanned for new drive.
7.It automatically assigned a new driver letter E
8.So we changed the driver letter from E to original F and all the share permission got applied to drive

I have seen lot of people getting confused, please note Robocopy will only carry Security permission if required all the share permission you will have to manually assign to Shares.

As the registry settings was having all the sharing details as soon as changed the drive letter it took the Share permissions automatically and we didn't had to give anything.

The whole downtime for the post steps was like 45 mints and Server was up.

If anyone has a better way of doing it please share.
So we are at end of this article hopefully this helps someone, until next one you all a good day!!!!!!!!!!!

Friday, September 1, 2017

Unable to change Audit settings in Local group policy even though the settings are not governed by Group policy

Good day!

Welcome back!!! As part of non compliance,our security team asked me to enable below Audit Policy settings for Success/Failure in Local group policy.

Audit system events
Audit process tracking
Audit Policy change
Audit object access

When we try to enable Success/Failure it seams to work and then after we close the settings and go back and recheck the settings get unchecked.

So first think we checked was is it bound by group policy which is not and if its even bound by group policy it will not even allow to change it, we will clearly get error saying can't change it as its been enforced by Group policy.

For starters if you don't know starting 2008 MS introduced Advance Audit settings which you can enable using the Auidtpol command.
Below is the list of Category and subcategory for the Audit

Advance Policy sub category:

Audit system events

Category: System

  Security System Extension            
  System Integrity                    
  IPsec Driver                        
  Other System Events                  
  Security State Change                

Audit process tracking

Category: Detailed Tracking

Sub-Category:

  Process Creation                      
  Process Termination                  
  DPAPI Activity                        
  RPC Events                            
  Plug and Play Events                  

Audit privilege use

Category: Privilege Use

  Non Sensitive Privilege Use          
  Other Privilege Use Events            
  Sensitive Privilege Use              

Audit Policy change

Category: Policy Change

  Authentication Policy Change          
  Authorization Policy Change          
  MPSSVC Rule-Level Policy Change      
  Filtering Platform Policy Change      
  Other Policy Change Events            
  Audit Policy Change                  

Audit object access

Category: Object Access

  File System                          
  Registry                              
  Kernel Object                        
  SAM                                  
  Certification Services                
  Application Generated                
  Handle Manipulation                  
  File Share                            
  Filtering Platform Packet Drop        
  Filtering Platform Connection        
  Other Object Access Events            
  Detailed File Share                  
  Removable Storage                    
  Central Policy Staging                

Audit Logon events

Category: Logon/Logoff

  Logon                                
  Logoff                                
  Account Lockout                      
  IPsec Main Mode                      
  IPsec Quick Mode                      
  IPsec Extended Mode                  
  Special Logon                        
  Other Logon/Logoff Events            
  Network Policy Server                
  User / Device Claims                  


Audit directory service access

Category: DS Access

  Directory Service Changes            
  Directory Service Replication        
  Detailed Directory Service Replication
  Directory Service Access              

Audit account management

Category: Account Management

  User Account Management              
  Computer Account Management          
  Security Group Management            
  Distribution Group Management        
  Application Group Management          
  Other Account Management Events      

Audit account logon events

Category: Account Logon

  Kerberos Service Ticket Operations    
  Other Account Logon Events            
  Kerberos Authentication Service      
  Credential Validation                

How to  enable Category:

Example:

Auditpol /set /category:"Account Logon" /Success:enable /failure:enable
Auditpol /set /category:"Logon/Logoff" /Success:enable /failure:enable


If you don't want to enable all the Audit settings in Category you can enable just the Subcategory

Example:

AuditPol /Set /Subcategory:”Credential Validation” /Success:enable /failure:enable

So if you have to enable Audit policy subcategory you need to enable it as below.


















coming back to my issue when i tried to change the settings under Audit Policy it was not allowing me because Advance Policy was enabled and now any settings you will have to enable it by using Auditpol command only.

Problem was our tool from security was only looking at the Audit policy, is settings enabled or not and it had no clue on Advance Audit Subcategory. Even though we had it enable in there it was not working.

So to fix the problem i had to disable the Force audit policy, then enable all the settings in Audit policy and then enable it back.

Hopefully this helps someone and until next one you all have good day!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Tuesday, August 29, 2017

Blade movement from 1 Frame to another in a linked Frames

Good day All,

Welcome back!!! we had a recent requirement to move a Blade from 1 Frame to another on a Linked Frame of 4 and following steps was performed

Pre-plan:
1. Note ILO IP details
2.If you have to create new Profile then all the NIC's VLAN information needs to be noted.
3.Blade to be verified if its using VC assigned NIC's,WWW N's or Server default and MAC address and WWW N's needs to be noted.
4.Need to make sure VLAN's been used in current Frame, same is already in palace on New frame which means need to verify Ethernet Networks for same VLAN in place.
4. SAN Fabric if present then need to make sure same is present in new frame we well.

Steps Performed:
Note:
Our Frames are old setup that is SAN connected to a MDS Fiber switch to 2 VC modules on Bay 3,4. For starters this setup is  like a physical Server connected to a external Fabric switches just that its internal that's it.
Also the VC profile was setup to use Blade MAC and WWW N's.

1. Source Server was powered down
2. Existing Profile was unassigned
3. ILO IP was unchecked in old frame and assigned in new frame in OA
4. After the move , as the frames are linked assigned the existing profile pointing to new Blade location in the new Frame.
5.Before powering on network NIC's VLAN was changed so that all the NIC's use new Frame to upload or connectivity
6.After changing Server was powered on
7. NIC's MAC address for the blade didn't change and all the IP's etc was intact.
8.Blade had a Qlogic MEZ card attached to a MDS Fiber switch and WWW N's was intact when we moved to new Frame and no re-zoning was required.
9.Post validation was done.

Hopefully this helps someone and until next one you all have a good day!!!