Saturday, 12 May 2018

Active Directory is unavailable after disaster recovery fail-over

Active Directory is available after a fail-over.

Customer has two domain controllers that are replicated to a recovery site using Veeam Backup & Replication.

During a DR test fail-over, Active Directory on both DCs would be available for only a few minutes before stopping working.

Tests like NETDOM QUERY FSMO and NLTEST state the domain is unavailable. NET SHARE shows the SYSVOL and NETLOGON shares are missing.

After a restore or replication Active Directory detects this has happened and attempts to protect it's self and effectively goes into a 'safe mode' so to speak.

The steps below outline what needs to be done to recovery from this. These steps apply to domain controllers using the legacy NTFRS replication and not DCs using DFSR. You can use dfsrmig.exe /getglobalstate to see if you are using NTFRS or DFSR


Step 1 - Power on both DCs and wait for the automatic reboot. If not you can't log in "No domain controllers available"
Step 2 - On DC1 or the DC with the FMSO roles, type NET SHARE and confirm that the SYSVOL and NETLOGON confirm that they are missing. Also check that the domain is unavailable NETDOM QUERY FSMO.
Step 3 - On DC1, CMD "Start SYSVOL" Make a backup of C:\windows\sysvol\domain\policies & C:\windows\sysvol\domain\scripts
Step 4 - NET STOP NTFRS on both DCs
Step 5 - On DC1 Set D4 to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Culmulative Replicate Set\GUID
Step 6 - On DC1 NET START NTFRS
Step 7 - On DC1 Check Event viewer for event id 13516 in File replication stating that the server is a DC
Step 8 - On DC1 CMD "start SYSVOL" (Should be empty)
Step 9 - Copy the backup of the Scripts and Policy folder to c:\windows\sysvol\domain on DC1
Step 12 - On DC1 CMD "Start SYSVOL" and check that Scripts and Policies exists with recent time stamp.
Step 13 - On DC0 Check NTFRS is stopped
Step 14 - On DC0 Set D2 on HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Culmulative Replicate Set\GUID
Step 15 - on DC0 NET START NTFRS
Step 16 - On DC0 Open Event Viewer or event id 13516 in File replication event log
Step 17 - Type NET SHARE on both DCs and check that SYSVOL and NETLOGON exist. Restart NETLOGON if the NETLOGON share is missing.

Step 18 - Type NETDOM QUERY FSMO and make sure that both DCs report the same FSMO role holders.

Note that these steps differ from the ones details in this Microsoft KB article, which details setting the BurFlags under the Backup/Restore key, in my steps the BurFlags are under Culmulative Replicate Set

https://support.microsoft.com/en-gb/help/290762/using-the-burflags-registry-key-to-reinitialize-file-replication-servi


Wednesday, 4 April 2018

Windows cannot find the Microsoft Software License Terms.When installing Windows 2012 on a Dell server



When trying to deploy Windows server onto a Dell server you see this error message....



Turns out that this is caused by iDRAC firmware version 2.52.52.52 you can work around this by selecting Core mode and adding the GUI features later or downgrading the iDRAC firmware.

Lifecycle Controller Release Notes: https://downloads.dell.com/FOLDER04830652M/1/iDRAC_2_52_52_52_Release_Notes_A00.pdf
Deployment of Windows Server operating systems (OS) using LC may fail with one of the following message:
  • Windows installation cannot continue because a required driver could not be installed.
  • Product key required.
  • Windows cannot find the software license terms.
This happens when the Windows setup copies the driver to the scratch space (X: drive) and the scratch space becomes full. To resolve this issue, do any of the following:
  • Remove all the installed add-on devices before starting the OS installation. After the OS installation is complete, connect the add-on devices and manually install the remaining drivers using Dell Update Packages (DUPs).
  • To avoid physically removing the hardware, disable the PCle slots in the BIOS.
Increase scratch space size beyond 32 MB using DISM set-scratchspace command when creating customized deployment

Tuesday, 27 March 2018

VM slow after P2V conversion

A VM was running very slow after a P2V using DISK2VHD. I removed all the hidden devices, I even tried removing the AntiVirus solution.

I found out the the original physical server had 2 CPUs, but when I built the VM and attached the VHDX files I had only added 1x vCPU

Fix: Shutdown the VM and add another vCPU.

Sunday, 4 February 2018

Virtual Exchange server become unresponsive after Hyper-V integration services update

I've seen the three times now, and this issue effects both Exchange 2013 and Exchange 2016 installations.



The long and short is that if you have a Windows 2012 R2 virtual machine with either Exchange 2013/2016 installed and you upgrade the Hyper-V integration services the VM gets stuck on the next reboot. After about 30 minutes the login window appears but after trying to login the VM runs like a dog.

The fix is simple but painfully slow.


  1. Shutdown the VM, you might have to reset it.
  2. Attach a Windows 2012 R2 ISO to the VM
  3. Start the VM and press a key to boot from the ISO
  4. Select the Repair Windows option and get to the CMD prompt
  5. Next alter the boot configuration so you have the option to boot into safe mode.
  6. At the CMD prompt type:
    bcdedit /set {bootmgr} timeout 15
  7. Reboot the VM and remove the ISO
  8. You might find Windows wants to back out the failed updates from the previous boot.
  9. At the Windows Boot manager select Safe Mode.
  10. Once windows boots you will need to login and disable the Exchange services.
  11. Open a PowerShell prompt and enter this command.
    get-service -Name MSE* | ?{$_.Starttype -eq "Automatic"} | Set-Service -StartupType Disabled
  12. Now reboot the VM, you can ignore the Boot menu.
  13. After Windows starts and you have logged in you can insert the Integration Services disk again and apply the update. Note you might find Windows plays up a bit as Exchange isn't working now.
  14. Reboot the server, this time it shouldn't get stuck at "Getting Windows Ready"
  15. After you have restarted and logged in you can enable the Exchange services once more by entering this command.
    get-service -Name MSE* | ?{$_.Starttype -eq "Disabled"} | Set-Service -StartupType Automatic
  16. One last reboot and everything should be back to normal.
Microsoft has a lot to answer for here, the first time you see this it's really scary stuff.


Thursday, 7 December 2017

Using Synology as a Veeam Linux backup repository

I have a new customer that purchased Veeam Backup and Replication from me. The customer wanted to use their Synology RS815RP+ as a Veeam Linux Backup Repository. 

There is loads of information on Google about all tweaks you have to do to make Veeam see the Synology, however most are not reliant to v6.x of the Synology software.

From the Synology Web GUI

  • Control Panel/Terminal & SNMP: Enable SSH
  • Click on "Advanced Settings" and change the cipher strength to "Low" (Veeam doesn't support medium or strong strength ciphers).
  • Control Panel/User/Advanced: Enable the user home service.
  • Package Centre: Search for Perl. It will be listed under "Third Party". Click install.

From here you can try to add the Synology as a Linux repository using the synology Admin account. If Veeam hangs then there is an issue, if it's working it normally finishes the add task in 15-30 seconds. You don NOT need to elevate to root. 

If the above doesn't work you can try using the root account on the Synology.

There are mixed messages about if you can use the 'root' password or if you should use SSH keys. I have configured both as a belt and braces approach. To change the root password you need to use the instructions below.

  • SSH to machine as "admin" user.
  • Enter command sudo -sand providing admin password.
  • Enter command “synouser -setpw root password“ Where password is the root password you want.
If that doesn;t work you can try and use Linux SSH Keys to connect. Try this blog post.

http://karlcode.owtelse.com/blog/2015/06/27/passwordless-ssh-on-synology/

You need to add the Linux repository and specify the key.

Here a a list of things I found not to be on any relevance.

  • No need to alter the veeam_soap.tar file for the "-x VMFS" setting
  • No need to enable sFTP on the synology.
  • No need to enable NFS
  • Not relevant http://blog.millard.org/2014/11/use-synology-as-veeam-b-linux-repository.html
  • Not relevant http://blog.millard.org/2014/11/repair-synology-dsm51-for-use-as-linux.html
  • No need to add admin ALL = NOPASSWD: ALL to /etc/sudoers

 

Monday, 16 October 2017

RAMdisk full after ESXi 6.5 upgrade with EqualLogic MEM 1.5

Recently I performed a vSphere 6.5 upgrade for a good customer of mine recently. The upgrade went as planned. We checked the EqualLogic support website and saw that MEM 1.5 had been released, and decided to update this too using the VMware Update Manager.

Over the weekend that followed the customer contacted me to say that VeeamOne was reporting issues on all four hosts.

Alarm: ESX(i) host storage warning
Time: 14/10/2017 09:50:11
Details: Fired by event: esx.problem.visorfs.ramdisk.full
Event description: The ramdisk 'root' is full. As a result, the file /scratch/log/vpxa.log could not be written.

I used this VMware KB article to trouble shoot.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2001550

I used vdf -h to see the partition utilization and could see straight away that the Root partition was indeed full.



If the root partition becomes full the host can make vMotion fail, make the host slow and unresponsive or even worse PSOD!

Next I looked in the VMKernel log.

[root@esxi1:~] tail /var/log/vmkwarning.log
2017-10-14T08:51:20.082Z cpu31:65816)WARNING: VisorFSRam: 353: Cannot extend visorfs file /scratch/log/equallogic/ehcmd.log because its ramdisk (root) is full.


VMKernel was telling me that the EqualLogic MEM log couldn't be written, given the MEM module was the last thing to be updated I suspected this was the issue.

[root@esxi1:/scratch/log/equallogic] ls -lah
total 24400
d-wxrw--wt 1 root root 512 Oct 13 10:43 .
drwxr-xr-x 1 root root 512 Oct 14 08:42 ..
-rw-rw-rw- 1 root root 23.8M Oct 14 10:06 ehcmd.log


an 'ls' showed that the log was 24MBs in size, given the root RAMdisk is only 32MBs this is a problem.
I used WinSCP to copy the file off the ESXi host for further examination. To free up some space and get us out of the immediate issue I removed the log using the command below.

rm /scratch/log/equallogic/ehcmd.log

I then used vdf -h to check the free space which was back down to a normal value.


Sadly after 24hrs the problem was back!

There are two options here, first is to limit the EqualLogic log size from the default of 50MBs to something more like 15MBs, the second option is to move the /scratch area.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033696

I created a folder on VMFS6-1 called .locker-esxi1 and used a SSH session to configure the path using the commands below, note the hosts need a reboot to action the changes:

vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/VMFS6-1/.locker-esxi1


vim-cmd hostsvc/advopt/view ScratchConfig.ConfiguredScratchLocation


Fortunately the customer has VeeamOne, otherwise I'm not sure how this problem would have been picked up as by default vCenter doesn't alert on this. 


Tuesday, 26 September 2017

Hyper-V host freezes until LiveMigration finishes

I recently had to rebuild a pair of Windows 2012 Hyper-V hosts that had been 'fiddled' with. The hosts where fully patched and the latest drivers and firmware had been applied.
During a LiveMigration from either host to the other the RDP session would freeze up, typing would be almost impossible. Once the LiveMigration completed the host would return to normality, virtual machines where unaffected.

The network was made up from a pair of Broadcom 57810's teamed together using LBFO, this was configured to be Switch Indipendant and have Hyper-V Port as the algorithm.

Although the hosts where lagged and froze up you could see that in the TaskMgr the LiveMigration network was only using between 2-3 Gbps! However the CPUs show low utilization 6-10%

I checked the Power Profile in the BIOS and made sure it was set to 'Performance' but this made no difference.

We configured the 2x BCM57810 with Jumbo frames 9014 and configured the LiveMigration vEthernet adapter to 9014 bytes and the migrations then hit 9-10Gbps! Further more the laggy-ness and freezing had gone!


Set-NetAdapterAdvancedProperty -Name “NIC1”,"NIC2" -RegistryKeyword “*JumboPacket” -Registryvalue 9014