Monday, 16 October 2017

RAMdisk full after ESXi 6.5 upgrade with EqualLogic MEM 1.5

Recently I performed a vSphere 6.5 upgrade for a good customer of mine recently. The upgrade went as planned. We checked the EqualLogic support website and saw that MEM 1.5 had been released, and decided to update this too using the VMware Update Manager.

Over the weekend that followed the customer contacted me to say that VeeamOne was reporting issues on all four hosts.

Alarm: ESX(i) host storage warning
Time: 14/10/2017 09:50:11
Details: Fired by event: esx.problem.visorfs.ramdisk.full
Event description: The ramdisk 'root' is full. As a result, the file /scratch/log/vpxa.log could not be written.

I used this VMware KB article to trouble shoot.

I used vdf -h to see the partition utilization and could see straight away that the Root partition was indeed full.

If the root partition becomes full the host can make vMotion fail, make the host slow and unresponsive or even worse PSOD!

Next I looked in the VMKernel log.

[root@esxi1:~] tail /var/log/vmkwarning.log
2017-10-14T08:51:20.082Z cpu31:65816)WARNING: VisorFSRam: 353: Cannot extend visorfs file /scratch/log/equallogic/ehcmd.log because its ramdisk (root) is full.

VMKernel was telling me that the EqualLogic MEM log couldn't be written, given the MEM module was the last thing to be updated I suspected this was the issue.

[root@esxi1:/scratch/log/equallogic] ls -lah
total 24400
d-wxrw--wt 1 root root 512 Oct 13 10:43 .
drwxr-xr-x 1 root root 512 Oct 14 08:42 ..
-rw-rw-rw- 1 root root 23.8M Oct 14 10:06 ehcmd.log

an 'ls' showed that the log was 24MBs in size, given the root RAMdisk is only 32MBs this is a problem.
I used WinSCP to copy the file off the ESXi host for further examination. To free up some space and get us out of the immediate issue I removed the log using the command below.

rm /scratch/log/equallogic/ehcmd.log

I then used vdf -h to check the free space which was back down to a normal value.

Sadly after 24hrs the problem was back!

There are two options here, first is to limit the EqualLogic log size from the default of 50MBs to something more like 15MBs, the second option is to move the /scratch area.

I created a folder on VMFS6-1 called .locker-esxi1 and used a SSH session to configure the path using the commands below, note the hosts need a reboot to action the changes:

vim-cmd hostsvc/advopt/update ScratchConfig.ConfiguredScratchLocation string /vmfs/volumes/VMFS6-1/.locker-esxi1

vim-cmd hostsvc/advopt/view ScratchConfig.ConfiguredScratchLocation

Fortunately the customer has VeeamOne, otherwise I'm not sure how this problem would have been picked up as by default vCenter doesn't alert on this. 

Tuesday, 26 September 2017

Hyper-V host freezes until LiveMigration finishes

I recently had to rebuild a pair of Windows 2012 Hyper-V hosts that had been 'fiddled' with. The hosts where fully patched and the latest drivers and firmware had been applied.
During a LiveMigration from either host to the other the RDP session would freeze up, typing would be almost impossible. Once the LiveMigration completed the host would return to normality, virtual machines where unaffected.

The network was made up from a pair of Broadcom 57810's teamed together using LBFO, this was configured to be Switch Indipendant and have Hyper-V Port as the algorithm.

Although the hosts where lagged and froze up you could see that in the TaskMgr the LiveMigration network was only using between 2-3 Gbps! However the CPUs show low utilization 6-10%

I checked the Power Profile in the BIOS and made sure it was set to 'Performance' but this made no difference.

We configured the 2x BCM57810 with Jumbo frames 9014 and configured the LiveMigration vEthernet adapter to 9014 bytes and the migrations then hit 9-10Gbps! Further more the laggy-ness and freezing had gone!

Set-NetAdapterAdvancedProperty -Name “NIC1”,"NIC2" -RegistryKeyword “*JumboPacket” -Registryvalue 9014

Friday, 18 August 2017

Excellent iSCSI tuning guide lines

Thursday, 8 June 2017

Setting the local to English United Kingdom using PowerShell

Setting the local to English United Kingdom using PowerShell

Set-Culture en-GB
Set-WinSystemLocale en-GB
Set-WinHomeLocation -GeoId 242

Set-WinUserLanguageList en-GB

Thursday, 20 April 2017

Resetting lost ESXi password

Works on ESXi 6.0

If you have access to vCenter..

# Just so it contains one or more VMHost objects.
# To reset all ESXi host passwords use
# $vmhosts = Get-VMHost

Connect-VICenter -Server 

$NewCredential = Get-Credential -UserName "root" -Message "Enter an existing ESXi username (not vCenter), and what you want their password to be reset to."

Foreach ($vmhost in $vmhosts) {
    $esxcli = get-esxcli -vmhost $vmhost -v2 #Gain access to ESXCLI on the host.
    $esxcliargs = $esxcli.system.account.set.CreateArgs() #Get Parameter list (Arguments)
    $ = $NewCredential.UserName #Specify the user to reset
    $esxcliargs.password = $NewCredential.GetNetworkCredential().Password #Specify the new password
    $esxcliargs.passwordconfirmation = $NewCredential.GetNetworkCredential().Password
    Write-Host ("Resetting password for: " + $vmhost) #Debug line so admin can see what's happening.
    $esxcli.system.account.set.Invoke($esxcliargs) #Run command, if returns "true" it was successful.

Friday, 7 April 2017

Removing all the crap from Windows 10

A great write up on how to remove all the detritus Microsoft installs onto Windows 10 OOB 

Wednesday, 5 April 2017

For the device-specific module (DSM) named Microsoft DSM, versions do not match between node

I was asked by a customer to introduce a new Hyper-V 2012 R2 core mode node into an existing cluster.
As always I checked the existing nodes for Windows updates and patched the new node accordingly. The Add node menu item was using in the failover cluster manager and all tests had been selected. Quickly the validation failed with the error below.

Upon closer inspection we can see that the Microsoft DSM version doesn't match! So Windows update was used to check for missing updates, but both nodes showed no updates available.

I used PowerShell to compare the MSDSM.SYS files on both systems, and as we can see below the FilePrivatePart is newer.

I Googled about to find out if there was a hotfix that might have updated the MSDSM.SYS file but no joy.

In the end we narrowed it down to the fact that the March preview update has updated the file. After rolling this out to all cluster nodes the validation passed and the new node was added.