Yesterday I had an issue with my NSX-T 3.0.1 deployment in my lab. I was not able to resolve the issue, so I had to recover NSX from backup. Luckily I was making backups every day. So, it would not be hard te recover right? Well, think again if you are running NSX-T 3.x and you have deployed your Solution with custom certificates in a prior NSX-T version.
NSX-T Version: 126.96.36.199.0.16404613
I started to recover NSX-T by deploying a brand new appliance with the same build and according to the official VMware NSX-T restore documentation. Everything went well, until the backup config was pushed to the NSX-T manager and it was starting to complaint about the certificate. The exact error was: “Restore process failed. Error while restoring certificates: Error while updating tomcat certificate <cert-id>. Certficate validation failed. Reason: Certificate is not compliant as certificate of type4 SERVER: Extended key usage field is not present in the certificate”
After some google searches I came up to this blogpost from VRACCOON which helped my into the right direction.
NSX-T 3.0 added Certificate Revocation List (CRL) checking when applying a certificate to a Manager node/cluster. If the CRL check cannot be performed, the certificate cannot be applied to a Manager node or cluster. VMware recommends the use of HTTP CDP based certificates.
I had deployed my NSX-T solution when it was in the 2.4 release. So, I used a custom certificate that did not meet that requirement. To be more precisely, I did not have a certificate with the extension basic constraints enabled.
So you would think that it then just would be easy enough to disable the CRL check as described in this VMware KB or just create a new compliant certificate and attach this to NSX-T? I thought the same! However, during a restore process of NSX-T the application itself will be in a read-only mode (also via API) until the restore process has been validated and finished successfully. Therefore, I was not able to change the certificate or change that security setting. I even tried to deploy a new appliance, change certs and change the security setting upfront. Unfortunately, the restore process is copying every piece of configuration and data to the NSX-T appliance including that security setting and the certificates. So that did not solve my problem.
It does not matter if you run a NSX-T lab or a production system, you do not want to come into this situation. Therefore, you can do two things to be compliant for a future restore in NSX-T 3.x:
#1 Disable the security setting for CRL checking (not recommended)
#2 Renew and replace your custom NSX-T certificate that is meeting the requirements. A good blogpost to do this can be found here. (recommended)
If you have any additional questions or remarks about this blogpost please let me know!