Dude where’s my blog


It’s been really long since I have blogged on this site. A lot has happened all this while. However, I wanted to make my readers aware that I have started using a new platform called as postach.io and its integration with Evernote makes is awesome! So starting now, all my notes will be at https://amitrathod.postach.io/


vRealize Upgrade Assist Tool

Hola! It’s been some time that I have written on my blog.

Recently I was working with one of my customers with a Private Cloud Deployment based off vRA 6.2.3 and the associated vRealize Suite Advanced products. With vRA 7.3 being GA’d last month, I made the customer aware of the features and benefits of vRA 7.3 and he was awed. I was quickly tasked to help setup a PoC environment and showcase the benefits to the Business. We were lucky to get the blessings from the Business to take this forward and plan for the Upgrades.

There are tons of improvements in the 7.3 release especially when you are running off 6.2.x in your environment.I am not going to brag about the features and benefits of vRA 7.3(even though I would love too!), there are a good number of blogs already on this, but what I am here to talk about is the Upgrade Assist Tool, which would help plan the upgrades.


Back in the olden days, when it was just vSphere , things were pretty simple and so were the upgrades. But now, with so many components talking to each other, upgrades require a good amount of planning.

This customer of mine had been through the upgrade process of upgrading their distributed vRA deployment from 6.2.1 to 6.2.3 and were worried since the 7.3 release would be a major change and wanted us to ensure we do not miss out on any steps while planning the same.

Thanks to the Cloud Management Business Unit team at VMware, who have helped create this vRealize Automation Upgrade assist tool found in the Upgrade Center , see https://www.vmware.com/in/products/vrealize-automation/upgrade-center.html to make things easier and smooth.

Ok, so let’s get to the meat.

The vRealize Automation Upgrade Assist Tool can be downloaded from the Upgrade Center and of course has a guide associated with it for you to follow. If you have used the vRealize Production Test Tool in the past, this would be somewhat familiar (& also feel good about it, I will tell you why in a bit).

The idea of using this tool is to help analyze the current vRA deployment and bring forth those areas which could potentially cause some problems once upgraded to vRA 7.3. The good part is that it helps us check automatically if the current setup is ready for the vRA 7.x upgrade.

It doesn’t carry around very stringent requirements to get the product started but for best results (or so to say, complete results) here are some of the recommendations:

Ideally you should run this tool on one of the IaaS Windows Server with the Model Manager Data. In a distributed deployment, take a look at the IaaS nodes and search for “C:\ProgramFiles\(x86)\VMware\vCAC\Server\Model Manager Data” folder. The Machine with this folder is the place where you run this Upgrade Assist Tool.

Once extracted, you should run the vRPT-1.6.0.exe executable , which should open up the EULA and take you straight into the page where you enter the details about the existing vRA deployment. It has some fairly straight-forward questions with respect to your vRA deployment such as the vRA portal name, the IaaS Web Host name, the vRO node name along with the respective credentials.

Note: In a distributed deployment, you are supposed to enter the Load Balanced IP/URL instead of the node name.

Now, the good part when compared with the earlier versions(of vRealize Production Test Tool), is that it allows you to test the connections before you run the tests. The Test Connection button prompts you if the credentials are correct, or you need to modify them.

Once green, you click on Save and Run, which would initiate the tests.

To make sure you are good with the .NET versions, disk space and other details, it is recommended to run this test on the additional IaaS machines to ensure you verify the pre-requisites for the upgrade.

It also has an additional option where you can send these test results to VMware, where it uploads the test output onto the VMware ftpsite. It is exactly the same way you upload the logs when you are working with VMware Technical Support.

To give you a sample of how the test output looks like, here it is:

The output would state if there are any vRO Workflows or physical endpoints which may not work post the upgrade to 7.x.

I would highly recommend to start the upgrade journey with the Upgrade Assist Tool and then move to the Checklists provided in the VMware Documentation at https://docs.vmware.com/en/vRealize-Automation/index.html

Oh yes, VMware Pubs has a cool new look. Check it out, if you haven’t.

PSC & vCenter 6.0 Design Topology Decision Tree

Welcome 2017! Hope all is well with my readers. It has been a fabulous 2016 with great learnings.

Lately a lot of customers have reached out to me to help in planning their vSphere 6.0 setup(although 6.5 is already out and has awesome features which makes it a winner!).

These customers have a huge estate with a large number of vCenters and multiple ESXi Hosts. Either they are in a mode of consolidating them or re-designing them to have some sort of a CMA on top of it – That is a separate discussion altogether.

However in this post I want to point out a wonderful decision tree created by Emad Younis & Adam Eckerle at vSphere Topology Decision Tree .

This has been my advice to most of my customers planning for the 6.0 setup. Hope this helps!



Replacing SSL Certificates for Private Cloud based on vRA 6.2.3

One of my customer was recently replacing the SSL Certificates on the Private Cloud environment based off version 6.2.3 of vRA. It was the first time they were doing this and hence he had done his research of going through the internet on various blogs on how to go about the replacement. So when he tried to lay down the steps based on the reading done on the internet, he was even more confused because the blogs differed by at least more than a step in the overall certificate replacement process. So thats when he involved me and I tried to look into it.

vRA(or I should say vCAC) 6.0 , 6.1 & in fact 6.2.0 were released quite some time back and hence we have a lot of blog posts which would talk with respect to those versions(of course 6.2 was not released then. Mind you a lot of blog posts do use the word vCAC 6, so it doesnt mean that ALL the steps are going to work for your 6.2.3 or 6.2.4 versions of vRA.

Then we decided that we fallback to the trusted pubs.vmware.com and it becomes really important to look into the version of documentation which we were looking into since there is a 6.0, 6.1 and a 6.2 documentation available.

Ok so first lesson learnt: Ensure whatever you are reading on the Internet is right and validate it with respect to the versions that you are working on.

In a nutshell if you ask me the process which we followed to get the certificates replaced came from http://pubs.vmware.com/vra-62/index.jsp#com.vmware.vra.install.doc/GUID-F493819D-D4FB-4854-BEC4-295388BB6EF7.html which clearly shows you the flow of your certificate changes. I am not going through each and every step as I said that the above VMware Documentation link tells you what exactly you are are supposed to do. I will try and fill in the other interesting details.


PRO TIP: Dont skip steps in the pubs and if you are referencing KB Articles, dont hop,skip and jump the lines in Knowledge Base Articles —-> Another lesson learnt!

As it mentions you start with the Identity Appliance and then you inform(or re-establish trust between the Identity Appliance & ) the vRA Appliance to let it know about the change in the Certificate on the Identity Appliance.


So in essence there are two parts to the whole certificate replacement:

  1. Replace the Actual Certificate using the Key and the Certificate itself
  2. Re-establishing Trust among the various other components.


You should update all the components of the same type in a distributed setup and then go for the trust re-establishment.


It is the Point#2 which caused a lot of confusion(prior to we reading the official VMware Documentation) . The confusion was due to the fact that the steps are different for 6.1 and 6.2 so take note of the versions involved.

For versions 6.2.x after replacing the certificates in the vRealize Automation Appliance and updating the SSO registration for the vRealize Automation Appliance, we update the IaaS Servers with the vRealize Automation Appliance Certificate(or to say re-establish the trust between the vRA appliances and the IaaS components) by running the command vcac-config.exe command from the the server running the Model Manager Data component using the UpdateServerCertificates argument


The command for 6.2.x would be “vcac-config UpdateServerCertificates –d <name of the vRA Database> -s <FQDN of the SQL Server hosting the vRA DB> -v” as compared to 6.1 where you use a “DownloadRootCertificates” argument with vcac-config.exe.

P.S: In vRA 6.2.3 you will not find the “DownloadRootCertificates” argument if you run the “vcac-config.exe help” command.


At this point, my attention was taken to a VMware KB Article http://kb.vmware.com/kb/2110207 which lists down the steps very clearly on how to re-establish the trust.

I would highly recommend you to read the KB article fully atleast twice, just to make sure you fully understand the sequence of steps because all of them are really important.

Since it is a distributed environment, you will have to run these commands on all the nodes of the same type & hence I would recommend(a wise guy told me to do this) that you copy down those commands on to a notepad and fill in those details as requested so that you could just copy and then paste in the Command Prompt of those IaaS Nodes.

P.S: you don’t need to run the last command in the step#11 with a Note which is specifically for 6.0 and NOT for 6.2.3.

Post the iisreset & reboot of the appliances and IaaS boxes. I again went back to our documentation at http://pubs.vmware.com/vra-62/index.jsp#com.vmware.vra.install.doc/GUID-F493819D-D4FB-4854-BEC4-295388BB6EF7.html to make sure I update the other certificates such as the VAMI for Identity and vRA Appliances.

Once done successfully, we are all set for another year with a clean set of SSL Certs across the board 🙂

I ran into a couple of other issues such as creating the certificate chain, so it is important that you create the certificate chain in the right order which is:

  1. Server Certificate signed by the Intermediate CA
  2. Intermediate Certificate
  3. Root Certificate.

Although this is mentioned clearly in our documentation, we missed out on this too, so I wanted to make sure you guys don’t miss this.


Hope this helps!






Some Design Considerations for virtualizing SAP on VMware vSphere

Recently I was working with one of the customer who was migrating his SAP environment from IBM AIX to Cisco’s UCS platform and the customer was looking for some Best Practices from a VMware standpoint. I thought, this would make a good blog post.

There’s lot of information out there, but the following should help you get started.

So here they are:

  • The basis team should leverage Early Watch Reports to come up with sizing for the SAP virtual machines. Use requirements from the basis team for the application and DB servers for all the modules. The sizing should be done based on their existing environment and its utilization. 
  • Follow SAPS based workload Sizing recommendations. 
  • Reserved all memory based on SAP best practices for production. Definitely no overcommitment of Memory. 
  • Size VMs so that CPU and memory will fit within NUMA boundaries.
  • Reserve Memory for SAP Production Virtual Machines. 
  • Ensure database servers separate LUNS for data , logs, and archive.
  • The Guest OS swap will be separated to a LUN that is not replicated.
  • The boot disk and all other disk partitions should be aligned for optimal performance at the VMFS and the OS level.
  • Leverage Cisco UCS service profile to create consistent server Hardware configurations.
  • Plan vCPU per CPU core based on applications requirements. HT should be enabled to reduce impact of over commitment.
  • The ESX hosts for the SAP environment needs to exist in a dedicated cluster with its own networking and storage zoning.
  • Ensure virtual machines are configured to fit within NUMA nodes. 
  • Host profiles should be leveraged to ensure consistent configuration across all ESX hosts in the cluster.
  • Database Storage Design should be similar to Physical. We also need to factor in space to be used by the .vswp files in the Virtual Machines. The size of a .vswp file is equal to Total configured memory – reservation if provided. So if no reservations are provided, then the size of each .vswp file is equal to the amount of memory and hence the Storage sizing needs to factor that as well for the overall sizing. 
  • Use separate LUNs for OS,Data and Swap. 
  • Dedicate LUNs provided for DB related files to guarantee the required performance. 
  • All Database and Application disks should be Eager-Zero Thick Provisioned. 
  • Guest Page Files should be stored in a separate disks. 
  • At the Storage level, use Thick Provisioning. 
  • Use the UCS capability to create vNICs for separate ports for management,vMotion and Production traffic. Separate them by VLANs. 
  • The standard SAP Rules apply for Databases. See SAP Note 592393. 
  • See SAP Note 1056052 for additional vSphere Guidelines – https://service.sap.com/sap/support/notes/1056052 


Few good online resources:


Also note there was a recent announcement at SAPHIRE 2016, few days back(17th May) that SAP supports SAP HANA on vSphere 6 . Please see http://blogs.vmware.com/apps/2016/05/sap-hana-on-vsphere-deployment-options-and-best-practices.html 

Happy Virtualizing!!


CentOS7 Customization needs vCenter Server 5.5 U3

Holla! It’s been some time that I wrote on my blog, I am happy to share my recent experience with one of the customer who runs a private cloud built out of the vCloud Suite.

The customer is running a 6.2 vRealize Automation built on top of vCenter Server 5.5 U2d and ESXi 5.5U2. The R&D Team immediately needed a couple of CentOS7 machines to run their builds, however the current catalog did not have it.

The Cloud Infrastructure and Management  team curated a CentOS7 machine and built it as per their organizational standards and had VMware Tools installed, created a template and finally got that added as a catalog item on the vRA Catalog list.

Here comes the situation: VMs provisioned by the vRA portal were being allocated IP Addresses from the Network Profiles, however when we logged into those CentOS7 machines, we did not see any IPv4 assigned to it(it showed a IPv6). Neither did we see the IP Address on the Summary page of the VM on the vSphere Web Client/vSphere Client.

We were already running CentOS6.6/RHEL6/RHEL7 as catalog items and they were just running perfectly fine! What could be wrong with CentOS7? After all we were using the same Customization Specification for all Linux based machines!!

Ok. So lets do a recap – vRA provisioned the machine, it allocated an IP from the Network Profile, it also initiated a Customize Machine workflow to get it customized and it was powered on. We verified this by looking at the Tasks and Events of that machine as well, all of them were Completed successfully.

The customization tasks showed us a line stating that “For details, reference the log file /var/log/vmware-imc/toolsDeployPkg.log in the guest OS”

We looked into the file and we see that the IP Address field was blank in the toolsDeploypkg.log file which hinted us that probably there is something wrong with the Guest Customization Specification. But hey, how come we were able to successfully provision other VMs out of the same spec!

This made us turn to our favourite messiah – Google!! and we stumbled upon https://lonesysadmin.net/2015/01/06/centos-7-refusing-vmware-vsphere-guest-os-customizations/ by @plankers .

The trick was to fool the customization spec to believe that /etc/redhat-release stated “Red Hat Enterprise Linux Server release 7.0 (Maipo)”

Once this was done, the trick worked!!!

Only after this we saw this really important document – Guest OS Customization Support Matrix at http://partnerweb.vmware.com/programs/guestOS/guest-os-customization-matrix.pdf which stated the obvious – CentOS7 requires vCenter Server to be at the 5.5 Update 3 Build, which we were not.

The above is a hack(definitely not supported by VMware Support), I would highly recommend you to get the Upgrade done.

However in my customer’s case, where the Upgrade had a lot of dependencies, this was a quick and dirty way to get CentOS7 machines to the R&D folks, which made them happy!!

Thanks to @plankers.

Lessons learned:

  1. Pre-requisites – Check compatibility guides, in this case the GOS Customization Support Matrix.
  2. Search better in Google! 😉




Design Considerations – Words of Wisdom

  • A Good Design is a balance of Technical Best Practices and the organization’s goals, requirements and constraints. Now since this is a balance, there has to be a trade-off somewhere. As a good designer, you need to ensure that your design is meeting the customer’s requirements(which is the End Goal) along with the constraints despite the trade-off’s.
  • Never forget the End Goal, they guide and shape your design process.
    The Design decisions have to be documented clearly and should definitely contain a reason as to why the decision was made and also should help answer if the decision is questioned.
  • The design document should be as detailed as possible. The best test would be two different people reading the document and implementing the design and the end result should be the same.
  • It is highly important to have the right stakeholders who know the Organization’s Business Requirements and Goals as this is the Objective which your design should meet.
  • Follow the KISS Principle, Simple designs are easy to understand and deploy. Apart from being Simple, the design needs to be repeatable too, because the decisions made in the design are supposed to be based on sound justification.
  • Always make sure that your design is valid for 2-3 years to come atleast.

    Okay so with this information, you are good to create a Conceptual Design which should encompass all the customer requirements. This is the interim design to make sure you dont miss out on anything which the Organization is expecting. This is typically a diagram depicting customer requirements with the entities who would come into play when the design is initiated.

    Develop and/or adopt a design framework that is repeatable,reasonable and justifiable. The objective of a frame work is to help you get the required information needed for the design. Make sure you discuss this frame work with the key stakeholders at the customer’s organization as well as your peers. This will only help to make it better. After all a good design is also something which is always evolving.

    From a VMware Datacenter Virtualization Standpoint, I would stick to the AMPRSSC framework which I have been using for a while now:

    A – Availability – Think SLAs.
    M – Manageability – Think about how simple is it,How easy it is to deploy
    P -Performance – Think about throughput, latency, I/O per second
    R – Recoverability – Think RTO and RPO
    S – Scalability – Think, if we need to upscale, do we scale-out or scale-in?
    S – Security – Think how easy it is to secure the environment, Does it help in minimizing risks?
    C – Cost – Are you meeting the company requirements while staying within the budget?

    These are the areas which I would use as a scale to judge and take a sound design decision and thereby making a Good Design.

    So along with the Conceptual Design, we churn them with the AMPRSSC frame work and devise a Logical Design. Bear in mind that Logical Design is to help understand the design rather than getting into the nitty gritty of how various elements have been connected or plugged in. It shows the customer how will you be able to meet the desired requirements.

    The Conceptual Design is then brainstormed along with the customer as well as your peers to make it better and then you head to creating a Physical Design in which you show precise implementation details such as the Path Selection Policy, the IP Addresses, port numbers,etc.

    So the capture of requirements, constraints and assumptions, followed by creating a Conceptual Design and then a Logical Design and eventually a Physical design is added along with the Design Decisions made in a Design Document.
    Apart from the Design Document, it is pertinent that we create an Installation and Configuration Document based out on the Design document created,so that Implementation Engineers can just take the Install,Config Document and start building the required infrastructure which should precisely meet the customer’s requirement.

    Happy Designing!


vSphere 6 Upgrade Prerequisites

Though there are multiple blogs talking about the new Architecture and changes in the way vSphere 6 operates. I wanted to make sure that we plan and prepare a pre-upgrade checklist to ensure 100% positive results and a smooth upgrade.

Needless to say this – The vSphere Upgrade Guide on the VMware Website is your go-to guide for this. However here I am going to help you ensure that you have the necessary pre-requisites before you start your Upgrade.

P.S: For the vCenter Server, I wold be talking on the Windows based and not Appliance based(p.s: I love the vCenter Server Virtual Appliance, it just works great!).

  1. VMware vCenter Server 5.0 and later(5.1 and 5.5 and the associated Updates) is a 64 Bit Windows Application and hence would need a 64 Bit Machine. So if you have anything running below version 5.0, the process would be called a Migration rather than an Upgrade. i.e you migrate to a 64 Bit Windows Platform.
  2. Look into the Upgrade Path tab in the VMware Product Interoperability Matrixes at https://www.vmware.com/resources/compatibility/sim/interop_matrix.php and choose VMware vCenter Server as the Solution. A Green Tick will affirm that the Upgrade Path is Supported and will work!
  3. On the same webpage, click on the Solution/Database Interoperability and select VMware vCenter Server as the Solution and select 6.0 U1(if you are going to 6.0 U1 and trust me, you should go to 6.0 U1 ) and select the Database.
    IMPORTANT: Starting 6.0 onwards it is important to make sure that you know the exact version, NOT just the major number but also Release and Service Pack levels. For instance on SQL see https://support.microsoft.com/en-in/kb/321185 on how to find the update levels of the SQ Server and its components.
  4. Now for the ESXi Hosts, ensure that the Hardware is supported for ESXi 6.0 U1, please see https://www.vmware.com/resources/compatibility/search.php .
  5. Now go to the Release Notes of vCenter Server 6.0 U1 and ESXi 6.0 U1 and READ IT THOROUGHLY. Especially the Known Issues and the Resolved Issues Section, understand and assimilate the workaround as well – Always helps 😉
  6. Ensure the Certificates on the vCenter and the associated components are not expired.
  7. You should be using a 64 Bit ODBC DSN Connection.
  8. For SQL Server, ensure the system DSN is using SQL Native Client Driver.
  9. You cannot use Integrated Windows Authentication method if the vCenter Server service is running under the Windows built-in system account.
  10. Verify that the vCenter Database User has db_owner permissions before attempting the upgrade.
  11. Ensure that you know that the password for administrator@vsphere.local (Single Sign on) works.
  12. Ensure that your vCenter Server System is not a Domain Controller, but is a part of the Active Directory Domain.
  13. The vCenter Domain User account should have the following permissions:
    1. Member of theAdministrators Group.
    2. Log on as a Service
    3. Acts as  a part of the Operating System.
  14. IMPORTANT: Take a Backup of the vCenter Server’s Database and make sure you test it.
  15. Microsoft SQL Server Express is no longer supported for vCenter Server 6.0. The earlier 5.x Express database is replaced with an embedded PostgreSQL database.
  16. Take a backup of the vCenter SSL Certificates.
  17. DNS is pristine from both Forward and Reverse Lookup standpoint.
  18. Ensure JDK 1.6 is installed on the vCenter Server machine.
  19. If your end goal is to have the Platform Services Controller on the same machine as the vCenter Server, then your vCenter Server and Single Sign-On is supposed to be on the same machine. This is applicable for vCenter 5.1 and 5.5.
  20. If your end goal is to have an external Platform Services Controller, then the vCenter Server and Single Sign-On are on separate Machines.
  21. If you have vCenter Server and SSO on the same machine and if you desire High Availability. It can only be possible only by putting them(PSCs and vCenter Servers) behind a Load Balancer. See http://blogs.vmware.com/vsphere/2015/03/vcenter-server-6-topology-ha.html for a very good explanation on the same.
  22. Ensure Time is synchronized amongh all the machines.
  23. During the Upgrade you will be asked for the following information,so keep this handy:
    1. vCenter Single Sign-On Username and Password
    2. vCenter Server username and password(you can place a check mark on the box next to “Use the same credentials for vCenter Server”)
    3. Directory to which the 5.x data will be exported.
    4. Ports. I would leave them default if you ask me.
    5. Directory to install vCenter Server.
    6. Directory to store vCenter Server data.
  24. Before starting the ESXi Host upgrades, it is important that we pay attention and resolve issues by looking into Host Alerts,alarms and to an extent log files to make sure you are not carrying forward the issues to the 6.0 platform.
  25. Ensure you have the additional device drivers for the server hardware if necessary.
  26. Migrate the VMs or Shutdown as applicable on the ESXi Host and put it in Maintenance Mode.
  27. Remove the Host from the DRS/HA Cluster.
  28. As many would recommend you to disconnect any Fabric Channel system before removing, DO NOT DISABLE the HBA in the BIOS.


Okay! So that was a long list but if you ensure that these are taken care of, you are probably better off to proceed with the Upgrade, given that you enter the right details in the Upgrade Wizard.


Missing Libraries or Packages from vRealize Orchestrator

Time and again, many of my customers who use vRO or have just started using it have spent many hours puzzled as to why dont they see even after many reboots. Just a word of advice: Always try a “Reset Current Configuration” in the Troubleshooting section followed by a Server and a Configuration Server services restart in the Startup Options of the Configuration Page(login using vmware as the username, url with the port 8283) of vRO before trying a reboot. It will save you a lot of time.


The curious case of localhost! 503 Service Temporarily Unavailable in vRA

One of my customer is running a Distributed Setup of vRealize Automation.

Following is the setup:
1 Identity appliance
2 VRA Appliances(6.2.0) also hositng the Postgres Appliances.
2 IAAS web server component hosting the Web Tier(the Infrastructure Tab)
2 IAAS App server component(IaaS App Tier alogn with the Manager Service)
2 DEMs (IaaS DEM Worker and Proxy Agents)
2 vRealize Orchestrators
vShield Manager which is the Load Balancer providing VIPs for the vRA Appliances, IaaS Web, IaaS App, vRO.

Everything was fine and dandy until one fine day, we see that the VAMI Page of the vRA Appliances do not show any services as Registered. We also saw that the certificates were expired, since we were using SAN certificates, it would mean that the certificates were expired on multiple nodes and other products as well.
When we looked at the /var/log/messages
we saw
2015-11-05T12:26:13.000440+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Authenticated user: root successfully
2015-11-05T12:26:13.000666+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info locale=en-US, id=certificateReplace, action=submit, controller=<type ‘instance’>
2015-11-05T12:26:13.000714+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Executing shell command…
2015-11-05T12:26:13.023630+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Returned vCAC host: vcac.corp
2015-11-05T12:26:13.024209+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Removing passphrase from key…
2015-11-05T12:26:13.031701+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Successfully removed passphrase from key.
2015-11-05T12:26:13.031751+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Importing certificate in vCAC KeyStore…
2015-11-05T12:26:15.116852+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: ERROR —BEGIN—#012Command execution failed with unexpected error: null.#012—END—#012#012Use -e option to get more details.
2015-11-05T12:26:15.116925+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: ERROR Error processing request: Error importing certificate in vCAC.
2015-11-05T12:26:15.117001+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: ERROR {‘cafe.host’: {‘action’: None, ‘value’: ‘vcac.corp’}}

We managed to get past the certificate replacement by following http://kb.vmware.com/kb/2106583. We were expecting that this should help resolve the issue and get our Private Cloud back up and running, but to our surprise we saw that the vRA services did not come up as registered.
So we looked into the /var/log/vmware/vcac/catalina.out file and saw a bunch of “503 Service temporary unavailable” messages like
“<timestamp> vcac: [component=”cafe:component-registry” priority=”ERROR” thread=”registryServiceNotificationExecutor-768″ tenant=””] com.vmware.vcac.core.componentregistry.service.impl.StatusServiceImpl.retrieveCurrentStatus:280 – Exception during remote status retrieval for url: https://vcac.corp/catalog-service/api/status. Error Message 503 Service Temporarily Unavailable.”

This made us re-think about the /etc/hosts file on the vRA Appliances and the vShield Edge Load Balancer.
So when we ping from the vRA Appliances to the VIP of the vRA appliances, it should point to the local machine’s IP, basically meaning that the traffic should’nt be going to the Load Balancer which held good in our case as well. So the /etc/hotsts file and the Load Balancer were doing its job.

We tried rebooting the vRA Appliances and then looked into the catalina.out log file to find out if the vRA Services were getting initialized and registered and saw that for some reason the component-registry service was not getting started.

Component-Registry is one of the Key services in vRA, it manages all application services as well as the 3rd party solution provider services. Other services interact with it to request data related to the service or the endpoint.
Looking into the log carefully we saw this Error –  No route to host

So we wanted to check if the apache service also gets this error. hence we ran the command on the vRA Appliance
#wget https://vcac.corp/component-registry/services –no-check-certficate
we see the following output:
Resolving vcac.corp…
Connecting to vcac.corp||:443… connected.
HTTP request sent, awaiting response… 503 Service Temporarily Unavailable
ERROR 503: Service Temporarily Unavailable

Since this is resolving to the IP address on the local vRA appliance itself, we wanted to see if this error originated from the Apache2 on the vRA Appliance or the vcac server.
To check this we verified the access_log file at /var/log/vmware/vcac/access_log and /var/log/apache2/access_log files.

So when we looked into the /var/log/vmware/vcac/access_log file we see that we get “GET /component-registry/api/status HTTP/1.1” 200 926 – Basically the vcac access_log file states that it gets a HTTP 200, which is good, now when looking at the apache2’s access_log file. We see
“GET /component-registry/endpoints/types/com.vmware.csp.core.plugin.service.api/default HTTP/1.1” 503 416
“POST /component-registry/services/HTTP/1.1” 503 416

Further looking at the /var/log/apache2/error_log we see
[error] (110) Connection timed out: proxy AJP: attempt to connect to (localhost) failed
[error] proxy AJP: disabled connection for (localhost)
[error] proxy: AJP: failed to make connection to backend: localhost

We restarted that apache2 service using /etc/init.d/apache2 restart and then looked at the /var/log/apache2/error_log file it stated
[error] (110) Connection timed out: proxy AJP: attempt to connect to (localhost) failed

Wait a minute! We dont know what that IP Address is!
When we did a nslookup on that IP we found out that there is a Machine on the DNS server with the name localhost.corp which points to that IP Address! Is’nt that strange!

The DNS in the customer’s infrastructure was managed and maintained by the parent company of the organization and the cloud architect or none of the other technical resources had access to the DNS Server.
So to bypass that we thought lets make sure that this IP is the problem child.

To do this, we modified the vcac.conf file at /etc/apache2/vhosts.d
We changed the line from “ProxyPass / ajp://localhost:8009/ nocanon” to “ProxyPass / ajp:// nocanon” under #Tomcat AJP Proxy and then restarted apache2 using /etc/init.d/apache2 restart

We now checked the command
#wget https://localhost:443/component-registry/services/status –no-check-certficate
The output was:
Resolving localhost…
Connecting to localhost||:443… connected
HTTP request sent, awaiting response… 200 OK

We got a HTTP 200 basically meaning it started working! So what it means was that the apache2 was resolving localhost to a machine called as localhost.corp instead of the localhost which is the vRA Appliance itself.

We now checked on the VAMI page for the list of services and Voila! the services were registered 🙂

Apache2 is a proxy before the Tomcat server. So instead of using localhost as the name of the Tomcat Server, we used localhost was resolving to an IP on the DNS). Bascially bypassing the DNS, we were able to startup the services.

After waking up the DNS Admin in the middle of the night and removing the rogue entry and reverting the changes done to the /etc/apache2/vhosts.d/vcac.conf we were successfully able to get the services back up and and running and eventually the Private Cloud was back to Business. It was like finding a needle in the haystack!

Lesson learnt: Be pristine with your DNS Server