Missing Libraries or Packages from vRealize Orchestrator

Time and again, many of my customers who use vRO or have just started using it have spent many hours puzzled as to why dont they see even after many reboots. Just a word of advice: Always try a “Reset Current Configuration” in the Troubleshooting section followed by a Server and a Configuration Server services restart in the Startup Options of the Configuration Page(login using vmware as the username, url with the port 8283) of vRO before trying a reboot. It will save you a lot of time.

>vRevealed<

Advertisements

The curious case of localhost! 503 Service Temporarily Unavailable in vRA

One of my customer is running a Distributed Setup of vRealize Automation.

Following is the setup:
1 Identity appliance
2 VRA Appliances(6.2.0) also hositng the Postgres Appliances.
2 IAAS web server component hosting the Web Tier(the Infrastructure Tab)
2 IAAS App server component(IaaS App Tier alogn with the Manager Service)
2 DEMs (IaaS DEM Worker and Proxy Agents)
2 vRealize Orchestrators
vShield Manager which is the Load Balancer providing VIPs for the vRA Appliances, IaaS Web, IaaS App, vRO.

Everything was fine and dandy until one fine day, we see that the VAMI Page of the vRA Appliances do not show any services as Registered. We also saw that the certificates were expired, since we were using SAN certificates, it would mean that the certificates were expired on multiple nodes and other products as well.
When we looked at the /var/log/messages
we saw
2015-11-05T12:26:13.000440+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Authenticated user: root successfully
2015-11-05T12:26:13.000666+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info locale=en-US, id=certificateReplace, action=submit, controller=<type ‘instance’>
2015-11-05T12:26:13.000714+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Executing shell command…
2015-11-05T12:26:13.023630+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Returned vCAC host: vcac.corp
2015-11-05T12:26:13.024209+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Removing passphrase from key…
2015-11-05T12:26:13.031701+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Successfully removed passphrase from key.
2015-11-05T12:26:13.031751+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: info Importing certificate in vCAC KeyStore…
2015-11-05T12:26:15.116852+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: ERROR —BEGIN—#012Command execution failed with unexpected error: null.#012—END—#012#012Use -e option to get more details.
2015-11-05T12:26:15.116925+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: ERROR Error processing request: Error importing certificate in vCAC.
2015-11-05T12:26:15.117001+05:30 vcac01 vami /opt/vmware/share/htdocs/service/cafe/config-page.py: ERROR {‘cafe.host’: {‘action’: None, ‘value’: ‘vcac.corp’}}

We managed to get past the certificate replacement by following http://kb.vmware.com/kb/2106583. We were expecting that this should help resolve the issue and get our Private Cloud back up and running, but to our surprise we saw that the vRA services did not come up as registered.
So we looked into the /var/log/vmware/vcac/catalina.out file and saw a bunch of “503 Service temporary unavailable” messages like
“<timestamp> vcac: [component=”cafe:component-registry” priority=”ERROR” thread=”registryServiceNotificationExecutor-768″ tenant=””] com.vmware.vcac.core.componentregistry.service.impl.StatusServiceImpl.retrieveCurrentStatus:280 – Exception during remote status retrieval for url: https://vcac.corp/catalog-service/api/status. Error Message 503 Service Temporarily Unavailable.”

This made us re-think about the /etc/hosts file on the vRA Appliances and the vShield Edge Load Balancer.
So when we ping from the vRA Appliances to the VIP of the vRA appliances, it should point to the local machine’s IP, basically meaning that the traffic should’nt be going to the Load Balancer which held good in our case as well. So the /etc/hotsts file and the Load Balancer were doing its job.

We tried rebooting the vRA Appliances and then looked into the catalina.out log file to find out if the vRA Services were getting initialized and registered and saw that for some reason the component-registry service was not getting started.

Component-Registry is one of the Key services in vRA, it manages all application services as well as the 3rd party solution provider services. Other services interact with it to request data related to the service or the endpoint.
Looking into the log carefully we saw this Error –  No route to host
noroutetohost

So we wanted to check if the apache service also gets this error. hence we ran the command on the vRA Appliance
#wget https://vcac.corp/component-registry/services –no-check-certficate
we see the following output:
Resolving vcac.corp… 10.110.68.28
Connecting to vcac.corp|10.110.68.28|:443… connected.
HTTP request sent, awaiting response… 503 Service Temporarily Unavailable
ERROR 503: Service Temporarily Unavailable

Since this is resolving to the IP address on the local vRA appliance itself, we wanted to see if this error originated from the Apache2 on the vRA Appliance or the vcac server.
To check this we verified the access_log file at /var/log/vmware/vcac/access_log and /var/log/apache2/access_log files.

So when we looked into the /var/log/vmware/vcac/access_log file we see that we get “GET /component-registry/api/status HTTP/1.1” 200 926 – Basically the vcac access_log file states that it gets a HTTP 200, which is good, now when looking at the apache2’s access_log file. We see
“GET /component-registry/endpoints/types/com.vmware.csp.core.plugin.service.api/default HTTP/1.1” 503 416
“POST /component-registry/services/HTTP/1.1” 503 416

Further looking at the /var/log/apache2/error_log we see
[error] (110) Connection timed out: proxy AJP: attempt to connect to 10.122.91.218:8009 (localhost) failed
[error] proxy AJP: disabled connection for (localhost)
[error] proxy: AJP: failed to make connection to backend: localhost

We restarted that apache2 service using /etc/init.d/apache2 restart and then looked at the /var/log/apache2/error_log file it stated
[error] (110) Connection timed out: proxy AJP: attempt to connect to 10.122.91.218:8009 (localhost) failed

Wait a minute! We dont know what that IP Address 10.122.91.218 is!
When we did a nslookup on that IP we found out that there is a Machine on the DNS server with the name localhost.corp which points to that IP Address! Is’nt that strange!

The DNS in the customer’s infrastructure was managed and maintained by the parent company of the organization and the cloud architect or none of the other technical resources had access to the DNS Server.
So to bypass that we thought lets make sure that this IP is the problem child.

To do this, we modified the vcac.conf file at /etc/apache2/vhosts.d
We changed the line from “ProxyPass / ajp://localhost:8009/ nocanon” to “ProxyPass / ajp://127.0.0.1:8009/ nocanon” under #Tomcat AJP Proxy and then restarted apache2 using /etc/init.d/apache2 restart

We now checked the command
#wget https://localhost:443/component-registry/services/status –no-check-certficate
The output was:
Resolving localhost… 127.0.0.1
Connecting to localhost|127.0.0.1|:443… connected
HTTP request sent, awaiting response… 200 OK

We got a HTTP 200 basically meaning it started working! So what it means was that the apache2 was resolving localhost to a machine called as localhost.corp instead of the localhost which is the vRA Appliance itself.

We now checked on the VAMI page for the list of services and Voila! the services were registered 🙂

Apache2 is a proxy before the Tomcat server. So instead of using localhost as the name of the Tomcat Server, we used 127.0.0.1(since localhost was resolving to an IP 10.122.91.218 on the DNS). Bascially bypassing the DNS, we were able to startup the services.

After waking up the DNS Admin in the middle of the night and removing the rogue entry and reverting the changes done to the /etc/apache2/vhosts.d/vcac.conf we were successfully able to get the services back up and and running and eventually the Private Cloud was back to Business. It was like finding a needle in the haystack!

Lesson learnt: Be pristine with your DNS Server

>vRevealed<

Troubleshooting ESXi Hosts which are Not responding or in a Disconnected State

Troubleshooting ESXi Hosts which are Not responding or in a Disconnected State

Introduction:

ESXi Hosts going into a Not Responding Mode or Disconnected is not a problem which is something new for the vSphere Admins. For those experienced users, would also know that there is no single solution or an approach to the problem as the issue can arise at multiple levels right from the Hardware level, through the Networking and/or Storage Stack and ultimately the vmkernel stack provided by VMware vSphere.

It is always recommended to have a thorough understanding of the problem in hand and look into the logs to tackle the issue at hand.

The VMware Knowledge Base has a lot of articles surrounding this issue and hence I thought I will put up a document which will help guide through a Fish-Bone Approach on troubleshooting this problem. Most of this is taken from the KB Articles published by VMware.

Prerequisites:

Since these issues can happen due to multiple reasons starting from the Hardware, Network, Storage or the vSphere’s Kernel modules itself. The below steps are more of a primer for you to understand what could have possibly gone wrong.

Start from a Hardware Level then Network & Storage level and then vSphere level.

Ask yourself the following questions:

  • How many times has the ESX host experienced this condition?
  • What were the exact times and dates that the host became unresponsive?
  • Have any other hosts experienced this issue?
  • What else was happening in your environment at the time of the events?
  • Is there a pattern to the times when the host becomes unresponsive?
  • Are there any regularly scheduled jobs running when the host becomes unresponsive?
  • Do I have the logs which would have captured the time frame when the issue would have occurred?

The above questions coupled with logs for the host should be provided to the VMware Technical Support Team when logging a Service Request, this will help prevent going back-and-forth for additional information.

Remember when submitting logs, ensure that the logs you submit should have the timeframe when the issue occurred so that the Technical Support team can help find the exact problem.

To collect diagnostic information for ESX/ESXi hosts and vCenter Server using the vSphere Web Client Please see http://kb/vmware/com/kb/2032892 , alternately when using vSphere Client please see http://kb.vmware.com/kb/653

To upload diagnostic information to VMware Technical Support Team, please see http://kb.vmware.com/kb/1008525

NOTE: When the ESXi Server is in a Not Responding State, you will not be able to collect logs using the vSphere Client or the vSphere Web Client. Ideally you should troubleshoot to get the host back into a manageable state, however if you still need to get logs. You can get the same by running the command /usr/lib/vmware/vm-support from the SSH/DCUI.

 

 

 

Understanding the difference between a Not Responding Host and a Disconnected Host

Not Responding: A host can become greyed out and shown as Not Responding due to an external factor that vCenter Server is unaware of. If a host is showing as Not Responding, vCenter Server no longer receives heartbeats from it. This can happen for several reasons, all of which prevent heartbeats being received from the host to vCenter. Some common reasons include:

  • A network connectivity issue between the host and vCenter Server, for example UDP port 902 not open, a routing issue, bad cable, firewall rule, etc.
  • hostd is not running successfully on the host.
  • vpxa is not running successfully on the host.
  • The host has crashed.

A host can go from Not Responding back to a normal state if the underlying issue which brought the host to the Not Responding state is resolved. However, a host that is in the Disconnected state ceases to be monitored by vCenter Server and will stay in that state regardless of the status of the underlying issue. Once the issue is resolved, the user must right-click on the host and select Connect to bring the host back to a normal state in vCenter Server.

Disconnected: Disconnected is a state initiated from the vCenter Server side and suspends vCenter Server’s management of the host, and thus all vCenter Server services ignore the host.

A disconnected host is one that has been explicitly disconnected by the user, or the license on the host has expired. Disconnected hosts also require the user to manually reconnect the host. Ultimately, a host that is disconnected can become that way for three reasons (2 of which require manual intervention):

  • A user right-clicks the host and selects Disconnect.
  • A user right-clicks a host that is listed as Not Responding and clicks Connect and that task fails.
  • The host license expires.

When a host becomes disconnected, it still exists in the vCenter Server inventory, but vCenter Server does not get any updates from that host, does not monitor it, and therefore has no knowledge of the health of that host.

vCenter Server takes a conservative approach when considering disconnected hosts. Virtual machines on a host that is not responding affect the admission control check for vSphere HA. vCenter Server does not include those virtual machines when computing the current failover level for HA, but assumes that any virtual machines running on a disconnected host will be failed over if the host fails. Because the status of the host is not known, and because vCenter Server is not communicating with that host, HA cannot use it as a guaranteed failover target. As part of disconnecting a host, vCenter Server disables HA on that host. The virtual machines on that host are therefore not failed over in the event of a host isolation. When the host becomes reconnected, the host becomes available for failover again.

Now that we know the difference, we would recommend that you start the troubleshooting from a Hardware layer and progress through Networking, vSphere layer and the Storage layer.

Let’s discuss them one by one.

 

Hardware Layer

  1. Verify the current state of the ESXi host hardware and power. Physically go to the VMware ESXi host hardware, and make note of any lights on the face of the server hardware that may indicate the power or hardware status. For more information regarding the hardware lights, consult the hardware vendor documentation or support.
  2. Depending on the configuration of your physical environment, you may consider connecting to the physical host by using a remote hardware interface provided by your hardware vendor.
  3. If the hardware lights indicate that there is a hardware issue, consult the hardware vendor documentation or support to identify any existing hardware issues.
  4. Determine the state of the user interface of the ESXi host in the physical console. To determine the responsiveness at the local physical console prior to taking any action.
    1. Press the NumLock key on your keyboard and observe if the NumLock light state changes. A successful light state change indicates that the BIOS is responsive.
    2. Check if there is any active disk or network traffic using status lights or other hardware monitoring on the disk drive array, network interface cards or upstream switches. Active egress traffic indicates that the ESXi host is still functioning.
    3. Attempt to interact with the server via a baseboard management controller (BMC) interface, such as ILO, DRAC or RSA. If aspects of this interface other than the console are also unresponsive, it indicates that the issue is hardware related.
    4. Reboot the ESXi host. Collect the Diagnostic Information using http://kb.vmware.com/kb/1008524
  1. Verify the right BIOS version of the hardware on the VMware Compatibility Guide. Go to the VMware Compatibility Guide at http://www.vmware.com/resources/compatibility/search.php and select Systems/Servers in “What are you looking for” and choose the ESXi Version installed. Choose the Partner Name and type in the Server Model in the Keyword and click Update and View Results.
  2. If things are fine on the Hardware side, lets proceed to the Network layer

Network Layer

  1. Verify PING connectivity using IP Address as well as DNS from both the ESXi Host as well as the vCenter Server. Using them as Source and Destination Address vice-versa.
    1. If the ping succeeds using the IP and not using the Hostname, then there is problem in the DNS. – If this doesn’t work – First fix this issue.
    2. If your investigation confirms that the host in question is using the old address and the DNS server is resolving the name to its new address. , then we will have to clear the DNS Cache in the ESXi Servers using “/etc/init.d/nscd restart” command from the SSH Session.
  2. Verify the state of the Network Cards on the ESXi Server
    1. Connect to the ESXi Host using PuTTY or if PuTTY is not working use the Remote Console to run the command “esxcli network nic list”. Make sure the Link Status shows “Up”
    2. Verify ESXi Host Networking using the command “esxcfg-vswitch –l”
  3. If you can ping the ESXi Host from the vCenter Server, can you open a direct vSphere Client Connection to the ESXi Host? If this works then remove the disconnected ESXi Host from the vCenter Server’s inventory and re-add – This should work.
  4. If Step 3 still doesn’t work then probably an incorrect Managed IP Address has been set.
    1. Log into the vSphere Web Client. The default URL is https://vCenter_Server_FQDN:9443/vsphere-client
    2. Navigate to vCenter > Inventory Lists vCenter Server.
    3. Click the vCenter Server needing to be verified.
    4. Select Manage > Settings > General
    5. Expand the Runtime settings field by clicking on the arrow next to it. Make a note of the vCenter Server managed address.
    6. To modify the Runtime Settings:
      1. Click Edit in the right-hand corner of the panel
      2. Click Runtime settings on the left-hand side of the window
      3. Modify the vCenter Server managed address and vCenter Server name accordingly.
      4. Click OK
    7. Make sure that the IP seen in Step 4.5 is same as the IP reported by the command “grep -i serverIp /etc/vmware/vpxa/vpxa.cfg”
    8. If the IP Address is correct, it could probably be that the vCenter Server and the ESXi Host are not receiving heartbeat packets which are usually exchanged through the default port 902, these packets are important for the Host to stay connected.
    9. Check the port through which the heartbeat traffic flows by issuing a command on the ESXi host using “grep -i serverport vpxa.cfg”. This will return with the serverport number.
    10. On the Windows based vCenter Server use the telnet command, for example, “telnet <esxi_ip> 902”
    11. On the ESXi Host use the netcat(nc) command as “nc –z <vcenter_ip> 902”

 

vSphere Layer

  1. Is the ESXi Host Licensed correctly?
  2. Even after verifying and making sure things are good from a Hardware and a Network perspective, if you are unable to establish a vSphere Client Connection to the ESXi Host directly, then it could be possible that the Management Agent of the ESXi Host is not functioning correctly.
  3. Issuing a ./services.sh restart will restart all the Management related services. Please be patient for atleast 4-5 minutes to allow all the services to be restarted after issuing the command. To verify if the Management Agent is started successfully you should see the following on ESXi 6.0.x Hosts, you will have to grep for “BEGIN SERVICES” on the /var/log/hostd.log file(you will see the line as below) -“2015-12-22T09:25:50.314Z info hostd[FF8BAA70] [Originator@6876 sub=Default] BEGIN SERVICES”
  4. IMPORTANT: If LACP IS CONFIGURED, DO NOT RESTART MANAGEMENT SERVICES USING services.sh restart instead use the commands in Step 2.5.
  5. Alternately just issuing commands
    1. /etc/init.d/hostd restart – This will restart the Management agent of the ESXi Host
    2. /etc/init.d/vpxa restart – This will restart the vCenter Agent on the ESXi Host
  6. Note there may be issues with Storage not visible and probably the Host might be experiencing APD or PDL and hence the above command might be stuck. If you are sure that there are no storage level issues, only then try this command. To make sure the Storage is accessible. Run the following commands:
    1. vim-cmd vmsvc/getallvms – This will list all the VMs running on the host with the Datastore details.
    2. ls /vmfs/volumes/[Datastore_Name]/ – This will list all the VMs in that datastore
    3. cd /vmfs/volumes/[Datastore_Name] – To get into the datastore directory
    4. touch filename – This is to check if you are able to create a file named as filename
    5. rm filename – This is to delete the filename file.
    6. If the above two commands show results then you are able to access the Storage.
  7. Are you running the right Network/Storage driver and/or firmware version based on the version of ESXi Hosts?
    1. Did you know you can run the command “/usr/lib/vmware/vm-support/bin/swfw.sh” to display the firmware and driver version of the hardware connected to the ESXi Hosts. This is the versioning information from the CIM Providers. Look for the VersionString for each of the InstanceID.
    2. If you think that the command in Step 3a gives you a lot of information and gets you confused(like me). We will have to individually run the commands to find out the versions for HBAs and NICs.
    3. Run the command esxcfg-scsidevs –a . This will give you the HBA devices with the modules associated(the second column lists the HBA module names)
    4. Run the command vmkload_mod –s <HBA_drivername_forexample_qlnativefc> | grep –i version . This command will give you the version.
    5. Run the command esxcli network nic list , this will list down all the network cards.
    6. Run the command esxcli network nic get –n vmnic# | grep –i version . Where # is the NIC number. This will give you the Version.
    7. Run the command vmware –v . This will give you the version of the ESXi installed on the ESXi Host.
    8. Go to the VMware Compatibility Guide at http://www.vmware.com/resources/compatibility/search.php and select IO devices in “What are you looking for” and choose the ESXi Version installed. Choose the Partner Name and type in the Name of the IO Card in the Keyword and click Update and View Results.
    9. You SHOULD be on the exact versions as stated by the VMware Compatibility Guide.
  8. Are you managing remote ESXi Hosts using vCenter Server? Are your ESXi Servers disconnecting approximately 30 to 60 seconds. Due to network or ISP limitations, it may be necessary to use NAT to connect to the vCenter Server. Please note that USING NAT BETWEEN VCENTER SERVER AND ESXi HOSTS IS AN UNSUPPORTED CONFIGURATION.
    1. As discussed in the Network layer, port 902 is critical for heartbeat functionality. Even if the port is open, the host still disconnect if the firewall on the Windows Server 2008 blocks Edge Traversal.
    2. To enable Edge Traversal, Go to Start-> Run and type wf.msc and hit Enter on the vCenter Server and locate VMware vCenter Server – Host Heartbeat Rule and select the Advanced tab and choose Allow Edge Traversal.
  9. For ESXi versions older than 5.5, it is possible that the hardware monitoring service(sfcbd) populates the /var/run/sfcb. For this issue, stop the sfcbd-watchdog service using the command /etc/init.d/sfcbd-watchdog stop and then cd /var/run/sfcb and remove the files in the directory using the rm command. Once deleted, start the service using the command /etc/init.d/sfcbd-watchdog start

Storage layer: Have you tried the steps listed in Step 2d of the vSphere layer? If yes and still facing issues, based on the type of Storage and the scope of the Problem, please select the right hyperlink in http://kb.vmware.com/kb/1003659

Hope the above helps you narrow down to the path of Why the Host went into a Not Responding State or a disconnected state. Having this information ready with you while filing Support Requests are going to help you reduce the total troubleshooting time and get you to a faster resolution.

Happy troubleshooting!

>vRevealed<

IT Transformation and Organizational Process…

IT Transformation and Organizational Process Maturity

IT Transformation and Organizational Process…

By John Worthington Regardless of what process framework you use, and especially if you’ve done some ‘adaptation’ of processes, building process capability over the long haul goes hand-in-hand with building the organization’s process maturity. Having thought about my comment, ‘so go ahead, adopt and adapt’, in one of my previous posts I thought further discussion […] The post IT Transformation and Organizational Process Maturity appeared first on VMware Operations Transformation Services.


Shutdown and Startup order when using the VMware vCloud Suite

One of my customers had to move his Datacenter from Wing A to Wing B in the same office campus , thanks to the Real Estate and Workspace Services department, which provided a larger and better space for us.

The customer recently had Professional Services design their Private Cloud and were almost close to 150 VMs and this was a huge task for the Cloud Management team which had a lone soldier wearing multiple hats and a Head of the Department. They are a R&D Shop serving Virtual Machines to their users who use it from the complete SDLC lifecycle perspective for their software products.

After much alerts and reminders the time had come when we had to start the shutdown and startup of our gear.

This is a distributed environment of vRA along with the other components of the vCloud Suite. vRA Appliances, IaaS Web Servers, IaaS App Servers and the vRealize Orchestrators were load balanced using 2 vCNS Edge Devices which act as the Load Balancers using the IP_HASH Load Balancing Algorithm.

Ok, apart from the people moving the Datacenter gear, we made sure our Networking and Storage was setup as it was in the Wing A DC in our Wing B(new) DC. Thanks to the those Networking and Storage Admins.

These are the steps which the Cloud Admin followed:
0 – ASK YOURSELF ARE YOU SATISFIED WITH THE BACKUPS TAKEN FOR THIS.
1. Shutdown all Workload VMs(Ask users to Shut down their VMs from the vRA Portal OR SHUTDOWN VM’s FROM the Managed Machines TAB in case Users forget to power off). DO NOT UNREGISTER THEM.
2. Run Data Collection in vRA for all attributes(state,performance,inventory) – MAKE SURE IT IS SUCCESSFUL. This is so that the state of the VMs in vRA DB and the state of the VMs in vCenter DB is same – THIS IS IMPORTANT
3. Shut Down vRealize Operations.
4. Shut Down vRealize Hyperic.
5. Shut Down Infrastructure Navigator.
6. Shut Down VMware vRealize Appliance Node 01(They take care of the vRA Portal and the Postgres DB)
7. Shut Down VMware vRealize Appliance Node 02(They take care of the vRA Portal and the Postgres DB)
8. Shut Down vRA Identity Appliance
9. Shut Down IaaS Web Server VM Node 01(This is the IaaS Web Tier, corresponding to the Infrastructure Tab)
10. Shut Down IaaS Web Server VM Node 02(This is the IaaS Web Tier, corresponding to the Infrastructure Tab)
11. Shut Down IaaS App Server VM Node 01(This is IaaS App Tier and the Manager Service)
12. Shut Down IaaS App Server VM Node 02(This is IaaS App Tier and the Manager Service)
13. Shut Down DEM Orchestrator Node 01(This contains the IaaS DEM Worker and the Proxy Agents)
14. Shut Down DEM Orchestrator Node 02(This contains the IaaS DEM Worker and the Proxy Agents)
15. Shut Down vRO Appliance Node 01(This is configured as an Endpoint in the Customer’s setup)
16. Shut Down vRO Appliance Node 02(This is configured as an Endpoint in the Customer’s setup)
17. Shut Down vCNS Edge Load Balancer 1
18. Shut Down vCNS Edge Load Balancer 2
19. Shut Down vCNS Manager
20. Shut Down vRealize Business Standard
21. Shut Down vCenter Server
22. Shut Down Database Server hosting all the Databases ( I know this is one SPOF, the customer is going to have SQL Clustering done for it)

Ok, so that was easy!

Now lets go to the Startup Order.

1. Have a positive frame of mind( You can do this!)
2. Startup the Database Server hosting all the Databases. Make sure the Database is up and running successfully.
3. Start the vCenter Server
4. Start the vCNS Manager
5. Start the vCNS Edge Load Balancer 1
6. Start the vCNS Edge Load Balancer 2
7. Start the vRA Identity Appliance
8. Provide a delay of 10 Minutes.
9. Start the VMware vRealize Appliance Node 01
10. Start the VMware vRealize Appliance Node 02
11. Give a Delay of 10 Minutes and check if all the Services on the vRA Admin Portal state as REGISTERED, Of Course the IaaS Service will be failed, since we have not started it yet)
12. Start the IaaS App Server VM Node 01(This is IaaS App Tier and the Manager Service)
13. Start the IaaS App Server VM Node 02(This is IaaS App Tier and the Manager Service)
14. Start the IaaS Web Server VM Node 01(This is the IaaS Web Tier, corresponding to the Infrastructure Tab)
15. Start the IaaS Web Server VM Node 02(This is the IaaS Web Tier, corresponding to the Infrastructure Tab)
16. Start the DEM Orchestrator Node 01(This contains the IaaS DEM Worker and the Proxy Agents)
17. Start the DEM Orchestrator Node 02(This contains the IaaS DEM Worker and the Proxy Agents)
18. Start the vRO Appliance Node 01
19. Start the vRO Appliance Node 02
20. Start vRealize Business Standard
21. Give a delay of 10 minutes, check if the portal is up and running and you can see all the tabs functioning
22. Start vRealize Operations Manager
23. Start vRealize Hyperic
24. Start Infrastructure Navigator
25. POWER ON WORKLOAD VMs from the Managed Machines TAB in vRA & NOT FROM the vCENTER Server. Take a wave approach of starting 10 VMs at a time.
26. Do a Data Collection and make sure it is Successful for all the attributes( State, Performance & Inventory)

There you go. You deserve a Chilled one!

Cheers,
>vRevealed<

VMworld 2015 Content Catalog is now live!

What sessions will you attend? The #VMworld 2015 Content Catalog is now live!

VMworld 2015 Content Catalog is now live!

The Content Catalog allows prospective VMworld attendees access to the VMworld agenda, with the ability to peruse breakouts and note sessions of interest. You can search and filter to your heart’s content—by track, category, session format, industry, role, technical level, speaker name, location (US or Europe), and keyword search. You cannot schedule sessions in the catalog.


VMware Advocacy