Tag Archives: Helpdesk Alert

Key Survey / WorldApp Update: Message from the CEO

Here is the message sent by the CEO of WorldApp, Inc. concerning last Friday’s Key Survey down time.  (Key Survey is a software program used to create and distribute surveys, as well as collect & analyze responses.)

KeySurvey Logo

From: Oleg Matsko
Sent: Monday, May 18, 2015 9:36 AM
Subject: An update on Friday’s disruption – a message from our CEO

Last Friday’s issues have been some of the most severe issues to affect WorldAPP since we launched Key Survey in 2002. As CEO, I take immense pride in serving organizations across the world in fulfilling their requirements and I feel immensely sorry and hurt that we let those customers down. As such, I feel it is only right that we be completely open, honest and transparent about what happened, and what we are doing to make sure it doesn’t happen again.

A few weeks ago we noticed that one of the storage components of our production environment had started to fail. This in itself doesn’t cause an immediate issue, our production environment is built with multiple layers of redundancy, and despite one of the critical elements of this environment not functioning, our applications continued to work in the manner they should, without any impact on availability. It is important though that when these issues occur, we rectify them as quickly as we can, so that should other components of our environment fail, there isn’t any impact on service.

So for the past few weeks we have been preparing our secondary storage components to take over, allowing us to complete the necessary works on the primary components. Our applications collect a lot of data, in fact the equivalent of 11,000 pages of paper an hour, and this amount of data takes a lot of time to transfer. In an absolutely emergency we can complete this transfer in about 12 hours, but as our primary setup was still stable, and the risks of transferring such a huge amount of data in a relatively short amount of time being quite high, we took our time and completed this transfer over a period of a few weeks.

This transfer was completed on Thursday evening, our secondary storage components went live without issue, and our primary storage components were taken offline to allow the required maintenance to be completed. For a few hours, everything worked fine, and then at around 08:00 EDT on Friday morning, without notice our secondary storage components failed. At the moment, the reason why they failed is still unclear, there doesn’t appear to be an obvious cause. We will work hard with our infrastructure partners, to find out why this happened – but the most important thing for us to do on Friday was to get our applications back online.

Key Survey and Form.com are incredibly large and complex applications, and restarting them isn’t a simple operation. The applications are made up of many separate modules, each relating to an area of their functionality, such as reporting, voting or our API. The effort required to restart them is large, so much so that they cannot all be restarted at once. As such, modules were restarted individually, in order of priority. Our main Key Survey and Form.com environments were operational by 15:00 EDT, with all of our reporting modules online by 21:30 EDT and specific instances of our applications for individual customers back online by 00:30 EDT on Saturday morning.

As a result of Friday’s disruption, I have instructed our teams to rebuild our storage infrastructure to include additional layers of redundancy with built in instant failover capabilities. This is no easy challenge, implementing this infrastructure and migrating all our applications will take about a week, but we should be able to complete this without additional disruption. Once these changes are implemented, we will be able to recover our systems in a matter of minutes. This is in addition to the construction of the remote disaster recovery infrastructure which is already underway and estimated to be completed early next year.

Unfortunately, until these changes have been completed, our secondary storage components could fail again, and this leaves us in a precarious position. Whilst the probability of such a failure is low, and we have taken all possible precautions to ensure it doesn’t reoccur, our teams are prepared to restore services as quickly as possible in the event of a second failure. As the amount of data that is migrated to the new infrastructure increases throughout the week, the amount of time to restore services in the event of an issue reduces. This does mean though that should a similar issue occur early this week, we could experience a similar outage as to what happened on Friday.

As mentioned, I want to be transparent about the challenges we face, and honest about what could happen while we take steps to improve our services. We will let you know as soon as this new environment is fully functional and we can be sure that such issues do not cause as much disruption as they have. In the meantime our team are working diligently to monitor and manage our applications to avoid such issues, and are prepared to restore services as quickly as possible in the event of a reoccurrence of Friday’s troubles. I can also assure you that we will investigate thoroughly what caused these components to fail, but for the time being I want to concentrate all our resources on implementing these changes and improving our service to you.

We will support you as much as we can as a result of this disruption – if there is anything WorldAPP can do to assist you from work you weren’t able to complete last week, such as building surveys, forms or reports, please let your account manager know. We’ll endeavour to accommodate as many requests as we can.

Once again I would like to reiterate my thanks for your patience and understanding, and my genuine sorrow that we have let you down. WorldAPP have been a trusted provider of survey, forms and inspection solutions for over 12 years now, and I hope my explanation of what happened, and assurances of the actions we’re taking to ensure it doesn’t happen again, go some way to rebuilding that trust.

Sincerely,
Oleg Matsko
CEO
WorldAPP, Inc.
161 Forbes Rd Ste 300, Braintree, MA, 02184, US

Key Survey / WorldAPP Service Interruption – Update

KeySurvey Logo

The login and survey access issues with Key Survey have not yet been resolved.  Here is the latest information received from their support team:

From: WorldAPP Support [mailto:support@worldapp.com]
Sent: Friday, May 15, 2015 1:30 PM
Subject: WorldAPP System Interruptions

Today, WorldAPP services, including Key Survey, Form.com and associated applications, have been subject to a service disruption. Below is a brief overview of what caused the issue and the actions we’re taking to restore services as quickly as possible.

Recently, a CPU on one of the servers that our applications use to access our database started failing. Whilst the failure of one CPU doesn’t cause disruption to our services, it does require maintenance so that should the others fail, our applications aren’t impacted. Yesterday evening, our team migrated services to our disaster recovery environment to enable the required maintenance to take place. This is common practice during periods of maintenance to enable continuation of service and has been regularly implemented without effect.

After a few hours of operating on the disaster recovery environment, for reasons yet unknown, the disaster recovery environment failed. Our team took immediate steps to bring the environment back online and are working very hard on restoring services in order of priority, with the most critical services being the first to be restored. As this process continues, we’ll provide further updates on our community pages here.

As we continue to experience service disruption, our applications will remain unavailable and respondents attempting to complete a survey or form will be directed to an error page. We are incredibly sorry for the frustration that this disruption is causing you, and assure you we’re working as hard as we can to restore full service as quickly as possible.

Yours sincerely,
Teresa Crisci
Director of Client Services

By: WorldAPP, Inc.
161 Forbes Rd Ste 300, Braintree, MA, 02184, US

Key Survey Issues — Login and Survey Access Unavailable

KeySurvey LogoWe are currently experiencing issues with Key Survey (hosted by WorldApp).  Users who try to log on will not be presented with the usual login screen; the page simply does not load.  Survey recipients will not be able to access surveys and respond at this time.

WorldApp has been notified of these problems.  Updates will be shared here as soon as they are available.

Remember that go/techalerts can be used for quick access to system up/down information and posts concerning outages.

[As of 9:15 am – WorldApp currently estimates that services will be restored in about 30 minutes.  All modules are affected; surveys and reports are not accessible as well.]

Moodle Maintenance on Friday, April 17th

From Remote-Learner, our Moodle host:
Maintenance Window: 12:00am to 4:00am ET on April 17, 2017
In order to increase the resilience and reliability of our cloud platform we will be conducting network maintenance during the window above. At this time, your site will be operational but may notice a slight decrease in performance.
Sincerely,
Remote-Learner Technical Support

Wireless Update

You may have noticed that no changes to our wireless networks took place yesterday (3/16).  We are giving people more time to switch from Midd-standard to the new MiddleburyCollege network.  If you haven’t already done so, please take this opportunity to connect your wireless devices to MiddleburyCollege.  If you encounter any difficulties, contact the Technology HelpDesk for assistance.

The following wireless networks are currently available:
* MiddleburyCollege is the new, fast, and secure wireless network.  A Middlebury username and password or guest account is required.
* MCPSK is the new wireless network for the limited devices that cannot connect to the MiddleburyCollege network, including the following:  PlayStation, Xbox, Nintendo DS, Kindle, and Nook.
* Midd-standard continues to be available to provide wireless access to campus visitors.
* The eduroam network is available for guests from participating institutions who don’t have Middlebury credentials.