A problem occurred on June 25th that caused Windows computers to temporarily lose their network drive mapping when logging in. We have researched what may have caused this issue, and believe we have fixed it. Please call or email the Helpdesk (firstname.lastname@example.org) if you see any problem with your network drives the next time you login to your computer.
From Remote-Learner, our Moodle Host:
http://moodle.middlebury.edu has been scheduled for Production Upgrade at 01:00 EDT on 5-JUN-2015.
Your site may be unavailable for up to 120 minutes while this action occurs.
The following information is relevant to anyone who uses Key Survey to create and distribute surveys, as well as survey respondents.
From: WorldAPP Customer Care
Sent: Thursday, May 28, 2015 1:02 PM
Subject: WorldAPP Maintenance Notification
Key Survey & Form.com will be undergoing maintenance between 1am and 3am EDT on Sunday, 31st May.
As part of our commitment to enhance our services and improve reliability, we need to schedule a short maintenance period this weekend to replace some elements of our production environment.
During the maintenance period both the Form.com and Key Survey applications will be unavailable, with respondents directed to a maintenance page.
Further information and updates will be posted to our community pages.
WorldAPP Customer Care Team
Here is the message sent by the CEO of WorldApp, Inc. concerning last Friday’s Key Survey down time. (Key Survey is a software program used to create and distribute surveys, as well as collect & analyze responses.)
From: Oleg Matsko
Sent: Monday, May 18, 2015 9:36 AM
Subject: An update on Friday’s disruption – a message from our CEO
Last Friday’s issues have been some of the most severe issues to affect WorldAPP since we launched Key Survey in 2002. As CEO, I take immense pride in serving organizations across the world in fulfilling their requirements and I feel immensely sorry and hurt that we let those customers down. As such, I feel it is only right that we be completely open, honest and transparent about what happened, and what we are doing to make sure it doesn’t happen again.
A few weeks ago we noticed that one of the storage components of our production environment had started to fail. This in itself doesn’t cause an immediate issue, our production environment is built with multiple layers of redundancy, and despite one of the critical elements of this environment not functioning, our applications continued to work in the manner they should, without any impact on availability. It is important though that when these issues occur, we rectify them as quickly as we can, so that should other components of our environment fail, there isn’t any impact on service.
So for the past few weeks we have been preparing our secondary storage components to take over, allowing us to complete the necessary works on the primary components. Our applications collect a lot of data, in fact the equivalent of 11,000 pages of paper an hour, and this amount of data takes a lot of time to transfer. In an absolutely emergency we can complete this transfer in about 12 hours, but as our primary setup was still stable, and the risks of transferring such a huge amount of data in a relatively short amount of time being quite high, we took our time and completed this transfer over a period of a few weeks.
This transfer was completed on Thursday evening, our secondary storage components went live without issue, and our primary storage components were taken offline to allow the required maintenance to be completed. For a few hours, everything worked fine, and then at around 08:00 EDT on Friday morning, without notice our secondary storage components failed. At the moment, the reason why they failed is still unclear, there doesn’t appear to be an obvious cause. We will work hard with our infrastructure partners, to find out why this happened – but the most important thing for us to do on Friday was to get our applications back online.
Key Survey and Form.com are incredibly large and complex applications, and restarting them isn’t a simple operation. The applications are made up of many separate modules, each relating to an area of their functionality, such as reporting, voting or our API. The effort required to restart them is large, so much so that they cannot all be restarted at once. As such, modules were restarted individually, in order of priority. Our main Key Survey and Form.com environments were operational by 15:00 EDT, with all of our reporting modules online by 21:30 EDT and specific instances of our applications for individual customers back online by 00:30 EDT on Saturday morning.
As a result of Friday’s disruption, I have instructed our teams to rebuild our storage infrastructure to include additional layers of redundancy with built in instant failover capabilities. This is no easy challenge, implementing this infrastructure and migrating all our applications will take about a week, but we should be able to complete this without additional disruption. Once these changes are implemented, we will be able to recover our systems in a matter of minutes. This is in addition to the construction of the remote disaster recovery infrastructure which is already underway and estimated to be completed early next year.
Unfortunately, until these changes have been completed, our secondary storage components could fail again, and this leaves us in a precarious position. Whilst the probability of such a failure is low, and we have taken all possible precautions to ensure it doesn’t reoccur, our teams are prepared to restore services as quickly as possible in the event of a second failure. As the amount of data that is migrated to the new infrastructure increases throughout the week, the amount of time to restore services in the event of an issue reduces. This does mean though that should a similar issue occur early this week, we could experience a similar outage as to what happened on Friday.
As mentioned, I want to be transparent about the challenges we face, and honest about what could happen while we take steps to improve our services. We will let you know as soon as this new environment is fully functional and we can be sure that such issues do not cause as much disruption as they have. In the meantime our team are working diligently to monitor and manage our applications to avoid such issues, and are prepared to restore services as quickly as possible in the event of a reoccurrence of Friday’s troubles. I can also assure you that we will investigate thoroughly what caused these components to fail, but for the time being I want to concentrate all our resources on implementing these changes and improving our service to you.
We will support you as much as we can as a result of this disruption – if there is anything WorldAPP can do to assist you from work you weren’t able to complete last week, such as building surveys, forms or reports, please let your account manager know. We’ll endeavour to accommodate as many requests as we can.
Once again I would like to reiterate my thanks for your patience and understanding, and my genuine sorrow that we have let you down. WorldAPP have been a trusted provider of survey, forms and inspection solutions for over 12 years now, and I hope my explanation of what happened, and assurances of the actions we’re taking to ensure it doesn’t happen again, go some way to rebuilding that trust.
161 Forbes Rd Ste 300, Braintree, MA, 02184, US
As of 8:15 pm today (Fri, 5/15/15), Key Survey functionality has been restored. WorldApp is conducting a thorough investigation and will be sharing full details with us as soon as they are available.
The login and survey access issues with Key Survey have not yet been resolved. Here is the latest information received from their support team:
From: WorldAPP Support [mailto:email@example.com]
Sent: Friday, May 15, 2015 1:30 PM
Subject: WorldAPP System Interruptions
Today, WorldAPP services, including Key Survey, Form.com and associated applications, have been subject to a service disruption. Below is a brief overview of what caused the issue and the actions we’re taking to restore services as quickly as possible.
Recently, a CPU on one of the servers that our applications use to access our database started failing. Whilst the failure of one CPU doesn’t cause disruption to our services, it does require maintenance so that should the others fail, our applications aren’t impacted. Yesterday evening, our team migrated services to our disaster recovery environment to enable the required maintenance to take place. This is common practice during periods of maintenance to enable continuation of service and has been regularly implemented without effect.
After a few hours of operating on the disaster recovery environment, for reasons yet unknown, the disaster recovery environment failed. Our team took immediate steps to bring the environment back online and are working very hard on restoring services in order of priority, with the most critical services being the first to be restored. As this process continues, we’ll provide further updates on our community pages here.
As we continue to experience service disruption, our applications will remain unavailable and respondents attempting to complete a survey or form will be directed to an error page. We are incredibly sorry for the frustration that this disruption is causing you, and assure you we’re working as hard as we can to restore full service as quickly as possible.
Director of Client Services
By: WorldAPP, Inc.
161 Forbes Rd Ste 300, Braintree, MA, 02184, US
We are currently experiencing issues with Key Survey (hosted by WorldApp). Users who try to log on will not be presented with the usual login screen; the page simply does not load. Survey recipients will not be able to access surveys and respond at this time.
WorldApp has been notified of these problems. Updates will be shared here as soon as they are available.
Remember that go/techalerts can be used for quick access to system up/down information and posts concerning outages.
[As of 9:15 am – WorldApp currently estimates that services will be restored in about 30 minutes. All modules are affected; surveys and reports are not accessible as well.]
Wednesday, April 15th 5:35
The issue with Outlook has been resolved. We apologize for any inconvenience.