IT Outages Communication Plan

This page outlines norms for communication to all staff by Library IT Managers during service outages.

  • What constitutes an outage that warrants a message being sent to staff?
    • An outage is defined here as a disruption that results in users' inability to use the service for its basic functions; this could also be planned maintenance work that may realistically result in such downtime
    • An outage that lasts 10 minutes or longer, from the point when the outage is known
  • When should a message be sent out?
    • Once the service has been down for 10 minutes
    • When relevant new information is available (example: progress is made or additional issues arise) after an outage has been announced, during that outage while the service is still unavailable
    • When the outage has been resolved
  • How should we send the message?
    • Send an email to PULUpdates@princeton.edu.
    • Post an update to the #incident_reports channel.
    • If appropriate, post an update to the relevant Slack channel for the service (the one most non-IT staff may check for updates [example: #dspace for Dataspace or OAR outages]).
    • If appropriate and the outage impacts users other than Library staff, we can have it listed on the OIT Outages & Alerts page
      • Contact the OIT Service Desk in order to post an outage on the OIT outage site
      • Call 609-258-HELP (4357) M-F 8 am to 8 pm, Sa/Su 8 am to 4 pm. All times are Eastern.
      • Identify yourself as a Library IT staff member. Ask the agent to post an outage on the OIT Outages & Alerts page for the Library
  • Who should send the message?
    • One of the following IT Managers:
      • Stephanie Ayers
      • Esmé Cowles
      • Kate Lynch
      • Trey Pendragon
      • Kevin Reiss
      • Jon Stroop
    • Whoever initiates contact with staff should be the point of contact (aka the Incident Communicator) throughout the outage.
      • If the Incident Communicator needs to step away from the situation before it is resolved, they should hand off communication to a designated backup person.
    • To claim Incident Communicator responsibility: once an incident is reported in the #incident_reports channel on Slack, one of the IT Managers (potentially including the one reporting the issue) should explicitly reply on Slack that they are taking responsibility for communication on this incident.
      • If someone is working to resolve the incident, they should not also be the Incident Communicator
  • What about planned outages/maintenance?
    • Outages that meet the above criteria (7 days a week, 8:00am and 6:00pm EST, expected to result in more than 10 minutes of downtime for a production service) should be communicated to staff within a reasonable timeframe on a case-by-case basis before the planned outage takes place
    • See the Service Catalog Data for acceptable outage times/windows (work-in-progress)