We maintain a regular maintenance window (usually the second Tuesday of the month). Maintenance will begin at 8AM EST/EDT on the scheduled day and will typically be complete by 5PM EST/EDT on the same day, but we do have occasional multi-day maintenance windows. Plan for all Research Computing services to be unavailable for the entire maintenance window. During maintenance, our HPC environment will not be accessible, and cluster jobs will not run.
If we have to perform unscheduled maintenance due to security or service failure we will update the “message of the day” (motd) command and/or email users as deemed appropriate. You can see a list of upcoming maintenance windows by running the time-until-maintenance command on the cluster.
Job Scheduling & Maintenance
If your job will not start and finish before a maintenance window begins, your job will remain in the queue until after maintenance is complete.
For example, let’s say a maintenance window begins in 2 days. You submit a job that will run for three days (#SBATCH -t 3-00:00:00). That job will not start until after the maintenance window.
As another example, let’s say a maintenance window begins in 2 days. You submit a job that will run for 1.5 days (#SBATCH -t 1-12:00:00). Due to other running/queued jobs, your job will not start until 13 hours from now. Since 36 hours (1.5 days) + 13 hours is greater than 2 days, you job will not start until after the maintenance window.