Basics

Q: Who can use the Research Computing cluster?

Faculty and Student researchers associated with RIT’s Main Campus are eligible to use the cluster if their research will result in published outputs (e.g. conference/journal papers, Ph.D. dissertations, M.S. theses). Researchers from RIT Dubai or other globus campuses can only use the cluster if they are collaborating with a Main Campus Faculty researcher.

External collaborators (not affiliated with RIT) must have an affiliate account created in order to access the cluster. PI’s can follow this process to request an affiliate account.

Q: How do I get started using Research Computing?

The first step is to submit a project questionnaire. We will review your questionnaire and ask for any clarifications before we set up your cluster access.

Q: How do I submit jobs to the cluster?

We have a slurm tutorial to help you get started.

The link for our office hours is pinned in our #rc-general slack channel. Information on accessing our slack workspace is here.

Q: Where can I find information about the cluster for grant proposals?

Check out our grant information document.

Q: When are the next Research Computing events?

Check out our events calendar.

Login

Q: How do I log into the cluster?

There are two main ways to access the cluster:

  • Command Line: ssh <username>@sporcsubmit.rc.rit.edu (e.g. ssh abc1234@sporcsubmit.rc.rit.edu)
    • If you are logging in with your RIT username and password, you will need Duo set up for multi-factor authentication.
  • Web Portal: You can access the cluster through our OnDemand web portal.

Q: Can I share my password with someone else?

NO. Sharing passwords (or other credentials) is a violation of our Acceptable Use Policy.

Q: Can I run code directly on sporcsubmit.rc.rit.edu?

NO. The submit node (sporcsubmit.rc.rit.edu) is a shared resource. Control groups govern access to the submit node resources. If you are running computationally intensive code on the submit node (you shouldn’t be), it will be automatically killed by control groups. The submit node’s primary purpose is to prepare jobs for submission to the scheduler. See our [Slurm Tutorial][slurm_quick_start_guide] for details on how to submit jobs.

Q: What are the supported methods for accessing the cluster?

We support cluster access via ssh, scp, our Web Portal, and Globus.

Other methods (e.g. WinSCP, FileZilla, VSCode) are not supported due to technical reasons, such as being unable to handle Duo prompts. You may be able to still use these methods to access the cluster, but we will not be able help you troubleshoot problems with unsupported access methods.

Q: Can I use VSCode to connect to the cluster?

If you want to use VSCode on the cluster, please launch it from our Web Portal (Interactive Apps –> OnDemand VS Code Server).

If you launch VSCode from your local machine instead of from our Web Portal, you run the risk of connections getting interrupted, which could prevent you from being able to login.

Note: By default, VSCode is constantly scanning your home directory (recursively) for changes. This may result in slow connections. Please put the following in your VSCode settings to disable this behavior:

"files.watcherExclude": {
    "**": true,
    "**/**": true
}

Q: When I try to login, it says my account has been disabled. What do I do?

If you try to login using your password, you should receive a Duo prompt to confirm your login. If you see the following error message:

“Your account is disabled and cannot access this application. Please contact your administrator.”

You have had too many failed login attempts. Click here for more details.

ColdFront

Q: Who can use ColdFront?

Any RIT researcher can login to ColdFront and view their projects, but only Principal Investigators (PIs) can manage their projects via ColdFront.

Q: What is a ColdFront Project?

A ColdFront Project reflects a research project with distinct methodologies, goals, and results. ColdFront Projects control access to compute, storage, and other resources.

Elements of a Research Project

Q: What can PIs do in ColdFront?

From ColdFront, PIs can update details about RC Projects (e.g. descriptions, grants, publications), add/remove research collaborators to/from their projects, and request changes (or new) resource allocations (e.g. Slurm Accounts, Project Storage).

Q: How can I create a ColdFront Project?

Please submit our Project Questionnaire to provide the RC Team with details about your research. Once we receive and approve your Questionnaire, your ColdFront Project will be set up for you.

Q: Why is ColdFront asking me to review a project?

Once per year, PIs will be required to review their RC Projects in ColdFront. The review process includes verifying that your project description is accurate, adding publications and/or grants, and verifying that the list of collaborators on your projects are accurate. See Section 3.3 of our ColdFront Tutorial for more information.

ColdFront is the definitive source of RC Project information for both PIs and the RC Team. Keeping Project information, artifacts (e.g. publications, grants), and collaborators up-to-date helps everyone in the RC community by ensuring shared understanding of Projects for PIs, their collaborators, and the RC Team.

An additional key benefit of ColdFront is that it simplifies the process of sharing your publications with the RC Team. Every publication that leverages RC resources gets showcased on our Publications Site! Sharing (and citing us in) your publications helps us secure funding and improve our services.

Q: How do I update who has access to a ColdFront Project?

See Section 3.5 of our ColdFront Tutorial for details.

Q: How do I do X, Y, or Z with ColdFront?

Check out our ColdFront Tutorial!

Q: How does ColdFront relate to Novelution/Rapid?

Noveultion and Rapid are systems used by Sponsored Research Services to manage Sponsored Research Projects.

Novelution/Rapid have no concept of RC resources, so Research Computing uses ColdFront to allocate and manage RC resources (e.g. Slurm Accounts, Storage) to research projects (which may or may not be sponsored).

Security

Q: Can I share my password with another researcher so they can log into my account?

NO. Sharing your password with anyone is a violation of RIT’s Code of Conduct for Computer Use (C08.2 Section IV.A.1) and RIT’s Computer RIT Password Standard.

If you need to share files with another researcher, please request a shared directory.

Q: How do I set up an SSH key on the cluster?

Check out our SSH tutorial.

Q: Can I add additional SSH keys to my authorized_keys file?

Yes, as long as the SSH keys you add belong to you. Adding SSH keys that belong to another researcher is also a violation of RIT’s Code of Conduct for Computer Use.

When in doubt, please ask!

Linux

Q: I’m new to linux, what are some resources to help me get started?

We have a linux and bash tutorial that also includes some external resources.

Q: Can I modify my .bashrc file?

We will not prevent you from modifying your .bashrc file, but we do not recommend it. Modified .bashrc files can lead to problems, such as software not loading/loading incorrectly, or being unable to login. If you choose to modify your .bashrc file, we can reset it to the default for you, but we will not debug errors resulting from a modified .bashrc.

Q: How can I use sudo?

You can not. To maintain stability and security of the cluster for everyone, sudo (or “root”) access is reserved for systems administrators. You do not need root permissions to use spack or install software in your home directory.

Maintenance

Q: When is the next maintenance window?

You can see a maintenance window schedule by running time-until-maintenance on the cluster. Maintenance windows are also posted on this calendar.

Q: How long are maintenance windows?

Maintenance windows are scheduled for 8am to 5pm. We will post in our #rc-announcements slack channel when maintenance is complete.

Q: How do maintenance windows impact my jobs?

You can find details about maintenance windows here.

Citation

Q: How do I cite Research Computing?

You can find citation information here.

Q: When should I cite Research Computing?

You should cite Research Computing if you used any of the following:

  • The cluster (via sporcsubmit or OnDemand)
  • Our storage (Ceph or File Shares)
  • A virtual machine (oVirt) provisioned for your research
  • REDCap (https://redcap.rc.rit.edu/)
  • Mirrors (https://mirrors.rit.edu/)
  • GitLab (https://git.rc.rit.edu/ – formerly https://kgcoe-git.rit.edu)

Slurm

Q: When will my job start?

You can run squeue --me --start to see when your jobs will start. This is the worst case start time if no jobs finish early or get canceled.

Q: My job is taking too long to schedule, how can I speed that up?

Slurm schedules your job based on the following:

  • The resources you request
  • The frequency that you submit jobs
  • The other jobs in the queue
  • Maintenance windows

Selecting the right number of resources for your jobs will help your jobs schedule faster. You can find more details here.

If you are already selecting the right resources, you may need to scale up using MPI and/or GPUs. More details here.

Q: How can I tell what resources my job is actually using?

We have a guide to resource utilization in our slurm tutorial. You can also view some graphs of your resource utilization on Grafana.

Q: What is the maximum runtime for a job?

The maximum runtime on tier3 is 20 days (the default is 5 days). The maximum runtime on debug is 1 day, and the maximum runtime for interactive jobs is 12 hours.

Q: How do I tell slurm to do X?

Check out our slurm reference page.

Q: I have a deadline approaching, but my jobs are waiting due to priority. How can I request a priority bump?

When Slurm schedules a job, it takes into consideration your prior usage. If you often use a lot of resources, Slurm will assign your jobs a lower priority than someone who uses fewer resources than you, to ensure fair access to cluster resources for all researchers—you can read about the Fair Tree Fairshare Algorithm for details on how Slurm does this. Asking us to raise the priority of your jobs is unfair to other researchers, as it impacts when their jobs will run. We will only consider raising priority if the following conditions are met:

  • Your research project is sponsored and has an associated SRS number.
  • Your Advisor/PI provided notice of upcoming deadlines by submitting a ticket at least 30 days in advance.

Q: I’m worried my job will not finish before the time runs out. How can I request an extension?

Once a job is submitted, we will not change any of the reseources you requested for the job. Extending job times is unfair to other researchers, as it impacts when their jobs will run.

Software

Software Tutorial

Q: What makes installing software on the cluster harder than installing it on a laptop?

  • Dependencies: Traditional software installation typically involves a right-click install (e.g. Windows, Mac) of an executable file that has all of the dependencies the software needs bundled with it. Most research software is open source (e.g. pyotrch, numpy, scikit-learn), meaning many different people contribute to the code base, and dependencies are often other open source projects managed by different teams. If you want to install two software libraries that share a dependency, you need to find (or write) versions of those libraries that rely on the same version of their shared dependencies. The more libraries you you need to install, the more dependency conflicts arise, and the more time you need to spend debugging and identifying a set of software versions and dependencies that will build and install together. If you’re comfortable managing that complexity, you can install and manage your own software on the cluster; if you want to focus on your research instead of your software stack, the Research Computing team can build a software environment for you.

  • Differing Requirements: On your laptop, you get to decide what version of a software library you want installed; you don’t have to negotiate with anyone to identify a version that works for everyone. On an HPC cluster, if we installed only one version of a software library (e.g. pytorch), that would not meet everyone’s needs. Researchers need their own specific versions of software (e.g. pytorch v1 vs. pytorch v2) with different build options and configurations (e.g. with or without GPU support). Most software is not designed to have multiple versions of itself installed on the same machine. This is a common issue in the HPC community, and the standard solution is Spack, an open source package manager that allows multiple versions of the same software to be installed.

  • New Libraries/Versions: Spack maintains a set of recipes for installing a software library and all of its dependencies. There are far too many software libraries and version updates out there for a small team of software developers to keep every library and version available in Spack. When you ask us to install a library using spack, sometimes the recipe already exists for the version you need. Most of the time, that recipe doesn’t exist and we have to write that recipe, which often includes writing the recipes for dependencies that aren’t already in Spack. Need a new version of an existing library? We have to update the recipe and test it.

  • Environment Complexity: If you only need one library, that’s pretty straightforward. But modern research workflows often depend on many different libraries. That’s where environments–collections of libraries–come into play. Every environment has a unique set of dependencies and challenges that arise. Building environments is a complex process of troubleshooting errors iteratively until there are no more. For every error that arises, we have to determine the root cause and find a sustainable solution. This is not always a straightforward process, especially when dealing with large environments that have hundreds of dependencies.

  • Ad Hoc Software: Many software libraries were originally written with a single research purpose in mind, without concern for ease of use or portability. This means that documentation is often incomplete or non-existent, support for parallel computation and/or GPUs is often limited, and build errors often arise on different hardware architectures. By definition, research explores spaces that hever never been explored before, and research software often ends up being used for unaticipated purposes; this makes research software more fragile and constantly evolving. As a result, we often have to work through build and import errors iteratively to identify hidden dependencies and edge cases that were not considered or tested. Sometimes, we even have to write our own code to make these libraries compatible with modern hardware and software architectures.

You can learn more about software complexity and Spack in this slide deck.

Q: Can I install my own software on the cluster?

Due to the non-standard and highly-customized nature of home directory installs, we are not resourced to support them. You are welcome to install software in your home directory (and you don’t need sudo to do it), but you are responsible for supporting any software that you install yourself. We will not help you debug software that you installed yourself. Spack is our supported package manager on the cluster because it is a standard for HPC systems and prevents many of the challenges typically encountered with conda, pip, etc. If you would like us to build and support a spack environment for you, please fill out this form. Please be aware that spack environments typically take at least 2 weeks to build, install, and test.

Please read through our Software Tutorial for more details.

Q: I installed pytorch/tensorflow, but its not using GPUs, what can I do?

Pytorch and Tensorflow needs to have cuda installed/loaded in order to use GPUs. Check your installation to make sure you built pytorch/tensorflow with cuda enabled. Alternatively, you can use one of our default spack environments: spack env activate default-ml-x86_64-24071101.

Q: I am trying to install software on the cluster; what does this error mean?

You are welcome to install software in your home directory, but if you choose to do so, you are supporting that software. If you would like us to package and install software for you with spack, please request an environment.

Q: Should I use spack or conda for my software?

We have a Software Tutorial that goes over the differences between spack and conda.

Q: Where is cuda?

You can try to install cuda on your own (not recommended), or you can load a version of cuda that we have installed with spack. Run spack find -l cuda to find versions of cuda that we have installed. Then run spack load cuda@<version> /<build_hash> to load the version of cuda you need, e.g. spack load cuda@12.0.0 /4a5j4ca.

Storage

Storage Tutorial

Q: What kinds of data can I store on RC resources?

You can store any code, datasets, and/or results related to your research on RC resources. RC RESOURCES DO NOT MEET COMPLIANCE STANDARDS FOR THE FOLLOWING TYPES OF DATA. DO NOT STORE THESE ON RC RESOURCES:

  • Social Security Numbers (SSNs), Individual Taxpayer Identification Numbers (ITINs), and/or other national identification numbers
  • University Identification Numbers (UIDs)
  • Driver’s license numbers
  • Financial account information
  • Educational records governed by FERPA
  • Personal health information (as defined by HIPAA)
  • Employee personnel information

Q: Where should I store my data on the cluster?

Your Home Directory is for data that is for you alone, such as:

  • SSH keys
  • Custom software environments (e.g. conda)
  • Configuration files (e.g. .bashrc)
  • Folders that ondemand.rc.rit.edu creates

Your Project Directory is for data that your collaborators need access to, such as:

  • Datasets for your experiments
  • Shared software environments
  • Code for running your experiments
  • Results from your experiments

Q: Is my data on the cluster backed up?

No. We provide resilient storage (i.e. we can handle hard drive failures without losing data), but you are responsible for backing up your own data.

Q: I’m running out of storage, what do I do?

See Section 1.3 of our Storage Tutorial for suggestions to free up space. If you still need more storage after freeing up space, your Project PI/Advisor can request more storage in ColdFront. See Section 4.4 of our ColdFront Tutorial for more details.

Q: How do I transfer files to the cluster?

You transfer small files/datasets to the cluster using OnDemand. For large amounts of data, you can use one of these methods.

Q: How can I share files with other researchers on the cluster?

You can ask us to set up a shared directory for projects by filling out this form.

GitLab

Q: What is GitLab?

GitLab is a web-based platform consisting of a variety of tools to assist teams with version control, continuous integration/continuous deployment (CI/CD), and code review/collaboration.

Q: Is there a Service Level Agreement (SLA) for git.rc.rit.edu?

There is no SLA for RC’s GitLab. RC’s GitLab is subject to our monthly maintenance window, but may occassionally be patched outside of our typical maintenance cycle. We do our best to minimize downtime.

Q: Who can use git.rc.rit.edu?

Research Computing’s GitLab Instance (git.rc.rit.edu) should only be used for research projects. If you need GitLab for academic use cases (e.g. homework, labs, course projects), please reach out to your college/department IT staff.

Q: Can I use git.rc.rit.edu for academic purposes?

While there is nothing preventing the use of RC’s GitLab instance for academic purposes, we don’t recommend it as there is no SLA and upgrades/maintenance may bring the service down and impact academic work.

Q: Can I create and manage my own GitLab runners?

No.

Q: Why can’t I set my repositories to internal or public on git.rc.rit.edu?

  • Internal: Setting a repository to internal grants everyone who logs into GitLab read access to the repository. This is not obvious from the word “internal”, so we disabled this. If you understand those risks, and have a compelling reason why a project needs to be internal, please let us know and one of the admins can set internal for your project.
  • Public: Public projects are not allowed on git.rc.rit.edu. There are public resources available for free where public projects can be hosted. Some examples include (does not constitute endorsement of the services listed):

Q: Can I set up GitLab Pages for a repository on git.rc.rit.edu?

Yes. Here’s how. This is for static content only. Do not expect 24/7/365 uptime.