Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hardware:cluster:openlab [2023/04/24 14:55]
hans [Changes April, 2023]
hardware:cluster:openlab [2024/04/19 10:34] (current)
hans [Splash Screen]
Line 1: Line 1:
 ====== ICS Instructional Openlab Linux Cluster ====== ====== ICS Instructional Openlab Linux Cluster ======
  
 +===== Summary =====
  
-===== Changes April2023 ===== +The ICS Instruction Openlab Linux Server cluster is for general purpose computing at the School of ICS and accessible to any user with an ICS shell account. ​ Cluster priority is given to instructionthen undergraduate students, then graduate students, and then ICS researchers.
  
-In order to ensure that Openlab cluster remain available to instruction and researchers fairly, ​we will be making ​the following ​changes to the administration of the cluster:+==== Guidelines ==== 
 +  
 +In order to ensure that Openlab cluster remain available to instruction and researchers fairly, ​users may expect ​the following:
  
-  * Files in the /tmp directory ​will be cleared after two weeks+  * Long running jobs must be submitted through our [[:​services:​slurm|SLURM]] job queueing system ​in order to balance resources. ​  
-  * Processes running for longer than 2 hours will be reniced 19. +  * Jobs executed from via SLURM  queueing system ​will be given priority over jobs running on the command line.  
-  * Processes running for longer than 48 hours will be suspended.+    * Processes running for longer than 2 hours will be reniced 19. 
 +  * Processes running for longer than 5 days will be suspended.
     * [[hardware:​cluster:​openlab#​Exceptions|Exceptions]]     * [[hardware:​cluster:​openlab#​Exceptions|Exceptions]]
   * Processes for users with more than 100 processes will be suspended.   * Processes for users with more than 100 processes will be suspended.
     * [[hardware:​cluster:​openlab#​exceptions|Exceptions]]     * [[hardware:​cluster:​openlab#​exceptions|Exceptions]]
 +  * Files in the /tmp directory will be cleared after two weeks.
  
 Please use [[services:​slurm|slurm]] for long running or serial projects requiring more than one openlab node.  Processes running through slurm are not subject to the above guidelines. Please use [[services:​slurm|slurm]] for long running or serial projects requiring more than one openlab node.  Processes running through slurm are not subject to the above guidelines.
Line 20: Line 25:
  
  
-===== Summary ===== 
  
-The ICS Instruction Openlab Linux Server cluster is for general purpose computing at the School of ICS and accessible to any user with an ICS shell account. ​ Cluster priority is given to instruction,​ then undergraduate students, then graduate students, and then ICS researchers. 
- 
-==== Restrictions ==== 
- 
-  * Long running jobs must be submitted through our [[:​services:​slurm|SLURM]] job queueing system in order to properly balance resources.  ​ 
-  * Jobs executed from via SLURM  queueing system will be given priority over jobs running on the command line.  ​ 
-  * Long running jobs submitted via command line may be removed after 48 hours (proposed). 
-  * Files in /tmp and /scratch are cleared out on reboot. 
-  * Hosts are rebooted quarterly after finals. 
- 
-=== Resuming Suspended Processes === 
- 
-If you find your process has been suspended, you may resume it by using the kill command ton send the CONT signal. 
- 
-<​code>​ 
-kill -CONT <pid> 
-</​code>​ 
- 
-You may list your processes using the command `ps ux`.  Suspended processes will have a "​T"​ in the 8th column 
-<​code>​ 
-USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND 
-hans     ​2458438 ​ 0.0  0.0   ​6476 ​ 2244 pts/​18 ​  ​S+ ​  ​14:​08 ​  0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox T 
-hank     ​3554779 ​ 0.0  0.0  20480  3164 ?        Ts    2022  24:43 SCREEN -xxR 
-</​code>​ 
- 
-In this case you would resume the SCREEN process, pid 3554779 by running: 
-<​code>​ 
-kill -CONT 3554779 
-</​code>​ 
  
 ===== Home Directory ===== ===== Home Directory =====
Line 64: Line 39:
 ^ Domain Name ^ Operating System ^ ^ Domain Name ^ Operating System ^
 | openlab.ics.uci.edu | Ubuntu 22.04 hosts only | | openlab.ics.uci.edu | Ubuntu 22.04 hosts only |
-| openlab22.ics.uci.edu | Ubuntu 22.04 hosts only| 
 | opengpu.ics.uci.edu | Ubuntu 22.04 hosts only| | opengpu.ics.uci.edu | Ubuntu 22.04 hosts only|
  
  
 **NOTE**: ​ If you are running a long running process, please take note of the hostname you are running on.   The name "​openlab.ics.uci.edu"​ is a load balancer and may not point you to the same host twice. ​ You can determine the hostname by running the command `hostname` from the shell cli.   When you come back to check on  your process, you will ssh into this name instead of openlab.ics.uci.edu. **NOTE**: ​ If you are running a long running process, please take note of the hostname you are running on.   The name "​openlab.ics.uci.edu"​ is a load balancer and may not point you to the same host twice. ​ You can determine the hostname by running the command `hostname` from the shell cli.   When you come back to check on  your process, you will ssh into this name instead of openlab.ics.uci.edu.
 +==== MOD/Splash Screen ====
 +
 +When you login to the Openlab Linux cluster you will receive a lot of information that includes sections about fair use, system load and message from individual software [[commands:​modules|ICS softward modules]] you add to your environment. ​ This information may also include a message similar to the following:
 +
 +<​code>​
 +Expanded Security Maintenance for Applications is not enabled.
 +  ​
 +19 updates can be applied immediately.
 +11 of these updates are standard security updates.
 +To see these additional updates run: apt list --upgradable
 +
 +24 additional security updates can be applied with ESM Apps.
 +Learn more about enabling ESM Apps service at https://​ubuntu.com/​esm
 +</​code>​
 +
 +You may safely ignore this message. This message is automatically managed by [[https://​ubuntu.com/​landscape/​features|Ubuntu Landscape]] and regards the host that you have logged into, not your local system.
 +This message indicates that there are patches available or that patches that have been applied require a system restart in order to fully install. ​ Our systems administrators will take care of that, typically between quarters when systems are not heavily used.
  
-=== Off Campus Access ===+==== Off Campus Access ​====
  
 **ADVISORY**: ​ Off campus access requires [[accounts:​ssh_keys|ssh keys]]. ​  Users connecting from off campus must use keys.   Users connecting from off campus cannot use passwords to login to the openlab cluster. **ADVISORY**: ​ Off campus access requires [[accounts:​ssh_keys|ssh keys]]. ​  Users connecting from off campus must use keys.   Users connecting from off campus cannot use passwords to login to the openlab cluster.
Line 118: Line 109:
 ==== Availability ==== ==== Availability ====
  
-The openlab cluster is available on and off campus via SSH and [[services:slurme|SGE]]. +The openlab cluster is available on and off campus via SSH and [[services:slurm|SLURM]]. 
  
 There are limitations regarding authentication and how users can connect to the openlab cluster. ​  ​Please refer to the following matrix for specific information:​ There are limitations regarding authentication and how users can connect to the openlab cluster. ​  ​Please refer to the following matrix for specific information:​
  
-^Location^[[accounts:​ssh_keys|SSH with Keys]]^SSH with Password^[[group:support:​software:​sge|SGE QRSH]]^+^Location^[[accounts:​ssh_keys|SSH with Keys]]^SSH with Password^[[services:slurm|SLURM]]^
 ^On campus| ​ Yes  |  Yes  |  Yes  | ^On campus| ​ Yes  |  Yes  |  Yes  |
 ^Off Campus<​sup>​*</​sup>​| ​ Yes  |  No  |  No  | ^Off Campus<​sup>​*</​sup>​| ​ Yes  |  No  |  No  |
Line 164: Line 155:
 ===== Details ====== ===== Details ======
  
-  * Operating System: ​ ​CentOS 7 (Redhat Enterprise Linux 7 Equivalent) or Ubuntu ​20.04.  +  * Operating System: Ubuntu ​22.04.  
-  * Provisioning: ​ Puppet ​3.x+  * Provisioning: ​ Puppet ​6.x
   * Primary Investigator: ​ Computing Support   * Primary Investigator: ​ Computing Support
   * Contact Email: ​ helpdesk@ics.uci.edu   * Contact Email: ​ helpdesk@ics.uci.edu
Line 231: Line 222:
 </​code>​ </​code>​
  
-=====Troubleshooting======+===== Long Running Processes ​===== 
 +==== Resuming Suspended Processes ​====
  
-Also see [[courses:​openlab_faq|]]+If you find your process has been suspended, you may resume it by using the kill command ton send the CONT signal.
  
-=== Exceptions ===+<​code>​ 
 +kill -CONT <​pid>​ 
 +</​code>​
  
-The following long running processes will not be suspended on the Openlab Linux Cluster:+You may list your processes using the command `ps ux`.  Suspended processes will have a "​T"​ in the 8th column 
 +<​code>​ 
 +USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND 
 +hans     ​2458438 ​ 0.0  0.0   ​6476 ​ 2244 pts/​18 ​  ​S+ ​  ​14:​08 ​  0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox T 
 +hank     ​3554779 ​ 0.0  0.0  20480  3164 ?        Ts    2022  24:43 SCREEN -xxR 
 +</​code>​ 
 + 
 +In this case you would resume the SCREEN process, pid 3554779 by running: 
 +<​code>​ 
 +kill -CONT 3554779 
 +</​code>​ 
 + 
 + 
 +==== Exceptions ==== 
 + 
 +The following long running processes will not be automatically ​suspended on the Openlab Linux Cluster:
  
   * bash   * bash
Line 271: Line 280:
   * dbus-send   * dbus-send
  
 +=====Troubleshooting======
 +
 +Also see [[courses:​openlab_faq|]]
 +
 +==== RSA host key for <​hostname>​ has changed ====
 +
 +If you receive an error "Hot key verification failed"​ when logging in, you may, if you are confident you are logging into the correct host, clear the offending key by issuing the following command, on most platforms, to clear the old host key and resolve the error:
 +
 +  ssh-keygen -R <​hostname>​
 +
 +If this does not work on your platform, please send a message to helpdesk@ics.uci.edu along with the output from the command so we can extend this information.
 +
 +==== RC Files ====
  
-===RC Files===+ICS users may occasionally damage their dot files, which results in unusual-looking SSH prompts and/or being unable to use the “module” command.  ​
  
-ICS users will occasionally damage their dot files, which results in unusual-looking SSH prompts and/or being unable to use the “module” command.  ​If this is the case, click here for directions to [[https://​swiki.ics.uci.edu/​doku.php/​accounts:​restore_unix_dot_files|]] ​ If the above fails, contact [[https://​swiki.ics.uci.edu/​doku.php/​start?​s[]=helpdesk#​contacting_ics_computing_support_helpdesk|helpdesk]].+If this is the case, click here for directions to [[https://​swiki.ics.uci.edu/​doku.php/​accounts:​restore_unix_dot_files|]] ​ If the above fails, contact [[https://​swiki.ics.uci.edu/​doku.php/​start?​s[]=helpdesk#​contacting_ics_computing_support_helpdesk|helpdesk]].
  
  
hardware/cluster/openlab.1682373324.txt.gz · Last modified: 2023/04/24 14:55 by hans
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0