Connecting to Narval

Connecting to Narval or Other Compute Canada Servers and Using SLURM

This notebook provides instructions on how to connect to Narval or other Compute Canada servers using SSH and how to submit a script using SLURM.

Connecting via SSH

To connect to Narval or other Compute Canada servers via SSH, you can use the following command:

ssh -o ServerAliveInterval=1000 user_name@narval.computecanada.ca

Replace user_name with your username on the server. The -o ServerAliveInterval=1000 option sets a keep-alive interval to maintain the SSH connection.

After entering the command, you will be prompted to enter your password. Then you will have to use your 2AF app to confirm your identity.

Once authenticated, you will be logged into the server.

Open your terminal, change the username and connect you in narval

ssh -o ServerAliveInterval=1000 user_name@narval.computecanada.ca

Go to your scratch folder

cd scratch

Make a new directory

mkdir bioinformatics_workshop
cd bioinformatics_workshop
pwd

On the server you can need the realpath which differs from the one given by pwd

realpath .

Transferring Files with scp and rsync

scp (Secure Copy)

scp (secure copy) is a command-line utility for securely copying files between local and remote hosts. It uses SSH for data transfer and authentication, providing a secure way to transfer files.

Syntax

scp [options] source destination

To copy a file from your local machine to a remote server:
scp file.txt user_name@remote_host:/path/to/destination
To copy a file from a remote server to your local machine:
scp user_name@remote_host:/path/to/file.txt /local/destination

rsync

rsync is a powerful file synchronization and transfer tool that is often used for copying files between local and remote systems. It is capable of efficiently transferring large amounts of data and provides many options for customization.

Syntax

rsync [options] source destination

To synchronize the contents of a directory on your local machine with a directory on a remote server:
rsync -avz /local/directory/ user_name@remote_host:/path/to/destination/
To synchronize the contents of a directory on a remote server with a directory on your local machine:
rsync -avz user_name@remote_host:/path/to/directory/ /local/destination/

Key Options

-a, --archive: Archive mode; preserves permissions, timestamps, etc.
-v, --verbose: Verbose output; provides detailed information about the transfer process.
-z, --compress: Compress file data during transfer to reduce bandwidth usage.
-r, --recursive: Recursively copy entire directories.
-u, --update: Skip files that are newer on the destination.
-n, --dry-run: Perform a trial run without making any changes.

Security Considerations

When using scp or rsync to transfer files between systems, ensure that SSH is properly configured and that you trust the remote host. Always use secure authentication methods and avoid transferring sensitive information over untrusted networks.

By using scp and rsync, you can easily transfer files between local and remote systems, making it convenient to work with data on Compute Canada servers from your local machine.
Open a new terrminal and copy the bed files on the server using scp or rsync MODIFY YOUR USERNAME

scp bioinformatics_files/*.bed user_name@narval.computecanada.ca://home/user_name/scratch/bioinformatics_workshop/
# OR
rsync -avr --progress bioinformatics_files/*.bed user_name@narval.computecanada.ca://home/user_name/scratch/bioinformatics_workshop/

Using SLURM

SLURM (Simple Linux Utility for Resource Management) is a workload manager used on many high-performance computing (HPC) clusters, including Compute Canada clusters like Narval. SLURM enables users to submit and manage jobs on the cluster.

Submitting a Script

To submit a script to the cluster using SLURM, you need to create a script file containing your job commands and then use the sbatch command to submit it.

Here's an example SLURM script (bedtools_script.sh):

   #!/bin/bash
    #SBATCH --job-name=my_job
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=1
    #SBATCH --cpus-per-task=1
    #SBATCH --mem=1G
    #SBATCH --time=00:10:00
    #SBATCH --output=my_job.out
    #SBATCH --error=my_job.err

    echo "Great workshop!"

This script requests one node with one task and one CPU per task, 1 GB of memory, and a maximum runtime of 10 minutes. Output will be redirected to my_job.out, and errors will be redirected to my_job.err. Replace the commands inside the script with your own job commands.

Exercice

Write your onw script using vim: bedtools_script.sh

For account, you need to have your allocation name (if you don't know it, do ls project and use the one starting by def-)

   #!/bin/bash

    #SBATCH --account=def-ACCOUNT
    #SBATCH --job-name=bedtools_intersect
    #SBATCH --output=%x_%j.out
    #SBATCH --time=0:10:00
    #SBATCH --mem=1000m

    echo "My second bedtools script"

   FILE1=file_a.bed
   FILE2=file_b.bed

   # Load module
    module load bedtools

   # Commands to execute
   bedtools intersect -a ${FILE1} -b ${FILE2} > file_intersect.bed  
   bedtools intersect -wa -a ${FILE1} -b ${FILE2} > file_intersect_wa.bed

    
    wc -l file_intersect*.bed

Make your script executable with chmod
chmod +x bedtools_script.sh
Run your script:
sbatch bedtools_script.sh
Check your job (using YOUR username):
squeue -u user_name
When a job is done you can check its efficiency (using YOUR job_id):
seff job_id

Bioinformatics RIMUHC

Bioinformatics RIMUHC