rRNA

rsRNAfinder Toolkit

nipgr logo

User Manual

1. Download

    Download the rsRNAfinder Toolkit package from the link: rsRNAfinder.tar.gz



2. Prerequisites

    • python3
    • conda
    • snakemake (v7.20.0)

3. Deploy the workflow

    The repository can be downloaded with all the additional dependencies handled by snakemake, provided that snakemake is fully installed & available.     Use this command to clone the toolkit: git clone https://github.com/rebminso/rsRNAfinder.git

    The important files are named data and config. The config file is the configuration file that is modified to configure the workflow as per the needs,     whereas the data file allows you to load your input data.

       Files

      Figure 1: The files inside the main directory


    Install Snakemake
    If conda and mamba are already installed, snakemake can be installed with the following command:
    conda activate base
  conda env create -f config/environment.yml
  conda activate rsRNA


    OR

    Manually install the dependencies by the following command line:
    conda activate base
  conda create -n rsRNA python=3.7 --no-default-packages
  conda activate rsRNA
  pip install snakemake==7.20.0
  conda install -c bioconda bedtools==2.30.0
  pip install seaborn
  conda install -c conda-forge matplotlib
  conda install -c bioconda segemehl==0.2.0
  conda install -c bioconda samtools
  conda install -c conda-forge biopython
  conda install -c bioconda viennarna


4. Configure the workflow

     The configuration of the workflow can be altered by modifying the config/config.yaml file.


    • For Arabidopsis thaliana

                     If Arabidopsis thaliana is to be used as the reference genome, no alteration in the config file will be required.

                     A folder containing trimmed FASTQ files can be added to the data/trimmed/ directory, with the input fastq filename required to be in the                      format of {xyz}_trimmed.fq, such as SRR2354321_trimmed.fq. The Search strategy in the config file is to be set to Default.



    • For Different Genome

                     A folder containing trimmed FASTQ files is to be added to the data/trimmed/ directory.

                     The FASTA sequence of the desired reference genome is to be added to data/Genome/.The genome feature table in .txt format is to be                      added to data/Feature_table/.

Adjust parameters in the config file for rRFs, and ensure genome file headers follow the correct format >chr[Num] for genomic, 'chrMt' and 'chrPt' for mitochondrial and plastid sequences.


5. Run the workflow

    The workflow can be executed after proper configuration and deployment, with the current working directory set to ~./rsRNAfinder/. To run, the     following command should be executed: snakemake --cores 8 -q



6. Results

    The intermediate/ and result/ directories will be generated with the intermediate directory containing the intermediate files generated during processing     and the results directory containing the output files.

    The result directory will contain several files, including:

          • csv file
              A .csv file contains rRF information, including rRNA details like length, sequence, counts, gene and genomic positions. The columns includes the               following details:
    Command lines

                Figure 2: The csv file format



          • html file
              An html file contains the abundant rRFs from individual genomic locations are considered and the count of all mapped reads is kept in the               creation of the HTML file from the above csv file with an additional column of dot-bracket notation.
   Command lines

                Figure 3: The html file format



          • tsv file
             A tsv file with two columns showing the count of occurrence of each rRF class in the sample.
   Command lines

               Figure 4: The tsv file format



          • Graphical Representation
              The plots will provide a better representation of the distribution of the results, including a pie plot, bar plot, and two box plots.
  Command lines

                Figure 5: The graphical representation of the rRFs