PtRFdb - A database of Plant tRFs

Method

Data Procurement
Construction of tRNA BLAST database
Data Analysis and Refinement
Elucidation of tRFs

Data Procurement

Small RNA high throughput sequencing data and degradome sequencing data was retrieved using NCBI GEO database and NCBI SRA database. For small RNA sequencing data, each dataset was processed to fetch unique sequences with their respective cloning frequencies such that their cloning frequency should be greater than 9 and the length of sequences must fall between 14-100 nt.
Due to scarcity of processed data, degradome sequencing data was taken. Raw reads were processed by removing the adapter sequences from RNA data using Cutadapt v1.14 tool and only those sequences were acquired that had length between 14-100 nt, frequency greater than 100 and Phred score must be greater than 28.

Construction of tRNA BLAST database

tRNA information of specified plant species were generated using tRNAscan-SE 2.0 . Using tRNAscan-SE results and reference genome, precursor tRNAs were extracted 40bp upstream and downstream each for both negative (-) and positive (+) strand using in-house developed scripts. Negative (-) strands were converted to positive (+) strands. Since, tRNA nucleotidyltransferases (CCA-adding enzymes) are responsible for maturation of the functional 3' end of tRNAs so, 'CCA' was added to tRNA sequences procured from tRNAscan-SE to make mature tRNAs. BLAST database was created for combined sequences of tRNAs and precursors tRNAs for each species individually using NCBI blast-2.6.0.

Data Analysis and Refinement

Each sequence, from both, small RNA sequencing and degradome sequencing was used as a unique query against the autogenous customised tRNA blast database for individual species.

BLASTN was performed with default E value 10 and word-size 10 on plus (+) strand.

Hits were filtered out from the BLAST results that have 100% identity and no gaps. Also, for small RNA sequencing data, only those hits were considered that were mapped to tRNA database along 100% of their length.

If any fragment that aligns to both mature and precursor tRNA, then only those fragments were selected that hits the mature tRNA, to remove false positives.

Further, fragments with the highest frequency aligning with database for all tRNAs were extracted to abolish random hits.

Length filter was applied to retrieve significant fragments with length 15-28 nt.

Fragments aligning on 3' trailer region located at first base-pair of 3' trailer stem only, were selected to be considered as potential tRFs. Also, the fragments aligning at exact 5' end and 3' end of tRNA were considered as potential tRFs.

Elucidation of tRFs

Fragments aligning on 5' end of tRNA were annotated as tRF-5.

Fragments aligning on 3' end of tRNA were annotated as tRF-3.

While, fragments aligning on 3' trailer region of tRNA were annotated as tRF-1.

Each tRF was given a unique PtRFdb ID with prefix "PT" (e.g. PT-1000).

Plant tRF Database

Method

Data Procurement

Construction of tRNA BLAST database

Data Analysis and Refinement

Elucidation of tRFs