Bioinformatics Tutlets: 19) How to remove duplicate sequences from a fasta formatted input file

Wednesday, May 13, 2009

19) How to remove duplicate sequences from a fasta formatted input file

===Update: 21 March 2011===

For a more comprehensive and updated information on this, please see Post 32.
Read all the way to the end of that post.

======================

You can use Jalview to easily check for duplicates and remove them if any.

1. Download and install Jalview on your home system from

http://www.jalview.org/download.html

2. Run Jalview and close all example windows

3. Load your fasta file to Jalview

4. Remove duplicates:

- Select all sequence

- Go ‘Edit’ to uncheck the pad gaps function.

- In ‘Edit’, select ‘Remove all gaps’

- After that select ‘Remove redundancy’

- At the “redundancy threshold selection” dialog box, set the threshold value to 100, click ‘Remove’.

5. Saving the unique fasta file and you are done!

Content by: Asif M. Khan & Sye Bee
Posted by: Sye Bee
Edited by: Asif M. Khan

3 comments:

UnknownMay 8, 2010 at 5:03 AM
great, I like it!
ReplyDelete
Replies
AnonymousOctober 8, 2011 at 12:47 AM
Thank ya for the help
ReplyDelete
Replies
Eslam SamirFebruary 18, 2017 at 4:01 PM
Here is my free program on Github **Sequence database curator**
(https://github.com/Eslam-Samir-Ragab/Sequence-database-curator)

It is a very fast program and it can deal with:

1. Nucleotide sequences
2. Protein sequences

It can work under Operating systems:

1. Windows
2. Mac
3. Linux

It also works for:

1. Fasta format
2. Fastq format

Best Regards
ReplyDelete
Replies

Add comment

Bioinformatics Tutlets

Wednesday, May 13, 2009

19) How to remove duplicate sequences from a fasta formatted input file

3 comments:

Related Sites

Contributors

Bioinformatics Tutlets

Wednesday, May 13, 2009

19) How to remove duplicate sequences from a fasta formatted input file

3 comments:

Subscribe To Tutlets

Related Sites

Contributors