Wednesday, May 13, 2009

18) How to shorten the description line in fasta format to just include GI number

Some of the software like Phylip or Clustal (version 1.6) truncate the description line and only take the first 10 characters. This may result in duplicate description lines in your input sequence file. You could choose to just have the GI number in the description in order to avoid this problem as it has a maximum of only 9 characters. Follow the steps below to achieve this:

1. Open the Fasta formatted input file in MS Excel





2. Select column A and click on “Ctrl+H” for the replace function

3. You have to replace “gi|” with nothing, follow the steps below:

a. Under “Find what”, type “gi|”, and leave “Replace with” blank

b. Click on “Replace All”



You will see this upon successful replacement:



4. Next, replace everything after the “|” symbol with nothing:

a. Under “Find what”, type “|*”, and leave “Replace with” blank

b. Click on “Replace All”



Output upon successful replacement. Now you only have the GI numbers left:



5. Save the file by clicking on the Save icon or “File” à “Save”

6. Click on “Yes” when the warning prompt pops-up



7. Close Excel

8. Ignore the prompt that pops-up by clicking on “No”



9. Open your fasta file in Notepad to check whether only the GIs are left on the description line, if yes, you are done!




Content by: Asif M. Khan & Sye Bee
Posted by: Sye Bee
Edited by: Asif M. Khan

No comments:

Post a Comment