For example, it can be used to extract only the subject and the identifier (NCBI GI number) information from a BLAST result. Below is an illustration of how this can be done using LAPIS:
Before
>gi|126385999|gb|CP000521.1| Acinetobacter baumannii ATCC 17978, complete genome
Length = 3976747
Score = 570 bits (1470), Expect = e-163
Identities = 284/284 (100%), Positives = 284/284 (100%)
Frame = -2
Query: 1 LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT 60
LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT
Sbjct: 1766322 LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT 1766143
After
>gi|126385999
LNFKFNFISLMNIKALLLITSAIFISACSPYIVTANPNHSASKSDEKAEKIKNLFNEAHT
Methodology:
1. Select the line containing the fasta description together with the line containing the subject sequences: “line containing > or line containing sbjct” ->Tools ->Extract
2. To get rid of the numbers in the line containing sbjct: “digits in line containing sbjct” ->Tools -> Omit
3. To get rid of sbjct: “sbjct:” -> Extract -> Omit
4. To get rid of dashes: type “-" -> Extract -> Omit
5. To get rid of the extra spaces in the lines containing sequences: “spaces not in line containing >” -> Tools -> Omit
6. In case you want to clean up the description line to only have the GI
From second | in line containing > to start of linebreak
Screen shots: The following screen shots shows the input and the output at each step
data:image/s3,"s3://crabby-images/72f7c/72f7c4dcb61b0fd585c931a7b6f139f39dfb7fe1" alt=""
For extracting specific information, the user needs to find a pattern and type it in the pattern box as shown below:
data:image/s3,"s3://crabby-images/35aff/35affdcde00f49e21716958e1d6616b76be59031" alt=""
The pattern above is used to extract the two necessary lines for the further analysis.
data:image/s3,"s3://crabby-images/ed020/ed0206fccee86ed3f211412dd5989a8073c47dce" alt=""
Next the user should remove the positions (digits) in the subject line.
data:image/s3,"s3://crabby-images/b09c1/b09c1061877ed1ddda5e7ad73a214488bb2b237e" alt=""
The screen shot below shows the highlighted digits to be omitted.
data:image/s3,"s3://crabby-images/0dc5f/0dc5f2bd55007e2414e29ebd7add3b8e998f1cac" alt=""
The screen shot below shows the information after omitting the numbers from the subject line.
data:image/s3,"s3://crabby-images/94c0c/94c0c6e142786638d1598007902b0dfdd37ef611" alt=""
Next the word “subj:” should be omitted also.
data:image/s3,"s3://crabby-images/72fac/72faccf1bb7d4cb1ea827400c892ddc0fa36e6f0" alt=""
data:image/s3,"s3://crabby-images/d2c99/d2c996c073258bff3c2f05806627f9a5b3b0c8e1" alt=""
The screen shot below shows the pattern to remove any extra spaces in the subject line. In case of any gaps (-), they should also be removed.
data:image/s3,"s3://crabby-images/8da89/8da892fe08bb1aafd09c7800b0e49837822e4746" alt=""
data:image/s3,"s3://crabby-images/51654/51654c26c12a72cd9b1c091ae14f65027941b51b" alt=""
The screen shot below shows the pattern to remove extra information from the header line.
data:image/s3,"s3://crabby-images/42e70/42e70c4fd6da3c2f19c988411edf5387182dab57" alt=""
The screen shot below shows the required output.
data:image/s3,"s3://crabby-images/70006/700061673ac01ffea5330eb14ebfbd0622cf53d2" alt=""
No comments:
Post a Comment