Assume that I have a text file named “text.txt” which hascontains like this:
>NZ_FNBK01000055.1 Halorientalis regularis strain IBRC-M10760, whole genome shotgun sequence
CCCTCCTCCAGGGCGGCCATGCCCCAGCCGTCGATCTCGTGGCCGTCGTCTCCGGTGACGACCTCGCGCAGCGTGGCGAC
GGCGTCCTCGATCCGGTCGCGGCTCTCTGCACCCGACCGGCCGAACGGATAGGTGGTCACACTGCCGACCGCGTGCTGAT
CCAGCGCTTCCTCAGCGCGCTCGACGTCGGTCGGTTCGTCGGGTTCGAAGCCGCGGAGGCCCCAGTCGTCGGGCATGTTG
>NZ_FNBK01000053.1 Halorientalis regularis strain IBRC-M 10760,whole genome shotgun sequence
GCGGTGCGGTTCGGGAAGCCTCGCCGTCGTCGGGCTACGCCCGACTGCTTGAGGGAGCTTCGCTCCCTCTCCGTTCACGG
CGAGGAGGAGGTCACGCCGTCACCAAGCGCGGCCGCCGGGAAATCGAGGCCCGGCGCGAGTGGGAACAGCAATACTTCGA
CTGGTAGGCACCCGCCCAGCCAGCCGCACCCTGGCCGCGATCCGACCGCTGTCGTGGAAACAGGCGTCCACACGCGACGA
So, how can I read the data from the above file that:
1. Extract the genome name from a line in the file that beginswith a greater than
sign; everything following the greater-than sign (and excludingthe newline at the
end of the line) is the name, so for the line
>NZ_FOJA01000002.1 Halobacterium jilantaiense strain CGMCC1.5337, whole genome shotgun sequence
the name would be
NZ_FOJA01000001.1 Halobacterium jilantaiense strain CGMCC1.5337, whole genome shotgun sequence
2. Extract the sequence of DNA bases
PayPal Gateway not configured
PayPal Gateway not configured