[go: nahoru, domu]

US10011849B1 - Nucleic acid-guided nucleases - Google Patents

Nucleic acid-guided nucleases Download PDF

Info

Publication number
US10011849B1
US10011849B1 US15/631,989 US201715631989A US10011849B1 US 10011849 B1 US10011849 B1 US 10011849B1 US 201715631989 A US201715631989 A US 201715631989A US 10011849 B1 US10011849 B1 US 10011849B1
Authority
US
United States
Prior art keywords
nucleic acid
sequence
seq
nuclease
editing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/631,989
Inventor
Ryan T. Gill
Andrew Garst
Tanya Elizabeth Warnecke LIPSCOMB
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inscripta Inc
Original Assignee
Inscripta Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inscripta Inc filed Critical Inscripta Inc
Priority to US15/631,989 priority Critical patent/US10011849B1/en
Assigned to MUSE BIOTECHNOLOGY, INC. reassignment MUSE BIOTECHNOLOGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GILL, RYAN T., LIPSCOMB, TANYA ELIZABETH WARNECKE, GARST, Andrew
Assigned to INSCRIPTA, INC. reassignment INSCRIPTA, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: MUSE BIOTECHNOLOGY, INC.
Priority to US15/896,433 priority patent/US10435714B2/en
Priority to EP21167880.0A priority patent/EP3916086A1/en
Priority to MX2019015047A priority patent/MX2019015047A/en
Priority to RU2022103603A priority patent/RU2022103603A/en
Priority to PCT/US2018/034779 priority patent/WO2018236548A1/en
Priority to EP18821213.8A priority patent/EP3642334B1/en
Priority to NZ760730A priority patent/NZ760730A/en
Priority to KR1020217035078A priority patent/KR102558931B1/en
Priority to RU2020102451A priority patent/RU2769475C2/en
Priority to KR1020207002319A priority patent/KR102321388B1/en
Priority to JP2019571011A priority patent/JP7136816B2/en
Priority to AU2018289077A priority patent/AU2018289077B2/en
Priority to CA3067951A priority patent/CA3067951A1/en
Priority to CN201880054732.5A priority patent/CN111511906A/en
Priority to HUE18821213A priority patent/HUE066467T2/en
Priority to ES18821213T priority patent/ES2971549T3/en
Publication of US10011849B1 publication Critical patent/US10011849B1/en
Application granted granted Critical
Priority to US16/548,631 priority patent/US10626416B2/en
Priority to IL271342A priority patent/IL271342A/en
Priority to US16/819,896 priority patent/US20200231987A1/en
Priority to US17/179,193 priority patent/US11130970B2/en
Priority to US17/387,860 priority patent/US11220697B2/en
Priority to US17/554,736 priority patent/US11306327B1/en
Priority to US17/692,069 priority patent/US20220195464A1/en
Priority to AU2022202248A priority patent/AU2022202248B2/en
Priority to JP2022138875A priority patent/JP2022169775A/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/111General methods applicable to biologically active non-coding nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/80Vectors or expression systems specially adapted for eukaryotic hosts for fungi
    • C12N15/81Vectors or expression systems specially adapted for eukaryotic hosts for fungi for yeasts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/905Stable introduction of foreign DNA into chromosome using homologous recombination in yeast
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • C12N15/8509Vectors or expression systems specially adapted for eukaryotic hosts for animal cells for producing genetically modified animals, e.g. transgenic
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/22Vectors comprising a coding region that has been codon optimised for expression in a respective host

Definitions

  • a method of modifying a target region in the genome of a cell comprising: (a) contacting a cell with: a non-naturally occurring nucleic-acid-guided nuclease encoded by a nucleic acid having at least 80% identity to SEQ ID NO: 22; an engineered guide nucleic acid capable of complexing with the nucleic acid-guided nuclease; and an editing sequence encoding a nucleic acid complementary to said target region having a change in sequence relative to the target region; and (b) allowing the nuclease, guide nucleic acid, and editing sequence to create a genome edit in a target region of the genome of the cell.
  • the engineered guide nucleic acid and the editing sequence are provided as a single nucleic acid.
  • the single nucleic acid further comprises a mutation in a protospacer adjacent motif (PAM) site.
  • PAM protospacer adjacent motif
  • the nucleic acid-guided nuclease is encoded by a nucleic acid with at least 85% identity to SEQ ID NO: 42.
  • the nucleic acid-guided nuclease is encoded by a nucleic acid with at least 85% identity to SEQ ID NO: 128.
  • nucleic acid-guided nuclease systems comprising: (a) a non-naturally occurring nuclease encoded by a nucleic acid having at least 80% identity to SEQ ID NO: 22; (b) an engineered guide nucleic acid capable of complexing with the nucleic acid-guided nuclease, and (c) an editing sequence having a change in sequence relative to the sequence of a target region in a genome of a cell; wherein the system results in a genome edit in the target region in the genome of the cell facilitated by the nuclease, the engineered guide nucleic acid, and the editing sequence.
  • FIG. 1A depicts a partial sequence alignment MAD1-8 (SEQ ID NO: 1-8) and MAD10-12 (SEQ ID NO: 10-12).
  • FIG. 1A discloses residues 703-707 of SEQ ID NO: 1, residues 625-629 of SEQ ID NO: 2, residues 587-591 of SEQ ID NO: 3, residues 654-658 of SEQ ID NO: 4, residues 581-585 of SEQ ID NO: 5, residues 637-641 of SEQ ID NO: 6, residues 590-594 of SEQ ID NO: 7, residues 645-649 of SEQ ID NO: 8, SEQ ID NO: 205, residues 619-623 of SEQ ID NO: 10 and residues 603-607 of SEQ ID NO: 12, all disclosed respectively, in order of appearance.
  • FIG. 2 depicts an example protein expression construct.
  • FIG. 2 discloses “6 ⁇ -His” as SEQ ID NO: 182.
  • FIG. 3 depicts an example editing cassette.
  • FIG. 3 discloses SEQ ID NOS 183-185, respectively, in order of appearance.
  • FIG. 4 depicts an example screening or selection experiment workflow.
  • FIG. 5B depicts an example editing cassette.
  • FIG. 5C depicts an example screening or selection experiment workflow.
  • FIG. 6A depicts an example protein expression construct.
  • FIG. 6B depicts an example editing cassette.
  • FIG. 7A-7B depicts example data from a functional nuclease complex screening or selection experiment.
  • FIG. 8 depicts example data from a targetable nuclease complex-based editing experiment.
  • FIG. 14A-14B depict example data from a primer validation experiment.
  • FIG. 16 depicts example validation data comparing results from two different assays.
  • FIG. 17A-17C depict an example trackable genetic engineering workflow, including a plasmid comprising an editing cassette and a recording cassette, and downstream sequencing of barcodes in order to identify the incorporated edit or mutation.
  • FIG. 17B discloses SEQ ID NOS 203 and 204, respectively, in order of appearance.
  • the present disclosure provides nucleic acid-guided nucleases and methods of use.
  • the subject nucleic-acid guided nucleases are part of a targetable nuclease system comprising a nucleic acid-guided nuclease and a guide nucleic acid.
  • a subject targetable nuclease system can be used to cleave, modify, and/or edit a target polynucleotide sequence, often referred to as a target sequence.
  • a subject targetable nuclease system refers collectively to transcripts and other elements involved in the expression of or directing the activity of genes, which may include sequences encoding a subject nucleic acid-guided nuclease protein and a guide nucleic acid as disclosed herein.
  • Methods, systems, vectors, polynucleotides, and compositions described herein may be used in various applications including altering or modifying synthesis of a gene product, such as a protein, polynucleotide cleavage, polynucleotide editing, polynucleotide splicing; trafficking of target polynucleotide, tracing of target polynucleotide, isolation of target polynucleotide, visualization of target polynucleotide, etc.
  • aspects of the invention also encompass methods and uses of the compositions and systems described herein in genome engineering, e.g. for altering or manipulating the expression of one or more genes or the one or more gene products, in prokaryotic, archaeal, or eukaryotic cells, in vitro, in vivo or ex vivo.
  • Non-naturally occurring targetable nucleases and non-naturally occurring targetable nuclease systems can address many of these challenges and limitations.
  • Non-naturally targetable nuclease systems are engineered to address one or more of the challenges described above and can be referred to as engineered nuclease systems.
  • Engineered nuclease systems can comprise one or more of an engineered nuclease, such as an engineered nucleic acid-guided nuclease, an engineered guide nucleic acid, an engineered polynucleotides encoding said nuclease, or an engineered polynucleotides encoding said guide nucleic acid.
  • Engineered nucleases, engineered guide nucleic acids, and engineered polynucleotides encoding the engineered nuclease or engineered guide nucleic acid are not naturally occurring and are not found in nature. It follows that engineered nuclease systems including one or more of these elements are non-naturally occurring.
  • non-naturally occurring nucleic acid sequences which are disclosed herein include sequences codon optimized for expression in bacteria, such as E. coli (e.g., SEQ ID NO: 41-60), sequences codon optimized for expression in single cell eukaryotes, such as yeast (e.g., SEQ ID NO: 127-146), sequences codon optimized for expression in multi cell eukaryotes, such as human cells (e.g., SEQ ID NO: 147-166), polynucleotides used for cloning or expression of any sequences disclosed herein (e.g., SEQ ID NO: 61-80), plasmids comprising nucleic acid sequences (e.g., SEQ ID NO: 21-40) operably linked to a heterologous promoter or nuclear localization signal or other heterologous element, proteins generated from engineered or codon optimized nucleic acid sequences (e.g., SEQ ID NO: 1-20), or engineered guide nucleic acids comprising any one of SEQ ID NO
  • nucleic acid-guided nucleases Disclosed herein are nucleic acid-guided nucleases.
  • Subject nucleases are functional in vitro, or in prokaryotic, archaeal, or eukaryotic cells for in vitro, in vivo, or ex vivo applications.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus which includes but is not limited to Thiomicrospira, Succinivibrio, Candidatus, Porphyromonas, Acidaminococcus, Acidomonococcus, Prevotella, Smithella, Moraxella, Synergistes, Francisella, Leptospira, Catenibacterium, Kandleria, Clostridium, Dorea, Coprococcus, Enterococcus, Fructobacillus, Weissella, Pediococcus, Corynebacter, Sutterella, Legionella, Treponema, Roseburia, Filifactor, Eubacterium, Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flaviivola, Flavobacterium, Sphaerochaeta, Azospirillum, Gluconacetobacter, Neisseria, Roseburia, Parvibaculum, Staphy
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a kingdom which includes but is not limited to Firmicute, Actinobacteria, Bacteroidetes, Proteobacteria, Spirochates, and Tenericutes.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a phylum which includes but is not limited to Erysipelotrichia, Clostridia, Bacilli, Actinobacteria, Bacteroidetes, Flavobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Deltaproteobacteria, Epsilonproteobacteria, Spirochaetes, and Mollicutes.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within an order which includes but is not limited to Clostridiales, Lactobacillales, Actinomycetales, Bacteroidales, Flavobacteriales, Rhizobiales, Rhodospirillales, Burkholderiales, Neisseriales, Legionellales, Nautiliales, Campylobacterales, Spirochaetales, Mycoplasmatales, and Thiotrichales.
  • Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a family which includes but is not limited to Lachnospiraceae, Enterococcaceae, Leuconostocaceae, Lactobacillaceae, Streptococcaceae, Peptostreptococcaceae, Staphylococcaceae, Eubacteriaceae, Corynebacterineae, Bacteroidaceae, Flavobacterium , Cryomoorphaceae, Rhodobiaceae, Rhodospirillaceae, Acetobacteraceae, Sutterellaceae, Neisseriaceae, Legionellaceae, Nautiliaceae, Campylobacteraceae, Spirochaetaceae, Mycoplasmataceae, Pisciririckettsiaceae, and Francisellaceae.
  • nucleic acid-guided nucleases have been describe in US Patent Application Publication No. US20160208243 filed Dec. 18, 2015, US Application Publication No. US20140068797 filed Mar. 15, 2013, U.S. Pat. No. 8,697,359 filed Oct. 15, 2013, and Zetsche et al., Cell 2015 Oct. 22; 163(3):759-71, each of which are incorporated herein by reference in their entirety.
  • nucleic acid-guided nucleases suitable for use in the methods, systems, and compositions of the present disclosure include those derived from an organism such as, but not limited to, Thiomicrospira sp. XS5, Eubacterium rectale, Succinivibrio dextrinosolvens, Candidatus Methanoplasma termitum, Candidatus Methanomethylophilus alvus, Porphyromonas crevioricanis, Flavobacterium branchiophilum, Acidaminococcus Sp., Acidomonococcus sp., Lachnospiraceae bacterium COE1, Prevotella brevis ATCC 19188 , Smithella sp.
  • SCADC Moraxella bovoculi, Synergistes jonesii , Bacteroidetes oral taxon 274, Francisella tularensis, Leptospira inadai serovar Lyme str. 10 , Acidomonococcus sp. crystal structure (5B43) S. mutans, S. agalactiae, S. equisimilis, S. sanguinis, S. pneumonia; C. jejuni, C. coli; N. salsuginis, N. tergarcus; S. auricularis, S. carnosus; N. meningitides, N. gonorrhoeae; L. monocytogenes, L.
  • Lachnospiraceae bacterium MA2020 Lachnospiraceae bacterium MA2020 , Candidatus Methanoplasma termitum, Eubacterium eligens, Moraxella bovoculi 237 , Leptospira inadai , Lachnospiraceae bacterium ND2006, Porphyromonas crevioricanis 3, Prevotella disiens, Porphyromonas macacae, Catenibacterium sp.
  • EFB-N1 Weissella halotolerans, Pediococcus acidilactici, Lactobacillus curvatus, Streptococcus pyogenes, Lactobacillus versmoldensis, Filifactor alocis ATCC 35896, Alicyclobacillus acidoterrestris, Alicyclobacillus acidoterrestris ATCC 49025, Desulfovibrio inopinatus, Desulfovibrio inopinatus DSM 10711 , Oleiphilus sp. Oleiphilus sp.
  • a nucleic acid-guided nuclease disclosed herein comprises an amino acid sequence comprising at least 50% amino acid identity to any one of SEQ ID NO: 1-20. In some instances, a nuclease comprises an amino acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% amino acid identity to any one of SEQ ID NO: 1-20. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to any one of SEQ ID NO: 1-20.
  • the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to any one of SEQ ID NO: 1-8 or 10-12. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to any one of SEQ ID NO: 1-8 or 10-11. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to SEQ ID NO: 2. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to SEQ ID NO: 7.
  • the nucleic acid-guided nuclease comprises any one of SEQ ID NO: 1-20. In some cases, the nucleic acid-guided nuclease comprises any one of SEQ ID NO: 1-8 or 10-12. In some cases, the nucleic acid-guided nuclease comprises any one of SEQ ID NO: 1-8 or 10-11. In some cases, the nucleic acid-guided nuclease comprises SEQ ID NO: 2. In some cases, the nucleic acid-guided nuclease comprises SEQ ID NO: 7.
  • a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 50% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110.
  • a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 45% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 40% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 35% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 30% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110.
  • a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 21-40. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 21-40.
  • a nuclease is encoded by a nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-40.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-40.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-28 or 30-32. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-28 or 30-31.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to SEQ ID NO: 22. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to SEQ ID NO: 27.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 21-40. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 21-28 or 30-32. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 21-28 or 30-31. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 22. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 27.
  • a nucleic acid-guided nuclease disclosed herein is encoded on a nucleic acid sequence.
  • a nucleic acid can be codon optimized for expression in a desired host cell.
  • Suitable host cells can include, as non-limiting examples, prokaryotic cells such as E. coli, P. aeruginosa, B. subtilus, and V. natriegens , and eukaryotic cells such as S. cerevisiae , plant cells, insect cells, nematode cells, amphibian cells, fish cells, or mammalian cells, including human cells.
  • a nucleic acid sequence encoding a nucleic acid-guided nuclease can be codon optimized for expression in gram positive bacteria, e.g., Bacillus subtilis , or gram negative bacteria, e.g., E. coli .
  • a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 41-60.
  • a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 41-60.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 41-60.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 41-48 or 50-52. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 41-48 or 50-51.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 42. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 47.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 41-60. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 41-48 or 50-52. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 41-48 or 50-51. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 42. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 47.
  • a nucleic acid sequence encoding a nucleic acid-guided nuclease can be codon optimized for expression in a species of yeast, e.g., S. cerevisiae .
  • a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 127-146.
  • a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 127-146.
  • a nuclease is encoded by a nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-146.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-146.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-134 or 136-138. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-134 or 136-137.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 128. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 133.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 127-146. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 127-134 or 136-138. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 127-134 or 136-137. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 128. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 133.
  • a nucleic acid sequence encoding a nucleic acid-guided nuclease can be codon optimized for expression in mammalian cells.
  • a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 147-166.
  • a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 147-166.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 147-166. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 147-154 or 156-158.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 147-154 or 156-157. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 148.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 153.
  • the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 147-166. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 147-154 or 156-158. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 147-154 or 156-157. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 148. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 153.
  • a nucleic acid sequence encoding a nucleic acid-guided nuclease can be operably linked to a promoter.
  • Such nucleic acid sequences can be linear or circular.
  • the nucleic acid sequences can be comprised on a larger linear or circular nucleic acid sequences that comprises additional elements such as an origin of replication, selectable or screenable marker, terminator, other components of a targetable nuclease system, such as a guide nucleic acid, or an editing or recorder cassette as disclosed herein.
  • These larger nucleic acid sequences can be recombinant expression vectors, as are described in more detail later.
  • a guide nucleic acid can complex with a compatible nucleic acid-guided nuclease and can hybridize with a target sequence, thereby directing the nuclease to the target sequence.
  • a subject nucleic acid-guided nuclease capable of complexing with a guide nucleic acid can be referred to as a nucleic acid-guided nuclease that is compatible with the guide nucleic acid.
  • a guide nucleic acid capable of complexing with a nucleic acid-guided nuclease can be referred to as a guide nucleic acid that is compatible with the nucleic acid-guided nucleases.
  • a guide nucleic acid can be DNA.
  • a guide nucleic acid can be RNA.
  • a guide nucleic acid can comprise both DNA and RNA.
  • a guide nucleic acid can comprise modified of non-naturally occurring nucleotides.
  • the RNA guide nucleic acid can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
  • a guide nucleic acid can comprise a guide sequence.
  • a guide sequence is a polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence.
  • the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences.
  • a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 nucleotides long. The guide sequence can be 15-20 nucleotides in length. The guide sequence can be 15 nucleotides in length. The guide sequence can be 16 nucleotides in length. The guide sequence can be 17 nucleotides in length. The guide sequence can be 18 nucleotides in length. The guide sequence can be 19 nucleotides in length. The guide sequence can be 20 nucleotides in length.
  • a guide nucleic acid can comprise a scaffold sequence.
  • a “scaffold sequence” includes any sequence that has sufficient sequence to promote formation of a targetable nuclease complex, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease and a guide nucleic acid comprising a scaffold sequence and a guide sequence.
  • Sufficient sequence within the scaffold sequence to promote formation of a targetable nuclease complex may include a degree of complementarity along the length of two sequence regions within the scaffold sequence, such as one or two sequence regions involved in forming a secondary structure. In some cases, the one or two sequence regions are comprised or encoded on the same polynucleotide.
  • the one or two sequence regions are comprised or encoded on separate polynucleotides.
  • Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the one or two sequence regions.
  • the degree of complementarity between the one or two sequence regions along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
  • at least one of the two sequence regions is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • a scaffold sequence of a subject guide nucleic acid can comprise a secondary structure.
  • a secondary structure can comprise a pseudoknot region.
  • binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by secondary structures within the scaffold sequence.
  • binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by nucleic acid sequence with the scaffold sequence.
  • a scaffold sequence can comprise the sequence of any one of SEQ ID NO: 84-107.
  • a scaffold sequence can comprise the sequence of any one of SEQ ID NO: 84-103.
  • a scaffold sequence can comprise the sequence of any one of SEQ ID NO: 84-91 or 93-95.
  • a scaffold sequence can comprise the sequence of any one of SEQ ID NO: 88, 93, 94, or 95.
  • a scaffold sequence can comprise the sequence of SEQ ID NO: 88.
  • a scaffold sequence can comprise the sequence of SEQ ID NO: 93.
  • a scaffold sequence can comprise the sequence of SEQ ID NO: 94.
  • a scaffold sequence can comprise the sequence of SEQ ID NO: 95.
  • the invention provides a nuclease that binds to a guide nucleic acid comprising a conserved scaffold sequence.
  • the nucleic acid-guided nucleases for use in the present disclosure can bind to a conserved pseudoknot region as shown in FIG. 13A .
  • the nucleic acid-guided nucleases for use in the present disclosure can bind to a guide nucleic acid comprising a conserved pseudoknot region as shown in FIG. 13A .
  • nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-1 (SEQ ID NO: 172).
  • Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-3 (SEQ ID NO: 173).
  • nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-4 (SEQ ID NO: 174).
  • Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-5 (SEQ ID NO: 175).
  • nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-6 (SEQ ID NO: 176). Still other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-7 (SEQ ID NO: 177).
  • nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-11 (SEQ ID NO: 180).
  • Certain nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-12 (SEQ ID NO: 181).
  • a guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 84-107.
  • a guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 84-103.
  • a guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 84-91 or 93-95.
  • a guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 88, 93, 94, or 95.
  • a guide nucleic acid can comprise the sequence of SEQ ID NO: 88.
  • a guide nucleic acid can comprise the sequence of SEQ ID NO: 93.
  • a guide nucleic acid can comprise the sequence of SEQ ID NO: 94.
  • a guide nucleic acid can comprise the sequence of SEQ ID NO: 95.
  • guide nucleic acid refers to one or more polynucleotides comprising 1) a guide sequence capable of hybridizing to a target sequence and 2) a scaffold sequence capable of interacting with or complexing with an nucleic acid-guided nuclease as described herein.
  • a guide nucleic acid may be provided as one or more nucleic acids.
  • the guide sequence and the scaffold sequence are provided as a single polynucleotide.
  • a guide nucleic acid can be compatible with a nucleic acid-guided nuclease when the two elements can form a functional targetable nuclease complex capable of cleaving a target sequence.
  • a compatible scaffold sequence for a compatible guide nucleic acid can be found by scanning sequences adjacent to a native nucleic acid-guided nuclease loci.
  • native nucleic acid-guided nucleases can be encoded on a genome within proximity to a corresponding compatible guide nucleic acid or scaffold sequence.
  • Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
  • Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features.
  • Common features can include sequence outside a pseudoknot region.
  • Common features can include a pseudoknot region.
  • Common features can include a primary sequence or secondary structure.
  • a guide nucleic acid can be engineered to target a desired target sequence by altering the guide sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence.
  • a guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid.
  • Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
  • a targetable nuclease system can comprise a nucleic acid-guided nuclease and a compatible guide nucleic acid.
  • a targetable nuclease system can comprise a nucleic acid-guided nuclease or a polynucleotide sequence encoding the nucleic acid-guided nuclease.
  • a targetable nuclease system can comprise a guide nucleic acid or a polynucleotide sequence encoding the guide nucleic acid.
  • a targetable nuclease system as disclosed herein is characterized by elements that promote the formation of a targetable nuclease complex at the site of a target sequence, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease and a guide nucleic acid.
  • a guide nucleic acid together with a nucleic acid-guided nuclease forms a targetable nuclease complex which is capable of binding to a target sequence within a target polynucleotide, as determined by the guide sequence of the guide nucleic acid.
  • a targetable nuclease complex binds to a target sequence as determined by the guide nucleic acid, and the nuclease has to recognize a protospacer adjacent motif (PAM) sequence adjacent to the target sequence.
  • PAM protospacer adjacent motif
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid.
  • the guide nucleic acid can comprise a scaffold sequence compatible with the nucleic acid-guided nuclease.
  • the guide nucleic acid can further comprise a guide sequence.
  • the guide sequence can be engineered to target any desired target sequence.
  • the guide sequence can be engineered to be complementary to any desired target sequence.
  • the guide sequence can be engineered to hybridize to any desired target sequence.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-20 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 84-107.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-8 or 10-12 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 84-95.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-8 or 10-11 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 84-91 or 93-95.
  • the guide nucleic acid can further comprise a guide sequence.
  • the guide sequence can be engineered to target any desired target sequence.
  • the guide sequence can be engineered to be complementary to any desired target sequence.
  • the guide sequence can be engineered to hybridize to any desired target sequence.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 88, 93, 94, or 95.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 88.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 93.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 94.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 95.
  • the guide nucleic acid can further comprise a guide sequence.
  • the guide sequence can be engineered to target any desired target sequence.
  • the guide sequence can be engineered to be complementary to any desired target sequence.
  • the guide sequence can be engineered to hybridize to any desired target sequence.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 88, 93, 94, or 95.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 88.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 93.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 94.
  • a targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 95.
  • the guide nucleic acid can further comprise a guide sequence.
  • the guide sequence can be engineered to target any desired target sequence.
  • the guide sequence can be engineered to be complementary to any desired target sequence.
  • the guide sequence can be engineered to hybridize to any desired target sequence.
  • a target sequence of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro.
  • the target sequence can be a polynucleotide residing in the nucleus of the eukaryotic cell.
  • a target sequence can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA).
  • a gene product e.g., a protein
  • a non-coding sequence e.g., a regulatory polynucleotide or a junk DNA
  • PAMs are typically 2-5 base pair sequences adjacent the target sequence. Examples of PAM sequences are given in the examples section below, and the skilled person will be able to identify further PAM sequences for use with a given nucleic acid-guided nuclease. Further, engineering of the PAM Interacting (PI) domain may allow programming of PAM specificity, improve target site recognition fidelity, and increase the versatility of a nucleic acid-guided nuclease genome engineering platform. Nucleic acid-guided nucleases may be engineered to alter their PAM specificity, for example as described in Kleinstiver B P et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul. 23; 523 (7561): 481-5. doi: 10.1038/nature14592.
  • a PAM site is a nucleotide sequence in proximity to a target sequence. In most cases, a nucleic acid-guided nuclease can only cleave a target sequence if an appropriate PAM is present. PAMs are nucleic acid-guided nuclease-specific and can be different between two different nucleic acid-guided nucleases. A PAM can be 5′ or 3′ of a target sequence. A PAM can be upstream or downstream of a target sequence. A PAM can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides in length. Often, a PAM is between 2-6 nucleotides in length.
  • a PAM can be provided on a separate oligonucleotide.
  • providing PAM on a oligonucleotide allows cleavage of a target sequence that otherwise would not be able to be cleave because no adjacent PAM is present on the same polynucleotide as the target sequence.
  • Polynucleotide sequences encoding a component of a targetable nuclease system can comprise one or more vectors.
  • the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked.
  • Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art.
  • vector refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques.
  • viral vector wherein virally-derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g. retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses).
  • Viral vectors also include polynucleotides carried by a virus for transfection into a host cell. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g.
  • bacterial vectors having a bacterial origin of replication and episomal mammalian vectors.
  • Other vectors e.g., non-episomal mammalian vectors
  • Other vectors are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome.
  • certain vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as “expression vectors.”
  • Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. Further discussion of vectors is provided herein.
  • Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed.
  • “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g. in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell).
  • a regulatory element is operably linked to one or more elements of a targetable nuclease system so as to drive expression of the one or more components of the targetable nuclease system.
  • a vector comprises a regulatory element operably linked to a polynucleotide sequence encoding a nucleic acid-guided nuclease.
  • the polynucleotide sequence encoding the nucleic acid-guided nuclease can be codon optimized for expression in particular cells, such as prokaryotic or eukaryotic cells.
  • Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells.
  • Eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammal including non-human primate.
  • codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence.
  • codon bias differs in codon usage between organisms
  • mRNA messenger RNA
  • tRNA transfer RNA
  • Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp/codon/(visited Jul. 9, 2002), and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000).
  • codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available.
  • one or more codons e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons
  • one or more codons in a sequence encoding an engineered nuclease correspond to the most frequently used codon for a particular amino acid.
  • a vector encodes a nucleic acid-guided nuclease comprising one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs.
  • the engineered nuclease comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g. one or more NLS at the amino-terminus and one or more NLS at the carboxy terminus).
  • the engineered nuclease comprises at most 6 NLSs.
  • an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus.
  • Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 111); the NLS from nucleoplasmin (e.g.
  • the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO:112)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO:113) or RQRRNELKRSP (SEQ ID NO:114); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 115); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO:1 116) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:117) and PPKKARED (SEQ ID NO:118) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:119) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO:120) of mouse c-abl IV; the
  • the one or more NLSs are of sufficient strength to drive accumulation of the nucleic acid-guided nuclease in a detectable amount in the nucleus of a eukaryotic cell.
  • strength of nuclear localization activity may derive from the number of NLSs, the particular NLS(s) used, or a combination of these factors.
  • Detection of accumulation in the nucleus may be performed by any suitable technique.
  • a detectable marker may be fused to the nucleic acid-guided nuclease, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI).
  • Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of the nucleic acid-guided nuclease complex formation (e.g.
  • nucleic acid-guided nuclease activity assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by targetable nuclease complex formation and/or nucleic acid-guided nuclease activity), as compared to a control not exposed to the nucleic acid-guided nuclease or targetable nuclease complex, or exposed to a nucleic acid-guided nuclease lacking the one or more NLS s.
  • a nucleic acid-guided nuclease and one or more guide nucleic acids can be delivered either as DNA or RNA. Delivery of an nucleic acid-guided nuclease and guide nucleic acid both as RNA (unmodified or containing base or backbone modifications) molecules can be used to reduce the amount of time that the nucleic acid-guided nuclease persist in the cell. This may reduce the level of off-target cleavage activity in the target cell.
  • nucleic acid-guided nuclease as mRNA takes time to be translated into protein, it might be advantageous to deliver the guide nucleic acid several hours following the delivery of the nucleic acid-guided nuclease mRNA, to maximize the level of guide nucleic acid available for interaction with the nucleic acid-guided nuclease protein.
  • the nucleic acid-guided nuclease mRNA and guide nucleic acid are delivered concomitantly.
  • the guide nucleic acid is delivered sequentially, such as 0.5, 1, 2, 3, 4, or more hours after the nucleic acid-guided nuclease mRNA.
  • nucleic acid-guided nuclease as mRNA and guide nucleic acid in the form of a DNA expression cassette with a promoter driving the expression of the guide nucleic acid. This way the amount of guide nucleic acid available will be amplified via transcription.
  • Guide nucleic acid in the form of RNA or encoded on a DNA expression cassette can be introduced into a host cell comprising an nucleic acid-guided nuclease encoded on a vector or chromosome.
  • the guide nucleic acid may be provided in the cassette one or more polynucleotides, which may be contiguous or non-contiguous in the cassette. In specific embodiments, the guide nucleic acid is provided in the cassette as a single contiguous polynucleotide.
  • a variety of delivery systems can be used to introduce a nucleic acid-guided nuclease (DNA or RNA) and guide nucleic acid (DNA or RNA) into a host cell.
  • these include the use of yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires (Shalek et al., Nano Letters, 2012), exosomes.
  • Molecular trojan horses liposomes may be used to deliver an engineered nuclease and guide nuclease across the blood brain barrier.
  • a editing template is also provided.
  • a editing template may be a component of a vector as described herein, contained in a separate vector, or provided as a separate polynucleotide, such as an oligonucleotide, linear polynucleotide, or synthetic polynucleotide.
  • a editing template is on the same polynucleotide as a guide nucleic acid.
  • a editing template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-guided nuclease as a part of a complex as disclosed herein.
  • a editing template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length.
  • the editing template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence.
  • a editing template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides).
  • the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
  • an editing template comprises at least one mutation compared to the target sequence.
  • An editing template can comprise an insertion, deletion, modification, or any combination thereof compared to the target sequence. Examples of some editing templates are described in more detail in a later section.
  • the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors or linear polynucleotides as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell.
  • the invention further provides cells produced by such methods, and organisms comprising or produced from such cells.
  • an engineered nuclease in combination with (and optionally complexed with) a guide nucleic acid is delivered to a cell.
  • Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome.
  • Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
  • Methods of non-viral delivery of nucleic acids include lipofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
  • Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., TransfectamTM and LipofectinTM).
  • Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Felgner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
  • lipid:nucleic acid complexes including targeted liposomes such as immunolipid complexes
  • the preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).
  • RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in culture or in the host and trafficking the viral payload to the nucleus or host cell genome.
  • Viral vectors can be administered directly to cells in culture, patients (in vivo), or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo).
  • Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
  • Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression.
  • Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700).
  • MiLV murine leukemia virus
  • GaLV gibbon ape leukemia virus
  • SIV Simian Immuno deficiency virus
  • HAV human immuno deficiency virus
  • adenoviral based systems may be used.
  • Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system.
  • Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No.
  • a host cell is transiently or non-transiently transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein.
  • a cell is transfected as it naturally occurs in a subject.
  • a cell that is transfected is taken from a subject.
  • the cell is derived from cells taken from a subject, such as a cell line.
  • a cell transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein is used to establish a new cell line comprising one or more transfection-derived sequences.
  • a cell transiently transfected with the components of an engineered nucleic acid-guided nuclease system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of an engineered nuclease complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence.
  • one or more vectors described herein are used to produce a non-human transgenic cell, organism, animal, or plant.
  • the transgenic animal is a mammal, such as a mouse, rat, or rabbit.
  • Methods for producing transgenic cells, organisms, plants, and animals are known in the art, and generally begin with a method of cell transformation or transfection, such as described herein.
  • target sequence refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a engineered nuclease complex.
  • a target sequence may comprise any polynucleotide, such as DNA, RNA, or a DNA-RNA hybrid.
  • a target sequence can be located in the nucleus or cytoplasm of a cell.
  • a target sequence can be located in vitro or in a cell-free environment.
  • one or more vectors driving expression of one or more components of a targetable nuclease system are introduced into a host cell or in vitro such formation of a targetable nuclease complex at one or more target sites.
  • a nucleic acid-guided nuclease and a guide nucleic acid could each be operably linked to separate regulatory elements on separate vectors.
  • two or more of the elements expressed from the same or different regulatory elements may be combined in a single vector, with one or more additional vectors providing any components of the targetable nuclease system not included in the first vector.
  • Targetable nuclease system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element.
  • the coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction.
  • a single promoter drives expression of a transcript encoding a nucleic acid-guided nuclease and one or more guide nucleic acids.
  • a nucleic acid-guided nuclease and one or more guide nucleic acids are operably linked to and expressed from the same promoter.
  • one or more guide nucleic acids or polynucleotides encoding the one or more guide nucleic acids are introduced into a cell or in vitro environment already comprising a nucleic acid-guided nuclease or polynucleotide sequence encoding the nucleic acid-guided nuclease.
  • a single expression construct may be used to target nuclease activity to multiple different, corresponding target sequences within a cell or in vitro.
  • a single vector may comprise about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more guide sequences. In some embodiments, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more such guide-sequence-containing vectors may be provided, and optionally delivered to a cell or in vitro.
  • Methods and compositions disclosed herein may comprise more than one guide nucleic acid, wherein each guide nucleic acid has a different guide sequence, thereby targeting a different target sequence.
  • multiple guide nucleic acids can be using in multiplexing, wherein multiple targets are targeted simultaneously.
  • the multiple guide nucleic acids are introduced into a population of cells, such that each cell in a population received a different or random guide nucleic acid, thereby targeting multiple different target sequences across a population of cells.
  • the collection of subsequently altered cells can be referred to as a library.
  • Methods and compositions disclosed herein may comprise multiple different nucleic acid-guided nucleases, each with one or more different corresponding guide nucleic acids, thereby allowing targeting of different target sequences by different nucleic acid-guided nucleases.
  • each nucleic acid-guided nuclease can correspond to a distinct plurality of guide nucleic acids, allowing two or more non overlapping, partially overlapping, or completely overlapping multiplexing events.
  • the nucleic acid-guided nuclease has DNA cleavage activity or RNA cleavage activity. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
  • a nucleic acid-guided nuclease may form a component of an inducible system.
  • the inducible nature of the system would allow for spatiotemporal control of gene editing or gene expression using a form of energy.
  • the form of energy may include but is not limited to electromagnetic radiation, sound energy, chemical energy, light energy, temperature, and thermal energy.
  • inducible system include tetracycline inducible promoters (Tet-On or Tet-Off), small molecule two-hybrid transcription activations systems (FKBP, ABA, etc), or light inducible systems (Phytochrome, LOV domains, or cryptochorome).
  • the nucleic acid-guided nuclease may be a part of a Light Inducible Transcriptional Effector (LITE) to direct changes in transcriptional activity in a sequence-specific manner.
  • the components of a light inducible system may include a nucleic acid-guided nuclease, a light-responsive cytochrome heterodimer (e.g. from Arabidopsis thaliana ), and a transcriptional activation/repression domain.
  • LITE Light Inducible Transcriptional Effector
  • the invention provides for methods of modifying a target sequence in vitro, or in a prokaryotic or eukaryotic cell, which may be in vivo, ex vivo, or in vitro.
  • the method comprises sampling a cell or population of cells such as prokaryotic cells, or those from a human or non-human animal or plant (including micro-algae), and modifying the cell or cells. Culturing may occur at any stage in vitro or ex vivo.
  • the cell or cells may even be re-introduced into the host, such as a non-human animal or plant (including micro-algae). For re-introduced cells it is particularly preferred that the cells are stem cells.
  • the method comprises allowing a targetable nuclease complex to bind to the target sequence to effect cleavage of said target sequence, thereby modifying the target sequence, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease complexed with a guide nucleic acid wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within a target polynucleotide.
  • the invention provides a method of modifying expression of a target polynucleotide in in vitro or in a prokaryotic or eukaryotic cell.
  • the method comprises allowing an targetable nuclease complex to bind to a target sequence with the target polynucleotide such that said binding results in increased or decreased expression of said target polynucleotide; wherein the targetable nuclease complex comprises an nucleic acid-guided nuclease complexed with a guide nucleic acid, and wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within said target polynucleotide.
  • Similar considerations apply as above for methods of modifying a target polynucleotide. In fact, these sampling, culturing and re-introduction options apply across the aspects of the present invention.
  • kits containing any one or more of the elements disclosed in the above methods and compositions. Elements may provide individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube. In some embodiments, the kit includes instructions in one or more languages, for example in more than one language.
  • a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein.
  • Reagents may be provided in any suitable container.
  • a kit may provide one or more reaction or storage buffers.
  • Reagents may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use (e.g. in concentrate or lyophilized form).
  • a buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof.
  • the buffer is alkaline.
  • the invention provides methods for using one or more elements of a engineered targetable nuclease system.
  • a targetable nuclease complex of the disclosure provides an effective means for modifying a target sequence within a target polynucleotide.
  • a targetable nuclease complex of the disclosure has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target sequence in a multiplicity of cell types.
  • modifying e.g., deleting, inserting, translocating, inactivating, activating
  • a target sequence in a multiplicity of cell types e.g., a multiplicity of cell types.
  • a targetable nuclease complex of the invention has a broad spectrum of applications in, e.g., biochemical pathway optimization, genome-wide studies, genome engineering, gene therapy, drug screening, disease diagnosis, and prognosis.
  • the method comprises cleaving a target polynucleotide using a targetable nuclease complex that binds to a target sequence within a target polynucleotide and effect cleavage of said target polynucleotide.
  • the targetable nuclease complex of the invention when introduced into a cell, creates a break (e.g., a single or a double strand break) in the target sequence.
  • the method can be used to cleave a target gene in a cell, or to replace a wildtype sequence with a modified sequence.
  • the break created by the targetable nuclease complex can be repaired by a repair processes such as the error prone non-homologous end joining (NHEJ) pathway, the high fidelity homology-directed repair (HDR), or by recombination pathways.
  • a repair process such as the error prone non-homologous end joining (NHEJ) pathway, the high fidelity homology-directed repair (HDR), or by recombination pathways.
  • a editing template can be introduced into the genome sequence.
  • the HDR or recombination process is used to modify a target sequence.
  • an editing template comprising a sequence to be integrated flanked by an upstream sequence and a downstream sequence is introduced into a cell.
  • the upstream and downstream sequences share sequence similarity with either side of the site of integration in the chromosome, target vector, or target polynucleotide.
  • An editing template can be DNA or RNA, e.g., a DNA plasmid, a bacterial artificial chromosome (BAC), a yeast artificial chromosome (YAC), a viral vector, a linear piece of DNA, a PCR fragment, oligonucleotide, synthetic polynucleotide, a naked nucleic acid, or a nucleic acid complexed with a delivery vehicle such as a liposome or poloxamer.
  • BAC bacterial artificial chromosome
  • YAC yeast artificial chromosome
  • viral vector e.g., a DNA plasmid, a bacterial artificial chromosome (BAC), a yeast artificial chromosome (YAC), a viral vector, a linear piece of DNA, a PCR fragment, oligonucleotide, synthetic polynucleotide, a naked nucleic acid, or a nucleic acid complexed with a delivery vehicle such as a liposome or poloxa
  • An editing template polynucleotide can comprise a sequence to be integrated (e.g, a mutated gene).
  • a sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function. Sequence to be integrated may be a mutated or variant of an endogenous wildtype sequence. Alternatively, sequence to be integrated may be a wildtype version of an endogenous mutated sequence. Additionally or alternatively, sequenced to be integrated may be a variant or mutated form of an endogenous mutated or variant sequence.
  • Upstream and downstream sequences in an editing template polynucleotide can be selected to promote recombination between the target polynucleotide of interest and the editing template polynucleotide.
  • the upstream sequence can be a nucleic acid sequence having sequence similarity with the sequence upstream of the targeted site for integration.
  • the downstream sequence can be a nucleic acid sequence having similarity with the sequence downstream of the targeted site of integration.
  • the upstream and downstream sequences in an editing template can have 75%, 80%, 85%, 90%, 95%, or 100% sequence identity with the targeted polynucleotide.
  • the upstream and downstream sequences in the editing template polynucleotide have about 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the targeted polynucleotide. In some methods, the upstream and downstream sequences in the editing template polynucleotide have about 99% or 100% sequence identity with the targeted polynucleotide.
  • An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp.
  • the exemplary upstream or downstream sequence has about 15 bp to about 50 bp, about 30 bp to about 100 bp, about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000 bp.
  • the editing template polynucleotide may further comprise a marker.
  • a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers.
  • the exogenous polynucleotide template of the invention can be constructed using recombinant techniques (see, for example, Green and Sambrook et al., 2014 and Ausubel et al., 2017).
  • a double stranded break is introduced into the genome sequence by an engineered nuclease complex, the break can be repaired via homologous recombination using an editing template such that the template is integrated into the target polynucleotide.
  • the presence of a double-stranded break can increase the efficiency of integration of the editing template.
  • Some methods comprise increasing or decreasing expression of a target polynucleotide by using a targetable nuclease complex that binds to the target polynucleotide.
  • a target polynucleotide can be inactivated to effect the modification of the expression in a cell. For example, upon the binding of a targetable nuclease complex to a target sequence in a cell, the target polynucleotide is inactivated such that the sequence is not transcribed, the coded protein is not produced, or the sequence does not function as the wild-type sequence does. For example, a protein or microRNA coding sequence may be inactivated such that the protein is not produced.
  • a control sequence can be inactivated such that it no longer functions as a regulatory sequence.
  • regulatory sequence can refer to any nucleic acid sequence that effects the transcription, translation, or accessibility of a nucleic acid sequence. Examples of regulatory sequences include, a promoter, a transcription terminator, and an enhancer.
  • An inactivated target sequence may include a deletion mutation (i.e., deletion of one or more nucleotides), an insertion mutation (i.e., insertion of one or more nucleotides), or a nonsense mutation (i.e., substitution of a single nucleotide for another nucleotide such that a stop codon is introduced).
  • a deletion mutation i.e., deletion of one or more nucleotides
  • an insertion mutation i.e., insertion of one or more nucleotides
  • a nonsense mutation i.e., substitution of a single nucleotide for another nucleotide such that a stop codon is introduced.
  • An altered expression of one or more target polynucleotides associated with a signaling biochemical pathway can be determined by assaying for a difference in the mRNA levels of the corresponding genes between the test model cell and a control cell, when they are contacted with a candidate agent.
  • the differential expression of the sequences associated with a signaling biochemical pathway is determined by detecting a difference in the level of the encoded polypeptide or gene product.
  • nucleic acid contained in a sample is first extracted according to standard methods in the art.
  • mRNA can be isolated using various lytic enzymes or chemical solutions according to the procedures set forth in Green and Sambrook (2014), or extracted by nucleic-acid-binding resins following the accompanying instructions provided by the manufacturers.
  • the mRNA contained in the extracted nucleic acid sample is then detected by amplification procedures or conventional hybridization assays (e.g. Northern blot analysis) according to methods widely known in the art or based on the methods exemplified herein.
  • amplification means any method employing a primer and a polymerase capable of replicating a target sequence with reasonable fidelity.
  • Amplification may be carried out by natural or recombinant DNA polymerases such as TaqGoldTM, T7 DNA polymerase, Klenow fragment of E. coli DNA polymerase, and reverse transcriptase.
  • a preferred amplification method is PCR.
  • the isolated RNA can be subjected to a reverse transcription assay that is coupled with a quantitative polymerase chain reaction (RT-PCR) in order to quantify the expression level of a sequence associated with a signaling biochemical pathway.
  • RT-PCR quantitative polymerase chain reaction
  • Detection of the gene expression level can be conducted in real time in an amplification assay.
  • the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double-stranded DNA molecules is typically proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art.
  • DNA-binding dye suitable for this application include SYBR green, SYBR blue, DAPI, propidium iodine, Hoeste, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and the like.
  • probe-based quantitative amplification relies on the sequence-specific detection of a desired amplified product. It utilizes fluorescent, target-specific probes (e.g., TaqManTM probes) resulting in increased specificity and sensitivity. Methods for performing probe-based quantitative amplification are well established in the art and are taught in U.S. Pat. No. 5,210,015.
  • probes are allowed to form stable complexes with the sequences associated with a signaling biochemical pathway contained within the biological sample derived from the test subject in a hybridization reaction.
  • antisense used as the probe nucleic acid
  • the target polynucleotides provided in the sample are chosen to be complementary to sequences of the antisense nucleic acids.
  • the target polynucleotide is selected to be complementary to sequences of the sense nucleic acid.
  • Hybridization can be performed under conditions of various stringency, for instance as described herein. Suitable hybridization conditions for the practice of the present invention are such that the recognition interaction between the probe and sequences associated with a signaling biochemical pathway is both sufficiently specific and sufficiently stable. Conditions that increase the stringency of a hybridization reaction are widely known and published in the art. See, for example, (Green and Sambrook, et al., (2014); Nonradioactive in Situ Hybridization Application Manual, Boehringer Mannheim, second edition).
  • the hybridization assay can be formed using probes immobilized on any solid support, including but are not limited to nitrocellulose, glass, silicon, and a variety of gene arrays. A preferred hybridization assay is conducted on high-density gene chips as described in U.S. Pat. No. 5,445,934.
  • the nucleotide probes are conjugated to a detectable label.
  • Detectable labels suitable for use in the present invention include any composition detectable by photochemical, biochemical, spectroscopic, immunochemical, electrical, optical or chemical means.
  • a wide variety of appropriate detectable labels are known in the art, which include fluorescent or chemiluminescent labels, radioactive isotope labels, enzymatic or other ligands.
  • a fluorescent label or an enzyme tag such as digoxigenin, .beta.-galactosidase, urease, alkaline phosphatase or peroxidase, avidin/biotin complex.
  • Detection methods used to detect or quantify the hybridization intensity will typically depend upon the label selected above.
  • radiolabels may be detected using photographic film or a phosphoimager.
  • Fluorescent markers may be detected and quantified using a photodetector to detect emitted light.
  • Enzymatic labels are typically detected by providing the enzyme with a substrate and measuring the reaction product produced by the action of the enzyme on the substrate; and finally colorimetric labels are detected by simply visualizing the colored label.
  • An agent-induced change in expression of sequences associated with a signaling biochemical pathway can also be determined by examining the corresponding gene products. Determining the protein level typically involves a) contacting the protein contained in a biological sample with an agent that specifically bind to a protein associated with a signaling biochemical pathway; and (b) identifying any agent:protein complex so formed.
  • the agent that specifically binds a protein associated with a signaling biochemical pathway is an antibody, preferably a monoclonal antibody.
  • the reaction can be performed by contacting the agent with a sample of the proteins associated with a signaling biochemical pathway derived from the test samples under conditions that will allow a complex to form between the agent and the proteins associated with a signaling biochemical pathway.
  • the formation of the complex can be detected directly or indirectly according to standard procedures in the art.
  • the agents are supplied with a detectable label and unreacted agents may be removed from the complex; the amount of remaining label thereby indicating the amount of complex formed.
  • an indirect detection procedure may use an agent that contains a label introduced either chemically or enzymatically.
  • a desirable label generally does not interfere with binding or the stability of the resulting agent:polypeptide complex.
  • the label is typically designed to be accessible to an antibody for an effective binding and hence generating a detectable signal.
  • labels suitable for detecting protein levels are known in the art.
  • Non-limiting examples include radioisotopes, enzymes, colloidal metals, fluorescent compounds, bioluminescent compounds, and chemiluminescent compounds.
  • agent:polypeptide complexes formed during the binding reaction can be quantified by standard quantitative assays. As illustrated above, the formation of agent:polypeptide complex can be measured directly by the amount of label remained at the site of binding.
  • the protein associated with a signaling biochemical pathway is tested for its ability to compete with a labeled analog for binding sites on the specific agent. In this competitive assay, the amount of label captured is inversely proportional to the amount of protein sequences associated with a signaling biochemical pathway present in a test sample.
  • a number of techniques for protein analysis based on the general principles outlined above are available in the art. They include but are not limited to radioimmunoassays, ELISA (enzyme linked immunoradiometric assays), “sandwich” immunoassays, immunoradiometric assays, in situ immunoassays (using e.g., colloidal gold, enzyme or radioisotope labels), western blot analysis, immunoprecipitation assays, immunofluorescent assays, and SDS-PAGE.
  • radioimmunoassays ELISA (enzyme linked immunoradiometric assays), “sandwich” immunoassays, immunoradiometric assays, in situ immunoassays (using e.g., colloidal gold, enzyme or radioisotope labels), western blot analysis, immunoprecipitation assays, immunofluorescent assays, and SDS-PAGE.
  • Antibodies that specifically recognize or bind to proteins associated with a signaling biochemical pathway are preferable for conducting the aforementioned protein analyses.
  • antibodies that recognize a specific type of post-translational modifications e.g., signaling biochemical pathway inducible modifications
  • Post-translational modifications include but are not limited to glycosylation, lipidation, acetylation, and phosphorylation. These antibodies may be purchased from commercial vendors.
  • anti-phosphotyrosine antibodies that specifically recognize tyrosine-phosphorylated proteins are available from a number of vendors including Invitrogen and Perkin Elmer.
  • Anti-phosphotyrosine antibodies are particularly useful in detecting proteins that are differentially phosphorylated on their tyrosine residues in response to an ER stress.
  • proteins include but are not limited to eukaryotic translation initiation factor 2 alpha (eIF-2.alpha.).
  • eIF-2.alpha. eukaryotic translation initiation factor 2 alpha
  • these antibodies can be generated using conventional polyclonal or monoclonal antibody technologies by immunizing a host animal or an antibody-producing cell with a target protein that exhibits the desired post-translational modification.
  • tissue-specific, cell-specific or subcellular structure specific antibodies capable of binding to protein markers that are preferentially expressed in certain tissues, cell types, or subcellular structures.
  • An altered expression of a gene associated with a signaling biochemical pathway can also be determined by examining a change in activity of the gene product relative to a control cell.
  • the assay for an agent-induced change in the activity of a protein associated with a signaling biochemical pathway will dependent on the biological activity and/or the signal transduction pathway that is under investigation.
  • a change in its ability to phosphorylate the downstream substrate(s) can be determined by a variety of assays known in the art. Representative assays include but are not limited to immunoblotting and immunoprecipitation with antibodies such as anti-phosphotyrosine antibodies that recognize phosphorylated proteins.
  • kinase activity can be detected by high throughput chemiluminescent assays such as AlphaScreenTM (available from Perkin Elmer) and eTagTM assay (Chan-Hui, et al. (2003) Clinical Immunology 111: 162-174).
  • high throughput chemiluminescent assays such as AlphaScreenTM (available from Perkin Elmer) and eTagTM assay (Chan-Hui, et al. (2003) Clinical Immunology 111: 162-174).
  • pH sensitive molecules such as fluorescent pH dyes can be used as the reporter molecules.
  • the protein associated with a signaling biochemical pathway is an ion channel
  • fluctuations in membrane potential and/or intracellular ion concentration can be monitored.
  • Representative instruments include FLIPRTM (Molecular Devices, Inc.) and VIPR (Aurora Biosciences). These instruments are capable of detecting reactions in over 1000 sample wells of a microplate simultaneously, and providing real-time measurement and functional data within a second or even a minisecond.
  • a suitable vector can be introduced to a cell, tissue, organism, or an embryo via one or more methods known in the art, including without limitation, microinjection, electroporation, sonoporation, biolistics, calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendrimer transfection, heat shock transfection, nucleofection transfection, magnetofection, lipofection, impalefection, optical transfection, proprietary agent-enhanced uptake of nucleic acids, and delivery via liposomes, immunoliposomes, virosomes, or artificial virions.
  • the vector is introduced into an embryo by microinjection.
  • the vector or vectors may be microinjected into the nucleus or the cytoplasm of the embryo.
  • the vector or vectors may be introduced into a cell by nucleofection.
  • a target polynucleotide of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to the host cell.
  • the target polynucleotide can be a polynucleotide residing in the nucleus of the eukaryotic cell, the genome of a prokaryotic cell, or an extrachromosomal vector of a host cell.
  • the target polynucleotide can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA).
  • target polynucleotides include a sequence associated with a signaling biochemical pathway, e.g., a signaling biochemical pathway-associated gene or polynucleotide.
  • target polynucleotides include a disease associated gene or polynucleotide.
  • a “disease-associated” gene or polynucleotide refers to any gene or polynucleotide which is yielding transcription or translation products at an abnormal level or in an abnormal form in cells derived from a disease-affected tissues compared with tissues or cells of a non-disease control.
  • a disease-associated gene also refers to a gene possessing mutation(s) or genetic variation that is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease.
  • the transcribed or translated products may be known or unknown, and may be at a normal or abnormal level.
  • Embodiments of the invention also relate to methods and compositions related to knocking out genes, editing genes, altering genes, amplifying genes, and repairing particular mutations.
  • Altering genes may also mean the epigenetic manipulation of a target sequence. This may be the chromatin state of a target sequence, such as by modification of the methylation state of the target sequence (i.e. addition or removal of methylation or methylation patterns or CpG islands), histone modification, increasing or reducing accessibility to the target sequence, or by promoting 3D folding.
  • a targetable nuclease complex can be assessed by any suitable assay.
  • the components of a targetable nuclease system sufficient to form a targetable nuclease complex can be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the engineered nuclease system, followed by an assessment of preferential cleavage within the target sequence.
  • cleavage of a target sequence may be evaluated in a test tube by providing the target sequence and components of a targetable nuclease complex.
  • Other assays are possible, and will occur to those skilled in the art.
  • a guide sequence can be selected to target any target sequence.
  • the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome.
  • compositions and methods for editing a target polynucleotide sequence include polynucleotides containing one or more components of targetable nuclease system.
  • Polynucleotide sequences for use in these methods can be referred to as editing cassettes.
  • An editing cassette can comprise one or more primer sites.
  • Primer sites can be used to amplify an editing cassette by using oligonucleotide primers comprising reverse complementary sequences that can hybridize to the one or more primer sites.
  • An editing cassette can comprise two or more primer times. Sometimes, an editing cassette comprises a primer site on each end of the editing cassette, said primer sites flanking one or more of the other components of the editing cassette. Primer sites can be approximately 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or more nucleotides in length.
  • An editing cassette can comprise an editing template as disclosed herein.
  • An editing cassette can comprise an editing sequence.
  • An editing sequence can be homologous to a target sequence.
  • An editing sequence can comprise at least one mutation relative to a target sequence.
  • An editing sequence often comprises homology region (or homology arms) flanking at least one mutation relative to a target sequence, such that the flanking homology regions facilitate homologous recombination of the editing sequence into a target sequence.
  • An editing sequence can comprise an editing template as disclosed herein.
  • the editing sequence can comprise at least one mutation relative to a target sequence including one or more PAM mutations that mutate or delete a PAM site.
  • An editing sequence can comprise one or more mutations in a codon or non-coding sequence relative to a non-editing target site.
  • a PAM mutation can be a silent mutation.
  • a silent mutation can be a change to at least one nucleotide of a codon relative to the original codon that does not change the amino acid encoded by the original codon.
  • a silent mutation can be a change to a nucleotide within a non-coding region, such as an intron, 5′ untranslated region, 3′ untranslated region, or other non-coding region.
  • a PAM mutation can be a non-silent mutation.
  • Non-silent mutations can include a missense mutation.
  • a missense mutation can be when a change to at least one nucleotide of a codon relative to the original codon that changes the amino acid encoded by the original codon. Missense mutations can occur within an exon, open reading frame, or other coding region.
  • An editing sequence can comprise at least one mutation relative to a target sequence.
  • a mutation can be a silent mutation or non-silent mutation, such as a missense mutation.
  • a mutation can include an insertion of one or more nucleotides or base pairs.
  • a mutation can include a deletion of one or more nucleotides or base pairs.
  • a mutation can include a substitution of one or more nucleotides or base pairs for a different one or more nucleotides or base pairs. Inserted or substituted sequences can include exogenous or heterologous sequences.
  • An editing cassette can comprise a polynucleotide encoding a guide nucleic acid sequence.
  • the guide nucleic acid sequence is optionally operably linked to a promoter.
  • a guide nucleic acid sequence can comprise a scaffold sequence and a guide sequence as described herein.
  • An editing cassette can comprise a barcode.
  • a barcode can be a unique DNA sequence that corresponds to the editing sequence such that the barcode can identify the one or more mutations of the corresponding editing sequence.
  • the barcode is 15 nucleotides.
  • the barcode can comprise less than 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 88, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more than 200 nucleotides.
  • a barcode can be a non-naturally occurring sequence.
  • An editing cassette comprising a barcode can be a non-naturally occurring sequence.
  • An editing cassette can comprise one or more of an editing sequence and a polynucleotide encoding a guide nucleic acid optionally operably linked to a promoter, wherein the editing cassette and guide nucleic acid sequence are flanked by primer sites.
  • An editing cassette can further comprise a barcode.
  • Each editing cassette can be designed to edit a site in a target sequence
  • Sites to be targeted can be coding regions, non-coding regions, functionally neutral sites, or they can be a screenable or selectable marker gene.
  • Homology regions within the editing sequence flank the one or more mutations of the editing cassette and can be inserted into the target sequence by recombination.
  • Recombination can comprise DNA cleavage, such as by an nucleic acid-guided nuclease, and repair via homologous recombination.
  • Editing cassettes can be generated by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
  • Trackable sequences such as barcodes or recorder sequences, can be designed in silico via standard code with a degenerate mutation at the target codon.
  • the degenerate mutation can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleic acid residues.
  • the degenerate mutations can comprise 15 nucleic acid residues (N15).
  • Homology arms can be added to an editing sequence to allow incorporation of the editing sequence into the desired location via homologous recombination or homology-driven repair.
  • Homology arms can be added by synthesis, in vitro assembly, PCR, or other known methods in the art. For example, chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
  • a homology arm can be added to both ends of a barcode, recorder sequence, and/or editing sequence, thereby flanking the sequence with two distinct homology arms, for example, a 5′ homology arm and a 3′ homology arm.
  • a homology arm can comprise sequence homologous to a target sequence.
  • a homology arm can comprise sequence homologous to sequence adjacent to a target sequence.
  • a homology arm can comprise sequence homologous to sequence upstream or downstream of a target sequence.
  • a homology arm can comprise sequence homologous to sequence within the same gene or open reading frame as a target sequence.
  • a homology arm can comprise sequence homologous to sequence upstream or downstream of a gene or open reading frame the target sequence is within.
  • a homology arm can comprise sequence homologous to a 5′ UTR or 3′ UTR of a gene or open reading frame within which is a target sequence.
  • a homology arm can comprise sequence homologous to a different gene, open reading frame, promoter, terminator, or nucleic acid sequence than that which the target sequence is within.
  • the same 5′ and 3′ homology arms can be added to a plurality of distinct editing sequences, thereby generating a library of unique editing sequences that each have the same targeted insertion site.
  • the same 5′ and 3′ homology arms can be added to a plurality of distinct editing templates, thereby generating a library of unique editing templates that each have the same targeted insertion site.
  • different or a variety of 5′ or 3′ homology arms can be added to a plurality of editing sequences or editing templates.
  • a barcode library or recorder sequence library comprising flanking homology arms can be cloned into a vector backbone.
  • the barcode comprising flanking homology arms are cloned into an editing cassette.
  • Cloning can occur by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
  • An editing sequence library comprising flanking homology arms can be cloned into a vector backbone.
  • the editing sequence and homology arms are cloned into an editing cassette.
  • Editing cassettes can, in some cases, further comprise a nucleic acid sequence encoding a guide nucleic acid or gRNA engineered to target the desired site of editing sequence insertion, e.g. the target sequence.
  • Editing cassettes can, in some cases, further comprise a barcode or recorder sequence.
  • Cloning can occur by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
  • a guide nucleic acid or sequence encoding the same can be assembled or inserted into a vector backbone first, followed by insertion of an editing sequence and/or cassette.
  • an editing sequence and/or cassette can be inserted or assembled into a vector backbone first, followed by insertion of a guide nucleic acid or sequence encoding the same.
  • guide nucleic acid or sequence encoding the same and an editing sequence and/or cassette are simultaneous inserted or assembled into a vector.
  • a recorder sequence or barcode can be inserted before or after any of these steps.
  • the vector can be linear or circular and can be generated by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
  • a nucleic acid molecule can be synthesized which comprises one or more elements disclosed herein.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette.
  • a nucleic acid molecule can be synthesized that comprises a guide nucleic acid.
  • a nucleic acid molecule can be synthesized that comprises a recorder cassette.
  • a nucleic acid molecule can be synthesized that comprises a barcode.
  • a nucleic acid molecule can be synthesized that comprises a homology arm.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette and a guide nucleic acid.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette and a barcode.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette, a guide nucleic acid, and a recorder cassette.
  • a nucleic acid molecule can be synthesized that comprises an editing cassette, a recorder cassette, and two guide nucleic acids.
  • a nucleic acid molecule can be synthesized that comprises a recorder cassette and a guide nucleic acid.
  • the guide nucleic acid can optionally be operably linked to a promoter.
  • the nucleic acid molecule can further include one or more barcodes.
  • Synthesis can occur by any nucleic acid synthesis method known in the art. Synthesis can occur by enzymatic nucleic acid synthesis. Synthesis can occur by chemical synthesis. Synthesis can occur by array-based synthesis. Synthesis can occur by solid-phase synthesis or phosphoramidite methods. Synthesis can occur by column or multi-well methods. Synthesized nucleic acid molecules can be non-naturally occurring nucleic acid molecules.
  • Software and automation methods can be used for multiplex synthesis and generation. For example, software and automation can be used to create 10, 10 2 , 10 3 , 10 4 , 10 5 , 10 6 , or more synthesized polynucleotides, cassettes, or plasmids.
  • An automation method can generate desired sequences and libraries in rapid fashion that can be processed through a workflow with minimal steps to produce precisely defined libraries, such as gene-wide or genome-wide editing libraries.
  • Polynucleotides or libraries can be generated which comprise two or more nucleic acid molecules or plasmids comprising any combination disclosed herein of recorder sequence, editing sequence, guide nucleic acid, and optional barcode, including combinations of one or more of any of the previously mentioned elements.
  • such a library can comprise at least 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , 10 10 , or more nucleic acid molecules or plasmids of the present disclosure. It should be understood that such a library can include any number of nucleic acid molecules or plasmids, even if the specific number is not explicit listed above.
  • Trackable plasmid libraries or nucleic acid molecule libraries can be sequenced in order to determine the recorder sequence and editing sequence pair that is comprised on each trackable plasmid.
  • a known recorder sequence is paired with a known editing sequence during the library generation process.
  • Other methods of determining the association between a recorder sequence and editing sequence comprised on a common nucleic acid molecule or plasmid are envisioned such that the editing sequence can be identified by identification or sequencing of the recorder sequence.
  • the libraries can be comprised on plasmids, Bacterial artificial chromosomes (BACs), Yeast artificial chromosomes (YACs), synthetic chromosomes, or viral or phage genomes. These methods and compositions can be used to generate portable barcoded libraries in host organisms, such as E. coli . Library generation in such organisms can offer the advantage of established techniques for performing homologous recombination. Barcoded plasmid libraries can be deep-sequenced at one site to track mutational diversity targeted across the remaining portions of the plasmid allowing dramatic improvements in the depth of library coverage.
  • nucleic acid molecule disclosed herein can be an isolated nucleic acid.
  • isolated nucleic acids may be made by any method known in the art, for example using standard recombinant methods, assembly methods, synthesis techniques, or combinations thereof.
  • the nucleic acids may be cloned, amplified, assembled, or otherwise constructed.
  • Isolated nucleic acids may be obtained from cellular, bacterial, or other sources using any number of cloning methodologies known in the art.
  • oligonucleotide probes which selectively hybridize, under stringent conditions, to other oligonucleotides or to the nucleic acids of an organism or cell can be used to isolate or identify an isolated nucleic acid.
  • Cellular genomic DNA, RNA, or cDNA may be screened for the presence of an identified genetic element of interest using a probe based upon one or more sequences. Various degrees of stringency of hybridization may be employed in the assay.
  • High stringency conditions for nucleic acid hybridization are well known in the art.
  • conditions may comprise low salt and/or high temperature conditions, such as provided by about 0.02 M to about 0.15 M NaCl at temperatures of about 50° C. to about 70° C. It is understood that the temperature and ionic strength of a desired stringency are determined in part by the length of the particular nucleic acid(s), the length and nucleotide content of the target sequence(s), the charge composition of the nucleic acid(s), and by the presence or concentration of formamide, tetramethylammonium chloride or other solvent(s) in a hybridization mixture. Nucleic acids may be completely complementary to a target sequence or may exhibit one or more mismatches.
  • Nucleic acids of interest may also be amplified using a variety of known amplification techniques. For instance, polymerase chain reaction (PCR) technology may be used to amplify target sequences directly from DNA, RNA, or cDNA. PCR and other in vitro amplification methods may also be useful, for example, to clone nucleic acid sequences, to make nucleic acids to use as probes for detecting the presence of a target nucleic acid in samples, for nucleic acid sequencing, or for other purposes.
  • PCR polymerase chain reaction
  • Isolated nucleic acids may be prepared by direct chemical synthesis by methods such as the phosphotriester method, or using an automated synthesizer. Chemical synthesis generally produces a single stranded oligonucleotide. This may be converted into double stranded DNA by hybridization with a complementary sequence or by polymerization with a DNA polymerase using the single strand as a template.
  • two editing cassettes can be used together to track a genetic engineering step.
  • one editing cassette can comprise an editing template and an encoded guide nucleic acid
  • a second editing cassette referred to as a recorder cassette
  • an editing template comprising a recorder sequence and an encoded nucleic acid which has a distinct guide sequence compared to that of the first editing cassette.
  • the editing sequence and the recorder sequence can be inserted into separate target sequences and determined by their corresponding guide nucleic acids.
  • a recorder sequence can comprise a barcode, trackable or traceable sequence, and/or a regulatory element operable with a screenable or selectable marker.
  • the recorder cassette can be covalently coupled to at least one editing cassette in a plasmid (e.g., FIG. 17A , green cassette) to generate plasmid libraries that have a unique recorder and editing cassette combination.
  • This library can be sequenced to generate the recorder/edit mapping and used to track editing libraries across large segments of the target DNA (e.g., FIG. 17C ).
  • Recorder and editing sequences can be comprised on the same cassette, in which case they are both incorporated into the target nucleic acid sequence, such as a genome or plasmid, by the same recombination event.
  • the recorder and editing sequences can be comprised on separate cassettes within the same plasmid, in which case the recorder and editing sequences are incorporated into the target nucleic acid sequence by separate recombination events, either simultaneously or sequentially.
  • Methods are provided herein for combining multiplex oligonucleotide synthesis with recombineering, to create libraries of specifically designed and trackable mutations. Screens and/or selections followed by high-throughput sequencing and/or barcode microarray methods can allow for rapid mapping of mutations leading to a phenotype of interest.
  • Methods and compositions disclosed herein can be used to simultaneously engineer and track engineering events in a target nucleic acid sequence.
  • Such plasmids can be generated using in vitro assembly or cloning techniques.
  • the plasmids can be generated using chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, other in vitro oligo assembly techniques, traditional ligation-based cloning, or any combination thereof.
  • Such plasmids can comprise at least one recording sequence, such as a barcode, and at least one editing sequence. In most cases, the recording sequence is used to record and track engineering events. Each editing sequence can be used to incorporate a desired edit into a target nucleic acid sequence. The desired edit can include insertion, deletion, substitution, or alteration of the target nucleic acid sequence.
  • the one or more recording sequence and editing sequences are comprised on a single cassette comprised within the plasmid such that they are incorporated into the target nucleic acid sequence by the same engineering event.
  • the recording and editing sequences are comprised on separate cassettes within the plasmid such that they are each incorporated into the target nucleic acid by distinct engineering events.
  • the plasmid comprises two or more editing sequences. For example, one editing sequence can be used to alter or silence a PAM sequence while a second editing sequence can be used to incorporate a mutation into a distinct sequence.
  • Recorder sequences can be inserted into a site separated from the editing sequence insertion site.
  • the inserted recorder sequence can be separated from the editing sequence by 1 bp to 1 Mbp.
  • the separation distance can be about 1 bp, 10 bp, 50 bp, 100 bp, 500 bp, 1 kp, 2 kb, 5 kb, 10 kb, or greater.
  • the separation distance can be any discrete integer between 1 bp and 10 Mbp. In some examples, the maximum distance of separation depends on the size of the target nucleic acid or genome.
  • Recorder sequences can be inserted adjacent to editing sequences, or within proximity to the editing sequence.
  • the recorder sequence can be inserted outside of the open reading frame within which the editing sequence is inserted.
  • Recorder sequence can be inserted into an untranslated region adjacent to an open reading frame within which an editing sequence has been inserted.
  • the recorder sequence can be inserted into a functionally neutral or non-functional site.
  • the recorder sequence can be inserted into a screenable or selectable marker gene.
  • the target nucleic acid sequence is comprised within a genome, artificial chromosome, synthetic chromosome, or episomal plasmid.
  • the target nucleic acid sequence can be in vitro or in vivo.
  • the plasmid can be introduced into the host organisms by transformation, transfection, conjugation, biolistics, nanoparticles, cell-permeable technologies, or other known methods for DNA delivery, or any combination thereof.
  • the host organism can be a eukaryote, prokaryote, bacterium, archaea, yeast, or other fungi.
  • the engineering event can comprise recombineering, non-homologous end joining, homologous recombination, or homology-driven repair.
  • the engineering event is performed in vitro or in vivo.
  • the methods described herein can be carried out in any type of cell in which a targetable nuclease system can function (e.g., target and cleave DNA), including prokaryotic and eukaryotic cells.
  • the cell is a bacterial cell, such as Escherichia spp. (e.g., E. coli ).
  • the cell is a fungal cell, such as a yeast cell, e.g., Saccharomyces spp.
  • the cell is an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell.
  • the cell is a recombinant organism.
  • the cell can comprise a non-native targetable nuclease system.
  • the cell can comprise recombination system machinery.
  • recombination systems can include lambda red recombination system, Cre/Lox, attB/attP, or other integrase systems.
  • the plasmid can have the complementary components or machinery required for the selected recombination system to work correctly and efficiently.
  • Method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing cassette; (c) obtaining viable cells; and (d) sequencing the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • a method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette comprising a PAM mutation as disclosed herein and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing cassette, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; (c) obtaining viable cells; and (d) sequencing the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • Method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, at least one recorder cassette, and at least two guide nucleic acids into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing and recorder cassettes; (c) obtaining viable cells; and (d) sequencing the recorder sequence of the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
  • a method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, a recorder cassette, and at least two guide nucleic acids into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing and recorder cassettes, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; (c) obtaining viable cells; and (d) sequencing the recorder sequence of the target DNA in at least one cell of the second population of cells
  • transformation efficiency is determined by using a non-targeting control guide nucleic acid, which allows for validation of the recombineering procedure and CFU/ng calculations.
  • absolute efficient is obtained by counting the total number of colonies on each transformation plate, for example, by counting both red and white colonies from a galK control.
  • relative efficiency is calculated by the total number of successful transformants (for example, white colonies) out of all colonies from a control (for example, galK control).
  • the methods of the disclosure can provide, for example, greater than 1000 ⁇ improvements in the efficiency, scale, cost of generating a combinatorial library, and/or precision of such library generation.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater improvements in the efficiency of generating genomic or combinatorial libraries.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater improvements in the scale of generating genomic or combinatorial libraries.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater decrease in the cost of generating genomic or combinatorial libraries.
  • the methods of the disclosure can provide, for example, greater than: 10 ⁇ , 50 ⁇ , 100 ⁇ , 200 ⁇ , 300 ⁇ , 400 ⁇ , 500 ⁇ , 600 ⁇ , 700 ⁇ , 800 ⁇ , 900 ⁇ , 1000 ⁇ , 1100 ⁇ , 1200 ⁇ , 1300 ⁇ , 1400 ⁇ , 1500 ⁇ , 1600 ⁇ , 1700 ⁇ , 1800 ⁇ , 1900 ⁇ , 2000 ⁇ , or greater improvements in the precision of genomic or combinatorial library generation.
  • Disclosed herein are methods and compositions for iterative rounds of engineering. Disclosed herein are recursive engineering strategies that allow implementation of CREATE recording at the single cell level through several serial engineering cycles (e.g., FIG. 18 and FIG. 19 ). These disclosed methods and compositions can enable search-based technologies that can effectively construct and explore complex genotypic space. The terms recursive and iterative can be used interchangeably.
  • Combinatorial engineering methods can comprise multiple rounds of engineering.
  • Methods disclosed herein can comprise 2 or more rounds of engineering.
  • a method can comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, or more than 30 rounds of engineering.
  • a new recorder sequence such as a barcode
  • a new recorder sequence is incorporated at the same locus in nearby sites (e.g., FIG. 18 , green bars or FIG. 19 , black bars) such that following multiple engineering cycles to construct combinatorial diversity throughout the genome (e.g., FIG. 18 , green bars or FIG. 19 , grey bars)
  • a simple PCR of the recording locus can be used to reconstruct each combinatorial genotype or to confirm that the engineered edit from each round has been incorporated into the target site.
  • Selection can occur by a PAM mutation incorporated by an editing cassette.
  • Selection can occur by a PAM mutation incorporated by a recorder cassette.
  • Selection can occur using a screenable, selectable, or counter-selectable marker.
  • Selection can occur by targeting a site for editing or recording that was incorporated by a prior round of engineering, thereby selecting for variants that successfully incorporated edits and recorder sequences from both rounds or all prior rounds of engineering.
  • Quantitation of these genotypes can be used for understanding combinatorial mutational effects on large populations and investigation of important biological phenomena such as epistasis.
  • Serial editing and combinatorial tracking can be implemented using recursive vector systems as disclosed herein.
  • These recursive vector systems can be used to move rapidly through the transformation procedure.
  • these systems consist of two or more plasmids containing orthogonal replication origins, antibiotic markers, and an encoded guide nucleic acids.
  • the encoded guide nucleic acid in each vector can be designed to target one of the other resistance markers for destruction by nucleic acid-guided nuclease-mediated cleavage.
  • These systems can be used, in some examples, to perform transformations in which the antibiotic selection pressure is switched to remove the previous plasmid and drive enrichment of the next round of engineered genomes.
  • Two or more passages through the transformation loop can be performed, or in other words, multiple rounds of engineering can be performed.
  • Introducing the requisite recording cassettes and editing cassettes into recursive vectors as disclosed herein can be used for simultaneous genome editing and plasmid curing in each transformation step with high efficiencies.
  • the recursive vector system disclosed herein comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique plasmids.
  • the recursive vector system can use a particular plasmid more than once as long as a distinct plasmid is used in the previous round and in the subsequent round.
  • Recursive methods and compositions disclosed herein can be used to restore function to a selectable or screenable element in a targeted genome or plasmid.
  • the selectable or screenable element can include an antibiotic resistance gene, a fluorescent gene, a unique DNA sequence or watermark, or other known reporter, screenable, or selectable gene.
  • each successive round of engineering can incorporate a fragment of the selectable or screenable element, such that at the end of the engineering rounds, the entire selectable or screenable element has been incorporated into the target genome or plasmid.
  • only those genome or plasmids which have successfully incorporated all of the fragments, and therefore all of the desired corresponding mutations, can be selected or screened for. In this way, the selected or screened cells will be enriched for those that have incorporated the edits from each and every iterative round of engineering.
  • Recursive methods can be used to switch a selectable or screenable marker between an on and an off position, or between an off and an on position, with each successive round of engineering.
  • Using such a method allows conservation of available selectable or screenable markers by requiring, for example, the use of only one screenable or selectable marker.
  • short regulatory sequence or start codon or non-start codons can be used to turn the screenable or selectable marker on and off. Such short sequences can easily fit within a synthesized cassette or polynucleotide.
  • each round of engineering is used to incorporate an edit unique from that of previous rounds.
  • Each round of engineering can incorporate a unique recording sequence.
  • Each round of engineering can result in removal or curing of the plasmid used in the previous round of engineering.
  • successful incorporation of the recording sequence of each round of engineering results in a complete and functional screenable or selectable marker or unique sequence combination.
  • Unique recorder cassettes comprising recording sequences such as barcodes or screenable or selectable markers can be inserted with each round of engineering, thereby generating a recorder sequence that is indicative of the combination of edits or engineering steps performed.
  • Successive recording sequences can be inserted adjacent to one another.
  • Successive recording sequences can be inserted within proximity to one another.
  • Successive sequences can be inserted at a distance from one another.
  • Successive sequences can be inserted at a distance from one another.
  • successive recorder sequences can be inserted and separated by 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or greater than 100 bp.
  • successive recorder sequences are separated by about 10, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, or greater than 1500 bp.
  • Successive recorder sequences can be separated by any desired number of base pairs and can be dependent and limited on the number of successive recorder sequences to be inserted, the size of the target nucleic acid or target genomes, and/or the design of the desired final recorder sequence. For example, if the compiled recorder sequence is a functional screenable or selectable marker, than the successive recording sequences can be inserted within proximity and within the same reading frame from one another. If the compiled recorder sequence is a unique set of barcodes to be identified by sequencing and have no coding sequence element, then the successive recorder sequences can be inserted with any desired number of base pairs separating them. In these cases, the separation distance can be dependent on the sequencing technology to be used and the read length limit.
  • wild type is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
  • variable should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature.
  • orthologue also referred to as “ortholog” herein
  • homologue also referred to as “homolog” herein
  • a “homologue” of a protein as used herein is a protein of the same species which performs the same or a similar function as the protein it is a homologue of. Homologous proteins may but need not be structurally related, or are only partially structurally related.
  • An “orthologue” of a protein as used herein is a protein of a different species which performs the same or a similar function as the protein it is an orthologue of Orthologous proteins may but need not be structurally related, or are only partially structurally related.
  • Homologs and orthologs may be identified by homology modelling (see, e.g., Greer, Science vol. 228 (1985) 1055, and Blundell et al. Eur J Biochem vol 172 (1988), 513) or “structural BLAST” (Dey F, Cliff Zhang Q, Petrey D, Honig B. Toward a “structural BLAST”: using structural relationships to infer function. Protein Sci. 2013 April; 22(4):359-66. doi: 10.1002/pro.2225.).
  • polynucleotide refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
  • Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown.
  • polynucleotides coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
  • loci locus defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched poly
  • a polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer.
  • the sequence of nucleotides may be interrupted by non-nucleotide components.
  • a polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
  • “Complementarity” refers to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson-Crick base pairing or other non-traditional types.
  • a percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary).
  • Perfectly complementary means that all the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence.
  • “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
  • complementary or partially complementary sequences are also envisaged. These are preferably capable of hybridising to the reference sequence under highly stringent conditions.
  • relatively low-stringency hybridization conditions are selected: about 20 to 25 degrees Celsius. lower than the thermal melting point (Tm).
  • Tm is the temperature at which 50% of specific target sequence hybridizes to a perfectly complementary probe in solution at a defined ionic strength and pH.
  • highly stringent washing conditions are selected to be about 5 to 15 degrees Celsius lower than the Tm.
  • moderately-stringent washing conditions are selected to be about 15 to 30 degrees Celsius lower than the Tm. Highly permissive (very low stringency) washing conditions may be as low as 50 degrees Celsius below the Tm, allowing a high level of mis-matching between hybridized sequences.
  • Those skilled in the art will recognize that other physical and chemical parameters in the hybridization and wash stages can also be altered to affect the outcome of a detectable hybridization signal from a specific level of homology between target and probe sequences.
  • Hybridization refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues.
  • the hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner.
  • the complex may comprise two strands forming a duplex structure, three or more strands forming a multi stranded complex, a single self-hybridizing strand, or any combination of these.
  • a hybridization reaction may constitute a step in a more extensive process, such as the initiation of PCR, or the cleavage of a polynucleotide by an enzyme.
  • a sequence capable of hybridizing with a given sequence is referred to as the “complement” of the given sequence.
  • genomic locus or “locus” (plural loci) is the specific location of a gene or DNA sequence on a chromosome.
  • a “gene” refers to stretches of DNA or RNA that encode a polypeptide or an RNA chain that has functional role to play in an organism and hence is the molecular unit of heredity in living organisms.
  • genes include regions which regulate the production of the gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences.
  • a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions.
  • expression also refers to the process by which a polynucleotide is transcribed from a DNA template (such as into and mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins.
  • Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.
  • polypeptide refers to polymers of amino acids of any length.
  • the polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non amino acids.
  • the terms also encompass an amino acid polymer that has been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component.
  • amino acid includes natural and/or unnatural or synthetic amino acids, including glycine and both the D or L optical isomers, and amino acid analogs and peptidomimetics.
  • domain refers to a part of a protein sequence that may exist and function independently of the rest of the protein chain.
  • sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences. Sequence homologies may be generated by any of a number of computer programs known in the art, for example BLAST or FASTA, etc. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin. U.S.A; Devereux et al., 1984, Nucleic Acids Research 12:387).
  • Examples of other software than may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 ibid—Chapter 18), FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60). However it is preferred to use the GCG Bestfit program.
  • Percent homology may be calculated over contiguous sequences, i.e., one sequence is aligned with the other sequence and each amino acid or nucleotide in one sequence is directly compared with the corresponding amino acid or nucleotide in the other sequence, one residue at a time. This is called an “ungapped” alignment. Typically, such ungapped alignments are performed only over a relatively short number of residues.
  • gaps penalties assign “gap penalties” to each gap that occurs in the alignment so that, for the same number of identical amino acids, a sequence alignment with as few gaps as possible—reflecting higher relatedness between the two compared sequences—may achieve a higher score than one with many gaps.
  • “Affinity gap costs” are typically used that charge a relatively high cost for the existence of a gap and a smaller penalty for each subsequent residue in the gap. This is the most commonly used gap scoring system. High gap penalties may, of course, produce optimized alignments with fewer gaps. Most alignment programs allow the gap penalties to be modified. However, it is preferred to use the default values when using such software for sequence comparisons. For example, when using the GCG Wisconsin Bestfit package the default gap penalty for amino acid sequences is ⁇ 12 for a gap and ⁇ 4 for each extension.
  • Calculation of maximum % homology therefore first requires the production of an optimal alignment, taking into consideration gap penalties.
  • a suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (Devereux et al., 1984 Nuc. Acids Research 12 p 387).
  • Examples of other software that may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 Short Protocols in Molecular Biology, 4th Ed.—Chapter 18), FASTA (Altschul et al., 1990 J. Mol. Biol. 403-410) and the GENEWORKS suite of comparison tools.
  • BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999, Short Protocols in Molecular Biology, pages 7-58 to 7-60). However, for some applications, it is preferred to use the GCG Bestfit program.
  • a new tool, called BLAST 2 Sequences is also available for comparing protein and nucleotide sequences (see FEMS Microbiol Lett. 1999 174(2): 247-50; FEMS Microbiol Lett. 1999 177(1): 187-8 and the website of the National Center for Biotechnology information at the website of the National Institutes for Health).
  • percentage homologies may be calculated using the multiple alignment feature in DNASISTM (Hitachi Software), based on an algorithm, analogous to CLUSTAL (Higgins D G & Sharp P M (1988), Gene 73(1), 237-244).
  • DNASISTM Hagachi Software
  • CLUSTAL Higgins D G & Sharp P M (1988), Gene 73(1), 237-244
  • Sequences may also have deletions, insertions or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent substance.
  • Deliberate amino acid substitutions may be made on the basis of similarity in amino acid properties (such as polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues) and it is therefore useful to group amino acids together in functional groups.
  • Amino acids may be grouped together based on the properties of their side chains alone. However, it is more useful to include mutation data as well. The sets of amino acids thus derived are likely to be conserved for structural reasons. These sets may be described in the form of a Venn diagram (Livingstone C. D. and Barton G. J.
  • Embodiments of the invention include sequences (both polynucleotide or polypeptide) which may comprise homologous substitution (substitution and replacement are both used herein to mean the interchange of an existing amino acid residue or nucleotide, with an alternative residue or nucleotide) that may occur i.e., like-for-like substitution in the case of amino acids such as basic for basic, acidic for acidic, polar for polar, etc.
  • Non-homologous substitution may also occur i.e., from one class of residue to another or alternatively involving the inclusion of unnatural amino acids such as ornithine (hereinafter referred to as Z), diaminobutyric acid ornithine (hereinafter referred to as B), norleucine ornithine (hereinafter referred to as O), pyridylalanine, thienylalanine, naphthylalanine and phenylglycine.
  • Z ornithine
  • B diaminobutyric acid ornithine
  • O norleucine ornithine
  • Variant amino acid sequences may include suitable spacer groups that may be inserted between any two amino acid residues of the sequence including alkyl groups such as methyl, ethyl or propyl groups in addition to amino acid spacers such as glycine or .beta.-alanine residues.
  • alkyl groups such as methyl, ethyl or propyl groups
  • amino acid spacers such as glycine or .beta.-alanine residues.
  • a further form of variation which involves the presence of one or more amino acid residues in peptoid form, may be well understood by those skilled in the art.
  • the peptoid form is used to refer to variant amino acid residues wherein the .alpha.-carbon substituent group is on the residue's nitrogen atom rather than the .alpha.-carbon.
  • Sequence alignments were built using PSI-BLAST to search for MAD nuclease homologs in the NCBI non-redundant databases. Multiple sequence alignments were further refined using the MUSCLE alignment algorithm with default settings as implemented in Geneious 10. The percent identity of each homolog to SpCas9 and AsCpf1 reference sequences were computed based on the pairwise alignment matching from these global alignments.
  • Genomic source sequences were identified using Uniprot linkage information or TBLASTN searches of NCBI using the default parameters and searching all possible frames for translational matches.
  • Wild-type nucleic acid sequences for MAD1-MAD20 include SEQ ID NOs 21-40, respectively. These MAD nucleases were codon optimized for expression in E. coli and the codon optimized sequences are listed as SEQ ID NO: 41-60, respectively (summarized in Table 2).
  • Codon optimized MAD1-MAD20 were cloned into an expression construct comprising a constitutive or inducible promoter (eg., proB promoter SEQ ID NO: 83, or pBAD promoter SEQ ID NO: 81 or SEQ ID NO: 82) and an optional 6 ⁇ -His tag (SEQ ID NO: 182) (eg., FIG. 2 ).
  • the generated MAD1-MAD20 expression constructs are provided as SEQ ID NOs: 61-80, respectively.
  • the expression constructs as depicted in FIG. 2 were generated either by restriction/ligation-based cloning or homology-based cloning.
  • a nucleic acid-guided nuclease and a compatible guide nucleic acid is needed.
  • a nucleic acid-guided nuclease and a compatible guide nucleic acid is needed.
  • multiple approaches were taken. First, scaffold sequences were looked for near the endogenous loci of each MAD nuclease. In some cases, such as with MAD2, no endogenous scaffold sequence was found. Therefore, we tested the compatibility of MAD2 with scaffold sequences found near the endogenous loci of the other MAD nucleases. A list of the MAD nucleases and corresponding endogenous scaffold sequences that were tested is listed in Table 2.
  • Editing cassettes as depicted in FIG. 3 were generated to assess the functionality of the MAD nucleases and corresponding guide nucleic acids.
  • Each editing cassette comprises an editing sequence and a promoter operably linked to an encoded guide nucleic acid.
  • the editing cassettes further comprises primer sites (P1 and P2) on flanking ends.
  • the guide nucleic acids comprised various scaffold sequences to be tested, as well as a guide sequence to guide the MAD nuclease to the target sequence for editing.
  • the editing sequences comprised a PAM mutation and/or codon mutation relative to the target sequence.
  • the mutations were flanked by regions of homology (homology arms or HA) which would allow recombination into the cleaved target sequence.
  • FIG. 4 depicts an experimental designed to test different MAD nuclease and guide nucleic acid combinations.
  • An expression cassette encoding the MAD nuclease or the MAD nuclease protein were added to host cells along with various editing cassettes as described above.
  • the guide nucleic acids were engineered to target the galK gene in the host cell, and the editing sequence was designed to mutate the targeted galK gene in order to turn the gene off, thereby allowing for screening of successfully edited cells.
  • This design was used for identification of functional or compatible MAD nuclease and guide nucleic acid combinations. Editing efficiency was determined by qPCR to measure the editing plasmid in the recovered cells in a high-throughput manner. Validation of MAD11 and Cas9 primers is shown in FIGS. 14A and 14B . These results show that the selected primer pairs are orthogonal and allow quantitative measurement of input plasmid DNA
  • FIGS. 5A-5B is a depiction of a similar experimental design.
  • the editing cassette ( FIG. 5B ) further comprises a selectable marker, in this case kanamycin resistance (kan) and the MAD nuclease expression vector ( FIG. 5A ) further comprises a selectable marker, in this case chloramphenicol resistance (Cm), and the lambda RED recombination system to aid homologous recombination (HR) of the editing sequence into the target sequence.
  • kan kanamycin resistance
  • Cm chloramphenicol resistance
  • HR homologous recombination
  • a compatible MAD nuclease and guide nucleic acid combination will cause a double strand break in the target sequence if a PAM sequence is present. Since the editing sequence (eg. FIG. FIG.
  • the editing sequence further comprises a mutation in the galK gene that allows for screening of edited cells, while the MAD nuclease expression vector and editing cassette contain drug selection markers, allowing for selection of edited cells.
  • compatible guide nucleic acids for MAD1-MAD20 were tested. Twenty scaffold sequences were tested. The guide nucleic acids used in the experiments contained one of the twenty scaffold sequences, referred to as scaffold-1, scaffold-2, etc., and a guide sequence that targets the galK gene. Sequences for Scaffold-1 through Scaffold-20 are listed as SEQ ID NO: 84-103, respectively. It should be understood that the guide sequence of the guide nucleic acid is variable and can be engineered or designed to target any desired target sequence.
  • This workflow could also be used to identify or test PAM sequences compatible with a given MAD nuclease. Another method for identifying a PAM site is described in the next example.
  • transformations were carried out as follows. E. coli strains expressing the codon optimized MAD nucleases were grown overnight. Saturated cultures were diluted 1/100 and grown to an OD600 of 0.6 and induced by adding arabinose at a filing concentration of 0.4% and (if a temperature sensitive plasmid is used) shifting the culture to 42 degrees Celsius in a shaking water bath. Following induction, cells were chilled on ice for 15 min prior to washing thrice with 1 ⁇ 4 the initial culture volume with 10% glycerol (for example, 50 mL washed for a 200 mL culture).
  • Cells were resuspended in 1/100 the initial volume (for example, 2 mL for a 200 mL culture) and stores at ⁇ 90 degrees Celsius until ready to use.
  • 50 ng of editing cassette was transformed into cell aliquots by electroporation. Following electroporation, the cells were recovered in LB for 3 hours and 100 ⁇ L of cells were plated on Macconkey plates containing 1% galactose.
  • Editing efficiencies were determined by dividing the number of white colonies (edited cells) by the total number of white and red colonies (edited and non-edited cells).
  • a guide nucleic acid In order to generate a double strand break in a target sequence, a guide nucleic acid must hybridize to a target sequence, and the MAD nuclease must recognize a PAM sequence adjacent to the target sequence. If the guide nucleic acid hybridizes to the target sequence, but the MAD nuclease does not recognize a PAM site, then cleavage does not occur.
  • a PAM is MAD nuclease-specific and not all MAD nucleases necessarily recognize the same PAM.
  • an assay as depicted in FIGS. 6A-6C was performed.
  • FIG. 6A depicts a MAD nuclease expression vector as described elsewhere, which also contains a chloramphenicol resistance gene and the lambda RED recombination system.
  • FIG. 6B depicts a self-targeting editing cassette.
  • the guided nucleic acid is designed to target the target sequence which is contained on the same nucleic acid molecule.
  • the target sequence is flanked by random nucleotides, depicted by N4, meaning four random nucleotides on either end of the target sequence. It should be understood that any number of random nucleotides could also be used (for example, 3, 5, 6, 7, 8, etc).
  • the random nucleotides serve as a library of potential PAMs.
  • FIG. 6C depicts the experimental design. Basically, the MAD nuclease expression vector and editing cassette comprising the random PAM sites were transformed into a host cell. If a functional targetable nuclease complex was formed and the MAD nuclease recognized a PAM site, then the editing cassette vector was cleaved and which leads to cell death. If a functional targetable complex was not formed or if the MAD nuclease did not recognize the PAM, then the target sequence was not cleaved and the cell survived. Next generation sequence (NGS) was then used to sequence the starting and final cell populations in order to determine what PAM sites were recognized by a given MAD nuclease. These recognized PAM sites were then used to determine a consensus or non-consensus PAM for a given MAD nuclease.
  • NGS Next generation sequence
  • the consensus PAM for MAD1-MAD8, and MAD10-MAD12 was determined to be TTTN.
  • the consensus PAM for MAD9 was determined to be NNG.
  • the consensus PAM for MAD13-MAD15 was determined to be TTN.
  • the consensus PAM for MAD16-MAD18 was determined to be TA.
  • the consensus PAM for MAD19-MAD20 was determined to be TTCN.
  • Editing efficiencies were tested for MAD1, MAD2, MAD4, and MAD7 and are depicted in FIG. 7A and FIG. 7B . Experiment details and editing efficiencies are summarized in Table 3. Editing efficiency was determined by dividing the number of edited cells by the total number of recovered cells.
  • Various editing cassettes targeting the galK gene were used to allow screening of editing cells.
  • the guide nucleic acids encoded on the editing cassette contained a guide sequence targeting the galK gene and one of various scaffold sequences in order to test the compatibility of the indicated MAD nuclease with the indicated scaffold sequence, as summarized in Table 3.
  • nuclease sequence mutation gene 1 MAD2 Scaffold-12; SEQ ID NO: 95 N89KpnI galK 2 MAD2 Scaffold-10; SEQ ID NO: 93 L80** galK 3 MAD2 Scaffold-5; SEQ ID NO: 88 L80** galK 4 MAD2 Scaffold-12; SEQ ID NO: 95 D70KpnI galK 5 MAD2 Scaffold-12; SEQ ID NO: 95 Y145** galK 6 MAD2 Scaffold-11; SEQ ID NO: 94 Y145** galK 7 MAD2 Scaffold-10; SEQ ID NO: 93 Y145** galK 8 MAD2 Scaffold-12; SEQ ID NO: 95 Ll0KpnI galK 9 MAD2 Scaffold-11; SEQ ID NO: 94 L80** galK 10 SpCas9 S.
  • nuclease sequence mutation gene 1 MAD7 Scaffold-1; SEQ ID NO: 84 L80** galK 2 MAD7 Scaffold-2; SEQ ID NO: 85 Y145** galK 3 MAD7 Scaffold-4; SEQ ID NO: 87 Y145** galK 4 MAD7 Scaffold-10; SEQ ID NO: 93 Y145** galK 5 MAD7 Scaffold-11; SEQ ID NO: 95 L80** galK
  • transformation efficiencies were determined by calculating the total number of recovered cells compared to the start number of cells.
  • An example plate image is depicted in FIG. 10C .
  • Editing efficiencies were determined by calculating the ratio of editing colonies (white colonies, edited galK gene) versus total colonies.
  • cells expressing galK were transformed with expression constructs expressing either MAD2 or MAD7 and a corresponding editing cassette comprising a guide nucleic acid targeting the galK gene.
  • the guide nucleic acid was comprised of a guide sequence targeting the galK gene and the scaffold-12 sequence (SEQ ID NO: 95).
  • MAD2 and MAD7 has a lower transformation efficiency compared to S. pyogenes Cas9, though the editing efficiency of MAD2 and MAD7 was slightly higher than S. pyogenes Cas9.
  • FIG. 11 depicts the sequencing results from select colonies recovered from the assay described above.
  • the target sequence was in the galK coding sequence (CDS).
  • the TTTN PAM is shown as the reverse complement (wild-type NAAA, mutated NGAA).
  • the mutations targeted by the editing sequence are labeled as target codons. Changes compared to the wild-type sequence are highlighted.
  • the scaffold-12 sequence SEQ ID NO: 95 was used.
  • the guide sequence of the guide nucleic acid targeted the galK gene.
  • Two of the four depicted sequences from the MAD7 experiment contained the designed PAM mutation and mutated target codons.
  • One colony comprises a wildtype sequence, while another contained a deletion of eight nucleotides upstream of the target sequence.
  • FIG. 12 depicts results from another experiment testing the ability to recover edited cells.
  • the MAD2 nuclease was used with a guide nucleic acid comprising scaffold-11 sequence and a guide sequence targeting galK.
  • the editing cassette comprised an editing sequence designed to incorporate an L80** mutation into galK, thereby allowing screening of the edited cells.
  • the MAD2 nuclease was used with a guide nucleic acid comprising scaffold-12 sequence and a guide sequence targeting galK.
  • the editing cassette comprised an editing sequence designed to incorporate an L10KpnI mutation into galK.
  • a negative control plasmid a guide nucleic acid that is not compatible with MAD2 was included in the transformations.
  • the ratio of the compatible editing cassette (those containing scaffold-11 or scaffold-12 guide nucleic acids) to the non-compatible editing cassette (negative control) was measure.
  • the experiments were done in the presence or absence of selection. The results show that more compatible editing cassette containing cells were recovered compared to the non-compatible editing cassette, and this result is magnified when selection is used.
  • the sequences of scaffolds 1-8, and 10-12 (SEQ ID NO: 84-91, and 93-95) were aligned and are depicted in FIG. 13A . Nucleotides that match the consensus sequence are faded, while those diverging from the consensus sequence are visible. The predicted pseudoknot region is indicated. Without being bound by theory, the region 5′ of the pseudoknot may be influence binding and/or kinetics of the nucleic acid-guided nuclease. As is shown in FIG. 13A , in general, there appears to be less variability in the pseudoknot region (e.g., SEQ ID NO: 172-181) as compared to the sequence outside of the pseudoknot region.
  • FIG. 13B shows a preliminary model of MAD2 and MAD12 complexed with a guide nucleic acid (in this example, a guide RNA) and target sequence (DNA).
  • a guide nucleic acid in this example, a guide RNA
  • DNA target sequence
  • a plate-based editing efficiency assay and a molecular editing efficiency assay were used to test editing efficiency of various MAD nuclease and guide nucleic acid combinations.
  • FIG. 15 depicts quantification of the data obtained using the molecular editing efficiency assay using MAD2 nuclease with a guide nucleic acid comprising scaffold-12 and a guide sequencing targeting galK. The indicated mutations were incorporated into the galK using corresponding editing cassettes containing the mutation.
  • FIG. 16 shows the comparison of the editing efficiencies determined by the plate-based assay using white and red colonies as described previously, and the molecular editing efficiency assay. As shown in FIG. 16 , the editing efficiencies as determined by the two separate assays are consistent.
  • a barcode can be incorporated into or near the edit site as described in the present specification.
  • a cell expressing a MAD nuclease is transformed with a plasmid containing an editing cassette and a recording cassette.
  • the editing cassette contains a PAM mutation and a gene edit.
  • the recorder cassette comprises a barcode, in this case 15N. Both the editing cassette and recording cassette each comprise a guide nucleic acid to a distinct target sequence.
  • the recorder cassette for each round can contain the same guide nucleic acid, such that the first round barcode is inserted into the same location across all variants, regardless of what editing cassette and corresponding gene edit is used.
  • the correlation between the barcode and editing cassette is determined beforehand though such that the edit can be identified by sequencing the barcode.
  • FIG. 17B shows an example of a recording cassette designed to delete a PAM site while incorporating a 15N barcode.
  • the deleted PAM is used to enrich for edited cells since mutated PAM cells escape cell death while cells containing a wild-type PAM sequence are killed.
  • Fire 21 C depicts how sequencing the barcode region can be used to identify which edit is comprised within each cell.
  • FIG. 18 A similar approach is depicted in FIG. 18 .
  • the recorder cassette from each round is designed to target a sequence adjacent to the previous round, and each time, a new PAM site is deleted by the recorder cassette.
  • the result is a barcode array with the barcodes from each round that can be sequenced to confirm each round of engineering took place and to determine which combination of mutations are contained in the cell, and in which order the mutations were made.
  • Each successive recorder cassette can be designed to be homologous on one end to the region comprising the mutated PAM from the previous round, which could increase the efficiency of getting fully edited cells at the end of the experiment.
  • the recorder cassette is designed to target a unique landing site that was incorporated by the previous recorder cassette. This increases the efficiency of recovering cells containing all of the desired mutations since the subsequent recorder cassette and barcode can only target a cell that has successfully completed the previous round of engineering.
  • FIG. 19 depicts another approach that allows the recycling of selectable markers or to otherwise cure the cell of the plasmid form the previous round of engineering.
  • the transformed plasmid containing a guide nucleic acid designed to target a selectable marker or other unique sequence in the plasmid form the previous round of engineering.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Disclosed herein are nucleic acid-guided nucleases, guide nucleic acids, and targetable nuclease systems, and methods of use. Disclosed herein are engineered non-naturally occurring nucleic acid-guided nucleases, guide nucleic acids, and targetable nuclease systems, and methods of use. Targetable nuclease systems can be used to edit genetic targets, including recursive genetic engineering and trackable genetic engineering methods.

Description

BACKGROUND OF THE DISCLOSURE
Nucleic acid-guided nucleases have become important tools for research and genome engineering. The applicability of these tools can be limited by the sequence specificity requirements, expression, or delivery issues.
SEQUENCE LISTING
The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jul. 14, 2017, is named 49022-716_201_SL.txt and is 809,751 bytes in size. This application contains a partial sequence list in Table 6.
SUMMARY OF THE DISCLOSURE
Disclosed herein are methods of modifying a target region in the genome of a cell, the method comprising: (a) contacting a cell with: a non-naturally occurring nucleic-acid-guided nuclease encoded by a nucleic acid having at least 80% identity to SEQ ID NO: 22; an engineered guide nucleic acid capable of complexing with the nucleic acid-guided nuclease; and an editing sequence encoding a nucleic acid complementary to said target region having a change in sequence relative to the target region; and (b) allowing the nuclease, guide nucleic acid, and editing sequence to create a genome edit in a target region of the genome of the cell. In some aspects, the engineered guide nucleic acid and the editing sequence are provided as a single nucleic acid. In some aspects, the single nucleic acid further comprises a mutation in a protospacer adjacent motif (PAM) site. In some aspects, the nucleic acid-guided nuclease is encoded by a nucleic acid with at least 85% identity to SEQ ID NO: 42. In some aspects, the nucleic acid-guided nuclease is encoded by a nucleic acid with at least 85% identity to SEQ ID NO: 128.
Disclosed herein are nucleic acid-guided nuclease systems comprising: (a) a non-naturally occurring nuclease encoded by a nucleic acid having at least 80% identity to SEQ ID NO: 22; (b) an engineered guide nucleic acid capable of complexing with the nucleic acid-guided nuclease, and (c) an editing sequence having a change in sequence relative to the sequence of a target region in a genome of a cell; wherein the system results in a genome edit in the target region in the genome of the cell facilitated by the nuclease, the engineered guide nucleic acid, and the editing sequence. In some aspects, nucleic acid-guided nuclease is encoded by a nucleic acid with at least 85% identity to SEQ ID NO: 42. In some aspects, the nucleic acid-guided nuclease is encoded by a nucleic acid with at least 85% identity to SEQ ID NO: 128. In some aspects, the nucleic acid-guided nuclease is codon optimized for the cell to be edited. In some aspects, the engineered guide nucleic acid and the editing sequence are provided as a single nucleic acid. In some aspects, the single nucleic acid further comprises a mutation in a protospacer adjacent motif (PAM) site.
Disclosed herein are compositions for use in genome editing comprising a non-naturally occurring nuclease encoded by a nucleic acid having at least 75% identity to SEQ ID NO: 22. In some aspects, the nucleic acid has at least 80% identity to SEQ ID NO: 22. In some aspects, the nucleic acid has at least 90% identity to SEQ ID NO: 22. In some aspects, the nuclease is further codon optimized for use in cells from a particular organism. In some aspects, the nuclease is codon optimized for E. Coli In some aspects, the nuclease is codon optimized for S. Cerevisiae. In some aspects, the nuclease is codon optimized for mammalian cells. In some aspects, the nucleic acid-guided nuclease has less than 40% protein identity to SEQ ID NO: 12. In some aspects, the nucleic acid-guided nuclease has less than 40% protein identity to SEQ ID NO: 108.
INCORPORATION BY REFERENCE
All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
This patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
FIG. 1A depicts a partial sequence alignment MAD1-8 (SEQ ID NO: 1-8) and MAD10-12 (SEQ ID NO: 10-12). FIG. 1A discloses residues 703-707 of SEQ ID NO: 1, residues 625-629 of SEQ ID NO: 2, residues 587-591 of SEQ ID NO: 3, residues 654-658 of SEQ ID NO: 4, residues 581-585 of SEQ ID NO: 5, residues 637-641 of SEQ ID NO: 6, residues 590-594 of SEQ ID NO: 7, residues 645-649 of SEQ ID NO: 8, SEQ ID NO: 205, residues 619-623 of SEQ ID NO: 10 and residues 603-607 of SEQ ID NO: 12, all disclosed respectively, in order of appearance.
FIG. 1B depicts a phylogenetic tree of nucleases including MAD1-8.
FIG. 2 depicts an example protein expression construct. FIG. 2 discloses “6×-His” as SEQ ID NO: 182.
FIG. 3 depicts an example editing cassette. FIG. 3 discloses SEQ ID NOS 183-185, respectively, in order of appearance.
FIG. 4 depicts an example screening or selection experiment workflow.
FIG. 5A depicts an example protein expression construct.
FIG. 5B depicts an example editing cassette.
FIG. 5C depicts an example screening or selection experiment workflow.
FIG. 6A depicts an example protein expression construct.
FIG. 6B depicts an example editing cassette.
FIG. 6C depicts an example screening or selection experiment workflow.
FIG. 7A-7B depicts example data from a functional nuclease complex screening or selection experiment.
FIG. 8 depicts example data from a targetable nuclease complex-based editing experiment.
FIG. 9 depicts example data from a targetable nuclease complex-based editing experiment.
FIGS. 10A-10C depict example data from a targetable nuclease complex-based editing experiment.
FIG. 11 depicts a example sequence alignment of select sequences from an editing experiment. FIG. 11 discloses SEQ ID NOS 186-188, 187, 187, 187, 187, 187, 189, 186, 187 and 187, respectively, in order of appearance.
FIG. 12 depicts example data from a targetable nuclease complex-based editing experiment.
FIG. 13A depicts an example alignment of scaffold sequences (SEQ ID NOS 190-202, respectively, in order of appearance).
FIG. 13B depicts an example model of a nucleic acid-guided nuclease complexed with a guide nucleic acid and a target sequence.
FIG. 14A-14B depict example data from a primer validation experiment.
FIG. 15 depicts example data from a targetable nuclease complex-based editing experiment.
FIG. 16 depicts example validation data comparing results from two different assays.
FIG. 17A-17C depict an example trackable genetic engineering workflow, including a plasmid comprising an editing cassette and a recording cassette, and downstream sequencing of barcodes in order to identify the incorporated edit or mutation. FIG. 17B discloses SEQ ID NOS 203 and 204, respectively, in order of appearance.
FIG. 18 depicts an example trackable genetic engineering workflow, including iterative rounds of engineering with a different editing cassette and recorder cassette with unique barcode (BC) at each round, which can be followed by selection and tracking to confirm the successful engineering step at each round.
FIG. 19 depicts an example recursive engineering workflow.
DETAILED DESCRIPTION OF THE DISCLOSURE
The present disclosure provides nucleic acid-guided nucleases and methods of use. Often, the subject nucleic-acid guided nucleases are part of a targetable nuclease system comprising a nucleic acid-guided nuclease and a guide nucleic acid. A subject targetable nuclease system can be used to cleave, modify, and/or edit a target polynucleotide sequence, often referred to as a target sequence. A subject targetable nuclease system refers collectively to transcripts and other elements involved in the expression of or directing the activity of genes, which may include sequences encoding a subject nucleic acid-guided nuclease protein and a guide nucleic acid as disclosed herein.
Methods, systems, vectors, polynucleotides, and compositions described herein may be used in various applications including altering or modifying synthesis of a gene product, such as a protein, polynucleotide cleavage, polynucleotide editing, polynucleotide splicing; trafficking of target polynucleotide, tracing of target polynucleotide, isolation of target polynucleotide, visualization of target polynucleotide, etc. Aspects of the invention also encompass methods and uses of the compositions and systems described herein in genome engineering, e.g. for altering or manipulating the expression of one or more genes or the one or more gene products, in prokaryotic, archaeal, or eukaryotic cells, in vitro, in vivo or ex vivo.
Nucleic Acid-Guided Nucleases
Bacterial and archaeal targetable nuclease systems have emerged as powerful tools for precision genome editing. However, naturally occurring nucleases have some limitations including expression and delivery challenges due to the nucleic acid sequence and protein size. Targetable nucleases that require PAM recognition are also limited in the sequences they can target throughout a genetic sequence. Other challenges include processivity, target recognition specificity and efficiency, and nuclease acidity efficiency, which often effect genetic editing efficiency.
Non-naturally occurring targetable nucleases and non-naturally occurring targetable nuclease systems can address many of these challenges and limitations.
Disclosed herein are non-naturally targetable nuclease systems. Such targetable nuclease systems are engineered to address one or more of the challenges described above and can be referred to as engineered nuclease systems. Engineered nuclease systems can comprise one or more of an engineered nuclease, such as an engineered nucleic acid-guided nuclease, an engineered guide nucleic acid, an engineered polynucleotides encoding said nuclease, or an engineered polynucleotides encoding said guide nucleic acid. Engineered nucleases, engineered guide nucleic acids, and engineered polynucleotides encoding the engineered nuclease or engineered guide nucleic acid are not naturally occurring and are not found in nature. It follows that engineered nuclease systems including one or more of these elements are non-naturally occurring.
Non-limiting examples of types of engineering that can be done to obtain a non-naturally occurring nuclease system are as follows. Engineering can include codon optimization to facilitate expression or improve expression in a host cell, such as a heterologous host cell. Engineering can reduce the size or molecular weight of the nuclease in order to facilitate expression or delivery. Engineering can alter PAM selection in order to change PAM specificity or to broaden the range of recognized PAMs. Engineering can alter, increase, or decrease stability, processivity, specificity, or efficiency of a targetable nuclease system. Engineering can alter, increase, or decrease protein stability. Engineering can alter, increase, or decrease processivity of nucleic acid scanning. Engineering can alter, increase, or decrease target sequence specificity. Engineering can alter, increase, or decrease nuclease activity. Engineering can alter, increase, or decrease editing efficiency. Engineering can alter, increase, or decrease transformation efficiency. Engineering can alter, increase, or decrease nuclease or guide nucleic acid expression.
Examples of non-naturally occurring nucleic acid sequences which are disclosed herein include sequences codon optimized for expression in bacteria, such as E. coli (e.g., SEQ ID NO: 41-60), sequences codon optimized for expression in single cell eukaryotes, such as yeast (e.g., SEQ ID NO: 127-146), sequences codon optimized for expression in multi cell eukaryotes, such as human cells (e.g., SEQ ID NO: 147-166), polynucleotides used for cloning or expression of any sequences disclosed herein (e.g., SEQ ID NO: 61-80), plasmids comprising nucleic acid sequences (e.g., SEQ ID NO: 21-40) operably linked to a heterologous promoter or nuclear localization signal or other heterologous element, proteins generated from engineered or codon optimized nucleic acid sequences (e.g., SEQ ID NO: 1-20), or engineered guide nucleic acids comprising any one of SEQ ID NO: 84-107. Such non-naturally occurring nucleic acid sequences can be amplified, cloned, assembled, synthesized, generated from synthesized oligonucleotides or dNTPs, or otherwise obtained using methods known by those skilled in the art.
Disclosed herein are nucleic acid-guided nucleases. Subject nucleases are functional in vitro, or in prokaryotic, archaeal, or eukaryotic cells for in vitro, in vivo, or ex vivo applications. Suitable nucleic acid-guided nucleases can be from an organism from a genus which includes but is not limited to Thiomicrospira, Succinivibrio, Candidatus, Porphyromonas, Acidaminococcus, Acidomonococcus, Prevotella, Smithella, Moraxella, Synergistes, Francisella, Leptospira, Catenibacterium, Kandleria, Clostridium, Dorea, Coprococcus, Enterococcus, Fructobacillus, Weissella, Pediococcus, Corynebacter, Sutterella, Legionella, Treponema, Roseburia, Filifactor, Eubacterium, Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flaviivola, Flavobacterium, Sphaerochaeta, Azospirillum, Gluconacetobacter, Neisseria, Roseburia, Parvibaculum, Staphylococcus, Nitratifractor, Mycoplasma, Alicyclobacillus, Brevibacilus, Bacillus, Bacteroidetes, Brevibacilus, Carnobacterium, Clostridiaridium, Clostridium, Desulfonatronum, Desulfovibrio, Helcococcus, Leptotrichia, Listeria, Methanomethyophilus, Methylobacterium, Opitutaceae, Paludibacter, Rhodobacter, Sphaerochaeta, Tuberibacillus, Oleiphilus, Omnitrophica, Parcubacteria, and Campylobacter. Species of organism of such a genus can be as otherwise herein discussed. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a kingdom which includes but is not limited to Firmicute, Actinobacteria, Bacteroidetes, Proteobacteria, Spirochates, and Tenericutes. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a phylum which includes but is not limited to Erysipelotrichia, Clostridia, Bacilli, Actinobacteria, Bacteroidetes, Flavobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Deltaproteobacteria, Epsilonproteobacteria, Spirochaetes, and Mollicutes. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within an order which includes but is not limited to Clostridiales, Lactobacillales, Actinomycetales, Bacteroidales, Flavobacteriales, Rhizobiales, Rhodospirillales, Burkholderiales, Neisseriales, Legionellales, Nautiliales, Campylobacterales, Spirochaetales, Mycoplasmatales, and Thiotrichales. Suitable nucleic acid-guided nucleases can be from an organism from a genus or unclassified genus within a family which includes but is not limited to Lachnospiraceae, Enterococcaceae, Leuconostocaceae, Lactobacillaceae, Streptococcaceae, Peptostreptococcaceae, Staphylococcaceae, Eubacteriaceae, Corynebacterineae, Bacteroidaceae, Flavobacterium, Cryomoorphaceae, Rhodobiaceae, Rhodospirillaceae, Acetobacteraceae, Sutterellaceae, Neisseriaceae, Legionellaceae, Nautiliaceae, Campylobacteraceae, Spirochaetaceae, Mycoplasmataceae, Pisciririckettsiaceae, and Francisellaceae. Other nucleic acid-guided nucleases have been describe in US Patent Application Publication No. US20160208243 filed Dec. 18, 2015, US Application Publication No. US20140068797 filed Mar. 15, 2013, U.S. Pat. No. 8,697,359 filed Oct. 15, 2013, and Zetsche et al., Cell 2015 Oct. 22; 163(3):759-71, each of which are incorporated herein by reference in their entirety.
Some nucleic acid-guided nucleases suitable for use in the methods, systems, and compositions of the present disclosure include those derived from an organism such as, but not limited to, Thiomicrospira sp. XS5, Eubacterium rectale, Succinivibrio dextrinosolvens, Candidatus Methanoplasma termitum, Candidatus Methanomethylophilus alvus, Porphyromonas crevioricanis, Flavobacterium branchiophilum, Acidaminococcus Sp., Acidomonococcus sp., Lachnospiraceae bacterium COE1, Prevotella brevis ATCC 19188, Smithella sp. SCADC, Moraxella bovoculi, Synergistes jonesii, Bacteroidetes oral taxon 274, Francisella tularensis, Leptospira inadai serovar Lyme str. 10, Acidomonococcus sp. crystal structure (5B43) S. mutans, S. agalactiae, S. equisimilis, S. sanguinis, S. pneumonia; C. jejuni, C. coli; N. salsuginis, N. tergarcus; S. auricularis, S. carnosus; N. meningitides, N. gonorrhoeae; L. monocytogenes, L. ivanovii; C. botulinum, C. difficile, C. tetani, C. sordellii; Francisella tularensis 1, Prevotella albensis, Lachnospiraceae bacterium MC2017 1, Butyrivibrio proteoclasticus, Butyrivibrio proteoclasticus B316, Peregrinibacteria bacterium GW2011_GWA2_33_10, Parcubacteria bacterium GW2011_GWC2_44_17, Smithella sp. SCADC, Acidaminococcus sp. BV3L6, Lachnospiraceae bacterium MA2020, Candidatus Methanoplasma termitum, Eubacterium eligens, Moraxella bovoculi 237, Leptospira inadai, Lachnospiraceae bacterium ND2006, Porphyromonas crevioricanis 3, Prevotella disiens, Porphyromonas macacae, Catenibacterium sp. CAG:290, Kandleria vitulina, Clostridiales bacterium KA00274, Lachnospiraceae bacterium 3-2, Dorea longicatena, Coprococcus catus GD/7, Enterococcus columbae DSM 7374, Fructobacillus sp. EFB-N1, Weissella halotolerans, Pediococcus acidilactici, Lactobacillus curvatus, Streptococcus pyogenes, Lactobacillus versmoldensis, Filifactor alocis ATCC 35896, Alicyclobacillus acidoterrestris, Alicyclobacillus acidoterrestris ATCC 49025, Desulfovibrio inopinatus, Desulfovibrio inopinatus DSM 10711, Oleiphilus sp. Oleiphilus sp. HI0009, Candidtus kefeldibacteria, Parcubacteria CasY.4, Omnitrophica WOR 2 bacterium GWF2, Bacillus sp. NSP2.1, and Bacillus thermoamylovorans.
In some instances, a nucleic acid-guided nuclease disclosed herein comprises an amino acid sequence comprising at least 50% amino acid identity to any one of SEQ ID NO: 1-20. In some instances, a nuclease comprises an amino acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% amino acid identity to any one of SEQ ID NO: 1-20. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to any one of SEQ ID NO: 1-20. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to any one of SEQ ID NO: 1-8 or 10-12. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to any one of SEQ ID NO: 1-8 or 10-11. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to SEQ ID NO: 2. In some cases, the nucleic acid-guided nuclease comprises at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, amino acid identity to SEQ ID NO: 7.
In some cases, the nucleic acid-guided nuclease comprises any one of SEQ ID NO: 1-20. In some cases, the nucleic acid-guided nuclease comprises any one of SEQ ID NO: 1-8 or 10-12. In some cases, the nucleic acid-guided nuclease comprises any one of SEQ ID NO: 1-8 or 10-11. In some cases, the nucleic acid-guided nuclease comprises SEQ ID NO: 2. In some cases, the nucleic acid-guided nuclease comprises SEQ ID NO: 7.
In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 50% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 45% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 40% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 35% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110. In some instances, a nucleic acid-guided nuclease comprises an amino acid sequence comprising at most 30% amino acid identity to any one of SEQ ID NO: 12 or SEQ ID NO: 108-110.
In some instances, a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 21-40. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 21-40. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-40. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-40. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-28 or 30-32. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to any one of SEQ ID NO: 21-28 or 30-31. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to SEQ ID NO: 22. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, sequence identity to SEQ ID NO: 27.
In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 21-40. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 21-28 or 30-32. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 21-28 or 30-31. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 22. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 27.
In some instances, a nucleic acid-guided nuclease disclosed herein is encoded on a nucleic acid sequence. Such a nucleic acid can be codon optimized for expression in a desired host cell. Suitable host cells can include, as non-limiting examples, prokaryotic cells such as E. coli, P. aeruginosa, B. subtilus, and V. natriegens, and eukaryotic cells such as S. cerevisiae, plant cells, insect cells, nematode cells, amphibian cells, fish cells, or mammalian cells, including human cells.
A nucleic acid sequence encoding a nucleic acid-guided nuclease can be codon optimized for expression in gram positive bacteria, e.g., Bacillus subtilis, or gram negative bacteria, e.g., E. coli. In some instances, a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 41-60. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 41-60. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 41-60. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 41-48 or 50-52. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 41-48 or 50-51. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 42. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 47.
In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 41-60. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 41-48 or 50-52. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 41-48 or 50-51. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 42. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 47.
A nucleic acid sequence encoding a nucleic acid-guided nuclease can be codon optimized for expression in a species of yeast, e.g., S. cerevisiae. In some instances, a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 127-146. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 127-146. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-146. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-146. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-134 or 136-138. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 127-134 or 136-137. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 128. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 133.
In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 127-146. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 127-134 or 136-138. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 127-134 or 136-137. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 128. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 133.
A nucleic acid sequence encoding a nucleic acid-guided nuclease can be codon optimized for expression in mammalian cells. In some instances, a nucleic acid-guided nuclease disclosed herein is encoded by a nucleic acid sequence comprising at least 50% sequence identity to any one of SEQ ID NO: 147-166. In some instances, a nuclease is encoded by a nucleic acid sequence comprising at least about 10%, 20%, 30%, 40%, 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95%, or 100% sequence identity to any one of SEQ ID NO: 147-166. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 147-166. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 147-154 or 156-158. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to any one of SEQ ID NO: 147-154 or 156-157. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 148. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence comprising at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, greater than 95% sequence identity to SEQ ID NO: 153.
In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 147-166. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 147-154 or 156-158. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of any one of SEQ ID NO: 147-154 or 156-157. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 148. In some cases, the nucleic acid-guided nuclease is encoded by the nucleic acid sequence of SEQ ID NO: 153.
A nucleic acid sequence encoding a nucleic acid-guided nuclease can be operably linked to a promoter. Such nucleic acid sequences can be linear or circular. The nucleic acid sequences can be comprised on a larger linear or circular nucleic acid sequences that comprises additional elements such as an origin of replication, selectable or screenable marker, terminator, other components of a targetable nuclease system, such as a guide nucleic acid, or an editing or recorder cassette as disclosed herein. These larger nucleic acid sequences can be recombinant expression vectors, as are described in more detail later.
Guide Nucleic Acid
In general, a guide nucleic acid can complex with a compatible nucleic acid-guided nuclease and can hybridize with a target sequence, thereby directing the nuclease to the target sequence. A subject nucleic acid-guided nuclease capable of complexing with a guide nucleic acid can be referred to as a nucleic acid-guided nuclease that is compatible with the guide nucleic acid. Likewise, a guide nucleic acid capable of complexing with a nucleic acid-guided nuclease can be referred to as a guide nucleic acid that is compatible with the nucleic acid-guided nucleases.
A guide nucleic acid can be DNA. A guide nucleic acid can be RNA. A guide nucleic acid can comprise both DNA and RNA. A guide nucleic acid can comprise modified of non-naturally occurring nucleotides. In cases where the guide nucleic acid comprises RNA, the RNA guide nucleic acid can be encoded by a DNA sequence on a polynucleotide molecule such as a plasmid, linear construct, or editing cassette as disclosed herein.
A guide nucleic acid can comprise a guide sequence. A guide sequence is a polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a complexed nucleic acid-guided nuclease to the target sequence. The degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences. In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Preferably the guide sequence is 10-30 nucleotides long. The guide sequence can be 15-20 nucleotides in length. The guide sequence can be 15 nucleotides in length. The guide sequence can be 16 nucleotides in length. The guide sequence can be 17 nucleotides in length. The guide sequence can be 18 nucleotides in length. The guide sequence can be 19 nucleotides in length. The guide sequence can be 20 nucleotides in length.
A guide nucleic acid can comprise a scaffold sequence. In general, a “scaffold sequence” includes any sequence that has sufficient sequence to promote formation of a targetable nuclease complex, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease and a guide nucleic acid comprising a scaffold sequence and a guide sequence. Sufficient sequence within the scaffold sequence to promote formation of a targetable nuclease complex may include a degree of complementarity along the length of two sequence regions within the scaffold sequence, such as one or two sequence regions involved in forming a secondary structure. In some cases, the one or two sequence regions are comprised or encoded on the same polynucleotide. In some cases, the one or two sequence regions are comprised or encoded on separate polynucleotides. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the one or two sequence regions. In some embodiments, the degree of complementarity between the one or two sequence regions along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, at least one of the two sequence regions is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
A scaffold sequence of a subject guide nucleic acid can comprise a secondary structure. A secondary structure can comprise a pseudoknot region. In some cases, binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by secondary structures within the scaffold sequence. In some cases, binding kinetics of a guide nucleic acid to a nucleic acid-guided nuclease is determined in part by nucleic acid sequence with the scaffold sequence.
A scaffold sequence can comprise the sequence of any one of SEQ ID NO: 84-107. A scaffold sequence can comprise the sequence of any one of SEQ ID NO: 84-103. A scaffold sequence can comprise the sequence of any one of SEQ ID NO: 84-91 or 93-95. A scaffold sequence can comprise the sequence of any one of SEQ ID NO: 88, 93, 94, or 95. A scaffold sequence can comprise the sequence of SEQ ID NO: 88. A scaffold sequence can comprise the sequence of SEQ ID NO: 93. A scaffold sequence can comprise the sequence of SEQ ID NO: 94. A scaffold sequence can comprise the sequence of SEQ ID NO: 95.
In some aspects, the invention provides a nuclease that binds to a guide nucleic acid comprising a conserved scaffold sequence. For example, the nucleic acid-guided nucleases for use in the present disclosure can bind to a conserved pseudoknot region as shown in FIG. 13A. Specifically, the nucleic acid-guided nucleases for use in the present disclosure can bind to a guide nucleic acid comprising a conserved pseudoknot region as shown in FIG. 13A. Certain nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-1 (SEQ ID NO: 172). Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-3 (SEQ ID NO: 173). Still other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-4 (SEQ ID NO: 174). Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-5 (SEQ ID NO: 175). Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-6 (SEQ ID NO: 176). Still other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-7 (SEQ ID NO: 177). Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-8 (SEQ ID NO: 178). Other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-10 (SEQ ID NO: 179). Still other nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-11 (SEQ ID NO: 180). Certain nucleic acid-guided nucleases for use in the present disclosure can bind to a pseudoknot region having at least 75%, 80%, 85%, 90%, 95%, or 100% sequence identity to the pseudoknot region of Scaffold-12 (SEQ ID NO: 181).
A guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 84-107. A guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 84-103. A guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 84-91 or 93-95. A guide nucleic acid can comprise the sequence of any one of SEQ ID NO: 88, 93, 94, or 95. A guide nucleic acid can comprise the sequence of SEQ ID NO: 88. A guide nucleic acid can comprise the sequence of SEQ ID NO: 93. A guide nucleic acid can comprise the sequence of SEQ ID NO: 94. A guide nucleic acid can comprise the sequence of SEQ ID NO: 95.
In aspects of the invention the terms “guide nucleic acid” refers to one or more polynucleotides comprising 1) a guide sequence capable of hybridizing to a target sequence and 2) a scaffold sequence capable of interacting with or complexing with an nucleic acid-guided nuclease as described herein. A guide nucleic acid may be provided as one or more nucleic acids. In specific embodiments, the guide sequence and the scaffold sequence are provided as a single polynucleotide.
A guide nucleic acid can be compatible with a nucleic acid-guided nuclease when the two elements can form a functional targetable nuclease complex capable of cleaving a target sequence. Often, a compatible scaffold sequence for a compatible guide nucleic acid can be found by scanning sequences adjacent to a native nucleic acid-guided nuclease loci. In other words, native nucleic acid-guided nucleases can be encoded on a genome within proximity to a corresponding compatible guide nucleic acid or scaffold sequence.
Nucleic acid-guided nucleases can be compatible with guide nucleic acids that are not found within the nucleases endogenous host. Such orthogonal guide nucleic acids can be determined by empirical testing. Orthogonal guide nucleic acids can come from different bacterial species or be synthetic or otherwise engineered to be non-naturally occurring.
Orthogonal guide nucleic acids that are compatible with a common nucleic acid-guided nuclease can comprise one or more common features. Common features can include sequence outside a pseudoknot region. Common features can include a pseudoknot region. Common features can include a primary sequence or secondary structure.
A guide nucleic acid can be engineered to target a desired target sequence by altering the guide sequence such that the guide sequence is complementary to the target sequence, thereby allowing hybridization between the guide sequence and the target sequence. A guide nucleic acid with an engineered guide sequence can be referred to as an engineered guide nucleic acid. Engineered guide nucleic acids are often non-naturally occurring and are not found in nature.
Targetable Nuclease System
Disclosed herein are targetable nuclease systems. A targetable nuclease system can comprise a nucleic acid-guided nuclease and a compatible guide nucleic acid. A targetable nuclease system can comprise a nucleic acid-guided nuclease or a polynucleotide sequence encoding the nucleic acid-guided nuclease. A targetable nuclease system can comprise a guide nucleic acid or a polynucleotide sequence encoding the guide nucleic acid.
In general, a targetable nuclease system as disclosed herein is characterized by elements that promote the formation of a targetable nuclease complex at the site of a target sequence, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease and a guide nucleic acid.
A guide nucleic acid together with a nucleic acid-guided nuclease forms a targetable nuclease complex which is capable of binding to a target sequence within a target polynucleotide, as determined by the guide sequence of the guide nucleic acid.
In general, to generate a double stranded break, in most cases a targetable nuclease complex binds to a target sequence as determined by the guide nucleic acid, and the nuclease has to recognize a protospacer adjacent motif (PAM) sequence adjacent to the target sequence.
A targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-20 and a compatible guide nucleic acid. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-8 or 10-12 and a compatible guide nucleic acid. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-8 or 10-11 and a compatible guide nucleic acid. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid. In any of these cases, the guide nucleic acid can comprise a scaffold sequence compatible with the nucleic acid-guided nuclease. In any of these cases, the guide nucleic acid can further comprise a guide sequence. The guide sequence can be engineered to target any desired target sequence. The guide sequence can be engineered to be complementary to any desired target sequence. The guide sequence can be engineered to hybridize to any desired target sequence.
A targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-20 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 84-107. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-8 or 10-12 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 84-95. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of any one of SEQ ID NO: 1-8 or 10-11 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 84-91 or 93-95. In any of these cases, the guide nucleic acid can further comprise a guide sequence. The guide sequence can be engineered to target any desired target sequence. The guide sequence can be engineered to be complementary to any desired target sequence. The guide sequence can be engineered to hybridize to any desired target sequence.
A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 88, 93, 94, or 95. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 88. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 93. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 94. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 2 and a compatible guide nucleic acid comprising SEQ ID NO: 95. In any of these cases, the guide nucleic acid can further comprise a guide sequence. The guide sequence can be engineered to target any desired target sequence. The guide sequence can be engineered to be complementary to any desired target sequence. The guide sequence can be engineered to hybridize to any desired target sequence.
A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising any one of SEQ ID NO: 88, 93, 94, or 95. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 88. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 93. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 94. A targetable nuclease complex can comprise a nucleic acid-guided nuclease of SEQ ID NO: 7 and a compatible guide nucleic acid comprising SEQ ID NO: 95. In any of these cases, the guide nucleic acid can further comprise a guide sequence. The guide sequence can be engineered to target any desired target sequence. The guide sequence can be engineered to be complementary to any desired target sequence. The guide sequence can be engineered to hybridize to any desired target sequence.
A target sequence of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to a prokaryotic or eukaryotic cell, or in vitro. For example, the target sequence can be a polynucleotide residing in the nucleus of the eukaryotic cell. A target sequence can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA). Without wishing to be bound by theory, it is believed that the target sequence should be associated with a PAM; that is, a short sequence recognized by a targetable nuclease complex. The precise sequence and length requirements for a PAM differ depending on the nucleic acid-guided nuclease used, but PAMs are typically 2-5 base pair sequences adjacent the target sequence. Examples of PAM sequences are given in the examples section below, and the skilled person will be able to identify further PAM sequences for use with a given nucleic acid-guided nuclease. Further, engineering of the PAM Interacting (PI) domain may allow programming of PAM specificity, improve target site recognition fidelity, and increase the versatility of a nucleic acid-guided nuclease genome engineering platform. Nucleic acid-guided nucleases may be engineered to alter their PAM specificity, for example as described in Kleinstiver B P et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul. 23; 523 (7561): 481-5. doi: 10.1038/nature14592.
A PAM site is a nucleotide sequence in proximity to a target sequence. In most cases, a nucleic acid-guided nuclease can only cleave a target sequence if an appropriate PAM is present. PAMs are nucleic acid-guided nuclease-specific and can be different between two different nucleic acid-guided nucleases. A PAM can be 5′ or 3′ of a target sequence. A PAM can be upstream or downstream of a target sequence. A PAM can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides in length. Often, a PAM is between 2-6 nucleotides in length.
In some examples, a PAM can be provided on a separate oligonucleotide. In such cases, providing PAM on a oligonucleotide allows cleavage of a target sequence that otherwise would not be able to be cleave because no adjacent PAM is present on the same polynucleotide as the target sequence.
Polynucleotide sequences encoding a component of a targetable nuclease system can comprise one or more vectors. In general, the term “vector” refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g. circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art. One type of vector is a “plasmid,” which refers to a circular double stranded DNA loop into which additional DNA segments can be inserted, such as by standard molecular cloning techniques. Another type of vector is a viral vector, wherein virally-derived DNA or RNA sequences are present in the vector for packaging into a virus (e.g. retroviruses, replication defective retroviruses, adenoviruses, replication defective adenoviruses, and adeno-associated viruses). Viral vectors also include polynucleotides carried by a virus for transfection into a host cell. Certain vectors are capable of autonomous replication in a host cell into which they are introduced (e.g. bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Moreover, certain vectors are capable of directing the expression of genes to which they are operatively-linked. Such vectors are referred to herein as “expression vectors.” Common expression vectors of utility in recombinant DNA techniques are often in the form of plasmids. Further discussion of vectors is provided herein.
Recombinant expression vectors can comprise a nucleic acid of the invention in a form suitable for expression of the nucleic acid in a host cell, which means that the recombinant expression vectors include one or more regulatory elements, which may be selected on the basis of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence to be expressed. Within a recombinant expression vector, “operably linked” is intended to mean that the nucleotide sequence of interest is linked to the regulatory element(s) in a manner that allows for expression of the nucleotide sequence (e.g. in an in vitro transcription/translation system or in a host cell when the vector is introduced into the host cell). With regards to recombination and cloning methods, mention is made of U.S. patent application Ser. No. 10/815,730, published Sep. 2, 2004 as US 2004-0171156 A1, the contents of which are herein incorporated by reference in their entirety.
In some embodiments, a regulatory element is operably linked to one or more elements of a targetable nuclease system so as to drive expression of the one or more components of the targetable nuclease system.
In some embodiments, a vector comprises a regulatory element operably linked to a polynucleotide sequence encoding a nucleic acid-guided nuclease. The polynucleotide sequence encoding the nucleic acid-guided nuclease can be codon optimized for expression in particular cells, such as prokaryotic or eukaryotic cells. Eukaryotic cells can be yeast, fungi, algae, plant, animal, or human cells. Eukaryotic cells may be those of or derived from a particular organism, such as a mammal, including but not limited to human, mouse, rat, rabbit, dog, or non-human mammal including non-human primate.
In general, codon optimization refers to a process of modifying a nucleic acid sequence for enhanced expression in the host cells of interest by replacing at least one codon (e.g. about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more codons) of the native sequence with codons that are more frequently or most frequently used in the genes of that host cell while maintaining the native amino acid sequence. Various species exhibit particular bias for certain codons of a particular amino acid. Codon bias (differences in codon usage between organisms) often correlates with the efficiency of translation of messenger RNA (mRNA), which is in turn believed to be dependent on, among other things, the properties of the codons being translated and the availability of particular transfer RNA (tRNA) molecules. The predominance of selected tRNAs in a cell is generally a reflection of the codons used most frequently in peptide synthesis. Accordingly, genes can be tailored for optimal gene expression in a given organism based on codon optimization. Codon usage tables are readily available, for example, at the “Codon Usage Database” available at www.kazusa.orjp/codon/(visited Jul. 9, 2002), and these tables can be adapted in a number of ways. See Nakamura, Y., et al. “Codon usage tabulated from the international DNA sequence databases: status for the year 2000” Nucl. Acids Res. 28:292 (2000). Computer algorithms for codon optimizing a particular sequence for expression in a particular host cell are also available, such as Gene Forge (Aptagen; Jacobus, Pa.), are also available. In some embodiments, one or more codons (e.g. 1, 2, 3, 4, 5, 10, 15, 20, 25, 50, or more, or all codons) in a sequence encoding an engineered nuclease correspond to the most frequently used codon for a particular amino acid.
In some embodiments, a vector encodes a nucleic acid-guided nuclease comprising one or more nuclear localization sequences (NLSs), such as about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs. In some embodiments, the engineered nuclease comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g. one or more NLS at the amino-terminus and one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In a preferred embodiment of the invention, the engineered nuclease comprises at most 6 NLSs. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 111); the NLS from nucleoplasmin (e.g. the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO:112)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO:113) or RQRRNELKRSP (SEQ ID NO:114); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 115); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO:1 116) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO:117) and PPKKARED (SEQ ID NO:118) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO:119) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO:120) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO:121) and PKQKKRK (SEQ ID NO:122) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO:123) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO: 124) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 125) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 126) of the steroid hormone receptors (human) glucocorticoid.
In general, the one or more NLSs are of sufficient strength to drive accumulation of the nucleic acid-guided nuclease in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-guided nuclease, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g. a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of the nucleic acid-guided nuclease complex formation (e.g. assay for DNA cleavage or mutation at the target sequence, or assay for altered gene expression activity affected by targetable nuclease complex formation and/or nucleic acid-guided nuclease activity), as compared to a control not exposed to the nucleic acid-guided nuclease or targetable nuclease complex, or exposed to a nucleic acid-guided nuclease lacking the one or more NLS s.
A nucleic acid-guided nuclease and one or more guide nucleic acids can be delivered either as DNA or RNA. Delivery of an nucleic acid-guided nuclease and guide nucleic acid both as RNA (unmodified or containing base or backbone modifications) molecules can be used to reduce the amount of time that the nucleic acid-guided nuclease persist in the cell. This may reduce the level of off-target cleavage activity in the target cell. Since delivery of a nucleic acid-guided nuclease as mRNA takes time to be translated into protein, it might be advantageous to deliver the guide nucleic acid several hours following the delivery of the nucleic acid-guided nuclease mRNA, to maximize the level of guide nucleic acid available for interaction with the nucleic acid-guided nuclease protein. In other cases, the nucleic acid-guided nuclease mRNA and guide nucleic acid are delivered concomitantly. In other examples, the guide nucleic acid is delivered sequentially, such as 0.5, 1, 2, 3, 4, or more hours after the nucleic acid-guided nuclease mRNA.
In situations where guide nucleic acid amount is limiting, it may be desirable to introduce a nucleic acid-guided nuclease as mRNA and guide nucleic acid in the form of a DNA expression cassette with a promoter driving the expression of the guide nucleic acid. This way the amount of guide nucleic acid available will be amplified via transcription.
Guide nucleic acid in the form of RNA or encoded on a DNA expression cassette can be introduced into a host cell comprising an nucleic acid-guided nuclease encoded on a vector or chromosome. The guide nucleic acid may be provided in the cassette one or more polynucleotides, which may be contiguous or non-contiguous in the cassette. In specific embodiments, the guide nucleic acid is provided in the cassette as a single contiguous polynucleotide.
A variety of delivery systems can be used to introduce a nucleic acid-guided nuclease (DNA or RNA) and guide nucleic acid (DNA or RNA) into a host cell. These include the use of yeast systems, lipofection systems, microinjection systems, biolistic systems, virosomes, liposomes, immunoliposomes, polycations, lipid:nucleic acid conjugates, virions, artificial virions, viral vectors, electroporation, cell permeable peptides, nanoparticles, nanowires (Shalek et al., Nano Letters, 2012), exosomes. Molecular trojan horses liposomes (Pardridge et al., Cold Spring Harb Protoc; 2010; doi:10.1101/pdb.prot5407) may be used to deliver an engineered nuclease and guide nuclease across the blood brain barrier.
In some embodiments, a editing template is also provided. A editing template may be a component of a vector as described herein, contained in a separate vector, or provided as a separate polynucleotide, such as an oligonucleotide, linear polynucleotide, or synthetic polynucleotide. In some cases, a editing template is on the same polynucleotide as a guide nucleic acid. In some embodiments, a editing template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-guided nuclease as a part of a complex as disclosed herein. A editing template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In some embodiments, the editing template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence. When optimally aligned, a editing template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g. about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, or more nucleotides). In some embodiments, when a editing template sequence and a polynucleotide comprising a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
In many examples, an editing template comprises at least one mutation compared to the target sequence. An editing template can comprise an insertion, deletion, modification, or any combination thereof compared to the target sequence. Examples of some editing templates are described in more detail in a later section.
In some aspects, the invention provides methods comprising delivering one or more polynucleotides, such as or one or more vectors or linear polynucleotides as described herein, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell. In some aspects, the invention further provides cells produced by such methods, and organisms comprising or produced from such cells. In some embodiments, an engineered nuclease in combination with (and optionally complexed with) a guide nucleic acid is delivered to a cell.
Conventional viral and non-viral based gene transfer methods can be used to introduce nucleic acids in cells, such as prokaryotic cells, eukaryotic cells, mammalian cells, or target tissues. Such methods can be used to administer nucleic acids encoding components of an engineered nucleic acid-guided nuclease system to cells in culture, or in a host organism. Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell. For a review of gene therapy procedures, see Anderson, Science 256:808-813 (1992); Nabel & Feigner, TIBTECH 11:211-217 (1993); Mitani & Caskey, TIBTECH 11:162-166 (1993); Dillon. TIBTECH 11:167-175 (1993); Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology 6(10):1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience 8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin 51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiology and Immunology Doerfler and Bohm (eds) (1995); and Yu et al., Gene Therapy 1:13-26 (1994).
Methods of non-viral delivery of nucleic acids include lipofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386, 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™ and Lipofectin™). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of Felgner, WO 91/17424; WO 91/16024. Delivery can be to cells (e.g. in vitro or ex vivo administration) or target tissues (e.g. in vivo administration).
The preparation of lipid:nucleic acid complexes, including targeted liposomes such as immunolipid complexes, is well known to one of skill in the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese et al., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem. 5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gao et al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res. 52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871, 4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).
The use of RNA or DNA viral based systems for the delivery of nucleic acids take advantage of highly evolved processes for targeting a virus to specific cells in culture or in the host and trafficking the viral payload to the nucleus or host cell genome. Viral vectors can be administered directly to cells in culture, patients (in vivo), or they can be used to treat cells in vitro, and the modified cells may optionally be administered to patients (ex vivo). Conventional viral based systems could include retroviral, lentivirus, adenoviral, adeno-associated and herpes simplex virus vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
The tropism of a retrovirus can be altered by incorporating foreign envelope proteins, expanding the potential target population of target cells. Lentiviral vectors are retroviral vectors that are able to transduce or infect non-dividing cells and typically produce high viral titers. Selection of a retroviral gene transfer system would therefore depend on the target tissue. Retroviral vectors are comprised of cis-acting long terminal repeats with packaging capacity for up to 6-10 kb of foreign sequence. The minimum cis-acting LTRs are sufficient for replication and packaging of the vectors, which are then used to integrate the therapeutic gene into the target cell to provide permanent transgene expression. Widely used retroviral vectors include those based upon murine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), Simian Immuno deficiency virus (SIV), human immuno deficiency virus (HIV), and combinations thereof (see, e.g., Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommnerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); PCT/US94/05700).
In applications where transient expression is preferred, adenoviral based systems may be used. Adenoviral based vectors are capable of very high transduction efficiency in many cell types and do not require cell division. With such vectors, high titer and levels of expression have been obtained. This vector can be produced in large quantities in a relatively simple system.
Adeno-associated virus (“AAV”) vectors may also be used to transduce cells with target nucleic acids, e.g., in the in vitro production of nucleic acids and peptides, and for in vivo and ex vivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47 (1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy 5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994). Construction of recombinant AAV vectors are described in a number of publications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989).
In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein. In some embodiments, a cell in transfected in vitro, in culture, or ex vivo. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line.
In some embodiments, a cell transfected with one or more vectors, linear polynucleotides, polypeptides, nucleic acid-protein complexes, or any combination thereof as described herein is used to establish a new cell line comprising one or more transfection-derived sequences. In some embodiments, a cell transiently transfected with the components of an engineered nucleic acid-guided nuclease system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of an engineered nuclease complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence.
In some embodiments, one or more vectors described herein are used to produce a non-human transgenic cell, organism, animal, or plant. In some embodiments, the transgenic animal is a mammal, such as a mouse, rat, or rabbit. Methods for producing transgenic cells, organisms, plants, and animals are known in the art, and generally begin with a method of cell transformation or transfection, such as described herein.
Methods of Use
In the context of formation of an engineered nuclease complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a engineered nuclease complex. A target sequence may comprise any polynucleotide, such as DNA, RNA, or a DNA-RNA hybrid. A target sequence can be located in the nucleus or cytoplasm of a cell. A target sequence can be located in vitro or in a cell-free environment.
Typically, formation of an engineered nuclease complex comprising a guide nucleic acid hybridized to a target sequence and complexed with one or more engineered nucleases as disclosed herein results in cleavage of one or both strands in or near (e.g. within 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, or more base pairs from) the target sequence. Cleavage can occur within a target sequence, 5′ of the target sequence, upstream of a target sequence, 3′ of the target sequence, or downstream of a target sequence.
In some embodiments, one or more vectors driving expression of one or more components of a targetable nuclease system are introduced into a host cell or in vitro such formation of a targetable nuclease complex at one or more target sites. For example, a nucleic acid-guided nuclease and a guide nucleic acid could each be operably linked to separate regulatory elements on separate vectors. Alternatively, two or more of the elements expressed from the same or different regulatory elements, may be combined in a single vector, with one or more additional vectors providing any components of the targetable nuclease system not included in the first vector. Targetable nuclease system elements that are combined in a single vector may be arranged in any suitable orientation, such as one element located 5′ with respect to (“upstream” of) or 3′ with respect to (“downstream” of) a second element. The coding sequence of one element may be located on the same or opposite strand of the coding sequence of a second element, and oriented in the same or opposite direction. In some embodiments, a single promoter drives expression of a transcript encoding a nucleic acid-guided nuclease and one or more guide nucleic acids. In some embodiments, a nucleic acid-guided nuclease and one or more guide nucleic acids are operably linked to and expressed from the same promoter. In other embodiments, one or more guide nucleic acids or polynucleotides encoding the one or more guide nucleic acids are introduced into a cell or in vitro environment already comprising a nucleic acid-guided nuclease or polynucleotide sequence encoding the nucleic acid-guided nuclease.
When multiple different guide sequences are used, a single expression construct may be used to target nuclease activity to multiple different, corresponding target sequences within a cell or in vitro. For example, a single vector may comprise about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, or more guide sequences. In some embodiments, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more such guide-sequence-containing vectors may be provided, and optionally delivered to a cell or in vitro.
Methods and compositions disclosed herein may comprise more than one guide nucleic acid, wherein each guide nucleic acid has a different guide sequence, thereby targeting a different target sequence. In such cases, multiple guide nucleic acids can be using in multiplexing, wherein multiple targets are targeted simultaneously. Additionally or alternatively, the multiple guide nucleic acids are introduced into a population of cells, such that each cell in a population received a different or random guide nucleic acid, thereby targeting multiple different target sequences across a population of cells. In such cases, the collection of subsequently altered cells can be referred to as a library.
Methods and compositions disclosed herein may comprise multiple different nucleic acid-guided nucleases, each with one or more different corresponding guide nucleic acids, thereby allowing targeting of different target sequences by different nucleic acid-guided nucleases. In some such cases, each nucleic acid-guided nuclease can correspond to a distinct plurality of guide nucleic acids, allowing two or more non overlapping, partially overlapping, or completely overlapping multiplexing events.
In some embodiments, the nucleic acid-guided nuclease has DNA cleavage activity or RNA cleavage activity. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands at the location of a target sequence, such as within the target sequence and/or within the complement of the target sequence. In some embodiments, the nucleic acid-guided nuclease directs cleavage of one or both strands within about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, 200, 500, or more base pairs from the first or last nucleotide of a target sequence.
In some embodiments, a nucleic acid-guided nuclease may form a component of an inducible system. The inducible nature of the system would allow for spatiotemporal control of gene editing or gene expression using a form of energy. The form of energy may include but is not limited to electromagnetic radiation, sound energy, chemical energy, light energy, temperature, and thermal energy. Examples of inducible system include tetracycline inducible promoters (Tet-On or Tet-Off), small molecule two-hybrid transcription activations systems (FKBP, ABA, etc), or light inducible systems (Phytochrome, LOV domains, or cryptochorome). In one embodiment, the nucleic acid-guided nuclease may be a part of a Light Inducible Transcriptional Effector (LITE) to direct changes in transcriptional activity in a sequence-specific manner. The components of a light inducible system may include a nucleic acid-guided nuclease, a light-responsive cytochrome heterodimer (e.g. from Arabidopsis thaliana), and a transcriptional activation/repression domain. Further examples of inducible DNA binding proteins and methods for their use are provided in U.S. 61/736,465 and U.S. 61/721,283, which is hereby incorporated by reference in its entirety. An inducible system can be temperature inducible such that the system is turned on or off by increasing or decreasing the temperature. In some temperature inducible systems, increasing the temperature turns the system on. In some temperature inducible systems, increasing the temperature turns the system off.
In some aspects, the invention provides for methods of modifying a target sequence in vitro, or in a prokaryotic or eukaryotic cell, which may be in vivo, ex vivo, or in vitro. In some embodiments, the method comprises sampling a cell or population of cells such as prokaryotic cells, or those from a human or non-human animal or plant (including micro-algae), and modifying the cell or cells. Culturing may occur at any stage in vitro or ex vivo. The cell or cells may even be re-introduced into the host, such as a non-human animal or plant (including micro-algae). For re-introduced cells it is particularly preferred that the cells are stem cells.
In some embodiments, the method comprises allowing a targetable nuclease complex to bind to the target sequence to effect cleavage of said target sequence, thereby modifying the target sequence, wherein the targetable nuclease complex comprises a nucleic acid-guided nuclease complexed with a guide nucleic acid wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within a target polynucleotide.
In some aspects, the invention provides a method of modifying expression of a target polynucleotide in in vitro or in a prokaryotic or eukaryotic cell. In some embodiments, the method comprises allowing an targetable nuclease complex to bind to a target sequence with the target polynucleotide such that said binding results in increased or decreased expression of said target polynucleotide; wherein the targetable nuclease complex comprises an nucleic acid-guided nuclease complexed with a guide nucleic acid, and wherein the guide sequence of the guide nucleic acid is hybridized to a target sequence within said target polynucleotide. Similar considerations apply as above for methods of modifying a target polynucleotide. In fact, these sampling, culturing and re-introduction options apply across the aspects of the present invention.
In some aspects, the invention provides kits containing any one or more of the elements disclosed in the above methods and compositions. Elements may provide individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube. In some embodiments, the kit includes instructions in one or more languages, for example in more than one language.
In some embodiments, a kit comprises one or more reagents for use in a process utilizing one or more of the elements described herein. Reagents may be provided in any suitable container. For example, a kit may provide one or more reaction or storage buffers. Reagents may be provided in a form that is usable in a particular assay, or in a form that requires addition of one or more other components before use (e.g. in concentrate or lyophilized form). A buffer can be any buffer, including but not limited to a sodium carbonate buffer, a sodium bicarbonate buffer, a borate buffer, a Tris buffer, a MOPS buffer, a HEPES buffer, and combinations thereof. In some embodiments, the buffer is alkaline. In some embodiments, the buffer has a pH from about 7 to about 10. In some embodiments, the kit comprises one or more oligonucleotides corresponding to a guide sequence for insertion into a vector so as to operably link the guide sequence and a regulatory element. In some embodiments, the kit comprises a editing template.
In some aspects, the invention provides methods for using one or more elements of a engineered targetable nuclease system. A targetable nuclease complex of the disclosure provides an effective means for modifying a target sequence within a target polynucleotide. A targetable nuclease complex of the disclosure has a wide variety of utility including modifying (e.g., deleting, inserting, translocating, inactivating, activating) a target sequence in a multiplicity of cell types. As such a targetable nuclease complex of the invention has a broad spectrum of applications in, e.g., biochemical pathway optimization, genome-wide studies, genome engineering, gene therapy, drug screening, disease diagnosis, and prognosis. An exemplary targetable nuclease complex comprises a nucleic acid-guided nuclease as disclosed herein complexed with a guide nucleic acid, wherein the guide sequence of the guide nucleic acid can hybridize to a target sequence within the target polynucleotide. A guide nucleic acid can comprise a guide sequence linked to a scaffold sequence. A scaffold sequence can comprise one or more sequence regions with a degree of complementarity such that together they form a secondary structure. In some cases, the one or more sequence regions are comprised or encoded on the same polynucleotide. In some cases, the one or more sequence regions are comprised or encoded on separate polynucleotides.
Provided herein are methods of cleaving a target polynucleotide. The method comprises cleaving a target polynucleotide using a targetable nuclease complex that binds to a target sequence within a target polynucleotide and effect cleavage of said target polynucleotide. Typically, the targetable nuclease complex of the invention, when introduced into a cell, creates a break (e.g., a single or a double strand break) in the target sequence. For example, the method can be used to cleave a target gene in a cell, or to replace a wildtype sequence with a modified sequence.
The break created by the targetable nuclease complex can be repaired by a repair processes such as the error prone non-homologous end joining (NHEJ) pathway, the high fidelity homology-directed repair (HDR), or by recombination pathways. During these repair processes, a editing template can be introduced into the genome sequence. In some methods, the HDR or recombination process is used to modify a target sequence. For example, an editing template comprising a sequence to be integrated flanked by an upstream sequence and a downstream sequence is introduced into a cell. The upstream and downstream sequences share sequence similarity with either side of the site of integration in the chromosome, target vector, or target polynucleotide.
An editing template can be DNA or RNA, e.g., a DNA plasmid, a bacterial artificial chromosome (BAC), a yeast artificial chromosome (YAC), a viral vector, a linear piece of DNA, a PCR fragment, oligonucleotide, synthetic polynucleotide, a naked nucleic acid, or a nucleic acid complexed with a delivery vehicle such as a liposome or poloxamer.
An editing template polynucleotide can comprise a sequence to be integrated (e.g, a mutated gene). A sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function. Sequence to be integrated may be a mutated or variant of an endogenous wildtype sequence. Alternatively, sequence to be integrated may be a wildtype version of an endogenous mutated sequence. Additionally or alternatively, sequenced to be integrated may be a variant or mutated form of an endogenous mutated or variant sequence.
Upstream and downstream sequences in an editing template polynucleotide can be selected to promote recombination between the target polynucleotide of interest and the editing template polynucleotide. The upstream sequence can be a nucleic acid sequence having sequence similarity with the sequence upstream of the targeted site for integration. Similarly, the downstream sequence can be a nucleic acid sequence having similarity with the sequence downstream of the targeted site of integration. The upstream and downstream sequences in an editing template can have 75%, 80%, 85%, 90%, 95%, or 100% sequence identity with the targeted polynucleotide. Preferably, the upstream and downstream sequences in the editing template polynucleotide have about 95%, 96%, 97%, 98%, 99%, or 100% sequence identity with the targeted polynucleotide. In some methods, the upstream and downstream sequences in the editing template polynucleotide have about 99% or 100% sequence identity with the targeted polynucleotide.
An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence has about 15 bp to about 50 bp, about 30 bp to about 100 bp, about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000 bp.
In some methods, the editing template polynucleotide may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The exogenous polynucleotide template of the invention can be constructed using recombinant techniques (see, for example, Green and Sambrook et al., 2014 and Ausubel et al., 2017).
In an exemplary method for modifying a target polynucleotide by integrating an editing template polynucleotide, a double stranded break is introduced into the genome sequence by an engineered nuclease complex, the break can be repaired via homologous recombination using an editing template such that the template is integrated into the target polynucleotide. The presence of a double-stranded break can increase the efficiency of integration of the editing template.
Disclosed herein are methods for modifying expression of a polynucleotide in a cell. Some methods comprise increasing or decreasing expression of a target polynucleotide by using a targetable nuclease complex that binds to the target polynucleotide.
In some methods, a target polynucleotide can be inactivated to effect the modification of the expression in a cell. For example, upon the binding of a targetable nuclease complex to a target sequence in a cell, the target polynucleotide is inactivated such that the sequence is not transcribed, the coded protein is not produced, or the sequence does not function as the wild-type sequence does. For example, a protein or microRNA coding sequence may be inactivated such that the protein is not produced.
In some methods, a control sequence can be inactivated such that it no longer functions as a regulatory sequence. As used herein, “regulatory sequence” can refer to any nucleic acid sequence that effects the transcription, translation, or accessibility of a nucleic acid sequence. Examples of regulatory sequences include, a promoter, a transcription terminator, and an enhancer.
An inactivated target sequence may include a deletion mutation (i.e., deletion of one or more nucleotides), an insertion mutation (i.e., insertion of one or more nucleotides), or a nonsense mutation (i.e., substitution of a single nucleotide for another nucleotide such that a stop codon is introduced). In some methods, the inactivation of a target sequence results in “knockout” of the target sequence.
An altered expression of one or more target polynucleotides associated with a signaling biochemical pathway can be determined by assaying for a difference in the mRNA levels of the corresponding genes between the test model cell and a control cell, when they are contacted with a candidate agent. Alternatively, the differential expression of the sequences associated with a signaling biochemical pathway is determined by detecting a difference in the level of the encoded polypeptide or gene product.
To assay for an agent-induced alteration in the level of mRNA transcripts or corresponding polynucleotides, nucleic acid contained in a sample is first extracted according to standard methods in the art. For instance, mRNA can be isolated using various lytic enzymes or chemical solutions according to the procedures set forth in Green and Sambrook (2014), or extracted by nucleic-acid-binding resins following the accompanying instructions provided by the manufacturers. The mRNA contained in the extracted nucleic acid sample is then detected by amplification procedures or conventional hybridization assays (e.g. Northern blot analysis) according to methods widely known in the art or based on the methods exemplified herein.
For purpose of this invention, amplification means any method employing a primer and a polymerase capable of replicating a target sequence with reasonable fidelity. Amplification may be carried out by natural or recombinant DNA polymerases such as TaqGold™, T7 DNA polymerase, Klenow fragment of E. coli DNA polymerase, and reverse transcriptase. A preferred amplification method is PCR. In particular, the isolated RNA can be subjected to a reverse transcription assay that is coupled with a quantitative polymerase chain reaction (RT-PCR) in order to quantify the expression level of a sequence associated with a signaling biochemical pathway.
Detection of the gene expression level can be conducted in real time in an amplification assay. In one aspect, the amplified products can be directly visualized with fluorescent DNA-binding agents including but not limited to DNA intercalators and DNA groove binders. Because the amount of the intercalators incorporated into the double-stranded DNA molecules is typically proportional to the amount of the amplified DNA products, one can conveniently determine the amount of the amplified products by quantifying the fluorescence of the intercalated dye using conventional optical systems in the art. DNA-binding dye suitable for this application include SYBR green, SYBR blue, DAPI, propidium iodine, Hoeste, SYBR gold, ethidium bromide, acridines, proflavine, acridine orange, acriflavine, fluorcoumanin, ellipticine, daunomycin, chloroquine, distamycin D, chromomycin, homidium, mithramycin, ruthenium polypyridyls, anthramycin, and the like.
In another aspect, other fluorescent labels such as sequence specific probes can be employed in the amplification reaction to facilitate the detection and quantification of the amplified products. Probe-based quantitative amplification relies on the sequence-specific detection of a desired amplified product. It utilizes fluorescent, target-specific probes (e.g., TaqMan™ probes) resulting in increased specificity and sensitivity. Methods for performing probe-based quantitative amplification are well established in the art and are taught in U.S. Pat. No. 5,210,015.
In yet another aspect, conventional hybridization assays using hybridization probes that share sequence homology with sequences associated with a signaling biochemical pathway can be performed. Typically, probes are allowed to form stable complexes with the sequences associated with a signaling biochemical pathway contained within the biological sample derived from the test subject in a hybridization reaction. It will be appreciated by one of skill in the art that where antisense is used as the probe nucleic acid, the target polynucleotides provided in the sample are chosen to be complementary to sequences of the antisense nucleic acids. Conversely, where the nucleotide probe is a sense nucleic acid, the target polynucleotide is selected to be complementary to sequences of the sense nucleic acid.
Hybridization can be performed under conditions of various stringency, for instance as described herein. Suitable hybridization conditions for the practice of the present invention are such that the recognition interaction between the probe and sequences associated with a signaling biochemical pathway is both sufficiently specific and sufficiently stable. Conditions that increase the stringency of a hybridization reaction are widely known and published in the art. See, for example, (Green and Sambrook, et al., (2014); Nonradioactive in Situ Hybridization Application Manual, Boehringer Mannheim, second edition). The hybridization assay can be formed using probes immobilized on any solid support, including but are not limited to nitrocellulose, glass, silicon, and a variety of gene arrays. A preferred hybridization assay is conducted on high-density gene chips as described in U.S. Pat. No. 5,445,934.
For a convenient detection of the probe-target complexes formed during the hybridization assay, the nucleotide probes are conjugated to a detectable label. Detectable labels suitable for use in the present invention include any composition detectable by photochemical, biochemical, spectroscopic, immunochemical, electrical, optical or chemical means. A wide variety of appropriate detectable labels are known in the art, which include fluorescent or chemiluminescent labels, radioactive isotope labels, enzymatic or other ligands. In preferred embodiments, one will likely desire to employ a fluorescent label or an enzyme tag, such as digoxigenin, .beta.-galactosidase, urease, alkaline phosphatase or peroxidase, avidin/biotin complex.
Detection methods used to detect or quantify the hybridization intensity will typically depend upon the label selected above. For example, radiolabels may be detected using photographic film or a phosphoimager. Fluorescent markers may be detected and quantified using a photodetector to detect emitted light. Enzymatic labels are typically detected by providing the enzyme with a substrate and measuring the reaction product produced by the action of the enzyme on the substrate; and finally colorimetric labels are detected by simply visualizing the colored label.
An agent-induced change in expression of sequences associated with a signaling biochemical pathway can also be determined by examining the corresponding gene products. Determining the protein level typically involves a) contacting the protein contained in a biological sample with an agent that specifically bind to a protein associated with a signaling biochemical pathway; and (b) identifying any agent:protein complex so formed. In one aspect of this embodiment, the agent that specifically binds a protein associated with a signaling biochemical pathway is an antibody, preferably a monoclonal antibody.
The reaction can be performed by contacting the agent with a sample of the proteins associated with a signaling biochemical pathway derived from the test samples under conditions that will allow a complex to form between the agent and the proteins associated with a signaling biochemical pathway. The formation of the complex can be detected directly or indirectly according to standard procedures in the art. In the direct detection method, the agents are supplied with a detectable label and unreacted agents may be removed from the complex; the amount of remaining label thereby indicating the amount of complex formed. For such method, it is preferable to select labels that remain attached to the agents even during stringent washing conditions. It is preferable that the label does not interfere with the binding reaction. In the alternative, an indirect detection procedure may use an agent that contains a label introduced either chemically or enzymatically. A desirable label generally does not interfere with binding or the stability of the resulting agent:polypeptide complex. However, the label is typically designed to be accessible to an antibody for an effective binding and hence generating a detectable signal.
A wide variety of labels suitable for detecting protein levels are known in the art. Non-limiting examples include radioisotopes, enzymes, colloidal metals, fluorescent compounds, bioluminescent compounds, and chemiluminescent compounds.
The amount of agent:polypeptide complexes formed during the binding reaction can be quantified by standard quantitative assays. As illustrated above, the formation of agent:polypeptide complex can be measured directly by the amount of label remained at the site of binding. In an alternative, the protein associated with a signaling biochemical pathway is tested for its ability to compete with a labeled analog for binding sites on the specific agent. In this competitive assay, the amount of label captured is inversely proportional to the amount of protein sequences associated with a signaling biochemical pathway present in a test sample.
A number of techniques for protein analysis based on the general principles outlined above are available in the art. They include but are not limited to radioimmunoassays, ELISA (enzyme linked immunoradiometric assays), “sandwich” immunoassays, immunoradiometric assays, in situ immunoassays (using e.g., colloidal gold, enzyme or radioisotope labels), western blot analysis, immunoprecipitation assays, immunofluorescent assays, and SDS-PAGE.
Antibodies that specifically recognize or bind to proteins associated with a signaling biochemical pathway are preferable for conducting the aforementioned protein analyses. Where desired, antibodies that recognize a specific type of post-translational modifications (e.g., signaling biochemical pathway inducible modifications) can be used. Post-translational modifications include but are not limited to glycosylation, lipidation, acetylation, and phosphorylation. These antibodies may be purchased from commercial vendors. For example, anti-phosphotyrosine antibodies that specifically recognize tyrosine-phosphorylated proteins are available from a number of vendors including Invitrogen and Perkin Elmer. Anti-phosphotyrosine antibodies are particularly useful in detecting proteins that are differentially phosphorylated on their tyrosine residues in response to an ER stress. Such proteins include but are not limited to eukaryotic translation initiation factor 2 alpha (eIF-2.alpha.). Alternatively, these antibodies can be generated using conventional polyclonal or monoclonal antibody technologies by immunizing a host animal or an antibody-producing cell with a target protein that exhibits the desired post-translational modification.
In practicing a subject method, it may be desirable to discern the expression pattern of an protein associated with a signaling biochemical pathway in different bodily tissue, in different cell types, and/or in different subcellular structures. These studies can be performed with the use of tissue-specific, cell-specific or subcellular structure specific antibodies capable of binding to protein markers that are preferentially expressed in certain tissues, cell types, or subcellular structures.
An altered expression of a gene associated with a signaling biochemical pathway can also be determined by examining a change in activity of the gene product relative to a control cell. The assay for an agent-induced change in the activity of a protein associated with a signaling biochemical pathway will dependent on the biological activity and/or the signal transduction pathway that is under investigation. For example, where the protein is a kinase, a change in its ability to phosphorylate the downstream substrate(s) can be determined by a variety of assays known in the art. Representative assays include but are not limited to immunoblotting and immunoprecipitation with antibodies such as anti-phosphotyrosine antibodies that recognize phosphorylated proteins. In addition, kinase activity can be detected by high throughput chemiluminescent assays such as AlphaScreen™ (available from Perkin Elmer) and eTag™ assay (Chan-Hui, et al. (2003) Clinical Immunology 111: 162-174).
Where the protein associated with a signaling biochemical pathway is part of a signaling cascade leading to a fluctuation of intracellular pH condition, pH sensitive molecules such as fluorescent pH dyes can be used as the reporter molecules. In another example where the protein associated with a signaling biochemical pathway is an ion channel, fluctuations in membrane potential and/or intracellular ion concentration can be monitored. A number of commercial kits and high-throughput devices are particularly suited for a rapid and robust screening for modulators of ion channels. Representative instruments include FLIPR™ (Molecular Devices, Inc.) and VIPR (Aurora Biosciences). These instruments are capable of detecting reactions in over 1000 sample wells of a microplate simultaneously, and providing real-time measurement and functional data within a second or even a minisecond.
In practicing any of the methods disclosed herein, a suitable vector can be introduced to a cell, tissue, organism, or an embryo via one or more methods known in the art, including without limitation, microinjection, electroporation, sonoporation, biolistics, calcium phosphate-mediated transfection, cationic transfection, liposome transfection, dendrimer transfection, heat shock transfection, nucleofection transfection, magnetofection, lipofection, impalefection, optical transfection, proprietary agent-enhanced uptake of nucleic acids, and delivery via liposomes, immunoliposomes, virosomes, or artificial virions. In some methods, the vector is introduced into an embryo by microinjection. The vector or vectors may be microinjected into the nucleus or the cytoplasm of the embryo. In some methods, the vector or vectors may be introduced into a cell by nucleofection.
A target polynucleotide of a targetable nuclease complex can be any polynucleotide endogenous or exogenous to the host cell. For example, the target polynucleotide can be a polynucleotide residing in the nucleus of the eukaryotic cell, the genome of a prokaryotic cell, or an extrachromosomal vector of a host cell. The target polynucleotide can be a sequence coding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or a junk DNA).
Examples of target polynucleotides include a sequence associated with a signaling biochemical pathway, e.g., a signaling biochemical pathway-associated gene or polynucleotide. Examples of target polynucleotides include a disease associated gene or polynucleotide. A “disease-associated” gene or polynucleotide refers to any gene or polynucleotide which is yielding transcription or translation products at an abnormal level or in an abnormal form in cells derived from a disease-affected tissues compared with tissues or cells of a non-disease control. It may be a gene that becomes expressed at an abnormally high level; it may be a gene that becomes expressed at an abnormally low level, where the altered expression correlates with the occurrence and/or progression of the disease. A disease-associated gene also refers to a gene possessing mutation(s) or genetic variation that is directly responsible or is in linkage disequilibrium with a gene(s) that is responsible for the etiology of a disease. The transcribed or translated products may be known or unknown, and may be at a normal or abnormal level.
Embodiments of the invention also relate to methods and compositions related to knocking out genes, editing genes, altering genes, amplifying genes, and repairing particular mutations. Altering genes may also mean the epigenetic manipulation of a target sequence. This may be the chromatin state of a target sequence, such as by modification of the methylation state of the target sequence (i.e. addition or removal of methylation or methylation patterns or CpG islands), histone modification, increasing or reducing accessibility to the target sequence, or by promoting 3D folding. It will be appreciated that where reference is made to a method of modifying a cell, organism, or mammal including human or a non-human mammal or organism by manipulation of a target sequence in a genomic locus of interest, this may apply to the organism (or mammal) as a whole or just a single cell or population of cells from that organism (if the organism is multicellular). In the case of humans, for instance, Applicants envisage, inter alia, a single cell or a population of cells and these may preferably be modified ex vivo and then re-introduced. In this case, a biopsy or other tissue or biological fluid sample may be necessary. Stem cells are also particularly preferred in this regard. But, of course, in vivo embodiments are also envisaged. And the invention is especially advantageous as to HSCs.
The functionality of a targetable nuclease complex can be assessed by any suitable assay. For example, the components of a targetable nuclease system sufficient to form a targetable nuclease complex, including a guide nucleic acid and nucleic acid-guided nuclease, can be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of the engineered nuclease system, followed by an assessment of preferential cleavage within the target sequence. Similarly, cleavage of a target sequence may be evaluated in a test tube by providing the target sequence and components of a targetable nuclease complex. Other assays are possible, and will occur to those skilled in the art. A guide sequence can be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome.
Editing Cassette
Disclosed herein are compositions and methods for editing a target polynucleotide sequence. Such compositions include polynucleotides containing one or more components of targetable nuclease system. Polynucleotide sequences for use in these methods can be referred to as editing cassettes.
An editing cassette can comprise one or more primer sites. Primer sites can be used to amplify an editing cassette by using oligonucleotide primers comprising reverse complementary sequences that can hybridize to the one or more primer sites. An editing cassette can comprise two or more primer times. Sometimes, an editing cassette comprises a primer site on each end of the editing cassette, said primer sites flanking one or more of the other components of the editing cassette. Primer sites can be approximately 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or more nucleotides in length.
An editing cassette can comprise an editing template as disclosed herein. An editing cassette can comprise an editing sequence. An editing sequence can be homologous to a target sequence. An editing sequence can comprise at least one mutation relative to a target sequence. An editing sequence often comprises homology region (or homology arms) flanking at least one mutation relative to a target sequence, such that the flanking homology regions facilitate homologous recombination of the editing sequence into a target sequence. An editing sequence can comprise an editing template as disclosed herein. For example, the editing sequence can comprise at least one mutation relative to a target sequence including one or more PAM mutations that mutate or delete a PAM site. An editing sequence can comprise one or more mutations in a codon or non-coding sequence relative to a non-editing target site.
A PAM mutation can be a silent mutation. A silent mutation can be a change to at least one nucleotide of a codon relative to the original codon that does not change the amino acid encoded by the original codon. A silent mutation can be a change to a nucleotide within a non-coding region, such as an intron, 5′ untranslated region, 3′ untranslated region, or other non-coding region.
A PAM mutation can be a non-silent mutation. Non-silent mutations can include a missense mutation. A missense mutation can be when a change to at least one nucleotide of a codon relative to the original codon that changes the amino acid encoded by the original codon. Missense mutations can occur within an exon, open reading frame, or other coding region.
An editing sequence can comprise at least one mutation relative to a target sequence. A mutation can be a silent mutation or non-silent mutation, such as a missense mutation. A mutation can include an insertion of one or more nucleotides or base pairs. A mutation can include a deletion of one or more nucleotides or base pairs. A mutation can include a substitution of one or more nucleotides or base pairs for a different one or more nucleotides or base pairs. Inserted or substituted sequences can include exogenous or heterologous sequences.
An editing cassette can comprise a polynucleotide encoding a guide nucleic acid sequence. In some cases, the guide nucleic acid sequence is optionally operably linked to a promoter. A guide nucleic acid sequence can comprise a scaffold sequence and a guide sequence as described herein.
An editing cassette can comprise a barcode. A barcode can be a unique DNA sequence that corresponds to the editing sequence such that the barcode can identify the one or more mutations of the corresponding editing sequence. In some examples, the barcode is 15 nucleotides. The barcode can comprise less than 10, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 88, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, or more than 200 nucleotides. A barcode can be a non-naturally occurring sequence. An editing cassette comprising a barcode can be a non-naturally occurring sequence.
An editing cassette can comprise one or more of an editing sequence and a polynucleotide encoding a guide nucleic acid optionally operably linked to a promoter, wherein the editing cassette and guide nucleic acid sequence are flanked by primer sites. An editing cassette can further comprise a barcode.
An example of an editing cassette is depicted in FIG. 3. Each editing cassette can be designed to edit a site in a target sequence Sites to be targeted can be coding regions, non-coding regions, functionally neutral sites, or they can be a screenable or selectable marker gene. Homology regions within the editing sequence flank the one or more mutations of the editing cassette and can be inserted into the target sequence by recombination. Recombination can comprise DNA cleavage, such as by an nucleic acid-guided nuclease, and repair via homologous recombination.
Editing cassettes can be generated by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
Trackable sequences, such as barcodes or recorder sequences, can be designed in silico via standard code with a degenerate mutation at the target codon. The degenerate mutation can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleic acid residues. In some examples, the degenerate mutations can comprise 15 nucleic acid residues (N15).
Homology arms can be added to an editing sequence to allow incorporation of the editing sequence into the desired location via homologous recombination or homology-driven repair. Homology arms can be added by synthesis, in vitro assembly, PCR, or other known methods in the art. For example, chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof. A homology arm can be added to both ends of a barcode, recorder sequence, and/or editing sequence, thereby flanking the sequence with two distinct homology arms, for example, a 5′ homology arm and a 3′ homology arm.
A homology arm can comprise sequence homologous to a target sequence. A homology arm can comprise sequence homologous to sequence adjacent to a target sequence. A homology arm can comprise sequence homologous to sequence upstream or downstream of a target sequence. A homology arm can comprise sequence homologous to sequence within the same gene or open reading frame as a target sequence. A homology arm can comprise sequence homologous to sequence upstream or downstream of a gene or open reading frame the target sequence is within. A homology arm can comprise sequence homologous to a 5′ UTR or 3′ UTR of a gene or open reading frame within which is a target sequence. A homology arm can comprise sequence homologous to a different gene, open reading frame, promoter, terminator, or nucleic acid sequence than that which the target sequence is within.
The same 5′ and 3′ homology arms can be added to a plurality of distinct editing sequences, thereby generating a library of unique editing sequences that each have the same targeted insertion site. The same 5′ and 3′ homology arms can be added to a plurality of distinct editing templates, thereby generating a library of unique editing templates that each have the same targeted insertion site. In alternative examples, different or a variety of 5′ or 3′ homology arms can be added to a plurality of editing sequences or editing templates.
A barcode library or recorder sequence library comprising flanking homology arms can be cloned into a vector backbone. In some examples, the barcode comprising flanking homology arms are cloned into an editing cassette. Cloning can occur by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
An editing sequence library comprising flanking homology arms can be cloned into a vector backbone. In some examples, the editing sequence and homology arms are cloned into an editing cassette. Editing cassettes can, in some cases, further comprise a nucleic acid sequence encoding a guide nucleic acid or gRNA engineered to target the desired site of editing sequence insertion, e.g. the target sequence. Editing cassettes can, in some cases, further comprise a barcode or recorder sequence. Cloning can occur by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
Gene-wide or genome-wide editing libraries can be cloned into a vector backbone. A barcode or recorder sequence library can be inserted or assembled into a second site to generate competent trackable plasmids that can embed the recording barcode at a fixed locus while integrating the editing libraries at a wide variety of user defined sites. Cloning can occur by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
A guide nucleic acid or sequence encoding the same can be assembled or inserted into a vector backbone first, followed by insertion of an editing sequence and/or cassette. In other cases, an editing sequence and/or cassette can be inserted or assembled into a vector backbone first, followed by insertion of a guide nucleic acid or sequence encoding the same. In other cases, guide nucleic acid or sequence encoding the same and an editing sequence and/or cassette are simultaneous inserted or assembled into a vector. A recorder sequence or barcode can be inserted before or after any of these steps. In other words, it should be understood that there are many possible permutations to the order in which elements of the disclosure are assembled. The vector can be linear or circular and can be generated by chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, overlapping oligo extension, in vitro assembly, in vitro oligo assembly, PCR, traditional ligation-based cloning, other known methods in the art, or any combination thereof.
A nucleic acid molecule can be synthesized which comprises one or more elements disclosed herein. A nucleic acid molecule can be synthesized that comprises an editing cassette. A nucleic acid molecule can be synthesized that comprises a guide nucleic acid. A nucleic acid molecule can be synthesized that comprises a recorder cassette. A nucleic acid molecule can be synthesized that comprises a barcode. A nucleic acid molecule can be synthesized that comprises a homology arm. A nucleic acid molecule can be synthesized that comprises an editing cassette and a guide nucleic acid. A nucleic acid molecule can be synthesized that comprises an editing cassette and a barcode. A nucleic acid molecule can be synthesized that comprises an editing cassette, a guide nucleic acid, and a recorder cassette. A nucleic acid molecule can be synthesized that comprises an editing cassette, a recorder cassette, and two guide nucleic acids. A nucleic acid molecule can be synthesized that comprises a recorder cassette and a guide nucleic acid. In any of these cases, the guide nucleic acid can optionally be operably linked to a promoter. In any of these cases, the nucleic acid molecule can further include one or more barcodes.
Synthesis can occur by any nucleic acid synthesis method known in the art. Synthesis can occur by enzymatic nucleic acid synthesis. Synthesis can occur by chemical synthesis. Synthesis can occur by array-based synthesis. Synthesis can occur by solid-phase synthesis or phosphoramidite methods. Synthesis can occur by column or multi-well methods. Synthesized nucleic acid molecules can be non-naturally occurring nucleic acid molecules.
Software and automation methods can be used for multiplex synthesis and generation. For example, software and automation can be used to create 10, 102, 103, 104, 105, 106, or more synthesized polynucleotides, cassettes, or plasmids. An automation method can generate desired sequences and libraries in rapid fashion that can be processed through a workflow with minimal steps to produce precisely defined libraries, such as gene-wide or genome-wide editing libraries.
Polynucleotides or libraries can be generated which comprise two or more nucleic acid molecules or plasmids comprising any combination disclosed herein of recorder sequence, editing sequence, guide nucleic acid, and optional barcode, including combinations of one or more of any of the previously mentioned elements. For example, such a library can comprise at least 2, 3, 4, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 104, 105, 106, 107, 108, 109, 1010, or more nucleic acid molecules or plasmids of the present disclosure. It should be understood that such a library can include any number of nucleic acid molecules or plasmids, even if the specific number is not explicit listed above.
Trackable plasmid libraries or nucleic acid molecule libraries can be sequenced in order to determine the recorder sequence and editing sequence pair that is comprised on each trackable plasmid. In other cases, a known recorder sequence is paired with a known editing sequence during the library generation process. Other methods of determining the association between a recorder sequence and editing sequence comprised on a common nucleic acid molecule or plasmid are envisioned such that the editing sequence can be identified by identification or sequencing of the recorder sequence.
Methods and compositions for tracking edited episomal libraries that are shuttled between E. coli and other organisms/cell lines are provided herein. The libraries can be comprised on plasmids, Bacterial artificial chromosomes (BACs), Yeast artificial chromosomes (YACs), synthetic chromosomes, or viral or phage genomes. These methods and compositions can be used to generate portable barcoded libraries in host organisms, such as E. coli. Library generation in such organisms can offer the advantage of established techniques for performing homologous recombination. Barcoded plasmid libraries can be deep-sequenced at one site to track mutational diversity targeted across the remaining portions of the plasmid allowing dramatic improvements in the depth of library coverage.
Any nucleic acid molecule disclosed herein can be an isolated nucleic acid. Isolated nucleic acids may be made by any method known in the art, for example using standard recombinant methods, assembly methods, synthesis techniques, or combinations thereof. In some embodiments, the nucleic acids may be cloned, amplified, assembled, or otherwise constructed.
Isolated nucleic acids may be obtained from cellular, bacterial, or other sources using any number of cloning methodologies known in the art. In some embodiments, oligonucleotide probes which selectively hybridize, under stringent conditions, to other oligonucleotides or to the nucleic acids of an organism or cell can be used to isolate or identify an isolated nucleic acid.
Cellular genomic DNA, RNA, or cDNA may be screened for the presence of an identified genetic element of interest using a probe based upon one or more sequences. Various degrees of stringency of hybridization may be employed in the assay.
High stringency conditions for nucleic acid hybridization are well known in the art. For example, conditions may comprise low salt and/or high temperature conditions, such as provided by about 0.02 M to about 0.15 M NaCl at temperatures of about 50° C. to about 70° C. It is understood that the temperature and ionic strength of a desired stringency are determined in part by the length of the particular nucleic acid(s), the length and nucleotide content of the target sequence(s), the charge composition of the nucleic acid(s), and by the presence or concentration of formamide, tetramethylammonium chloride or other solvent(s) in a hybridization mixture. Nucleic acids may be completely complementary to a target sequence or may exhibit one or more mismatches.
Nucleic acids of interest may also be amplified using a variety of known amplification techniques. For instance, polymerase chain reaction (PCR) technology may be used to amplify target sequences directly from DNA, RNA, or cDNA. PCR and other in vitro amplification methods may also be useful, for example, to clone nucleic acid sequences, to make nucleic acids to use as probes for detecting the presence of a target nucleic acid in samples, for nucleic acid sequencing, or for other purposes.
Isolated nucleic acids may be prepared by direct chemical synthesis by methods such as the phosphotriester method, or using an automated synthesizer. Chemical synthesis generally produces a single stranded oligonucleotide. This may be converted into double stranded DNA by hybridization with a complementary sequence or by polymerization with a DNA polymerase using the single strand as a template.
Recorder
In some example, two editing cassettes can be used together to track a genetic engineering step. For example, one editing cassette can comprise an editing template and an encoded guide nucleic acid, and a second editing cassette, referred to as a recorder cassette, can comprise an editing template comprising a recorder sequence and an encoded nucleic acid which has a distinct guide sequence compared to that of the first editing cassette. In such cases, the editing sequence and the recorder sequence can be inserted into separate target sequences and determined by their corresponding guide nucleic acids. A recorder sequence can comprise a barcode, trackable or traceable sequence, and/or a regulatory element operable with a screenable or selectable marker.
Through a multiplexed cloning approach, the recorder cassette can be covalently coupled to at least one editing cassette in a plasmid (e.g., FIG. 17A, green cassette) to generate plasmid libraries that have a unique recorder and editing cassette combination. This library can be sequenced to generate the recorder/edit mapping and used to track editing libraries across large segments of the target DNA (e.g., FIG. 17C). Recorder and editing sequences can be comprised on the same cassette, in which case they are both incorporated into the target nucleic acid sequence, such as a genome or plasmid, by the same recombination event. In other examples, the recorder and editing sequences can be comprised on separate cassettes within the same plasmid, in which case the recorder and editing sequences are incorporated into the target nucleic acid sequence by separate recombination events, either simultaneously or sequentially.
Methods are provided herein for combining multiplex oligonucleotide synthesis with recombineering, to create libraries of specifically designed and trackable mutations. Screens and/or selections followed by high-throughput sequencing and/or barcode microarray methods can allow for rapid mapping of mutations leading to a phenotype of interest.
Methods and compositions disclosed herein can be used to simultaneously engineer and track engineering events in a target nucleic acid sequence.
Such plasmids can be generated using in vitro assembly or cloning techniques. For example, the plasmids can be generated using chemical synthesis, Gibson assembly, SLIC, CPEC, PCA, ligation-free cloning, other in vitro oligo assembly techniques, traditional ligation-based cloning, or any combination thereof.
Such plasmids can comprise at least one recording sequence, such as a barcode, and at least one editing sequence. In most cases, the recording sequence is used to record and track engineering events. Each editing sequence can be used to incorporate a desired edit into a target nucleic acid sequence. The desired edit can include insertion, deletion, substitution, or alteration of the target nucleic acid sequence. In some examples, the one or more recording sequence and editing sequences are comprised on a single cassette comprised within the plasmid such that they are incorporated into the target nucleic acid sequence by the same engineering event. In other examples, the recording and editing sequences are comprised on separate cassettes within the plasmid such that they are each incorporated into the target nucleic acid by distinct engineering events. In some examples, the plasmid comprises two or more editing sequences. For example, one editing sequence can be used to alter or silence a PAM sequence while a second editing sequence can be used to incorporate a mutation into a distinct sequence.
Recorder sequences can be inserted into a site separated from the editing sequence insertion site. The inserted recorder sequence can be separated from the editing sequence by 1 bp to 1 Mbp. For example, the separation distance can be about 1 bp, 10 bp, 50 bp, 100 bp, 500 bp, 1 kp, 2 kb, 5 kb, 10 kb, or greater. The separation distance can be any discrete integer between 1 bp and 10 Mbp. In some examples, the maximum distance of separation depends on the size of the target nucleic acid or genome.
Recorder sequences can be inserted adjacent to editing sequences, or within proximity to the editing sequence. For example, the recorder sequence can be inserted outside of the open reading frame within which the editing sequence is inserted. Recorder sequence can be inserted into an untranslated region adjacent to an open reading frame within which an editing sequence has been inserted. The recorder sequence can be inserted into a functionally neutral or non-functional site. The recorder sequence can be inserted into a screenable or selectable marker gene.
In some examples, the target nucleic acid sequence is comprised within a genome, artificial chromosome, synthetic chromosome, or episomal plasmid. In various examples, the target nucleic acid sequence can be in vitro or in vivo. When the target nucleic acid sequence is in vivo, the plasmid can be introduced into the host organisms by transformation, transfection, conjugation, biolistics, nanoparticles, cell-permeable technologies, or other known methods for DNA delivery, or any combination thereof. In such examples, the host organism can be a eukaryote, prokaryote, bacterium, archaea, yeast, or other fungi.
The engineering event can comprise recombineering, non-homologous end joining, homologous recombination, or homology-driven repair. In some examples, the engineering event is performed in vitro or in vivo.
The methods described herein can be carried out in any type of cell in which a targetable nuclease system can function (e.g., target and cleave DNA), including prokaryotic and eukaryotic cells. In some embodiments the cell is a bacterial cell, such as Escherichia spp. (e.g., E. coli). In other embodiments, the cell is a fungal cell, such as a yeast cell, e.g., Saccharomyces spp. In other embodiments, the cell is an algal cell, a plant cell, an insect cell, or a mammalian cell, including a human cell.
In some examples, the cell is a recombinant organism. For example, the cell can comprise a non-native targetable nuclease system. Additionally or alternatively, the cell can comprise recombination system machinery. Such recombination systems can include lambda red recombination system, Cre/Lox, attB/attP, or other integrase systems. Where appropriate, the plasmid can have the complementary components or machinery required for the selected recombination system to work correctly and efficiently.
Method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing cassette; (c) obtaining viable cells; and (d) sequencing the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
A method for genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette comprising a PAM mutation as disclosed herein and at least one guide nucleic acid into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing cassette, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; (c) obtaining viable cells; and (d) sequencing the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon.
Method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, at least one recorder cassette, and at least two guide nucleic acids into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage and incorporation of the editing and recorder cassettes; (c) obtaining viable cells; and (d) sequencing the recorder sequence of the target DNA molecule in at least one cell of the second population of cells to identify the mutation of at least one codon.
In some examples where the plasmid comprises a second editing sequence designed to silence a PAM, a method for trackable genome editing can comprise: (a) introducing a vector that encodes at least one editing cassette, a recorder cassette, and at least two guide nucleic acids into a first population of cells, thereby producing a second population of cells comprising the vector; (b) maintaining the second population of cells under conditions in which a nucleic acid-guided nuclease is expressed or maintained, wherein the nucleic acid-guided nuclease is encoded on the vector, a second vector, on the genome of cells of the second population of cells, or otherwise introduced into the cell, resulting in DNA cleavage, incorporation of the editing and recorder cassettes, and death of cells of the second population of cells that do not comprise the PAM mutation, whereas cells of the second population of cells that comprise the PAM mutation are viable; (c) obtaining viable cells; and (d) sequencing the recorder sequence of the target DNA in at least one cell of the second population of cells to identify the mutation of at least one codon.
In some examples transformation efficiency is determined by using a non-targeting control guide nucleic acid, which allows for validation of the recombineering procedure and CFU/ng calculations. In some cases, absolute efficient is obtained by counting the total number of colonies on each transformation plate, for example, by counting both red and white colonies from a galK control. In some examples, relative efficiency is calculated by the total number of successful transformants (for example, white colonies) out of all colonies from a control (for example, galK control).
The methods of the disclosure can provide, for example, greater than 1000× improvements in the efficiency, scale, cost of generating a combinatorial library, and/or precision of such library generation.
The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater improvements in the efficiency of generating genomic or combinatorial libraries.
The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater improvements in the scale of generating genomic or combinatorial libraries.
The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater decrease in the cost of generating genomic or combinatorial libraries.
The methods of the disclosure can provide, for example, greater than: 10×, 50×, 100×, 200×, 300×, 400×, 500×, 600×, 700×, 800×, 900×, 1000×, 1100×, 1200×, 1300×, 1400×, 1500×, 1600×, 1700×, 1800×, 1900×, 2000×, or greater improvements in the precision of genomic or combinatorial library generation.
Recursive Tracking for Combinatorial Engineering
Disclosed herein are methods and compositions for iterative rounds of engineering. Disclosed herein are recursive engineering strategies that allow implementation of CREATE recording at the single cell level through several serial engineering cycles (e.g., FIG. 18 and FIG. 19). These disclosed methods and compositions can enable search-based technologies that can effectively construct and explore complex genotypic space. The terms recursive and iterative can be used interchangeably.
Combinatorial engineering methods can comprise multiple rounds of engineering. Methods disclosed herein can comprise 2 or more rounds of engineering. For example, a method can comprise 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, or more than 30 rounds of engineering.
In some examples, during each round of engineering a new recorder sequence, such as a barcode, is incorporated at the same locus in nearby sites (e.g., FIG. 18, green bars or FIG. 19, black bars) such that following multiple engineering cycles to construct combinatorial diversity throughout the genome (e.g., FIG. 18, green bars or FIG. 19, grey bars) a simple PCR of the recording locus can be used to reconstruct each combinatorial genotype or to confirm that the engineered edit from each round has been incorporated into the target site.
Disclosed herein are methods for selecting for successive rounds of engineering. Selection can occur by a PAM mutation incorporated by an editing cassette. Selection can occur by a PAM mutation incorporated by a recorder cassette. Selection can occur using a screenable, selectable, or counter-selectable marker. Selection can occur by targeting a site for editing or recording that was incorporated by a prior round of engineering, thereby selecting for variants that successfully incorporated edits and recorder sequences from both rounds or all prior rounds of engineering.
Quantitation of these genotypes can be used for understanding combinatorial mutational effects on large populations and investigation of important biological phenomena such as epistasis.
Serial editing and combinatorial tracking can be implemented using recursive vector systems as disclosed herein. These recursive vector systems can be used to move rapidly through the transformation procedure. In some examples, these systems consist of two or more plasmids containing orthogonal replication origins, antibiotic markers, and an encoded guide nucleic acids. The encoded guide nucleic acid in each vector can be designed to target one of the other resistance markers for destruction by nucleic acid-guided nuclease-mediated cleavage. These systems can be used, in some examples, to perform transformations in which the antibiotic selection pressure is switched to remove the previous plasmid and drive enrichment of the next round of engineered genomes. Two or more passages through the transformation loop can be performed, or in other words, multiple rounds of engineering can be performed. Introducing the requisite recording cassettes and editing cassettes into recursive vectors as disclosed herein can be used for simultaneous genome editing and plasmid curing in each transformation step with high efficiencies.
In some examples, the recursive vector system disclosed herein comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, or more than 10 unique plasmids. In some examples, the recursive vector system can use a particular plasmid more than once as long as a distinct plasmid is used in the previous round and in the subsequent round.
Recursive methods and compositions disclosed herein can be used to restore function to a selectable or screenable element in a targeted genome or plasmid. The selectable or screenable element can include an antibiotic resistance gene, a fluorescent gene, a unique DNA sequence or watermark, or other known reporter, screenable, or selectable gene. In some examples, each successive round of engineering can incorporate a fragment of the selectable or screenable element, such that at the end of the engineering rounds, the entire selectable or screenable element has been incorporated into the target genome or plasmid. In such examples, only those genome or plasmids which have successfully incorporated all of the fragments, and therefore all of the desired corresponding mutations, can be selected or screened for. In this way, the selected or screened cells will be enriched for those that have incorporated the edits from each and every iterative round of engineering.
Recursive methods can be used to switch a selectable or screenable marker between an on and an off position, or between an off and an on position, with each successive round of engineering. Using such a method allows conservation of available selectable or screenable markers by requiring, for example, the use of only one screenable or selectable marker. Furthermore, short regulatory sequence or start codon or non-start codons can be used to turn the screenable or selectable marker on and off. Such short sequences can easily fit within a synthesized cassette or polynucleotide.
One or more rounds of engineering can be performed using the methods and compositions disclosed herein. In some examples, each round of engineering is used to incorporate an edit unique from that of previous rounds. Each round of engineering can incorporate a unique recording sequence. Each round of engineering can result in removal or curing of the plasmid used in the previous round of engineering. In some examples, successful incorporation of the recording sequence of each round of engineering results in a complete and functional screenable or selectable marker or unique sequence combination.
Unique recorder cassettes comprising recording sequences such as barcodes or screenable or selectable markers can be inserted with each round of engineering, thereby generating a recorder sequence that is indicative of the combination of edits or engineering steps performed. Successive recording sequences can be inserted adjacent to one another. Successive recording sequences can be inserted within proximity to one another. Successive sequences can be inserted at a distance from one another.
Successive sequences can be inserted at a distance from one another. For example, successive recorder sequences can be inserted and separated by 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or greater than 100 bp. In some examples, successive recorder sequences are separated by about 10, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, or greater than 1500 bp.
Successive recorder sequences can be separated by any desired number of base pairs and can be dependent and limited on the number of successive recorder sequences to be inserted, the size of the target nucleic acid or target genomes, and/or the design of the desired final recorder sequence. For example, if the compiled recorder sequence is a functional screenable or selectable marker, than the successive recording sequences can be inserted within proximity and within the same reading frame from one another. If the compiled recorder sequence is a unique set of barcodes to be identified by sequencing and have no coding sequence element, then the successive recorder sequences can be inserted with any desired number of base pairs separating them. In these cases, the separation distance can be dependent on the sequencing technology to be used and the read length limit.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Some Definitions
As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene or characteristic as it occurs in nature as distinguished from mutant or variant forms.
As used herein the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature.
The terms “orthologue” (also referred to as “ortholog” herein) and “homologue” (also referred to as “homolog” herein) are well known in the art. By means of further guidance, a “homologue” of a protein as used herein is a protein of the same species which performs the same or a similar function as the protein it is a homologue of. Homologous proteins may but need not be structurally related, or are only partially structurally related. An “orthologue” of a protein as used herein is a protein of a different species which performs the same or a similar function as the protein it is an orthologue of Orthologous proteins may but need not be structurally related, or are only partially structurally related. Homologs and orthologs may be identified by homology modelling (see, e.g., Greer, Science vol. 228 (1985) 1055, and Blundell et al. Eur J Biochem vol 172 (1988), 513) or “structural BLAST” (Dey F, Cliff Zhang Q, Petrey D, Honig B. Toward a “structural BLAST”: using structural relationships to infer function. Protein Sci. 2013 April; 22(4):359-66. doi: 10.1002/pro.2225.).
The terms “polynucleotide”, “nucleotide”, “nucleotide sequence”, “nucleic acid” and “oligonucleotide” are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. The term also encompasses nucleic-acid-like structures with synthetic backbones, see, e.g., Eckstein, 1991; Baserga et al., 1992; Milligan, 1993; WO 97/03211; WO 96/39154; Mata, 1997; Strauss-Soukup, 1997; and Samstag, 1996. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
“Complementarity” refers to the ability of a nucleic acid to form hydrogen bond(s) with another nucleic acid sequence by either traditional Watson-Crick base pairing or other non-traditional types. A percent complementarity indicates the percentage of residues in a nucleic acid molecule which can form hydrogen bonds (e.g., Watson-Crick base pairing) with a second nucleic acid sequence (e.g., 5, 6, 7, 8, 9, 10 out of 10 being 50%, 60%, 70%, 80%, 90%, and 100% complementary). “Perfectly complementary” means that all the contiguous residues of a nucleic acid sequence will hydrogen bond with the same number of contiguous residues in a second nucleic acid sequence. “Substantially complementary” as used herein refers to a degree of complementarity that is at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, 99%, or 100% over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, or more nucleotides, or refers to two nucleic acids that hybridize under stringent conditions.
As used herein, “stringent conditions” for hybridization refer to conditions under which a nucleic acid having complementarity to a target sequence predominantly hybridizes with the target sequence, and substantially does not hybridize to non-target sequences. Stringent conditions are generally sequence-dependent, and vary depending on a number of factors. In general, the longer the sequence, the higher the temperature at which the sequence specifically hybridizes to its target sequence. Non-limiting examples of stringent conditions are described in detail in Tijssen (1993). Laboratory Techniques. In Biochemistry And Molecular Biology-Hybridization With Nucleic Acid Probes Part I, Second Chapter “Overview of principles of hybridization and the strategy of nucleic acid probe assay”, Elsevier, N.Y. Where reference is made to a polynucleotide sequence, then complementary or partially complementary sequences are also envisaged. These are preferably capable of hybridising to the reference sequence under highly stringent conditions. Generally, in order to maximize the hybridization rate, relatively low-stringency hybridization conditions are selected: about 20 to 25 degrees Celsius. lower than the thermal melting point (Tm). The Tm is the temperature at which 50% of specific target sequence hybridizes to a perfectly complementary probe in solution at a defined ionic strength and pH. Generally, in order to require at least about 85% nucleotide complementarity of hybridized sequences, highly stringent washing conditions are selected to be about 5 to 15 degrees Celsius lower than the Tm. In order to require at least about 70% nucleotide complementarity of hybridized sequences, moderately-stringent washing conditions are selected to be about 15 to 30 degrees Celsius lower than the Tm. Highly permissive (very low stringency) washing conditions may be as low as 50 degrees Celsius below the Tm, allowing a high level of mis-matching between hybridized sequences. Those skilled in the art will recognize that other physical and chemical parameters in the hybridization and wash stages can also be altered to affect the outcome of a detectable hybridization signal from a specific level of homology between target and probe sequences.
“Hybridization” refers to a reaction in which one or more polynucleotides react to form a complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The hydrogen bonding may occur by Watson Crick base pairing, Hoogstein binding, or in any other sequence specific manner. The complex may comprise two strands forming a duplex structure, three or more strands forming a multi stranded complex, a single self-hybridizing strand, or any combination of these. A hybridization reaction may constitute a step in a more extensive process, such as the initiation of PCR, or the cleavage of a polynucleotide by an enzyme. A sequence capable of hybridizing with a given sequence is referred to as the “complement” of the given sequence.
As used herein, the term “genomic locus” or “locus” (plural loci) is the specific location of a gene or DNA sequence on a chromosome. A “gene” refers to stretches of DNA or RNA that encode a polypeptide or an RNA chain that has functional role to play in an organism and hence is the molecular unit of heredity in living organisms. For the purpose of this invention it may be considered that genes include regions which regulate the production of the gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences. Accordingly, a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions.
As used herein, “expression of a genomic locus” or “gene expression” is the process by which information from a gene is used in the synthesis of a functional gene product. The products of gene expression are often proteins, but in non-protein coding genes such as rRNA genes or tRNA genes, the product is functional RNA. The process of gene expression is used by all known life—eukaryotes (including multicellular organisms), prokaryotes (bacteria and archaea) and viruses to generate functional products to survive. As used herein “expression” of a gene or nucleic acid encompasses not only cellular gene expression, but also the transcription and translation of nucleic acid(s) in cloning systems and in any other context. As used herein, “expression” also refers to the process by which a polynucleotide is transcribed from a DNA template (such as into and mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, expression may include splicing of the mRNA in a eukaryotic cell.
The terms “polypeptide”, “peptide” and “protein” are used interchangeably herein to refer to polymers of amino acids of any length. The polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non amino acids. The terms also encompass an amino acid polymer that has been modified; for example, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, or any other manipulation, such as conjugation with a labeling component. As used herein the term “amino acid” includes natural and/or unnatural or synthetic amino acids, including glycine and both the D or L optical isomers, and amino acid analogs and peptidomimetics.
As used herein, the term “domain” or “protein domain” refers to a part of a protein sequence that may exist and function independently of the rest of the protein chain.
As described in aspects of the invention, sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences. Sequence homologies may be generated by any of a number of computer programs known in the art, for example BLAST or FASTA, etc. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (University of Wisconsin. U.S.A; Devereux et al., 1984, Nucleic Acids Research 12:387). Examples of other software than may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 ibid—Chapter 18), FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999 ibid, pages 7-58 to 7-60). However it is preferred to use the GCG Bestfit program.
Percent homology may be calculated over contiguous sequences, i.e., one sequence is aligned with the other sequence and each amino acid or nucleotide in one sequence is directly compared with the corresponding amino acid or nucleotide in the other sequence, one residue at a time. This is called an “ungapped” alignment. Typically, such ungapped alignments are performed only over a relatively short number of residues.
Although this is a very simple and consistent method, it fails to take into consideration that, for example, in an otherwise identical pair of sequences, one insertion or deletion may cause the following amino acid residues to be put out of alignment, thus potentially resulting in a large reduction in % homology when a global alignment is performed. Consequently, most sequence comparison methods are designed to produce optimal alignments that take into consideration possible insertions and deletions without unduly penalizing the overall homology or identity score. This is achieved by inserting “gaps” in the sequence alignment to try to maximize local homology or identity.
However, these more complex methods assign “gap penalties” to each gap that occurs in the alignment so that, for the same number of identical amino acids, a sequence alignment with as few gaps as possible—reflecting higher relatedness between the two compared sequences—may achieve a higher score than one with many gaps. “Affinity gap costs” are typically used that charge a relatively high cost for the existence of a gap and a smaller penalty for each subsequent residue in the gap. This is the most commonly used gap scoring system. High gap penalties may, of course, produce optimized alignments with fewer gaps. Most alignment programs allow the gap penalties to be modified. However, it is preferred to use the default values when using such software for sequence comparisons. For example, when using the GCG Wisconsin Bestfit package the default gap penalty for amino acid sequences is −12 for a gap and −4 for each extension.
Calculation of maximum % homology therefore first requires the production of an optimal alignment, taking into consideration gap penalties. A suitable computer program for carrying out such an alignment is the GCG Wisconsin Bestfit package (Devereux et al., 1984 Nuc. Acids Research 12 p 387). Examples of other software that may perform sequence comparisons include, but are not limited to, the BLAST package (see Ausubel et al., 1999 Short Protocols in Molecular Biology, 4th Ed.—Chapter 18), FASTA (Altschul et al., 1990 J. Mol. Biol. 403-410) and the GENEWORKS suite of comparison tools. Both BLAST and FASTA are available for offline and online searching (see Ausubel et al., 1999, Short Protocols in Molecular Biology, pages 7-58 to 7-60). However, for some applications, it is preferred to use the GCG Bestfit program. A new tool, called BLAST 2 Sequences is also available for comparing protein and nucleotide sequences (see FEMS Microbiol Lett. 1999 174(2): 247-50; FEMS Microbiol Lett. 1999 177(1): 187-8 and the website of the National Center for Biotechnology information at the website of the National Institutes for Health).
Although the final % homology may be measured in terms of identity, the alignment process itself is typically not based on an all-or-nothing pair comparison. Instead, a scaled similarity score matrix is generally used that assigns scores to each pair-wise comparison based on chemical similarity or evolutionary distance. An example of such a matrix commonly used is the BLOSUM62 matrix—the default matrix for the BLAST suite of programs. GCG Wisconsin programs generally use either the public default values or a custom symbol comparison table, if supplied (see user manual for further details). For some applications, it is preferred to use the public default values for the GCG package, or in the case of other software, the default matrix, such as BLOSUM62.
Alternatively, percentage homologies may be calculated using the multiple alignment feature in DNASIS™ (Hitachi Software), based on an algorithm, analogous to CLUSTAL (Higgins D G & Sharp P M (1988), Gene 73(1), 237-244). Once the software has produced an optimal alignment, it is possible to calculate % homology, preferably % sequence identity. The software typically does this as part of the sequence comparison and generates a numerical result.
Sequences may also have deletions, insertions or substitutions of amino acid residues which produce a silent change and result in a functionally equivalent substance. Deliberate amino acid substitutions may be made on the basis of similarity in amino acid properties (such as polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues) and it is therefore useful to group amino acids together in functional groups. Amino acids may be grouped together based on the properties of their side chains alone. However, it is more useful to include mutation data as well. The sets of amino acids thus derived are likely to be conserved for structural reasons. These sets may be described in the form of a Venn diagram (Livingstone C. D. and Barton G. J. (1993) “Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation” Comput. Appl. Biosci. 9: 745-756) (Taylor W. R. (1986) “The classification of amino acid conservation” J. Theor. Biol. 119; 205-218). Conservative substitutions may be made, for example according to the table below which describes a generally accepted Venn diagram grouping of amino acids.
Embodiments of the invention include sequences (both polynucleotide or polypeptide) which may comprise homologous substitution (substitution and replacement are both used herein to mean the interchange of an existing amino acid residue or nucleotide, with an alternative residue or nucleotide) that may occur i.e., like-for-like substitution in the case of amino acids such as basic for basic, acidic for acidic, polar for polar, etc. Non-homologous substitution may also occur i.e., from one class of residue to another or alternatively involving the inclusion of unnatural amino acids such as ornithine (hereinafter referred to as Z), diaminobutyric acid ornithine (hereinafter referred to as B), norleucine ornithine (hereinafter referred to as O), pyridylalanine, thienylalanine, naphthylalanine and phenylglycine.
Variant amino acid sequences may include suitable spacer groups that may be inserted between any two amino acid residues of the sequence including alkyl groups such as methyl, ethyl or propyl groups in addition to amino acid spacers such as glycine or .beta.-alanine residues. A further form of variation, which involves the presence of one or more amino acid residues in peptoid form, may be well understood by those skilled in the art. For the avoidance of doubt, “the peptoid form” is used to refer to variant amino acid residues wherein the .alpha.-carbon substituent group is on the residue's nitrogen atom rather than the .alpha.-carbon. Processes for preparing peptides in the peptoid form are known in the art, for example Simon R J et al., PNAS (1992) 89(20), 9367-9371 and Horwell D C, Trends Biotechnol. (1995) 13(4), 132-134.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA, which are within the skill of the art. See Green and Sambrook, (Molecular Cloning: A Laboratory Manual. 4th, ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2014); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (F. M. Ausubel, et al. eds., (2017)); Short Protocols in Molecular Biology, (Ausubel et al., 1999)); the series METHODS IN ENZYMOLOGY (Academic Press, Inc.): PCR 2: A PRACTICAL APPROACH (M. J. MacPherson, B. D. Hames and G. R. Taylor eds. (1995)), ANTIBODIES, A LABORATORY MANUAL, SECOND EDITION (Harlow and Lane, eds. (2014) and CULTURE OF ANIMAL CELLS: A MANUAL OF BASIC TECHNIQUE, 7TH EDITION (R. I. Freshney, ed. (2016)).
EXAMPLES
The following examples are given for the purpose of illustrating various embodiments of the invention and are not meant to limit the present invention in any fashion. The present examples, along with the methods described herein are presently representative of preferred embodiments, are exemplary, and are not intended as limitations on the scope of the invention. Changes therein and other uses which are encompassed within the spirit of the invention as defined by the scope of the claims will occur to those skilled in the art.
Example 1. Nucleic Acid-Guided Nucleases
Sequences for twenty nucleic acid guided nucleases, termed MAD1-MAD20 (SEQ ID NOs 1-20), were aligned and compared to other nucleic acid guided nucleases. A partial alignment and phylogenetic tree are depicted in FIG. 1A and FIG. 1B respectively. Key residues in that may be involved in the recognition of a PAM site are shown in FIG. 1A. These include amino acids at positions 167, 539, 548, 599, 603, 604, 605, 606, and 607.
Sequence alignments were built using PSI-BLAST to search for MAD nuclease homologs in the NCBI non-redundant databases. Multiple sequence alignments were further refined using the MUSCLE alignment algorithm with default settings as implemented in Geneious 10. The percent identity of each homolog to SpCas9 and AsCpf1 reference sequences were computed based on the pairwise alignment matching from these global alignments.
Genomic source sequences were identified using Uniprot linkage information or TBLASTN searches of NCBI using the default parameters and searching all possible frames for translational matches.
Percent identities of MAD1-8 and 10-12 to other various nuclease are summarized in Table 1. These percent identities represent the shared amino acid sequence identity between the indicated proteins.
TABLE 1
Protein
identifier or
accession
number MAD1 MAD2 MAD3 MAD4 MAD5 MAD6 MAD7 MAD8 MAD10 MAD11 MAD12
gi|102573 6.4 32.8 33.2 29.7 29.4 31.1 30.3 31.7 26.7 27.9 98.8
4861|pdb|5
B43|A
gi|105224 6.4 32.7 33.1 29.7 29.3 31 30.3 31.7 26.7 27.8 98.7
5173|pdb|5
KK5|A
gi|108621 6.1 33 34.4 29.6 30.1 33.5 32.3 32.1 26.2 27.2 46.8
6683|emb|
SDC1621
5.1|
gi|112017 5.9 30.9 37.2 32.8 33.6 34.4 35.7 35.1 26.3 28.3 34.9
5333|ref|
WP_0730
43853.1|
Cpf1.Sj 6.6 33.6 41.7 37.2 33.4 37.6 40.1 37.7 29.1 30.3 34.1
|WP_0818
39471
Cpf1.Ss|K 6.9 32.3 35.7 43 33.7 45.9 34.8 48 33.2 33.4 33.8
FO67989
MAD3 5.8 31 100 32.9 35.9 35 35.6 34.3 28 27.6 33.1
gi|108247 7 31.4 35.9 43.2 31.4 45 33.6 48.6 30.8 33.5 33
4576|gb|O
FY19591.1|
MAD2 6.1 100 31 30.7 30.2 31 31.2 31.2 25.8 27.7 32.6
Cpf1.Lb5| 7.8 32.8 36.5 38.2 34.2 45.5 35.8 43.6 30.7 35.7 32.5
WP_0163
01126
gi|108828 6.7 30.6 35.3 42.4 33.2 44.7 32.1 46.8 30.7 32.6 32.4
6736|gb|O
HB41002.1|
gi|109442 6.8 30.8 36.1 40.4 31.8 50.4 35.2 46.6 30.4 36.8 32.3
3310|emb|
SER03894.1|
gi|493326 6.8 30.8 36.1 40.3 31.8 50.3 35.1 46.6 30.4 36.8 32.3
531|ref|W
P_006283
774.1|
MAD8 7.6 31.2 34.3 40.4 32 41.6 32.8 100 30.1 32.1 31.7
Cpf1.Bot| 6.9 30.1 36.6 41.5 32.5 50.2 35.4 45.5 29.8 34.1 31.6
WP_0092
17842
Cpf1.Li|W 7.3 30.2 34.6 39.3 30.3 40.7 31.8 39.4 32.1 31.3 31.5
P_020988
726
Cpf1.Pb| 6.3 31.4 31.8 36.1 30.8 45.7 30.4 39.4 27.7 33.5 31.5
WP_0441
10123
gi|817911 7.3 29.8 35 40.7 32.1 40.3 32.6 41.7 29.1 31 31.4
372|gb|AK
G08867.1|
gi|105283 6.6 30.8 35.5 32 31.5 34.4 51.9 33.4 26.1 29 31.3
8533|emb|
SCH4529
7.1|
gi|105371 7.2 29.6 33.2 39.6 29.8 49.1 32.2 41.4 30.1 32.4 31.3
3332|ref|
WP_0660
40075.1|
gi|817909 7.3 29.8 35 40.7 32 40.3 32.5 41.6 29.1 30.9 31.3
002|gb|AK
G06878.1|
gi|104220 7.2 29.5 35.2 40.6 31.9 40.1 32.7 41.6 29 30.8 31.2
1477|ref|
WP_0652
56572.1|
MAD6 7.5 31 35 38.9 33.1 100 34.3 41.6 30.5 33.6 31
gi|490468 6.8 31.8 31.7 36.2 28.6 36.5 31.4 38.4 28.5 31.4 31
773|ref|W
P_004339
290.1|
gi|565853 7.5 30.8 34.9 38.9 33.1 99.7 34.1 41.6 30.4 33.6 31
704|ref|W
P_023936
172.1|
gi|739005 7.5 30.9 35 38.9 33 99.9 34.2 41.5 30.4 33.5 31
707|ref|W
P_036887
416.1|
gi|739008 7.5 31 35 38.8 33 99.8 34.2 41.5 30.4 33.5 31
549|ref|W
P_036890
108.1|
Cpf1.Ft|W 7.1 31.9 33.8 40.3 29.7 39.4 34.1 41 29.8 32.5 30.8
P_014550
095
gi|504362 7.2 32.4 33.8 40.3 29.6 39.4 33.8 40.9 30.1 32.5 30.8
993|ref|W
P_014550
095.1|
gi|640557 6.6 31.4 34.8 40.7 31.2 48 34.1 45.1 28.8 35.2 30.8
447|ref|W
P_024988
992.1|
gi|109894 7.1 32.3 33.5 40.3 29.6 39.2 33.8 40.9 30.1 32.5 30.6
4113|ref|
WP_0713
04624.1|
gi|489124 7.1 32.3 33.9 40.9 29.9 39.2 33.9 40.9 29.9 32.2 30.6
848|ref|W
P_003034
647.1|
gi|738967 6.8 29.4 33.1 35.5 28.9 40.3 30.7 35.9 28.7 31.3 30.5
776|ref|W
P_036851
563.1|
MAD7 5.9 31.2 35.6 30.8 33.9 34.3 100 32.8 24.2 28.9 30.5
Cpf1.Lb6| 6.7 29.8 33.7 36.6 30.9 43 34 39.8 29.1 32.1 30.4
WP_0449
10713
gi|105296 5.5 30.5 35.8 32.3 34 35 53.8 33.4 26.2 27.4 30.4
1977|emb|
SCH4791
5.1|
gi|817918 7 29.1 34.4 39.8 31.7 40 32.4 41.1 28.4 30.1 30.3
353|gb|AK
G14689.1|
gi|917059 6.9 29.9 31.5 35.7 31.6 41.8 32.9 39.1 30.1 34 30.2
416|ref|W
P_051666
128.1|
gi|101164 6.8 29 34.7 40.3 31.4 40.1 33.1 41.6 28.5 30.4 30.1
9201|ref|
WP_0624
99108.1|
Cpf1.Pm| 6.3 29.2 32.3 34.2 27.4 38.7 29.4 35 27.2 30.1 30
WP_0183
59861
gi|817922 6.8 29.1 34.5 39.6 31.5 39.9 32.7 40.7 28.3 29.8 30
537|gb|AK
G18099.1|
gi|769142 6.7 31 34.6 37.8 31.5 41.4 33.3 39.2 28 31.9 29.9
322|ref|W
P_044919
442.1|
gi|102317 6.7 29.7 31.3 35.5 31.3 41 32.6 38.5 29.7 33.3 29.8
6441|pdb|5
ID6|A
gi|491540 5.9 28.3 30.4 29.7 28.5 29 30.7 29.8 25.8 27.8 29.8
987|ref|W
P_005398
606.1|
gi|652820 6.4 31.1 34 35.3 31.7 40.3 33.4 37.5 28.5 33.3 29.8
612|ref|W
P_027109
509.1|
gi|502240 5.9 31.6 36.1 31.2 33 35.4 49.4 34 26.6 29.4 29.7
446|ref|W
P_012739
647.1|
gi|524278 5.8 31.6 36 31 33 35.4 50 34 26.6 29.5 29.7
046|emb|C
DA41776.1|
gi|737831 6.2 31.3 34.8 38.1 31.5 42.1 33 39.6 28.4 32.4 29.7
580|ref|W
P_035798
880.1|
gi|909652 6.9 30.7 34.2 37.2 30.8 41.5 34.2 38.7 28 32 29.7
572|ref|W
P_049895
985.1|
MAD4 6.7 30.7 32.9 100 30.7 38.9 30.8 40.4 28.8 29.4 29.7
gi|942073 5.9 31.6 36.1 31.1 32.7 35 49.7 33.9 27.1 29.5 29.6
049|ref|W
P_055286
279.1|
gi|654794 7.4 30.5 35.9 37.4 31.3 42.8 34.2 40.2 27.9 33.5 29.5
505|ref|W
P_028248
456.1|
gi|933014 5.6 31.3 34.9 31.2 31.5 32.4 46.7 30.6 25.4 27.7 29.4
786|emb|C
UO47728.1|
gi|941887 5.6 31.4 35 31.3 31.6 32.5 46.6 30.7 25.3 27.8 29.4
450|ref|W
P_055224
182.1|
gi|920071 6.3 31 31.8 38.8 31.8 41.3 33.8 42.6 29.8 34.7 29
674|ref|W
P_052943
011.1|
MAD5 5.1 30.2 35.9 30.7 100 33.1 33.9 32 24.3 28.7 29
gi|108146 6.9 30.4 33.5 34.7 29.7 40.1 30.5 37.4 27.3 32.5 28.9
2674|emb|
SCZ76797.1|
gi|918722 7.4 27.5 30.5 35.7 28.3 35.2 28.5 36 26 27.1 28.8
523|ref|W
P_052585
281.1|
gi|524816 6.2 30 34.1 29.3 31.2 32.7 47.6 32.2 25.5 25.9 28.4
323|emb|C
DF09621.1|
gi|941782 6.2 30.2 33.1 28.9 30.9 32 46.9 32.1 26 27.1 28.4
328|ref|W
P_055176
369.1|
gi|942113 6.4 29.8 33.8 29.7 31.3 33.1 48 32.5 25.8 26.2 28.4
296|ref|W
P_055306
762.1|
MAD11 6.4 27.7 27.6 29.4 28.7 33.6 28.9 32.1 26.2 100 27.8
gi|653158 5.9 26.4 28.1 33.5 27.4 32.5 27.8 32 27 26.8 27.6
548|ref|W
P_027407
524.1|
gi|652963 6.6 30.3 32.5 33.2 30.4 38.2 29.6 34.6 25.9 30.5 27.2
004|ref|W
P_027216
152.1|
gi|108306 6.2 25 24.3 26.6 23.1 28.1 23.2 26.4 45 24.9 27.1
9650|gb|O
GD68774.1|
gi|302483 5.6 24.7 26.8 30.3 24.9 34.8 26 30.4 24.4 27.5 27.1
275|gb|EF
L46285.1|
gi|915400 5.6 24.7 26.8 30.3 24.9 34.8 26 30.4 24.4 27.5 27.1
855|ref|W
P_050786
240.1|
MAD10 5.6 25.8 28 28.8 24.3 30.5 24.2 30.1 100 26.2 26.6
gi|110111 6.1 26.8 26 27.3 24.3 28.1 24.4 28.2 44.1 25.4 26.1
7967|gb|O
IO75780.1|
gi|108820 6.5 25.2 23.5 25.8 22.9 27 22 26.1 36.5 24.2 24.7
4458|gb|O
HA63117.1|
gi|809198 4.9 25.6 26.5 22.2 23.9 23.8 25.8 23.9 20.3 25.1 24
071|ref|W
P_046328
599.1|
gi|108807 5.6 21.9 23.8 26.9 23.4 27.8 23.3 26.7 28.8 24.7 23.5
9929|gb|O
GZ45678.1|
gi|110105 5.9 23.1 26.2 25.2 23 26.4 25.1 26.5 29.2 23.2 23.4
3499|gb|O
IO15737.1|
gi|110105 5.4 21.2 22.8 23.6 20.6 25 20.7 25 25.9 22.2 23
8058|gb|O
IO19978.1|
gi|108800 5.7 23.5 25.2 25.5 23.9 27 25.1 25.6 31.6 23.6 22.9
0848|gb|O
GY73485.1|
gi|407014 5.2 23.5 25.9 26.7 24.3 25.8 23 27.8 29.9 25.3 22.9
433|gb|EK
E28449.1|
gi|818249 6 21 20.7 23.5 20 24.2 21 24 24.6 21.8 22.6
855|gb|KK
P36646.1|
gi|818703 5.8 23.3 25 25.1 23.5 26.5 24.7 25.3 31.2 23.3 22.6
647|gb|KK
T48220.1|
gi|818705 5.8 23.1 24.6 24.7 22.9 26.2 24.2 24.8 30.8 22.9 22.2
786|gb|KK
T50231.1|
gi|108395 4.5 20 22.1 23.5 20.6 24.6 20 24 23.5 20.7 22.1
0632|gb|O
GJ66851.1|
gi|108393 6 20.4 20.2 22.6 19.3 23.3 20.6 23.2 23.9 21 21.8
2199|gb|O
GJ49885.1|
gi|108341 5 21.7 23.3 25.5 23 25 22.7 25.9 27.2 22.4 21.5
0735|gb|O
GF20863.1|
gi|101148 4.7 20.1 20.1 21.4 19.3 23.3 21.4 22 20.2 19.7 20.9
0927|ref|
WP_0623
76669.1|
gi|818539 5.1 19.8 21.6 22.1 20.5 22.9 21.2 22.8 24 20.5 19.9
593|gb|KK
R91555.1|
gi|503048 5.1 18.8 20.7 15.3 19.7 18.9 19.3 17.7 15.9 19 19.2
015|ref|W
P_013282
991.1|
gi|109623 5 19.1 20.5 17.4 20.1 19.7 20.4 20.4 17.5 18.5 18.9
2746|ref|
WP_0711
77645.1|
gi|769130 4.6 19.4 18.2 16.1 18.1 17.1 18.7 17.9 14.5 16.8 17.5
404|ref|W
P_044910
712.1|
gi|108556 2.6 11.6 12.1 12.7 10.2 12.1 12.7 11.6 10.9 11.1 10.5
9500|gb|O
GX23684.1|
gi|818357 3.3 10 11.1 10.6 11.1 11.8 12.1 11.5 12.2 10.8 9.8
062|gb|KK
Q38176.1|
gi|745626 3.7 9.4 11.7 11.1 11.1 12.5 11.9 11.9 10.2 10.6 8.8
763|gb|KI
E18642.1|
MAD1 100 6.1 5.8 6.7 5.1 7.5 5.9 7.6 5.6 6.4 6.4
SpCas9 4 6.3 6.5 8.3 5.6 8.1 6.9 7.7 6.9 6.3 6.3
MAD12 6.4 32.6 33.1 29.7 29 31 30.5 31.7 26.6 27.8 100
Example 2: Expression of MAD Nucleases
Wild-type nucleic acid sequences for MAD1-MAD20 include SEQ ID NOs 21-40, respectively. These MAD nucleases were codon optimized for expression in E. coli and the codon optimized sequences are listed as SEQ ID NO: 41-60, respectively (summarized in Table 2).
Codon optimized MAD1-MAD20 were cloned into an expression construct comprising a constitutive or inducible promoter (eg., proB promoter SEQ ID NO: 83, or pBAD promoter SEQ ID NO: 81 or SEQ ID NO: 82) and an optional 6×-His tag (SEQ ID NO: 182) (eg., FIG. 2). The generated MAD1-MAD20 expression constructs are provided as SEQ ID NOs: 61-80, respectively. The expression constructs as depicted in FIG. 2 were generated either by restriction/ligation-based cloning or homology-based cloning.
Example 3. Testing Guide Nucleic Acid Sequences Compatible with MAD Nucleases
In order to have a functioning targetable nuclease complex, a nucleic acid-guided nuclease and a compatible guide nucleic acid is needed. To determine the compatible guide nucleic acid sequence, specifically the scaffold sequence portion of the guide nucleic acid, multiple approaches were taken. First, scaffold sequences were looked for near the endogenous loci of each MAD nuclease. In some cases, such as with MAD2, no endogenous scaffold sequence was found. Therefore, we tested the compatibility of MAD2 with scaffold sequences found near the endogenous loci of the other MAD nucleases. A list of the MAD nucleases and corresponding endogenous scaffold sequences that were tested is listed in Table 2.
TABLE 2
Endogenous
Codon optimized scaffold sequence
MAD WT nucleic nucleic acid Amino acid for guide
nuclease acid sequence sequence sequence nucleic acid
MAD1 SEQ ID NO: 21 SEQ ID NO: 41 SEQ ID NO: 1 SEQ ID NO: 84
MAD2 SEQ ID NO: 22 SEQ ID NO: 42 SEQ ID NO: 2 None identified
MAD3 SEQ ID NO: 23 SEQ ID NO: 43 SEQ ID NO: 3 SEQ ID NO: 86
MAD4 SEQ ID NO: 24 SEQ ID NO: 44 SEQ ID NO: 4 SEQ ID NO: 87
MAD5 SEQ ID NO: 25 SEQ ID NO: 45 SEQ ID NO: 5 SEQ ID NO: 88
MAD6 SEQ ID NO: 26 SEQ ID NO: 46 SEQ ID NO: 6 SEQ ID NO: 89
MAD7 SEQ ID NO: 27 SEQ ID NO: 47 SEQ ID NO: 7 SEQ ID NO: 90
MAD8 SEQ ID NO: 28 SEQ ID NO: 48 SEQ ID NO: 8 SEQ ID NO: 91
MAD9 SEQ ID NO: 29 SEQ ID NO: 49 SEQ ID NO: 9 SEQ ID NO: 92;
SEQ ID NO: 103;
SEQ ID NO: 106
MAD10 SEQ ID NO: 30 SEQ ID NO: 50 SEQ ID NO: 10 SEQ ID NO: 93
MAD11 SEQ ID NO: 31 SEQ ID NO: 51 SEQ ID NO: 11 SEQ ID NO: 94
MAD12 SEQ ID NO: 32 SEQ ID NO: 52 SEQ ID NO: 12 SEQ ID NO: 95
MAD13 SEQ ID NO: 33 SEQ ID NO: 53 SEQ ID NO: 13 SEQ ID NO: 96;
SEQ ID NO: 105;
SEQ ID NO: 107
MAD14 SEQ ID NO: 34 SEQ ID NO: 54 SEQ ID NO: 14 SEQ ID NO: 97
MAD15 SEQ ID NO: 35 SEQ ID NO: 55 SEQ ID NO: 15 SEQ ID NO: 98
MAD16 SEQ ID NO: 36 SEQ ID NO: 56 SEQ ID NO: 16 SEQ ID NO: 99
MAD17 SEQ ID NO: 37 SEQ ID NO: 57 SEQ ID NO: 17 SEQ ID NO: 100
MAD18 SEQ ID NO: 38 SEQ ID NO: 58 SEQ ID NO: 18 SEQ ID NO: 101
MAD19 SEQ ID NO: 39 SEQ ID NO: 59 SEQ ID NO: 19 SEQ ID NO: 102
MAD20 SEQ ID NO: 40 SEQ ID NO: 60 SEQ ID NO: 20 SEQ ID NO: 103
Editing cassettes as depicted in FIG. 3 were generated to assess the functionality of the MAD nucleases and corresponding guide nucleic acids. Each editing cassette comprises an editing sequence and a promoter operably linked to an encoded guide nucleic acid. The editing cassettes further comprises primer sites (P1 and P2) on flanking ends. The guide nucleic acids comprised various scaffold sequences to be tested, as well as a guide sequence to guide the MAD nuclease to the target sequence for editing. The editing sequences comprised a PAM mutation and/or codon mutation relative to the target sequence. The mutations were flanked by regions of homology (homology arms or HA) which would allow recombination into the cleaved target sequence.
FIG. 4 depicts an experimental designed to test different MAD nuclease and guide nucleic acid combinations. An expression cassette encoding the MAD nuclease or the MAD nuclease protein were added to host cells along with various editing cassettes as described above. In this example, the guide nucleic acids were engineered to target the galK gene in the host cell, and the editing sequence was designed to mutate the targeted galK gene in order to turn the gene off, thereby allowing for screening of successfully edited cells. This design was used for identification of functional or compatible MAD nuclease and guide nucleic acid combinations. Editing efficiency was determined by qPCR to measure the editing plasmid in the recovered cells in a high-throughput manner. Validation of MAD11 and Cas9 primers is shown in FIGS. 14A and 14B. These results show that the selected primer pairs are orthogonal and allow quantitative measurement of input plasmid DNA
FIGS. 5A-5B is a depiction of a similar experimental design. In this case, the editing cassette (FIG. 5B) further comprises a selectable marker, in this case kanamycin resistance (kan) and the MAD nuclease expression vector (FIG. 5A) further comprises a selectable marker, in this case chloramphenicol resistance (Cm), and the lambda RED recombination system to aid homologous recombination (HR) of the editing sequence into the target sequence. A compatible MAD nuclease and guide nucleic acid combination will cause a double strand break in the target sequence if a PAM sequence is present. Since the editing sequence (eg. FIG. 3) contains a PAM mutation that is not recognized by the MAD nuclease, edited cells that contain the PAM mutation survive cleavage by the MAD nuclease, while wild-type non-edited cells die (FIG. 5C). The editing sequence further comprises a mutation in the galK gene that allows for screening of edited cells, while the MAD nuclease expression vector and editing cassette contain drug selection markers, allowing for selection of edited cells.
Using these methods, compatible guide nucleic acids for MAD1-MAD20 were tested. Twenty scaffold sequences were tested. The guide nucleic acids used in the experiments contained one of the twenty scaffold sequences, referred to as scaffold-1, scaffold-2, etc., and a guide sequence that targets the galK gene. Sequences for Scaffold-1 through Scaffold-20 are listed as SEQ ID NO: 84-103, respectively. It should be understood that the guide sequence of the guide nucleic acid is variable and can be engineered or designed to target any desired target sequence. Since MAD2 does not have an endogenous scaffold sequence to test, a scaffold sequence from a close homology (scaffold-2, SEQ ID NO: 85) was tested and found to be a non-functional pair, meaning MAD2 and scaffold-2 were not compatible. Therefore, MAD2 was tested with the other nineteen scaffold sequences, despite the low sequence homology between MAD2 and the other MAD nucleases.
This workflow could also be used to identify or test PAM sequences compatible with a given MAD nuclease. Another method for identifying a PAM site is described in the next example.
In general, for the assays described, transformations were carried out as follows. E. coli strains expressing the codon optimized MAD nucleases were grown overnight. Saturated cultures were diluted 1/100 and grown to an OD600 of 0.6 and induced by adding arabinose at a filing concentration of 0.4% and (if a temperature sensitive plasmid is used) shifting the culture to 42 degrees Celsius in a shaking water bath. Following induction, cells were chilled on ice for 15 min prior to washing thrice with ¼ the initial culture volume with 10% glycerol (for example, 50 mL washed for a 200 mL culture). Cells were resuspended in 1/100 the initial volume (for example, 2 mL for a 200 mL culture) and stores at −90 degrees Celsius until ready to use. To perform the compatibility and editing efficiency screens described here, 50 ng of editing cassette was transformed into cell aliquots by electroporation. Following electroporation, the cells were recovered in LB for 3 hours and 100 μL of cells were plated on Macconkey plates containing 1% galactose.
Editing efficiencies were determined by dividing the number of white colonies (edited cells) by the total number of white and red colonies (edited and non-edited cells).
Example 4. PAM Selection Assay
In order to generate a double strand break in a target sequence, a guide nucleic acid must hybridize to a target sequence, and the MAD nuclease must recognize a PAM sequence adjacent to the target sequence. If the guide nucleic acid hybridizes to the target sequence, but the MAD nuclease does not recognize a PAM site, then cleavage does not occur.
A PAM is MAD nuclease-specific and not all MAD nucleases necessarily recognize the same PAM. In order to assess the PAM site requirements for the MAD nucleases, an assay as depicted in FIGS. 6A-6C was performed.
FIG. 6A depicts a MAD nuclease expression vector as described elsewhere, which also contains a chloramphenicol resistance gene and the lambda RED recombination system.
FIG. 6B depicts a self-targeting editing cassette. The guided nucleic acid is designed to target the target sequence which is contained on the same nucleic acid molecule. The target sequence is flanked by random nucleotides, depicted by N4, meaning four random nucleotides on either end of the target sequence. It should be understood that any number of random nucleotides could also be used (for example, 3, 5, 6, 7, 8, etc). The random nucleotides serve as a library of potential PAMs.
FIG. 6C depicts the experimental design. Basically, the MAD nuclease expression vector and editing cassette comprising the random PAM sites were transformed into a host cell. If a functional targetable nuclease complex was formed and the MAD nuclease recognized a PAM site, then the editing cassette vector was cleaved and which leads to cell death. If a functional targetable complex was not formed or if the MAD nuclease did not recognize the PAM, then the target sequence was not cleaved and the cell survived. Next generation sequence (NGS) was then used to sequence the starting and final cell populations in order to determine what PAM sites were recognized by a given MAD nuclease. These recognized PAM sites were then used to determine a consensus or non-consensus PAM for a given MAD nuclease.
The consensus PAM for MAD1-MAD8, and MAD10-MAD12 was determined to be TTTN. The consensus PAM for MAD9 was determined to be NNG. The consensus PAM for MAD13-MAD15 was determined to be TTN. The consensus PAM for MAD16-MAD18 was determined to be TA. The consensus PAM for MAD19-MAD20 was determined to be TTCN.
Example 5: Testing Heterologous Guide Nucleic Acids
Editing efficiencies were tested for MAD1, MAD2, MAD4, and MAD7 and are depicted in FIG. 7A and FIG. 7B. Experiment details and editing efficiencies are summarized in Table 3. Editing efficiency was determined by dividing the number of edited cells by the total number of recovered cells. Various editing cassettes targeting the galK gene were used to allow screening of editing cells. The guide nucleic acids encoded on the editing cassette contained a guide sequence targeting the galK gene and one of various scaffold sequences in order to test the compatibility of the indicated MAD nuclease with the indicated scaffold sequence, as summarized in Table 3.
Editing efficiencies for compatible MAD nuclease and guide nucleic acids (comprising the indicated scaffold sequences) were observed to have between 75-100% editing efficiency. MAD2 had between a 75-100% editing efficiency and MAD7 had between a 97-100% editing efficiency.
MAD2 combined with scaffold-1, scaffold-2, scaffold-4, or scaffold-13 in these experiments results in 0% editing efficiency. These data imply that MAD2 did not form a functional complex with these tested guide nucleic acids and that MAD2 is not compatible with these scaffold sequences.
MAD7 combined with scaffold-1, scaffold-2, scaffold-4, or scaffold-13 in these experiments results in 0% editing efficiency. These data imply that MAD7 did not form a functional complex with these tested guide nucleic acids and that MAD7 is not compatible with these scaffold sequences.
For MAD1 and MAD4, all tested guide nucleic acid combinations resulted in 0% editing efficiency, implying that MAD1 and MAD4 did not form a functional complex with any of the tested guide nucleic acids. These data also imply that MAD1 and MAD4 are not compatible with the tested scaffold sequences.
Combined, these data highlight the unpredictability of finding a compatible MAD nuclease and scaffold sequence pair in order to form a functional targetable nuclease complex. Some tested MAD nucleases did not function with any tested scaffold sequence. Some tested MAD nucleases only functioned with some tested scaffold sequences and not with others.
TABLE 3
Editing
Nucleic acid- Guide nucleic acid sequence Editing
# guided nuclease scaffold sequence mutation Target gene efficiency
1 MAD1 Scaffold-1; SEQ ID NO: 84 L80** galK 0%
2 MAD1 Scaffold-2; SEQ ID NO: 85 Y145** galK 0%
3 MAD1 Scaffold-4; SEQ ID NO: 87 Y145** galK 0%
4 MAD1 Scaffold-10; SEQ ID NO: 93 Y145** galK 0%
5 MAD1 Scaffold-11; SEQ ID NO: 94 L80** galK 0%
6 MAD1 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 0%
7 MAD1 Scaffold-13; SEQ ID NO: 96 Y145** galK 0%
8 MAD1 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 0%
9 MAD2 Scaffold-10; SEQ ID NO: 93 L80** galK 0%
10 MAD2 Scaffold-10; SEQ ID NO: 93 Y145** galK 100% 
11 MAD2 Scaffold-11; SEQ ID NO: 94 L80** galK 98% 
12 MAD2 Scaffold-11; SEQ ID NO: 94 Y145** galK 99% 
13 MAD2 Scaffold-12; SEQ ID NO: 95 Y145** galK 98% 
14 MAD2 Scaffold-12; SEQ ID NO: 95 Y145** galK 0%
15 MAD2 Scaffold-13; SEQ ID NO: 96 Y145** galK 0%
16 MAD2 Scaffold-1; SEQ ID NO: 84 L80** galK 0%
17 MAD2 Scaffold-2; SEQ ID NO: 85 Y145** galK 0%
18 MAD2 Scaffold-2; SEQ ID NO: 85 Y145** galK 0%
19 MAD2 Scaffold-4; SEQ ID NO: 87 Y145** galK 0%
20 MAD2 Scaffold-5; SEQ ID NO: 88 L80** galK 99% 
21 MAD2 Scaffold-12; SEQ ID NO: 95 89** galK 0%
22 MAD2 Scaffold-12; SEQ ID NO: 95 70** galK 75% 
23 MAD2 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 79% 
24 MAD4 Scaffold-1; SEQ ID NO: 84 L80** galK 0%
25 MAD4 Scaffold-2; SEQ ID NO: 85 Y145** galK 0%
26 MAD4 Scaffold-4; SEQ ID NO: 87 Y145** galK 0%
27 MAD4 Scaffold-10; SEQ ID NO: 93 Y145** galK 0%
28 MAD4 Scaffold-11; SEQ ID NO: 94 L80** galK 0%
29 MAD4 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 0%
30 MAD4 Scaffold-13; SEQ ID NO: 96 Y145** galK 0%
31 MAD4 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 0%
32 MAD7 Scaffold-1; SEQ ID NO: 84 L80** galK 0%
33 MAD7 Scaffold-2; SEQ ID NO: 85 Y145** galK 0%
34 MAD7 Scaffold-4; SEQ ID NO: 87 Y145** galK 0%
35 MAD7 Scaffold-10; SEQ ID NO: 93 Y145** galK 100% 
36 MAD7 Scaffold-11; SEQ ID NO: 94 L80** galK 97% 
37 MAD7 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 0%
38 MAD7 Scaffold-13; SEQ ID NO: 96 Y145** galK 0%
39 MAD7 Scaffold-12; SEQ ID NO: 95 L10KpnI galK 0%
Example 6. Assessment of MAD2 and MAD7
The ability of MAD2 and MAD7 to function with heterologous guide nucleic acids were tested using a similar experimental design as described above.
The compatibility of MAD2 with other scaffold sequences was tested and the results of an experiment are depicted in FIG. 8. The MAD nucleases, guide nucleic acid scaffold sequences, and editing sequences used in this experiment are summarized in Table 4.
The compatibility of MAD7 with other scaffold sequences was tested and the results of an experiment are depicted in FIG. 9. The MAD nucleases, guide nucleic acid scaffold sequences, and editing sequences used in this experiment are summarized in Table 5.
TABLE 4
Nucleic acid- Editing
guided Guide nucleic acid scaffold sequence Target
# nuclease sequence mutation gene
1 MAD2 Scaffold-12; SEQ ID NO: 95 N89KpnI galK
2 MAD2 Scaffold-10; SEQ ID NO: 93 L80** galK
3 MAD2 Scaffold-5; SEQ ID NO: 88 L80** galK
4 MAD2 Scaffold-12; SEQ ID NO: 95 D70KpnI galK
5 MAD2 Scaffold-12; SEQ ID NO: 95 Y145** galK
6 MAD2 Scaffold-11; SEQ ID NO: 94 Y145** galK
7 MAD2 Scaffold-10; SEQ ID NO: 93 Y145** galK
8 MAD2 Scaffold-12; SEQ ID NO: 95 Ll0KpnI galK
9 MAD2 Scaffold-11; SEQ ID NO: 94 L80** galK
10  SpCas9 S. pyogenese gRNA Y145** galK
11  MAD2 Scaffold-2; SEQ ID NO: 85 Y145** galK
12  MAD2 Scaffold-4; SEQ ID NO: 87 Y145** galK
13  MAD2 Scaffold-1; SEQ ID NO: 84 L80** galK
14  MAD2 Scaffold-13; SEQ ID NO: 96 Y145** galK
TABLE 5
Nucleic acid- Editing
guided Guide nucleic acid scaffold sequence Target
# nuclease sequence mutation gene
1 MAD7 Scaffold-1; SEQ ID NO: 84 L80** galK
2 MAD7 Scaffold-2; SEQ ID NO: 85 Y145** galK
3 MAD7 Scaffold-4; SEQ ID NO: 87 Y145** galK
4 MAD7 Scaffold-10; SEQ ID NO: 93 Y145** galK
5 MAD7 Scaffold-11; SEQ ID NO: 95 L80** galK
In another experiment, transformation efficiencies (FIG. 10B) were determined by calculating the total number of recovered cells compared to the start number of cells. An example plate image is depicted in FIG. 10C. Editing efficiencies (FIG. 10A) were determined by calculating the ratio of editing colonies (white colonies, edited galK gene) versus total colonies.
In this example (FIG. 10A-10C), cells expressing galK were transformed with expression constructs expressing either MAD2 or MAD7 and a corresponding editing cassette comprising a guide nucleic acid targeting the galK gene. The guide nucleic acid was comprised of a guide sequence targeting the galK gene and the scaffold-12 sequence (SEQ ID NO: 95).
In the depicted example, MAD2 and MAD7 has a lower transformation efficiency compared to S. pyogenes Cas9, though the editing efficiency of MAD2 and MAD7 was slightly higher than S. pyogenes Cas9.
FIG. 11 depicts the sequencing results from select colonies recovered from the assay described above. The target sequence was in the galK coding sequence (CDS). The TTTN PAM is shown as the reverse complement (wild-type NAAA, mutated NGAA). The mutations targeted by the editing sequence are labeled as target codons. Changes compared to the wild-type sequence are highlighted. In these experiments, the scaffold-12 sequence (SEQ ID NO: 95) was used. The guide sequence of the guide nucleic acid targeted the galK gene.
Six of the seven depicted sequences from the MAD2 experiment contained the designed PAM mutation and designed mutations in the target codons of galK, which one sequences colony maintained the wild-type PAM and wild-type target codons while also containing an unintended mutation upstream of the target site.
Two of the four depicted sequences from the MAD7 experiment contained the designed PAM mutation and mutated target codons. One colony comprises a wildtype sequence, while another contained a deletion of eight nucleotides upstream of the target sequence.
FIG. 12 depicts results from another experiment testing the ability to recover edited cells. In Experiment 0, the MAD2 nuclease was used with a guide nucleic acid comprising scaffold-11 sequence and a guide sequence targeting galK. The editing cassette comprised an editing sequence designed to incorporate an L80** mutation into galK, thereby allowing screening of the edited cells. In experiment 1, the MAD2 nuclease was used with a guide nucleic acid comprising scaffold-12 sequence and a guide sequence targeting galK. The editing cassette comprised an editing sequence designed to incorporate an L10KpnI mutation into galK. In both experiments, a negative control plasmid a guide nucleic acid that is not compatible with MAD2 was included in the transformations. Following transformation, the ratio of the compatible editing cassette (those containing scaffold-11 or scaffold-12 guide nucleic acids) to the non-compatible editing cassette (negative control) was measure. The experiments were done in the presence or absence of selection. The results show that more compatible editing cassette containing cells were recovered compared to the non-compatible editing cassette, and this result is magnified when selection is used.
Example 7. Guide Nucleic Acid Characterization
The sequences of scaffolds 1-8, and 10-12 (SEQ ID NO: 84-91, and 93-95) were aligned and are depicted in FIG. 13A. Nucleotides that match the consensus sequence are faded, while those diverging from the consensus sequence are visible. The predicted pseudoknot region is indicated. Without being bound by theory, the region 5′ of the pseudoknot may be influence binding and/or kinetics of the nucleic acid-guided nuclease. As is shown in FIG. 13A, in general, there appears to be less variability in the pseudoknot region (e.g., SEQ ID NO: 172-181) as compared to the sequence outside of the pseudoknot region.
FIG. 13B shows a preliminary model of MAD2 and MAD12 complexed with a guide nucleic acid (in this example, a guide RNA) and target sequence (DNA).
Example 8. Editing Efficiency of the MAD Nucleases
A plate-based editing efficiency assay and a molecular editing efficiency assay were used to test editing efficiency of various MAD nuclease and guide nucleic acid combinations.
FIG. 15 depicts quantification of the data obtained using the molecular editing efficiency assay using MAD2 nuclease with a guide nucleic acid comprising scaffold-12 and a guide sequencing targeting galK. The indicated mutations were incorporated into the galK using corresponding editing cassettes containing the mutation. FIG. 16 shows the comparison of the editing efficiencies determined by the plate-based assay using white and red colonies as described previously, and the molecular editing efficiency assay. As shown in FIG. 16, the editing efficiencies as determined by the two separate assays are consistent.
Example 9. Trackable Editing
Genetic edits can be tracked by the use of a barcode. A barcode can be incorporated into or near the edit site as described in the present specification. When multiple rounds of engineering are being performed, with a different edit being made in each round, it may be beneficial to insert a barcode in a common region during each round of engineering, this way one could sequence a single site and get the sequences of all of the barcodes from each round without the need to sequence each edited site individually. FIGS. 17A-17C, 18, and 19 depict examples of such trackable engineering workflows.
As depicted in FIG. 17A, a cell expressing a MAD nuclease is transformed with a plasmid containing an editing cassette and a recording cassette. The editing cassette contains a PAM mutation and a gene edit. The recorder cassette comprises a barcode, in this case 15N. Both the editing cassette and recording cassette each comprise a guide nucleic acid to a distinct target sequence. Within a library of such plasmids, the recorder cassette for each round can contain the same guide nucleic acid, such that the first round barcode is inserted into the same location across all variants, regardless of what editing cassette and corresponding gene edit is used. The correlation between the barcode and editing cassette is determined beforehand though such that the edit can be identified by sequencing the barcode. FIG. 17B shows an example of a recording cassette designed to delete a PAM site while incorporating a 15N barcode. The deleted PAM is used to enrich for edited cells since mutated PAM cells escape cell death while cells containing a wild-type PAM sequence are killed. Fire 21 C depicts how sequencing the barcode region can be used to identify which edit is comprised within each cell.
A similar approach is depicted in FIG. 18. In this case, the recorder cassette from each round is designed to target a sequence adjacent to the previous round, and each time, a new PAM site is deleted by the recorder cassette. The result is a barcode array with the barcodes from each round that can be sequenced to confirm each round of engineering took place and to determine which combination of mutations are contained in the cell, and in which order the mutations were made. Each successive recorder cassette can be designed to be homologous on one end to the region comprising the mutated PAM from the previous round, which could increase the efficiency of getting fully edited cells at the end of the experiment. In other examples, the recorder cassette is designed to target a unique landing site that was incorporated by the previous recorder cassette. This increases the efficiency of recovering cells containing all of the desired mutations since the subsequent recorder cassette and barcode can only target a cell that has successfully completed the previous round of engineering.
FIG. 19 depicts another approach that allows the recycling of selectable markers or to otherwise cure the cell of the plasmid form the previous round of engineering. In this case, the transformed plasmid containing a guide nucleic acid designed to target a selectable marker or other unique sequence in the plasmid form the previous round of engineering.
TABLE 6
SEQUENCE LISTING
SEQ
ID
NO: Sequence
SEQ MGKMYYLGLDIGTNSVGYAVTDPSYHLLKFKGEPMWGAHVFAAGNQSAERRSFRTSRRRLDRRQQRVKLV
ID QEIFAPVISPIDPRFFIRLHESALWRDDVAETDKHIFFNDPTYTDKEYYSDYPTIHHLIVDLMESSEKHDPRLVY
NO: LAVAWLVAHRGHFLNEVDKDNIGDVLSFDAFYPEFLAFLSDNGVSPWVCESKALQATLLSRNSVNDKYKAL
1 KSLIFGSQKPEDNFDANISEDGLIQLLAGKKVKVNKLFPQESNDASFTLNDKEDAIEEILGTLTPDECEWIAHIR
RLFDWAIMKHALKDGRTISESKVKLYEQHHHDLTQLKYFVKTYLAKEYDDIFRNVDSETTKNYVAYSYHVK
EVKGTLPKNKATQEEFCKYVLGKVKNIECSEADKVDFDEMIQRLTDNSFMPKQVSGENRVIPYQLYYYELKT
ILNKAASYLPFLTQCGKDAISNQDKLLSIMTFRIPYFVGPLRKDNSEHAWLERKAGKIYPWNFNDKVDLDKSE
EAFIRRMTNTCTYYPGEDVLPLDSLIYEKFMILNEINNIRIDGYPISVDVKQQVFGLFEKKRRVTVKDIQNLLLS
LGALDKHGKLTGIDTTIHSNYNTYHHFKSLMERGVLTRDDVERIVERMTYSDDTKRVRLWLNNNYGTLTAD
DVKHISRLRKHDFGRLSKMFLTGLKGVHKETGERASILDFMWNTNDNLMQLLSECYTFSDEITKLQEAYYA
KAQLSLNDFLDSMYISNAVKRPIYRTLAVVNDIRKACGTAPKRIFIEMARDGESKKKRSVTRREQIKNLYRSIR
KDFQQEVDFLEKILENKSDGQLQSDALYLYFAQLGRDMYTGDPIKLEHIKDQSFYNIDHIYPQSMVKDDSLD
NKVLVQSEINGEKSSRYPLDAAIRNKMKPLWDAYYNHGLISLKKYQRLTRSTPFTDDEKWDFINRQLVETRQ
STKALAILLKRKFPDTEIVYSKAGLSSDFRHEFGLVKSRNINDLHHAKDAFLAIVTGNVYHERFNRRWFMVN
QPYSVKTKTLFTHSIKNGNFVAWNGEEDLGRIVKMLKQNKNTIHFTRFSFDRKEGLFDIQPLKASTGLVPRKA
GLDVVKYGGYDKSTAAYYLLVRFTLEDKKTQHKLMMIPVEGLYKARIDHDKEFLTDYAQTTISEILQKDKQ
KVINIMFPMGTRHIKLNSMISIDGFYLSIGGKSSKGKSVLCHAMVPLIVPHKIECYIKAMESFARKFKENNKLRI
VEKFDKITVEDNLNLYELFLQKLQHNPYNKFFSTQFDVLTNGRSTFTKLSPEEQVQTLLNILSIFKTCRSSGCD
LKSINGSAQAARIMISADLTGLSKKYSDIRLVEQSASGLFVSKSQNLLEYL*
SEQ MSSLTKFTNKYSKQLTIKNELIPVGKTLENIKENGLIDGDEQLNENYQKAKIIVDDFLRDFINKALNNTQIGNW
ID RELADALNKEDEDNIEKLQDKIRGIIVSKFETFDLFSSYSIKKDEKIIDDDNDVEEEELDLGKKTSSFKYIFKKN
NO: LFKLVLPSYLKTTNQDKLKIISSFDNFSTYFRGFFENRKNIFTKKPISTSIAYRIVHDNFPKFLDNIRCFNVWQTE
2 CPQLIVKADNYLKSKNVIAKDKSLANYFTVGAYDYFLSQNGIDFYNNIIGGLPAFAGHEKIQGLNEFINQECQ
KDSELKSKLKNRHAFKMAVLFKQILSDREKSFVIDEFESDAQVIDAVKNFYAEQCKDNNVIFNLLNLIKNIAF
LSDDELDGIFIEGKYLSSVSQKLYSDWSKLRNDIEDSANSKQGNKELAKKIKTNKGDVEKAISKYEFSLSELNS
IVHDNTKFSDLLSCTLHKVASEKLVKVNEGDWPKHLKNNEEKQKIKEPLDALLEIYNTLLIFNCKSFNKNGNF
YVDYDRCINELSSVVYLYNKTRNYCTKKPYNTDKFKLNFNSPQLGEGFSKSKENDCLTLLFKKDDNYYVGII
RKGAKINFDDTQAIADNTDNCIFKMNYFLLKDAKKFIPKCSIQLKEVKAHFKKSEDDYILSDKEKFASPLVIKK
STFLLATAHVKGKKGNIKKFQKEYSKENPTEYRNSLNEWIAFCKEFLKTYKAATIFDITTLKKAEEYADIVEF
YKDVDNLCYKLEFCPIKTSFIENLIDNGDLYLFRINNKDFSSKSTGTKNLHTLYLQAIFDERNLNNPTIMLNGG
AELFYRKESIEQKNRITHKAGSILVNKVCKDGTSLDDKIRNEIYQYENKFIDTLSDEAKKVLPNVIKKEATHDI
TKDKRFTSDKFFFHCPLTINYKEGDTKQFNNEVLSFLRGNPDINIIGIDRGERNLIYVTVINQKGEILDSVSFNT
VTNKSSKIEQTVDYEEKLAVREKERIEAKRSWDSISKIATLKEGYLSAIVHEICLLMIKHNAIVVLENLNAGFK
RIRGGLSEKSVYQKFEKMLINKLNYFVSKKESDWNKPSGLLNGLQLSDQFESFEKLGIQSGFIFYVPAAYTSKI
DPTTGFANVLNLSKVRNVDAIKSFFSNFNEISYSKKEALFKFSFDLDSLSKKGFSSFVKFSKSKWNVYTFGERII
KPKNKQGYREDKRINLTFEMKKLLNEYKVSFDLENNLIPNLTSANLKDTFWKELFFIFKTTLQLRNSVTNGKE
DVLISPVKNAKGEFFVSGTHNKTLPQDCDANGAYHIALKGLMILERNNLVREEKDTKKIMAISNVDWFEYVQ
KRRGVL*
SEQ MNNYDEFTKLYPIQKTIRFELKPQGRTMEHLETFNFFEEDRDRAEKYKILKEAIDEYHKKFIDEHLTNMSLDW
ID NSLKQISEKYYKSREEKDKKVFLSEQKRMRQEIVSEFKKDDRFKDLFSKKLFSELLKEEIYKKGNHQEIDALK
NO: SFDKFSGYFIGLHENRKNMYSDGDEITAISNRIVNENFPKFLDNLQKYQEARKKYPEWIIKAESALVAHNIKM
3 DEVFSLEYFNKVLNQEGIQRYNLALGGYVTKSGEKMMGLNDALNLAHQSEKSSKGRIHMTPLFKQILSEKES
FSYIPDVFTEDSQLLPSIGGFFAQIENDKDGNIFDRALELISSYAEYDTERIYIRQADINRVSNVIFGEWGTLGGL
MREYKADSINDINLERTCKKVDKWLDSKEFALSDVLEAIKRTGNNDAFNEYISKMRTAREKIDAARKEMKFI
SEKISGDEESIHIIKTLLDSVQQFLHFFNLFKARQDIPLDGAFYAEFDEVHSKLFAIVPLYNKVRNYLTKNNLNT
KKIKLNFKNPTLANGWDQNKVYDYASLIFLRDGNYYLGIINPKRKKNIKFEQGSGNGPFYRKMVYKQIPGPN
KNLPRVFLTSTKGKKEYKPSKEIIEGYEADKHIRGDKFDLDFCHKLIDFFKESIEKHKDWSKFNFYFSPTESYG
DISEFYLDVEKQGYRMHFENISAETIDEYVEKGDLFLFQIYNKDFVKAATGKKDMHTIYWNAAFSPENLQDV
VVKLNGEAELFYRDKSDIKEIVHREGEILVNRTYNGRTPVPDKIHKKLTDYHNGRTKDLGEAKEYLDKVRYF
KAHYDITKDRRYLNDKIYFHVPLTLNFKANGKKNLNKMVIEKFLSDEKAHIIGIDRGERNLLYYSIIDRSGKII
DQQSLNVIDGFDYREKLNQREIEMKDARQSWNAIGKIKDLKEGYLSKAVHEITKMAIQYNAIVVMEELNYGF
KRGRFKVEKQIYQKFENMLIDKMNYLVFKDAPDESPGGVLNAYQLTNPLESFAKLGKQTGILFYVPAAYTSK
IDPTTGFVNLFNTSSKTNAQERKEFLQKFESISYSAKDGGIFAFAFDYRKFGTSKTDHKNVWTAYTNGERMR
YIKEKKRNELFDPSKEIKEALTSSGIKYDGGQNILPDILRSNNNGLIYTMYSSFIAAIQMRVYDGKEDYIISPIKN
SKGEFFRTDPKRRELPIDADANGAYNIALRGELTMRAIAEKFDPDSEKMAKLELKHKDWFEFMQTRGD*
SEQ MTKTFDSEFFNLYSLQKTVRFELKPVGETASFVEDFKNEGLKRVVSEDERRAVDYQKVKEIIDDYHRDFIEES
ID LNYFPEQVSKDALEQAFHLYQKLKAAKVEEREKALKEWEALQKKLREKVVKCFSDSNKARFSRIDKKELIK
NO: EDLINWLVAQNREDDIPTVETFNNFTTYFTGFHENRKNIYSKDDHATAISFRLIHENLPKFFDNVISFNKLKEG
4 FPELKFDKVKEDLEVDYDLKHAFEIEYFVNFVTQAGIDQYNYLLGGKTLEDGTKKQGMNEQINLFKQQQTR
DKARQIPKLIPLFKQILSERTESQSFIPKQFESDQELFDSLQKLHNNCQDKFTVLQQAILGLAEADLKKVFIKTS
DLNALSNTIFGNYSVFSDALNLYKESLKTKKAQEAFEKLPAHSIHDLIQYLEQFNSSLDAEKQQSTDTVLNYFI
KTDELYSRFIKSTSEAFTQVQPLFELEALSSKRRPPESEDEGAKGQEGFEQIKRIKAYLDTLMEAVHFAKPLYL
VKGRKMIEGLDKDQSFYEAFEMAYQELESLIIPIYNKARSYLSRKPFKADKFKINFDNNTLLSGWDANKETAN
ASILFKKDGLYYLGIMPKGKTFLFDYFVSSEDSEKLKQRRQKTAEEALAQDGESYFEKIRYKLLPGASKMLPK
VFFSNKNIGFYNPSDDILRIRNTASHTKNGTPQKGHSKVEFNLNDCHKMIDFFKSSIQKHPEWGSFGFTFSDTS
DFEDMSAFYREVENQGYVISFDKIKETYIQSQVEQGNLYLFQIYNKDFSPYSKGKPNLHTLYWKALFEEANL
NNVVAKLNGEAEIFFRRHSIKASDKVVHPANQAIDNKNPHTEKTQSTFEYDLVKDKRYTQDKFFFHVPISLNF
KAQGVSKFNDKVNGFLKGNPDVNIIGIDRGERHLLYFTVVNQKGEILVQESLNTLMSDKGHVNDYQQKLDK
KEQERDAARKSWTTVENIKELKEGYLSHVVHKLAHLIIKYNAIVCLEDLNFGFKRGRFKVEKQVYQKFEKAL
IDKLNYLVFKEKELGEVGHYLTAYQLTAPFESFKKLGKQSGILFYVPADYTSKIDPTTGFVNFLDLRYQSVEK
AKQLLSDFNAIRFNSVQNYFEFEIDYKKLTPKRKVGTQSKWVICTYGDVRYQNRRNQKGHWETEEVNVTEK
LKALFASDSKTTTVIDYANDDNLIDVILEQDKASFFKELLWLLKLTMTLRHSKIKSEDDFILSPVKNEQGEFYD
SRKAGEVWPKDADANGAYHIALKGLWNLQQINQWEKGKTLNLAIKNQDWFSFIQEKPYQE*
SEQ MHTGGLLSMDAKEFTGQYPLSKTLRFELRPIGRTWDNLEASGYLAEDRHRAECYPRAKELLDDNHRAFLNR
ID VLPQIDMDWHPIAEAFCKVHKNPGNKELAQDYNLQLSKRRKEISAYLQDADGYKGLFAKPALDEAMKIAKE
NO: NGNESDIEVLEAFNGFSVYFTGYHESRENIYSDEDMVSVAYRITEDNFPRFVSNALIFDKLNESHPDIISEVSGN
5 LGVDDIGKYFDVSNYNNFLSQAGIDDYNHIIGGHTTEDGLIQAFNVVLNLRHQKDPGFEKIQFKQLYKQILSV
RTSKSYIPKQFDNSKEMVDCICDYVSKIEKSETVERALKLVRNISSFDLRGIFVNKKNLRILSNKLIGDWDAIET
ALMHSSSSENDKKSVYDSAEAFTLDDIFSSVKKFSDASAEDIGNRAEDICRVISETAPFINDLRAVDLDSLNDD
GYEAAVSKIRESLEPYMDLFHELEIFSVGDEFPKCAAFYSELEEVSEQLIEIIPLFNKARSFCTRKRYSTDKIKVN
LKFPTLADGWDLNKERDNKAAILRKDGKYYLAILDMKKDLSSIRTSDEDESSFEKMEYKLLPSPVKMLPKIF
VKSKAAKEKYGLTDRMLECYDKGMHKSGSAFDLGFCHELIDYYKRCIAEYPGWDVFDFKFRETSDYGSMK
EFNEDVAGAGYYMSLRKIPCSEVYRLLDEKSIYLFQIYNKDYSENAHGNKNMHTMYWEGLFSPQNLESPVF
KLSGGAELFFRKSSIPNDAKTVHPKGSVLVPRNDVNGRRIPDSIYRELTRYFNRGDCRISDEAKSYLDKVKTK
KADHDIVKDRRFTVDKMMFHVPIAMNFKAISKPNLNKKVIDGIIDDQDLKIIGIDRGERNLIYVTMVDRKGNI
LYQDSLNILNGYDYRKALDVREYDNKEARRNWTKVEGIRKMKEGYLSLAVSKLADMIIENNAIIVMEDLNH
GFKAGRSKIEKQVYQKFESMLINKLGYMVLKDKSIDQSGGALHGYQLANHVTTLASVGKQCGVIFYIPAAFT
SKIDPTTGFADLFALSNVKNVASMREFFSKMKSVIYDKAEGKFAFTFDYLDYNVKSECGRTLWTVYTVGERF
TYSRVNREYVRKVPTDIIYDALQKAGISVEGDLRDRIAESDGDTLKSIFYAFKYALDMRVENREEDYIQSPVK
NASGEFFCSKNAGKSLPQDSDANGAYNIALKGILQLRMLSEQYDPNAESIRLPLITNKAWLTFMQSGMKTWK
N*
SEQ MDSLKDFTNLYPVSKTLRFELKPVGKTLENIEKAGILKEDEHRAESYRRVKKIIDTYHKVFIDSSLENMAKMG
ID IENEIKAMLQSFCELYKKDHRTEGEDKALDKIRAVLRGLIVGAFTGVCGRRENTVQNEKYESLFKEKLIKEILP
NO: DFVLSTEAESLPFSVEEATRSLKEFDSFTSYFAGFYENRKNIYSTKPQSTAIAYRLIHENLPKFIDNILVFQKIKE
6 PIAKELEHIRADFSAGGYIKKDERLEDIFSLNYYIHVLSQAGIEKYNALIGKIVTEGDGEMKGLNEHINLYNQQ
RGREDRLPLFRPLYKQILSDREQLSYLPESFEKDEELLRALKEFYDHIAEDILGRTQQLMTSISEYDLSRIYVRN
DSQLTDISKKMLGDWNAIYMARERAYDHEQAPKRITAKYERDRIKALKGEESISLANLNSCIAFLDNVRDCR
VDTYLSTLGQKEGPHGLSNLVENVFASYHEAEQLLSFPYPEENNLIQDKDNVVLIKNLLDNISDLQRFLKPLW
GMGDEPDKDERFYGEYNYIRGALDQVIPLYNKVRNYLTRKPYSTRKVKLNFGNSQLLSGWDRNKEKDNSC
VILRKGQNFYLAIMNNRHKRSFENKVLPEYKEGEPYFEKMDYKFLPDPNKMLPKVFLSKKGIEIYKPSPKLLE
QYGHGTHKKGDTFSMDDLHELIDFFKHSIEAHEDWKQFGFKFSDTATYENVSSFYREVEDQGYKLSFRKVSE
SYVYSLIDQGKLYLFQIYNKDFSPCSKGTPNLHTLYWRMLFDERNLADVIYKLDGKAEIFFREKSLKNDHPTH
PAGKPIKKKSRQKKGEESLFEYDLVKDRHYTMDKFQFHVPITMNFKCSAGSKVNDMVNAHIREAKDMHVIG
IDRGERNLLYICVIDSRGTILDQISLNTINDIDYHDLLESRDKDRQQERRNWQTIEGIKELKQGYLSQAVHRIAE
LMVAYKAVVALEDLNMGFKRGRQKVESSVYQQFEKQLIDKLNYLVDKKKRPEDIGGLLRAYQFTAPFKSFK
EMGKQNGFLFYIPAWNTSNIDPTTGFVNLFHAQYENVDKAKSFFQKFDSISYNPKKDWFEFAFDYKNFTKKA
EGSRSMWILCTHGSRIKNFRNSQKNGQWDSEEFALTEAFKSLFVRYEIDYTADLKTAIVDEKQKDFFVDLLKL
FKLTVQMRNSWKEKDLDYLISPVAGADGRFFDTREGNKSLPKDADANGAYNIALKGLWALRQIRQTSEGGK
LKLAISNKEWLQFVQERSYEKD*
SEQ MNNGTNNFQNFIGISSLQKTLRNALIPTETTQQFIVKNGIIKEDELRGENRQILKDIMDDYYRGFISETLSSIDDI
ID DWTSLFEKMEIQLKNGDNKDTLIKEQTEYRKAIHKKFANDDRFKNMFSAKLISDILPEFVIHNNNYSASEKEE
NO: KTQVIKLFSRFATSFKDYFKNRANCFSADDISSSSCHRIVNDNAEIFFSNALVYRRIVKSLSNDDINKISGDMKD
7 SLKEMSLEEIYSYEKYGEFITQEGISFYNDICGKVNSFMNLYCQKNKENKNLYKLQKLHKQILCIADTSYEVP
YKFESDEEVYQSVNGFLDNISSKHIVERLRKIGDNYNGYNLDKIYIVSKFYESVSQKTYRDWETINTALEIHYN
NILPGNGKSKADKVKKAVKNDLQKSITEINELVSNYKLCSDDNIKAETYIHEISHILNNFEAQELKYNPEIHLV
ESELKASELKNVLDVIMNAFHWCSVFMTEELVDKDNNFYAELEEIYDEIYPVISLYNLVRNYVTQKPYSTKKI
KLNFGIPTLADGWSKSKEYSNNAIILMRDNLYYLGIFNAKNKPDKKIIEGNTSENKGDYKKMIYNLLPGPNK
MIPKVFLSSKTGVETYKPSAYILEGYKQNKHIKSSKDFDITFCHDLIDYFKNCIAIHPEWKNFGFDFSDTSTYED
ISGFYREVELQGYKIDWTYISEKDIDLLQEKGQLYLFQIYNKDFSKKSTGNDNLHTMYLKNLFSEENLKDIVL
KLNGEAEIFFRKSSIKNPIIHKKGSILVNRTYEAEEKDQFGNIQIVRKNIPENIYQELYKYFNDKSDKELSDEAA
KLKNVVGHHEAATNIVKDYRYTYDKYFLHMPITINFKANKTGFINDRILQYIAKEKDLHVIGIDRGERNLIYV
SVIDTCGNIVEQKSFNIVNGYDYQIKLKQQEGARQIARKEWKEIGKIKEIKEGYLSLVIHEISKMVIKYNAIIAM
EDLSYGFKKGRFKVERQVYQKFETMLINKLNYLVFKDISITENGGLLKGYQLTYIPDKLKNVGHQCGCIFYVP
AAYTSKIDPTTGFVNIFKFKDLTVDAKREFIKKFDSIRYDSEKNLFCFTFDYNNFITQNTVMSKSSWSVYTYGV
RIKRRFVNGRFSNESDTIDITKDMEKTLEMTDINWRDGHDLRQDIIDYEIVQHIFEIFRLTVQMRNSLSELEDR
DYDRLISPVLNENNIFYDSAKAGDALPKDADANGAYCIALKGLYEIKQITENWKEDGKFSRDKLKISNKDWF
DFIQNKRYL*
SEQ MTNKFTNQYSLSKTLRFELIPQGKTLEFIQEKGLLSQDKQRAESYQEMKKTIDKFHKYFIDLALSNAKLTHLE
ID TYLELYNKSAETKKEQKFKDDLKKVQDNLRKEIVKSFSDGDAKSIFAILDKKELITVELEKWFENNEQKDIYF
NO: DEKFKTFTTYFTGFHQNRKNMYSVEPNSTAIAYRLIHENLPKFLENAKAFEKIKQVESLQVNFRELMGEFGDE
8 GLIFVNELEEMFQINYYNDVLSQNGITIYNSIISGFTKNDIKYKGLNEYINNYNQTKDKKDRLPKLKQLYKQIL
SDRISLSFLPDAFTDGKQVLKAIFDFYKINLLSYTIEGQEESQNLLLLIRQTIENLSSFDTQKIYLKNDTHLTTISQ
QVFGDFSVFSTALNYWYETKVNPKFETEYSKANEKKREILDKAKAVFTKQDYFSIAFLQEVLSEYILTLDHTS
DIVKKHSSNCIADYFKNHFVAKKENETDKTFDFIANITAKYQCIQGILENADQYEDELKQDQKLIDNLKFFLD
AILELLHFIKPLHLKSESITEKDTAFYDVFENYYEALSLLTPLYNMVRNYVTQKPYSTEKIKLNFENAQLLNG
WDANKEGDYLTTILKKDGNYFLAINIDKKHNKAFQKFPEGKENYEKMVYKLLPGVNKMLPKVFFSNKNIAY
FNPSKELLENYKKETHKKGDTFNLEHCHTLIDFFKDSLNKHEDWKYFDFQFSETKSYQDLSGFYREVEHQGY
KINFKNIDSEYIDGLVNEGKLFLFQIYSKDFSPFSKGKPNMHTLYWKALFEEQNLQNVIYKLNGQAEIFFRKAS
IKPKNIILHKKKIKIAKKHFIDKKTKTSEIVPVQTIKNLNMYYQGKISEKELTQDDLRYIDNFSIFNEKNKTIDIIK
DKRFTVDKFQFHVPITMNFKATGGSYINQTVLEYLQNNPEVKIIGLDRGERHLVYLTLIDQQGNILKQESLNTI
TDSKISTPYHKLLDNKENERDLARKNWGTVENIKELKEGYISQVVHKIATLMLEENAIVVMEDLNFGFKRGR
FKVEKQIYQKLEKMLIDKLNYLVLKDKQPQELGGLYNALQLTNKFESFQKMGKQSGFLFYVPAWNTSKIDP
TTGFVNYFYTKYENVDKAKAFFEKFEAIRFNAEKKYFEFEVKKYSDFNPKAEGTQQAWTICTYGERIETKRQ
KDQNNKFVSTPINLTEKIEDFLGKNQIVYGDGNCIKSQIASKDDKAFFETLLYWFKMTLQNIRNSETRTDIDYL
ISPVMNDNGTFYNSRDYEKLENPTLPKDADANGAYHIAKKGLMLLNKIDQADLTKKVDLSISNRDWLQFVQ
KNK*
SEQ MEQEYYLGLDMGTGSVGWAVTDSEYHVLRKHGKALWGVRLFESASTAEERRMFRTSRRRLDRRNWRIEIL
ID QEIFAEEISKKDPGFFLRMKESKYYPEDKRDINGNCPELPYALFVDDDFTDKDYHKKFPTIYHLRKMLMNTEE
NO: TPDIRLVYLAIHHMNIKHRGHFLLSGDINEIKEFGTTFSKLLENIKNEELDWNLELGKEEYAVVESILKDNMLN
9 RSTKKTRLIKALKAKSICEKAVLNLLAGGTVKLSDIFGLEELNETERPKISFADNGYDDYIGEVENELGEQFYII
ETAKAVYDWAVLVEILGKYTSISEAKVATYEKHKSDLQFLKKIVRKYLTKEEYKDIFVSTSDKLKNYSAYIG
MTKINGKKVDLQSKRCSKEEFYDFIKKNVLKKLEGQPEYEYLKEELERETFLPKQVNRDNGVIPYQIHLYELK
KILGNLRDKIDLIKENEDKLVQLFEFRIPYYVGPLNKIDDGKEGKFTWAVRKSNEKIYPWNFENVVDIEASAE
KFIRRMTNKCTYLMGEDVLPKDSLLYSKYMVLNELNNVKLDGEKLSVELKQRLYTDVFCKYRKVTVKKIK
NYLKCEGHSGNVEITGIDGDFKASLTAYHDFKEILTGTELAKKDKENHTNIVLFGDDKKLLKKRLNRLYPQIT
PNQLKKICALSYTGWGRFSKKFLEEITAPDPETGEVWNIITALWESNNNLMQLLSNEYRFNIEEVETYNMGKQ
TKTLSYETVENMYVSPSVKRQIWQTLKIVKELEKVMKESPKRVFIEMAREKQESKRTESRKKQLIDLYKACK
NEEKDWVKELGDQEEQKLRSDKLYLYYTQKGRCMYSGEVIELKDLWDNTKYDIDHIYPQSKTNIDDSLNNR
VLVKKKYNATKSDKYPLNENIRHERKGFWKSLLDGGFISKEKYERLIRNTELSPEELAGFIERQIVETRQSTKA
VAEILKQVFPESEIVYVKAGTVSRFRKDFELLKVREVNDLHHAKDAYLNIVVGNSYYVKFTKNASWFIKENP
GRTYNLKKMFTSGWNIERNGEVAWEVGKKGTIVTVKQIMNKNNILVTRQVHEAKGGLFDQQIMKKGKGQI
AIKETDERLASIEKYGGYNKAAGAYFMLVESKDKKGKTIRTIEFIPLYLKNKIESDESIALNFLEKGRGLKEPKI
LLKKIKIDTLFDVDGFKMWLSGRTGDRLLFKCANQLILDEKIIVTMKKIVKFIQRRQENRELKLSDKDGIDNE
VLMEIYNTFVDKLENTVYRIRLSEQAKTLIDKQKEFERLSLEDKSSTLFEILHIFQCQSSAANLKMIGGPGKAGI
LVMNNNISKCNKISIINQSPTGIFENEIDLLK
SEQ MNKFENFTGLYPISKTLRFELIPQGKTLEYIEKSEILENDNYRAEKYEEVKDIIDGYHKWFINETLHDLHINWSE
ID LKVALENNRIEKSDASKKELQRVQKIKREEIYNAFIEHEAFQYLFKENLLSDLLPIQIEQSEDLDAEKKKQAVE
NO: TFNRFSTYFTGFHENRKNIYSKEGISTSVTYRIVHDNFPKFLENMKVFEILRNECPEVISDTANELAPFIDGVRIE
10 DIFLIDFFNSTFSQNGIDYYNRILGGVTTETGEKYRGINEFTNLYRQQHPEFGKSKKATKMVVLFKQILSDRDT
LSFIPEMFGNDKQVQNSIQLFYNREISQFENEGVKTDVCTALATLTSKIAEFDTEKIYIQQPELPNVSQRLFGSW
NELNACLFKYAELKFGTAEKVANRKKIDKWLKSDLFSFTELNKALEFSGKDERIENYFSETGIFAQLVKTGFD
EAQSILETEYTSEVHLKDQQTDIEKIKTFLDALQNLMHLLKSLCVSEEADRDAAFYNEFDMLYNQLKLVVPL
YNKVRNYITQKLFRSDKIKIYFENKGQFLGGWVDSQTENSDNGTQAGGYIFRKENVINEYDYYLGICSDPKLF
RRTTIVSENDRSSFERLDYYQLKTASVYGNSYCGKHPYTEDKNELVNSIDRFVHLSGNNILIEKIAKDKVKSNP
TTNTPSGYLNFIHREAPNTYECLLQDENFVSLNQRVVSALKATLATLVRVPKALVYAKKDYHLFSEIINDIDE
LSYEKAFSYFPVSQTEFENSSNRTIKPLLLFKISNKDLSFAENFEKGNRQKIGKKNLHTLYFEALMKGNQDTIDI
GTGMVFHRVKSLNYNEKTLKYGHHSTQLNEKFSYPIIKDKRFASDKFLFHLSTEINYKEKRKPLNNSHEFLTN
NPDINIIGLDRGERHLIYLTLINQKGEILRQKTFNIVGNTNYHEKLNQREKERDNARKSWATIGKIKELKEGFLS
LVIHEIAKIMVENNAIVVLEDLNFGFKRGRFKVEKQIYQKFEKMLIDKLNYLVFKDKKANEAGGVLKGYQLA
EKFESFQKMGKQSGFLFYVPAAYTSKIDPTTGFVNMLNLNYTNMKDAQTLLSGNIDKISFNADANYFEFELD
YEKFKTNQTDHTNKWTICTVGEKRFTYNSATKETTTVNVTEDLKKLLDKFEVKYSNGDNIKDEICRQTDAKF
FEIILWLLKLTMQNIRNSNTKTEEDFILSPVKNSNGEFFRSNDDANGIWPADADANGAYHIALKGLYLVKECF
NKNEKSLKIEHKNWFKFAQTRFNGSLTKNG*
SEQ MENFKNLYPINKTLRFELRPYGKTLENFKKSGLLEKDAFKANSRRSMQAIIDEKFKETIEERLKYTEFSECDLG
ID NMTSKDKKITDKAATNLKKQVILSFDDEIFNNYLKPDKNIDALFKNDPSNPVISTFKGFTTYFVNFFEIRKHIFK
NO: GESSGSMAYRIIDENLTTYLNNIEKIKKLPEELKSQLEGIDQIDKLNNYNEFITQSGITHYNEIIGGISKSENVKIQ
11 GINEGINLYCQKNKVKLPRLTPLYKMILSDRVSNSFVLDTIENDTELIEMISDLINKTEISQDVIMSDIQNIFIKY
KQLGNLPGISYSSIVNAICSDYDNNFGDGKRKKSYENDRKKHLETNVYSINYISELLTDTDVSSNIKMRYKEL
EQNYQVCKENFNATNWMNIKNIKQSEKTNLIKDLLDILKSIQRFYDLFDIVDEDKNPSAEFYTWLSKNAEKLD
FEFNSVYNKSRNYLTRKQYSDKKIKLNFDSPTLAKGWDANKEIDNSTIIMRKFNNDRGDYDYFLGIWNKSTP
ANEKIIPLEDNGLFEKMQYKLYPDPSKMLPKQFLSKIWKAKHPTTPEFDKKYKEGRHKKGPDFEKEFLHELID
CFKHGLVNHDEKYQDVFGFNLRNTEDYNSYTEFLEDVERCNYNLSFNKIADTSNLINDGKLYVFQIWSKDFSI
DSKGTKNLNTIYFESLFSEENMIEKMFKLSGEAEIFYRPASLNYCEDIIKKGHHHAELKDKFDYPIIKDKRYSQ
DKFFFHVPMVINYKSEKLNSKSLNNRTNENLGQFTHIIGIDRGERHLIYLTVVDVSTGEIVEQKHLDEIINTDTK
GVEHKTHYLNKLEEKSKTRDNERKSWEAIETIKELKEGYISHVINEIQKLQEKYNALIVMENLNYGFKNSRIK
VEKQVYQKFETALIKKFNYIIDKKDPETYIHGYQLTNPITTLDKIGNQSGIVLYIPAWNTSKIDPVTGFVNLLYA
DDLKYKNQEQAKSFIQKIDNIYFENGEFKFDIDFSKWNNRYSISKTKWTLTSYGTRIQTFRNPQKNNKWDSAE
YDLTEEFKLILNIDGTLKSQDVETYKKFMSLFKLMLQLRNSVTGTDIDYMISPVTDKTGTHFDSRENIKNLPA
DADANGAYNIARKGIMAIENIMNGISDPLKISNEDYLKYIQNQQE
SEQ MTQFEGFTNLYQVSKTLRFELIPQGKTLKHIQEQGFIEEDKARNDHYKELKPIIDRIYKTYADQCLQLVQLDW
ID ENLSAAIDSYRKEKTEETRNALIEEQATYRNAIHDYFIGRTDNLTDAINKRHAEIYKGLFKAELFNGKVLKQL
NO: GTVTTTEHENALLRSFDKFTTYFSGFYENRKNVFSAEDISTAIPHRIVQDNFPKFKENCHIFTRLITAVPSLREH
12 FENVKKAIGIFVSTSIEEVFSFPFYNQLLTQTQIDLYNQLLGGISREAGTEKIKGLNEVLNLAIQKNDETAHIIAS
LPHRFIPLFKQILSDRNTLSFILEEFKSDEEVIQSFCKYKTLLRNENVLETAEALFNELNSIDLTHIFISHKKLETIS
SALCDHWDTLRNALYERRISELTGKITKSAKEKVQRSLKHEDINLQEIISAAGKELSEAFKQKTSEILSHAHAA
LDQPLPTTLKKQEEKEILKSQLDSLLGLYHLLDWFAVDESNEVDPEFSARLTGIKLEMEPSLSFYNKARNYAT
KKPYSVEKFKLNFQMPTLASGWDVNKEKNNGAILFVKNGLYYLGIMPKQKGRYKALSFEPTEKTSEGFDKM
YYDYFPDAAKMIPKCSTQLKAVTAHFQTHTTPILLSNNFIEPLEITKEIYDLNNPEKEPKKFQTAYAKKTGDQK
GYREALCKWIDFTRDFLSKYTKTTSIDLSSLRPSSQYKDLGEYYAELNPLLYHISFQRIAEKEIMDAVETGKLY
LFQIYNKDFAKGHHGKPNLHTLYWTGLFSPENLAKTSIKLNGQAELFYRPKSRMKRMAHRLGEKMLNKKLK
DQKTPIPDTLYQELYDYVNHRLSHDLSDEARALLPNVITKEVSHEIIKDRRFTSDKFFFHVPITLNYQAANSPS
KFNQRVNAYLKEHPETPIIGIDRGERNLIYITVIDSTGKILEQRSLNTIQQFDYQKKLDNREKERVAARQAWSV
VGTIKDLKQGYLSQVIHEIVDLMIHYQAVVVLENLNFGFKSKRTGIAEKAVYQQFEKMLIDKLNCLVLKDYP
AEKVGGVLNPYQLTDQFTSFAKMGTQSGFLFYVPAPYTSKIDPLTGFVDPFVWKTIKNHESRKHFLEGFDFL
HYDVKTGDFILHFKMNRNLSFQRGLPGFMPAWDIVFEKNETQFDAKGTPFIAGKRIVPVIENHRFTGRYRDLY
PANELIALLEEKGIVFRDGSNILPKLLENDDSHAIDTMVALIRSVLQMRNSNAATGEDYINSPVRDLNGVCFDS
RFQNPEWPMDADANGAYHIALKGQLLLNHLKESKDLKLQNGISNQDWLAYIQELRN*
SEQ MAVKSIKVKLRLDDMPEIRAGLWKLHKEVNAGVRYYTEWLSLLRQENLYRRSPNGDGEQECDKTAEECKA
ID ELLERLRARQVENGHRGPAGSDDELLQLARQLYELLVPQAIGAKGDAQQIARKFLSPLADKDAVGGLGIAKA
NO: GNKPRWVRMREAGEPGWEEEKEKAETRKSADRTADVLRALADFGLKPLMRVYTDSEMSSVEWKPLRKGQ
13 AVRTWDRDMFQQAIERMMSWESWNQRVGQEYAKLVEQKNRFEQKNFVGQEHLVHLVNQLQQDMKEASP
GLESKEQTAHYVTGRALRGSDKVFEKWGKLAPDAPFDLYDAEIKNVQRRNTRRFGSHDLFAKLAEPEYQAL
WREDASFLTRYAVYNSILRKLNHAKMFATFTLPDATAHPIWTRFDKLGGNLHQYTFLFNEFGERRHAIRFHK
LLKVENGVAREVDDVTVPISMSEQLDNLLPRDPNEPIALYFRDYGAEQHFTGEFGGAKIQCRRDQLAHMHRR
RGARDVYLNVSVRVQSQSEARGERRPPYAAVFRLVGDNHRAFVHFDKLSDYLAEHPDDGKLGSEGLLSGLR
VMSVDLGLRTSASISVFRVARKDELKPNSKGRVPFFFPIKGNDNLVAVHERSQLLKLPGETESKDLRAIREER
QRTLRQLRTQLAYLRLLVRCGSEDVGRRERSWAKLIEQPVDAANHMTPDWREAFENELQKLKSLHGICSDK
EWMDAVYESVRRVWRHMGKQVRDWRKDVRSGERPKIRGYAKDVVGGNSIEQIEYLERQYKFLKSWSFFG
KVSGQVIRAEKGSRFAITLREHIDHAKEDRLKKLADRIIMEALGYVYALDERGKGKWVAKYPPCQLILLEELS
EYQFNNDRPPSENNQLMQWSHRGVFQELINQAQVHDLLVGTMYAAFSSRFDARTGAPGIRCRRVPARCTQE
HNPEPFPWWLNKFVVEHTLDACPLRADDLIPTGEGEIFVSPFSAEEGDFHQIHADLNAAQNLQQRLWSDFDIS
QIRLRCDWGEVDGELVLIPRLTGKRTADSYSNKVFYTNTGVTYYERERGKKRRKVFAQEKLSEEEAELLVEA
DEAREKSVVLMRDPSGIINRGNWTRQKEFWSMVNQRIEGYLVKQIRSRVPLQDSACENTGDI*
SEQ MATRSFILKIEPNEEVKKGLWKTHEVLNHGIAYYMNILKLIRQEAIYEHHEQDPKNPKKVSKAEIQAELWDFV
ID LKMQKCNSFTHEVDKDVVFNILRELYEELVPSSVEKKGEANQLSNKFLYPLVDPNSQSGKGTASSGRKPRWY
NO: NLKIAGDPSWEEEKKKWEEDKKKDPLAKILGKLAEYGLIPLFIPFTDSNEPIVKEIKWMEKSRNQSVRRLDKD
14 MFIQALERFLSWESWNLKVKEEYEKVEKEHKTLEERIKEDIQAFKSLEQYEKERQEQLLRDTLNTNEYRLSK
RGLRGWREIIQKWLKMDENEPSEKYLEVFKDYQRKHPREAGDYSVYEFLSKKENHFIWRNHPEYPYLYATF
CEIDKKKKDAKQQATFTLADPINHPLWVRFEERSGSNLNKYRILTEQLHTEKLKKKLTVQLDRLIYPTESGG
WEEKGKVDIVLLPSRQFYNQIFLDIEEKGKHAFTYKDESIKFPLKGTLGGARVQFDRDHLRRYPHKVESGNV
GRIYFNMTVNIEPTESPVSKSLKIHRDDFPKFVNFKPKELTEWIKDSKGKKLKSGIESLEIGLRVMSIDLGQRQA
AAASIFEVVDQKPDIEGKLFFPIKGTELYAVHRASFNIKLPGETLVKSREVLRKAREDNLKLMNQKLNFLRNV
LHFQQFEDITEREKRVTKWISRQENSDVPLVYQDELIQIRELMYKPYKDWVAFLKQLHKRLEVEIGKEVKHW
RKSLSDGRKGLYGISLKNIDEIDRTRKFLLRWSLRPTEPGEVRRLEPGQRFAIDQLNHLNALKEDRLKKMANT
IIMHALGYCYDVRKKKWQAKNPACQIILFEDLSNYNPYEERSRFENSKLMKWSRREIPRQVALQGEIYGLQV
GEVGAQFSSRFHAKTGSPGIRCSVVTKEKLQDNRFFKNLQREGRLTLDKIAVLKEGDLYPDKGGEKFISLSKD
RKLVTTHADINAAQNLQKRFWTRTHGFYKVYCKAYQVDGQTVYIPESKDQKQKIIEEFGEGYFILKDGVYE
WGNAGKLKIKKGSSKQSSSELVDSDILKDSFDLASELKGEKLMLYRDPSGNVFPSDKWMAAGVFFGKLERIL
ISKLTNQYSISTIEDDSSKQSM*
SEQ MPTRTINLKLVLGKNPENATLRRALFSTHRLVNQATKRIEEFLLLCRGEAYRTVDNEGKEAEIPRHAVQEEAL
ID AFAKAAQRHNGCISTYEDQEILDVLRQLYERLVPSVNENNEAGDAQAANAWVSPLMSAESEGGLSVYDKVL
NO: DPPPVWMKLKEEKAPGWEAASQIWIQSDEGQSLLNKPGSPPRWIRKLRSGQPWQDDFVSDQKKKQDELTKG
15 NAPLIKQLKEMGLLPLVNPFFRHLLDPEGKGVSPWDRLAVRAAVAHFISWESWNHRTRAEYNSLKLRRDEFE
AASDEFKDDFTLLRQYEAKRHSTLKSIALADDSNPYRIGVRSLRAWNRVREEWIDKGATEEQRVTILSKLQT
QLRGKFGDPDLFNWLAQDRHVHLWSPRDSVTPLVRINAVDKVLRRRKPYALMTFAHPRFHPRWILYEAPGG
SNLRQYALDCTENALHITLPLLVDDAHGTWIEKKIRVPLAPSGQIQDLTLEKLEKKKNRLYYRSGFQQFAGLA
GGAEVLFHRPYNIEHDERSEESLLERPGAVWFKLTLDVATQAPPNWLDGKGRVRTPPEVHHFKTALSNKSKH
TRTLQPGLRVLSVDLGMRTFASCSVFELIEGKPETGRAFPVADERSMDSPNKLWAKHERSFKLTLPGETPSRK
EEEERSIARAEIYALKRDIQRLKSLLRLGEEDNDNRRDALLEQFFKGWGEEDVVPGQAFPRSLFQGLGAAPFR
STPELWRQHCQTYYDKAEACLAKHISDWRKRTRPRPTSREMWYKTRSYHGGKSIWMLEYLDAVRKLLLSW
SLRGRTYGAINRQDTARFGSLASRLLHHINSLKEDRIKTGADSIVQAARGYIPLPHGKGWEQRYEPCQLILFED
LARYRFRVDRPRRENSQLMQWNHRAIVAETTMQAELYGQIVENTAAGFSSRFHAATGAPGVRCRFLLERDF
DNDLPKPYLLRELSWMLGNTKVESEEEKLRLLSEKIRPGSLVPWDGGEQFATLHPKRQTLCVIHADMNAAQ
NLQRRFFGRCGEAFRLVCQPHGDDVLRLASTPGARLLGALQQLENGQGAFELVRDMGSTSQMNRFVMKSL
GKKKIKPLQDNNGDDELEDVLSVLPEEDDTGRITVFRDSSGIFFPCNVWIPAKQFWPAVRAMIWKVMASHSL
G*
SEQ MTKLRHRQKKLTHDWAGSKKREVLGSNGKLQNPLLMPVKKGQVTEFRKAFSAYARATKGEMTDGRKNMF
ID THSFEPFKTKPSLHQCELADKAYQSLHSYLPGSLAHFLLSAHALGFRIFSKSGEATAFQASSKIEAYESKLASE
NO: LACVDLSIQNLTISTLFNALTTSVRGKGEETSADPLIARFYTLLTGKPLSRDTQGPERDLAEVISRKIASSFGTW
16 KEMTANPLQSLQFFEEELHALDANVSLSPAFDVLIKMNDLQGDLKNRTIVFDPDAPVFEYNAEDPADIIIKLTA
RYAKEAVIKNQNVGNYVKNAITTTNANGLGWLLNKGLSLLPVSTDDELLEFIGVERSHPSCHALIELIAQLEA
PELFEKNVFSDTRSEVQGMIDSAVSNHIARLSSSRNSLSMDSEELERLIKSFQIHTPHCSLFIGAQSLSQQLESLP
EALQSGVNSADILLGSTQYMLTNSLVEESIATYQRTLNRINYLSGVAGQINGAIKRKAIDGEKIHLPAAWSELI
SLPFIGQPVIDVESDLAHLKNQYQTLSNEFDTLISALQKNFDLNFNKALLNRTQHFEAMCRSTKKNALSKPEIV
SYRDLLARLTSCLYRGSLVLRRAGIEVLKKHKIFESNSELREHVHERKHFVFVSPLDRKAKKLLRLTDSRPDL
LHVIDEILQHDNLENKDRESLWLVRSGYLLAGLPDQLSSSFINLPIITQKGDRRLIDLIQYDQINRDAFVMLVTS
AFKSNLSGLQYRANKQSFVVTRTLSPYLGSKLVYVPKDKDWLVPSQMFEGRFADILQSDYMVWKDAGRLC
VIDTAKHLSNIKKSVFSSEEVLAFLRELPHRTFIQTEVRGLGVNVDGIAFNNGDIPSLKTFSNCVQVKVSRTNT
SLVQTLNRWFEGGKVSPPSIQFERAYYKKDDQIHEDAAKRKIRFQMPATELVHASDDAGWTPSYLLGIDPGE
YGMGLSLVSINNGEVLDSGFIHINSLINFASKKSNHQTKVVPRQQYKSPYANYLEQSKDSAAGDIAHILDRLIY
KLNALPVFEALSGNSQSAADQVWTKVLSFYTWGDNDAQNSIRKQHWFGASHWDIKGMLRQPPTEKKPKPYI
AFPGSQVSSYGNSQRCSCCGRNPIEQLREMAKDTSIKELKIRNSEIQLFDGTIKLFNPDPSTVIERRRHNLGPSRI
PVADRTFKNISPSSLEFKELITIVSRSIRHSPEFIAKKRGIGSEYFCAYSDCNSSLNSEANAAANVAQKFQKQLFF
EL*
SEQ MKRILNSLKVAALRLLFRGKGSELVKTVKYPLVSPVQGAVEELAEAIRHDNLHLFGQKEIVDLMEKDEGTQV
ID YSVVDFWLDTLRLGMFFSPSANALKITLGKFNSDQVSPFRKVLEQSPFFLAGRLKVEPAERILSVEIRKIGKRE
NO: NRVENYAADVETCFIGQLSSDEKQSIQKLANDIWDSKDHEEQRMLKADFFAIPLIKDPKAVTEEDPENETAGK
17 QKPLELCVCLVPELYTRGFGSIADFLVQRLTLLRDKMSTDTAEDCLEYVGIEEEKGNGMNSLLGTFLKNLQG
DGFEQIFQFMLGSYVGWQGKEDVLRERLDLLAEKVKRLPKPKFAGEWSGHRMFLHGQLKSWSSNFFRLFNE
TRELLESIKSDIQHATMLISYVEEKGGYHPQLLSQYRKLMEQLPALRTKVLDPEIEMTHMSEAVRSYIMIHKS
VAGFLPDLLESLDRDKDREFLLSIFPRIPKIDKKTKEIVAWELPGEPEEGYLFTANNLFRNFLENPKHVPRFMA
ERIPEDWTRLRSAPVWFDGMVKQWQKVVNQLVESPGALYQFNESFLRQRLQAMLTVYKRDLQTEKFLKLL
ADVCRPLVDFFGLGGNDIIFKSCQDPRKQWQTVIPLSVPADVYTACEGLAIRLRETLGFEWKNLKGHEREDFL
RLHQLLGNLLFWIRDAKLVVKLEDWMNNPCVQEYVEARKAIDLPLEIFGFEVPIFLNGYLFSELRQLELLLRR
KSVMTSYSVKTTGSPNRLFQLVYLPLNPSDPEKKNSNNFQERLDTPTGLSRRFLDLTLDAFAGKLLTDPVTQE
LKTMAGFYDHLFGFKLPCKLAAMSNHPGSSSKMVVLAKPKKGVASNIGFEPIPDPAHPVFRVRSSWPELKYL
EGLLYLPEDTPLTIELAETSVSCQSVSSVAFDLKNLTTILGRVGEFRVTADQPFKLTPIIPEKEESFIGKTYLGLD
AGERSGVGFAIVTVDGDGYEVQRLGVHEDTQLMALQQVASKSLKEPVFQPLRKGTFRQQERIRKSLRGCYW
NFYHALMIKYRAKVVHEESVGSSGLVGQWLRAFQKDLKKADVLPKKGGKNGVDKKKRESSAQDTLWGGA
FSKKEEQQIAFEVQAAGSSQFCLKCGWWFQLGMREVNRVQESGVVLDWNRSIVTFLIESSGEKVYGFSPQQL
EKGFRPDIETFKKMVRDFMRPPMFDRKGRPAAAYERFVLGRRHRRYRFDKVFEERFGRSALFICPRVGCGNF
DHSSEQSAVVLALIGYIADKEGMSGKKLVYVRLAELMAEWKLKKLERSRVEEQSSAQ*
SEQ MAESKQMQCRKCGASMKYEVIGLGKKSCRYMCPDCGNHTSARKIQNKKKRDKKYGSASKAQSQRIAVAG
ID ALYPDKKVQTIKTYKYPADLNGEVHDSGVAEKIAQAIQEDEIGLLGPSSEYACWIASQKQSEPYSVVDFWFD
NO: AVCAGGVFAYSGARLLSTVLQLSGEESVLRAALASSPFVDDINLAQAEKFLAVSRRTGQDKLGKRIGECFAE
18 GRLEALGIKDRNIREFVQAIDVAQTAGQRFAAKLKIFGISQNIPEAKQWNNDSGLTVCILPDYYVPEENRADQL
VVLLRRLREIAYCMGIEDEAGFEHLGIDPGALSNFSNGNPKRGFLGRLLNNDIIALANNMSAMTPYWEGRKG
ELIERLAWLKHRAEGLYLKEPHFGNSWADHRSRIFSRIAGWLSGCAGKLKIAKDQISGVRTDLFLLKRLLDAV
PQSAPSPDFIASISALDRFLEAAESSQDPAEQVRALYAFHLNAPAVRSIANKAVQRSDSQEWLIKELDAVDHL
EFNKAFPFFSDTGKKKKKGANSNGAPSEEEYTETESIQQPEDAEQEVNGQEGNGASKNQKKFQRIPRFFGEGS
RSEYRILTEAPQYFDMFCNNMRAIFMQLESQPRKAPRDFKCFLQNRLQKLYKQTFLNARSNKCRALLESVLIS
WGEFYTYGANEKKFRLRHEASERSSDPDYVVQQALEIARRLFLFGFEWRDCSAGERVDLVEIHKKAISFLLAI
TQAEVSVGSYNWLGNSTVSRYLSVAGTDTLYGTQLEEFLNATVLSQNIRGLAIRLSSQELKDGFDVQLESSCQ
DNLQHLLVYRASRDLAACKRATCPAELDPKILVLPVGAFIASVMKMIERGDEPLAGAYLRHRPHSFGWQIRV
RGVAEVGNIDQGTALAFQKPTESEPFKIKPFSAQYGPVLWLNSSSYSQSQYLDGFLSQPKNWSNIRVLPQAGS
VRVEQRVALIWNLQAGKNIRLERSGARAFFNIPVPFSFRPSGSGDEAVLAPNRYLGLFPHSGGIEYAVVDVLDS
AGFKILERGTIAVNGFSQKRGERQEEAHREKQRRGISDIGRKKPVQAEVDAANELHRKYTDVATRLGCRIVV
QWAPQPKPGTAPTAQTVYARAVRTEAPRSGNQEDHARMKSSWGYTWGTYWEKRKPEDILGISTQVYWTG
GIGESCPAVAVALLGHIRATSTQTEWEKEEVVFGRLKKFFPS*
SEQ MEKRINKIRKKLSADNATKPVSRSGPMKTLLVRVMTDDLKKRLEKRRKKPEVNIPQVISNNAANNLRMLLDD
ID YTKMKEAILQVYWQEFKDDHVGLMCKFAQPASKKIDQNKLKPENIDEKGNLTTAGFACSQCGQPLFVYKLE
NO: QVSEKGKAYTNYFGRCNVAEHEKLILLAQLKPEKDSDEAVTYSLGKFGQRALDFYSIHVTKESTHPVKPLAQ
19 IAGNRYASGPVGKALSDACMGTIASFLSKYQDIIIEHQKVVKGNQKRLESLRELAGKENLEYPSVTLPPQPHT
KEGVDAYNEVIARVRMWVNLNLWQKLKLSRDDAKPLLRLKGFPSFPVVERRENEVDWWNTINEVKKLIDA
KRDMGRVFWSGVTAEKRNTILEGYNYLPNENDHKKREGSLENPKKPAKRQFGDLLLYLEKKYAGDWGKVF
DEAWERIDKKIAGLTSHIEREEARNAEDAQSKAVLTDWLRAKASFVLERLKEMDEKEFYACEIQLQKWYGD
LRGNPFAVEAENRVVDISGFSIGSDGHSIQYRNLLAWKYLENGKREFYLLMNYGKKGRIRFTDGTDIKKSGK
WQGLLYGGGKAKVIDLTFDPDDEQLIILPLAFGTRQGREFIWNDLLSLETGLIKLANGRVIEKTIYNKKIGRDE
PALFVALTFERREVVDPSNIKPVNLIGVDRGENIPAVIALTDPEGCPLPEFKDSSGGPTDILRIGEGYKEKQRAI
QAAKEVEQRRAGGYSRKFASKSRNLADDMVRNSARDLFYHAVTHDAVLVFENLSRGFGRQGKRTFMTERQ
YTKMEDWLTAKLAYEGLTSKTYLSKTLAQYTSKTCSNCGFTITTADYDGMLVRLKKTSDGWATTLNNKEL
KAEGQITYYNRYKRQTVEKELSAELDRLSEESGNNDISKWTKGRRDEALFLLKKRFSHRPVQEQFVCLDCGH
EVHADEQAALNIARSWLFLNSNSTEFKSYKSGKQPFVGAWQAFYKRRLKEVWKPNA
SEQ MKRINKIRRRLVKDSNTKKAGKTGPMKTLLVRVMTPDLRERLENLRKKPENIPQPISNTSRANLNKLLTDYTE
ID MKKAILHVYWEEFQKDPVGLMSRVAQPAPKNIDQRKLIPVKDGNERLTSSGFACSQCCQPLYVYKLEQVND
NO: KGKPHTNYFGRCNVSEHERLILLSPHKPEANDELVTYSLGKFGQRALDFYSIHVTRESNHPVKPLEQIGGNSC
20 ASGPVGKALSDACMGAVASFLTKYQDIILEHQKVIKKNEKRLANLKDIASANGLAFPKITLPPQPHTKEGIEA
YNNVVAQIVIWVNLNLWQKLKIGRDEAKPLQRLKGFPSFPLVERQANEVDWWDMVCNVKKLINEKKEDGK
VFWQNLAGYKRQEALLPYLSSEEDRKKGKKFARYQFGDLLLHLEKKHGEDWGKVYDEAWERIDKKVEGLS
KHIKLEEERRSEDAQSKAALTDWLRAKASFVIEGLKEADKDEFCRCELKLQKWYGDLRGKPFAIEAENSILDI
SGFSKQYNCAFIWQKDGVKKLNLYLIINYFKGGKLRFKKIKPEAFEANRFYTVINKKSGEIVPNIEVNFNFDDP
NLIILPLAFGKRQGREFIWNDLLSLETGSLKLANGRVIEKTLYNRRTRQDEPALFVALTFERREVLDSSNIKPM
NLIGIDRGENIPAVIALTDPEGCPLSRFKDSLGNPTHILRIGESYKEKQRTIQAAKEVEQRRAGGYSRKYASKA
KNLADDMVRNTARDLLYYAVTQDAMLIFENLSRGFGRQGKRTFMAERQYTRMEDWLTAKLAYEGLPSKT
YLSKTLAQYTSKTCSNCGFTITSADYDRVLEKLKKTATGWMTTINGKELKVEGQITYYNRYKRQNVVKDLS
VELDRLSEESVNNDISSWTKGRSGEALSLLKKRFSHRPVQEKFVCLNCGFETHADEQAALNIARSWLFLRSQE
YKKYQTNKTTGNTDKRAFVETWQSFYRKKLKEVWKP
SEQ atgGGAAAAATGTATTATCTTGGTCTGGATATAGGAACAAATTCTGTTGGATATGCCGTAACCGACCCATC
ID GTACCATTTGCTCAAATTTAAAGGCGAACCGATGTGGGGTGCCCACGTGTTTGCTGCGGGGAATCAATC
NO: AGCTGAACGGAGAAGCTTTCGTACGAGCCGCAGACGCCTTGACCGCAGGCAACAGCGTGTCAAACTGGT
21 TCAAGAAATCTTTGCTCCCGTGATTAGTCCCATTGATCCACGTTTTTTTATCAGACTTCATGAGAGCGCTT
TATGGCGGGATGATGTGGCTGAAACGGATAAACATATTTTCTTTAATGACCCGACCTATACGGATAAGG
AATATTATTCTGACTATCCAACCATCCATCATCTCATTGTGGACCTTATGGAAAGCAGTGAAAAGCATGA
CCCGCGGCTTGTTTATTTGGCTGTTGCCTGGCTGGTTGCTCATCGTGGTCATTTCCTCAATGAAGTGGATA
AGGATAATATTGGGGATGTCCTGAGTTTTGACGCCTTTTATCCTGAGTTTCTGGCATTTCTTTCCGATAAT
GGGGTGTCACCTTGGGTATGTGAGTCAAAAGCACTCCAAGCGACCCTGCTTTCACGAAACTCCGTCAAC
GATAAGTATAAAGCCTTGAAGTCTCTGATCTTTGGCAGCCAAAAGCCGGAGGATAATTTTGATGCCAAT
ATCAGTGAAGATGGACTTATCCAACTTTTAGCAGGAAAAAAGGTCAAGGTCAATAAACTTTTTCCTCAA
GAAAGTAATGATGCTTCCTTTACACTCAATGATAAGGAAGATGCAATTGAGGAAATCTTAGGAACGCTT
ACACCGGATGAGTGTGAATGGATTGCGCATATTAGGAGGCTGTTTGATTGGGCCATCATGAAACATGCT
CTCAAAGATGGCAGAACAATCTCCGAATCGAAAGTAAAGCTCTATGAACAGCATCACCATGACTTGACA
CAGCTCAAGTATTTTGTGAAGACCTATCTAGCAAAGGAATATGATGACATTTTTCGAAACGTAGATAGT
GAAACAACCAAAAACTATGTCGCATATTCCTATCATGTAAAAGAAGTCAAGGGTACATTGCCCAAAAAT
AAGGCAACCCAAGAAGAATTTTGCAAGTATGTCCTTGGAAAGGTAAAGAACATCGAATGCAGTGAAGC
TGATAAGGTTGATTTTGATGAAATGATTCAGCGTCTTACAGACAATTCCTTTATGCCGAAACAAGTATCA
GGTGAAAACAGGGTTATCCCTTACCAGCTTTACTATTATGAACTAAAGACTATTTTGAATAAAGCCGCTT
CTTATCTGCCTTTTTTGACCCAATGCGGAAAAGATGCCATCTCCAATCAAGATAAGCTCCTTTCCATCAT
GACCTTTCGGATTCCGTATTTCGTTGGGCCCTTGCGCAAGGACAATTCAGAGCATGCCTGGCTGGAACGA
AAAGCAGGGAAAATCTATCCGTGGAATTTTAACGACAAAGTTGACCTTGATAAAAGTGAAGAAGCGTTC
ATTCGGAGAATGACGAATACCTGCACTTATTATCCCGGTGAAGATGTTTTGCCACTTGACTCCCTTATTT
ATGAAAAATTCATGATCCTCAATGAAATCAATAATATCCGAATTGATGGTTATCCTATTTCTGTAGATGT
AAAACAGCAGGTTTTTGGCCTCTTTGAAAAGAAGAGAAGAGTGACCGTAAAGGATATCCAGAATCTCCT
GCTTTCCTTGGGTGCCTTGGATAAGCATGGTAAATTGACGGGAATCGATACTACCATCCATAGCAATTAC
AATACATACCATCATTTTAAATCGCTCATGGAGCGTGGCGTTCTTACTCGTGATGATGTGGAACGCATTG
TGGAGCGTATGACCTATAGTGATGATACAAAACGCGTCCGTCTTTGGCTGAACAATAATTATGGAACGC
TCACTGCTGACGACGTAAAGCATATTTCAAGGCTCCGAAAGCATGATTTTGGCCGGCTTTCCAAAATGTT
CCTCACAGGCCTAAAGGGAGTTCATAAGGAAACGGGGGAACGAGCTTCCATTTTGGATTTTATGTGGAA
TACCAATGATAACTTGATGCAGCTTTTATCTGAATGTTATACTTTTTCGGATGAAATTACCAAGCTGCAG
GAAGCATACTATGCCAAGGCGCAGCTTTCCCTGAATGATTTTCTGGACTCCATGTATATTTCAAATGCTG
TCAAACGTCCTATCTATCGAACTCTTGCCGTTGTAAATGACATACGCAAAGCCTGTGGGACGGCGCCAA
AACGCATTTTTATCGAAATGGCAAGAGATGGGGAAAGCAAAAAGAAAAGGAGCGTAACAAGAAGAGA
ACAAATCAAGAATCTTTATAGGTCCATCCGCAAGGATTTTCAGCAGGAGGTAGATTTCCTTGAAAAAAT
CCTTGAAAACAAAAGCGATGGACAGCTGCAAAGCGATGCGCTCTATCTATACTTTGCGCAGCTTGGAAG
GGATATGTATACCGGGGACCCTATCAAGTTGGAGCATATCAAGGACCAGTCCTTCTATAATATTGATCAT
ATCTATCCCCAAAGCATGGTCAAGGACGATAGTCTTGATAACAAGGTGTTGGTTCAATCGGAAATTAAT
GGAGAGAAGAGCAGTCGATATCCTCTTGATGCTGCTATCCGTAATAAAATGAAGCCTCTTTGGGATGCTT
ATTATAACCATGGCCTGATTTCCCTCAAGAAGTATCAGCGTTTGACGCGGAGCACTCCCTTTACAGATGA
TGAAAAGTGGGATTTCATCAATCGGCAGCTTGTTGAGACAAGACAATCCACGAAGGCCTTGGCAATCTT
ACTAAAAAGGAAGTTCCCTGATACGGAGATTGTCTACTCCAAGGCAGGGCTTTCTTCTGATTTTCGGCAT
GAGTTTGGTCTCGTAAAATCGAGGAATATCAATGACCTGCACCATGCAAAGGACGCATTTCTTGCGATT
GTAACAGGAAATGTCTATCATGAACGCTTTAATCGCCGGTGGTTTATGGTGAACCAGCCCTATTCCGTCA
AGACCAAGACGTTGTTTACGCATTCTATTAAAAATGGTAATTTTGTAGCTTGGAATGGAGAAGAGGATC
TTGGCCGCATTGTTAAAATGTTAAAGCAAAATAAGAACACTATTCATTTCACGCGGTTCTCTTTTGATCG
AAAGGAAGGCCTGTTTGATATTCAGCCACTAAAAGCGTCAACCGGTCTTGTACCAAGAAAAGCCGGACT
AGACGTGGTAAAATATGGTGGCTATGACAAATCGACAGCAGCTTATTATCTCCTTGTTCGATTTACACTA
GAAGATAAAAAGACTCAACATAAATTGATGATGATTCCTGTAGAAGGCTTGTATAAAGCTCGAATTGAC
CATGATAAGGAATTCTTAACGGACTATGCACAAACTACAATCAGTGAAATCCTACAAAAAGATAAACAA
AAGGTGATAAATATAATGTTTCCAATGGGAACAAGGCACATTAAACTGAATTCCATGATTTCAATCGAT
GGTTTTTATCTTTCCATTGGAGGAAAGTCTAGTAAGGGAAAATCGGTGTTGTGTCATGCTATGGTACCTC
TTATTGTACCTCATAAGATAGAATGTTATATTAAGGCGATGGAGTCTTTTGCACGTAAATTTAAAGAAAA
TAATAAATTAAGGATTGTGGAAAAGTTTGATAAGATTACGGTGGAAGATAACTTGAACCTATACGAACT
ATTTTTACAAAAACTTCAACATAACCCATATAATAAGTTCTTCTCCACACAATTTGATGTGCTGACTAAT
GGAAGAAGTACATTTACTAAATTATCTCCAGAGGAACAAGTTCAAACGTTATTGAATATCTTATCAATTT
TTAAAACTTGTCGGAGCTCTGGCTGCGATTTAAAATCCATTAACGGTTCTGCTCAAGCTGCCAGAATTAT
GATCAGCGCAGATTTAACTGGACTCTCAAAAAAATATTCCGATATTCGGCTTGTTGAGCAATCAGCATCT
GGACTTTTTGTTAGTAAATCACAAAATCTTTTGGAGTATTTAtga
SEQ atgtcttcattaacaaaatttacaaataaatacagtaagcagctaaccataaaaaatgaactcatcccagtaggaaagactctcgagaacatta
ID aggaaaacggtctcatagatggagatgaacagetaaacgagaattatcaaaaagcaaagataatcgttgatgattttctacgagatttcataaa
NO: taaagctttaaataatacccaaataggaaattggagagaattagcagatgctttaaataaagaagatgaagataacatagaaaagctccaagac
22 aaaatcagaggaataattgtaagtaaattcgagacatttgatttgttttcttcttactcgataaagaaagacgaaaagataatagatgatgata
atgatgttgaagaagaggagctagatctaggaaaaaaaacttcctcatttaaatatatttttaagaaaaacctttttaaattagtacttccttc
ttatttaaagacaacaaatcaggataaactgaaaataatctcttcttttgataatttttctacctatttcagaggattctttgagaacagaaaa
aatattttcactaagaagcctatatctacgtcaattgcctacagaattgtccatgataactttccaaagtttctagataacatcagatgtttta
atgtgtggcaaacagaatgcccacagttaattgtaaaggctgataattatttaaaatcaaagaacgtcatagctaaagataaatctttagcaaa
ctattttactgtaggagcatatgattacttcttatcccagaatggcattgatttctacaacaacattatcggcggtctaccagcatttgctggt
catgagaaaatccaaggacttaatgaatttataaatcaagaatgccaaaaggacagcgaactaaaatctaaactgaaaaacagacatgctttca
aaatggctgttctatttaagcaaattctttcagatagagaaaaaagttttgttatagacgagttcgaatctgatgctcaggtcatagatgcggt
taagaacttctatgcagaacaatgtaaggataataatgttatttttaaccttctaaatcttatcaagaatatagcgttcttatctgatgatgaa
ttagatggaatttttatagaaggcaagtatttaagctctgtttcccaaaagctatattcagattggtcgaagcttcgaaatgatattgaagata
gtgcaaacagtaaacaaggaaataaagagttagcaaagaaaattaaaacaaataaaggcgatgttgaaaaggccataagtaaatatgagttttc
tttatcagaacttaactcaattgtacatgataatacaaaattcagtgaccttctttcttgtacgttacataaagtggctagcgaaaaactagtg
aaagttaatgaaggggactggccaaaacacctgaaaaataatgaagaaaaacaaaagataaaagagcctttagatgcattgttagaaatttata
atacattgctgatattcaactgcaagtcatttaataagaacggtaatttctatgttgattatgacagatgcataaatgagctttctagtgttgt
ttatttatataacaaaacaagaaattactgtacaaagaaaccttataacacagacaaattcaaattaaactttaacagtcctcaattaggagag
ggctttagtaagtcgaaagaaaatgactgtctgacattattatttaaaaaagacgacaattactatgttggaattatcagaaaaggggcaaaaa
ttaactttgatgatacacaagccattgcagacaatacagataactgtatatttaagatgaattatttcctattaaaagatgctaaaaagtttat
tcctaaatgttcaattcagttaaaagaagtaaaagcacattttaaaaaatcagaggatgattatatcctgagtgacaaagaaaaatttgcctct
ccccttgttattaagaaatcaacatttttattagcaacageacatgtaaaaggaaagaaaggaaacataaaaaaattccaaaaggaatattcta
aggaaaatccaacagaatatagaaattctctgaatgaatggattgcattttgtaaagaatttctaaaaacatataaggcggcaacaatctttga
cattacaacgttaaaaaaagctgaagaatatgctgatattgttgagttttataaggatgtagataatctttgttataaactagagttttgccct
attaaaacatctttcattgagaatcttattgataatggggacttatatttattcagaatcaataataaagatttcagttcaaaatctactggta
caaagaatcttcatacgctctatcttcaggcaatctttgatgaaagaaacctcaataatcctactattatgttaaatggcggagcagagttatt
ttatcgaaaagaaagcattgaacagaaaaataggataactcataaggcaggatcaattcttgtaaacaaggtttgtaaggatggaacaagtcta
gatgacaaaatcagaaacgaaatatatcaatatgaaaacaagtttattgatacattgtctgatgaagctaaaaaagttttacctaatgtaataa
aaaaagaagcaactcacgacataacaaaagataagcgatttacatcagataagttctttttccattgcccattaacaattaactataaggaagg
agatacaaaacaatttaacaatgaggttttatctttccttagaggtaatccagacattaatatcatcggaattgacagaggagaaagaaacctt
atatacgtaactgttattaatcagaaaggcgaaatacttgacagcgtttcgtttaacacagtaacaaacaagtcgagcaaaattgaacaaactg
ttgattatgaggaaaagcttgctgttagggaaaaagaaagaatagaagcaaaaagatcctgggattcaatatcaaagatagcaaccttaaaaga
aggttatctatcagctattgttcatgagatatgcctactgatgatcaaacacaacgcaatcgttgtacttgagaatctaaatgcaggatttaag
agaattagaggaggattatcagaaaagtctgtttatcagaaattcgagaagatgcttattaacaaactaaattactttgtatctaaaaaagaat
cagactggaataaacctagtggacttttaaatggtttacaactttcagaccagttcgagtcatttgagaaattaggaattcaatctgggttcat
cttctatgttcctgcagcatatacatctaagattgatcctacaacaggatttgcaaatgttcttaacttatccaaggtaagaaatgttgatgca
ataaagagttttttcagtaatttcaatgaaatttcatatagcaaaaaagaagctctctttaaattctcttttgatttagattccttatcaaaga
agggcttcagctcatttgtaaaattcagtaaatctaaatggaatgtatatacatttggagagagaataataaaaccaaagaataagcaagggta
tcgtgaagataagagaattaatttaacatttgaaatgaaaaaacttctgaatgaatataaagtaagttttgatcttgaaaacaacttaattcca
aatctaacctctgcaaatctgaaagataccttctggaaagaactattctttatttttaaaacaactctgcagcttagaaacagtgtaacaaatg
gcaaagaagatgtactgatttctccagtaaagaacgctaaaggagagttctttgtatcaggaactcataacaagacattacctcaagactgtga
tgcaaatggagcatatcatatcgccctaaaaggtctgatgattcttgaacgtaacaatcttgttagagaagaaaaagacacaaagaagataatg
gcaatttctaatgttgactggtttgagtatgttcaaaaaaggagaggtgtcctgtaa
SEQ ATGAACAACTATGATGAGTTTACCAAACTGTACCCAATACAGAAAACGATAAGGTTCGAATTGAAGCCG
ID CAGGGAAGAACGATGGAACACCTCGAAACATTCAACTTTTTCGAAGAGGACAGGGATAGAGCGGAGAA
NO: ATATAAGATTTTAAAGGAAGCAATCGACGAGTATCATAAGAAGTTTATAGACGAACATCTAACAAATAT
23 GTCTCTTGACTGGAATTCTTTAAAACAGATTTCAGAGAAATACTATAAGAGTAGAGAGGAAAAAGACAA
GAAAGTTTTTCTGTCAGAACAGAAACGCATGAGGCAAGAGATAGTTTCTGAGTTCAAAAAAGACGATCG
GTTTAAAGATCTTTTTTCAAAAAAATTGTTTTCTGAACTTCTCAAGGAAGAGATTTACAAAAAAGGAAAC
CATCAGGAAATTGACGCATTGAAAAGTTTTGATAAATTCTCAGGCTATTTTATTGGGTTGCATGAGAACC
GAAAAAATATGTATTCTGACGGAGACGAGATCACGGCTATCTCTAACCGTATTGTAAATGAGAATTTCC
CGAAGTTCCTCGACAACCTTCAGAAATATCAGGAAGCTCGTAAAAAATATCCAGAGTGGATCATTAAGG
CAGAATCTGCTTTAGTTGCACATAATATCAAGATGGATGAAGTCTTTTCCTTAGAGTATTTCAACAAAGT
CCTGAATCAAGAAGGAATACAGAGATACAATCTCGCCCTAGGTGGCTATGTGACCAAAAGTGGTGAGA
AAATGATGGGGCTTAATGATGCACTTAATCTTGCCCATCAAAGTGAAAAAAGCAGCAAGGGAAGGATA
CACATGACTCCACTCTTCAAACAGATTCTGAGTGAAAAAGAGTCCTTTTCTTATATACCAGATGTTTTTA
CAGAAGACTCTCAACTTTTACCATCCATTGGTGGGTTCTTTGCACAAATAGAAAATGATAAGGACGGGA
ATATTTTTGACAGAGCATTAGAATTGATATCTTCTTATGCAGAATACGATACAGAAAGGATATATATCAG
GCAAGCGGACATAAACAGAGTTTCTAATGTTATTTTCGGGGAGTGGGGAACACTGGGGGGGTTAATGAG
GGAATACAAAGCAGACTCTATCAACGACATCAATTTGGAGAGAACATGCAAGAAGGTAGACAAGTGGC
TCGACTCAAAGGAGTTTGCGTTATCAGATGTATTAGAGGCAATAAAAAGAACCGGCAATAATGATGCTT
TTAATGAATATATCTCAAAGATGCGCACTGCCAGGGAAAAGATTGACGCTGCAAGAAAGGAAATGAAA
TTCATTTCGGAAAAAATATCTGGAGACGAAGAATCGATCCATATTATCAAAACCTTATTGGACTCGGTGC
AACAGTTTTTACATTTTTTCAATTTATTCAAAGCGCGTCAGGACATTCCTCTTGATGGAGCATTCTATGCG
GAGTTCGATGAAGTCCATAGCAAACTGTTTGCTATTGTTCCGTTGTATAATAAGGTTAGGAACTATCTTA
CGAAAAATAACCTTAACACGAAAAAGATAAAGCTAAACTTCAAGAATCCAACTCTGGCAAACGGATGG
GATCAAAACAAGGTATATGACTACGCCTCCTTAATCTTTCTCCGCGATGGTAATTATTATCTCGGAATAA
TAAATCCAAAAAGGAAAAAGAATATTAAATTCGAACAAGGGTCTGGAAATGGCCCATTCTACCGGAAG
ATGGTGTACAAACAAATTCCAGGGCCGAACAAGAACTTACCAAGAGTCTTCCTCACATCTACGAAAGGC
AAAAAAGAGTACAAGCCGTCAAAGGAGATAATAGAAGGATATGAAGCGGACAAACACATAAGAGGAG
ATAAATTCGATCTGGATTTCTGTCATAAGCTGATAGACTTCTTCAAGGAATCCATCGAGAAGCACAAGG
ACTGGAGTAAGTTCAACTTCTATTTCTCTCCAACTGAATCATATGGAGACATCAGCGAATTCTATCTGGA
TGTAGAAAAACAGGGATACCGGATGCATTTTGAGAATATTTCTGCCGAGACGATTGATGAGTATGTCGA
AAAGGGGGACTTATTCCTCTTCCAGATATACAACAAAGACTTTGTGAAAGCGGCAACCGGAAAAAAAG
ATATGCACACCATTTATTGGAACGCGGCATTCTCGCCCGAGAACCTTCAGGATGTGGTAGTGAAACTGA
ACGGTGAAGCAGAACTTTTCTACAGAGACAAGAGCGACATCAAGGAGATAGTTCACAGGGAGGGAGAG
ATACTGGTCAATCGTACCTACAACGGCAGGACACCTGTGCCTGACAAGATCCACAAAAAATTAACAGAT
TATCATAATGGCCGTACCAAAGATCTCGGAGAAGCAAAAGAATACCTCGATAAGGTCAGATATTTCAAA
GCGCACTACGACATCACAAAGGATCGCAGATACCTGAATGATAAAATATACTTCCATGTGCCTCTGACA
TTGAATTTCAAAGCAAACGGGAAGAAGAATCTCAATAAGATGGTAATTGAAAAGTTCCTCTCGGACGAA
AAAGCGCATATTATTGGGATTGATCGCGGGGAAAGGAATCTTCTTTACTATTCTATCATTGACAGGTCAG
GTAAAATAATCGATCAACAGAGCCTCAACGTCATCGATGGATTCGATTACCGAGAGAAACTGAATCAGA
GGGAGATCGAGATGAAGGATGCCAGACAAAGCTGGAATGCTATCGGGAAGATAAAGGACCTCAAGGAA
GGGTATCTTTCAAAAGCGGTCCACGAAATTACCAAGATGGCGATACAATACAATGCCATTGTTGTCATG
GAGGAACTCAATTATGGGTTCAAACGCGGACGTTTCAAAGTTGAGAAGCAGATATATCAGAAATTCGAG
AATATGCTGATTGACAAGATGAATTATCTGGTATTCAAGGATGCTCCGGATGAAAGTCCGGGAGGAGTC
CTCAATGCATATCAGCTTACTAATCCGCTTGAAAGTTTCGCTAAACTTGGGAAACAGACAGGAATTCTTT
TCTATGTTCCGGCAGCCTATACTTCGAAGATAGATCCGACGACCGGGTTTGTCAATCTTTTCAATACTTC
AAGTAAAACGAACGCACAGGAAAGAAAAGAATTCTTGCAAAAATTCGAGTCGATCTCCTATTCCGCTAA
AGACGGAGGAATATTCGCATTCGCGTTCGATTATCGGAAGTTCGGAACGTCAAAAACAGACCACAAAAA
TGTATGGACCGCATACACGAACGGGGAAAGGATGAGGTACATAAAAGAGAAAAAACGCAACGAACTGT
TCGACCCCTCGAAGGAGATCAAAGAGGCTCTCACTTCATCAGGAATCAAATATGACGGCGGACAGAACA
TATTGCCAGATATCCTGAGGAGCAACAATAACGGTCTGATCTACACAATGTATTCCTCTTTCATAGCGGC
CATTCAAATGAGGGTCTATGACGGGAAAGAAGACTATATCATCTCGCCGATAAAGAACAGCAAGGGAG
AGTTCTTCAGGACCGATCCGAAAAGAAGGGAACTTCCGATAGACGCGGATGCGAACGGCGCGTATAAC
ATTGCTCTCAGGGGCGAATTGACGATGCGTGCGATAGCGGAGAAGTTCGATCCGGACTCGGAAAAGATG
GCGAAGCTAGAACTGAAACATAAGGACTGGTTCGAATTCATGCAGACAAGGGGGGATTGA
SEQ ATGACAAAAACATTTGATTCAGAATTTTTTAATTTATATTCTCTTCAAAAAACAGTTCGTTTTGAACTCAA
ID GCCGGTTGGTGAAACAGCCTCGTTTGTTGAAGATTTTAAAAACGAAGGTTTGAAACGAGTTGTTTCAGA
NO: GGATGAACGGCGTGCGGTTGATTACCAAAAAGTGAAAGAAATTATTGATGACTACCACCGAGATTTTAT
24 TGAAGAATCGCTGAACTATTTTCCTGAGCAGGTCTCAAAAGACGCTTTGGAACAAGCTTTTCACCTTTAT
CAAAAACTAAAAGCCGCTAAGGTTGAAGAGCGTGAAAAAGCATTGAAAGAATGGGAAGCCCTTCAGAA
AAAACTGCGCGAAAAAGTTGTTAAATGTTTTTCAGATTCAAACAAAGCACGCTTTTCCCGCATTGATAAA
AAAGAACTGATTAAAGAAGATTTAATTAACTGGTTGGTTGCACAAAATCGCGAAGATGACATTCCAACC
GTTGAAACCTTTAACAACTTTACGACTTATTTTACGGGGTTTCATGAAAACCGAAAAAACATTTATTCAA
AAGACGATCATGCCACAGCCATTTCATTTCGACTCATTCATGAAAACCTGCCTAAGTTTTTTGATAATGT
GATCAGCTTTAATAAATTGAAGGAAGGATTTCCAGAGCTGAAATTTGATAAGGTTAAGGAAGATTTAGA
AGTTGATTATGACTTGAAACATGCCTTTGAAATCGAATACTTTGTCAATTTTGTTACCCAAGCCGGAATT
GACCAATATAACTATCTTTTGGGGGGTAAAACCTTAGAAGACGGCACCAAAAAGCAAGGCATGAATGA
ACAAATCAATCTGTTCAAGCAACAGCAAACCCGAGACAAAGCCCGACAAATTCCCAAACTCATACCATT
GTTTAAACAAATTCTAAGCGAACGAACGGAAAGCCAATCGTTTATTCCAAAACAATTTGAATCAGACCA
AGAGCTATTTGACTCACTGCAAAAACTGCATAACAACTGCCAAGATAAATTTACCGTACTGCAACAAGC
CATTTTAGGCTTAGCCGAAGCAGATCTGAAAAAAGTATTCATTAAAACATCTGATCTTAATGCGCTATCA
AATACCATTTTTGGAAATTACAGTGTGTTTTCGGATGCGTTGAATTTATACAAAGAATCGCTCAAAACAA
AAAAGGCGCAAGAAGCGTTTGAAAAACTACCCGCTCACAGCATTCATGACTTGATTCAATATTTGGAGC
AATTTAATAGCTCTTTGGATGCAGAAAAACAGCAATCAACTGACACCGTACTGAATTACTTTATTAAAAC
AGACGAGCTGTATTCTCGGTTCATAAAATCAACGAGCGAAGCCTTCACACAAGTACAACCACTCTTTGA
ATTGGAAGCATTAAGCTCAAAACGTCGTCCACCGGAAAGTGAAGACGAAGGCGCAAAAGGTCAGGAAG
GGTTTGAGCAAATTAAACGCATAAAAGCCTATTTGGATACCTTGATGGAGGCGGTGCATTTTGCAAAAC
CACTTTATCTGGTGAAGGGGCGCAAAATGATTGAAGGTCTGGACAAAGACCAAAGTTTCTATGAAGCCT
TTGAAATGGCTTACCAAGAACTAGAAAGTCTGATTATTCCAATCTACAACAAAGCTCGTAGTTATTTAAG
TCGTAAACCGTTTAAAGCGGACAAATTCAAAATTAATTTTGATAATAATACATTGCTTTCCGGTTGGGAT
GCTAATAAAGAAACGGCTAACGCTTCAATTTTGTTTAAGAAGGATGGTTTGTATTATTTAGGAATCATGC
CTAAAGGAAAAACGTTTTTGTTCGATTACTTCGTTTCATCGGAAGATTCTGAAAAGTTAAAACAAAGAA
GACAAAAAACCGCCGAAGAAGCGCTTGCGCAAGATGGCGAAAGCTACTTTGAAAAAATTCGTTACAAG
CTGTTACCTGGCGCCAGCAAAATGTTGCCGAAAGTATTTTTTTCCAACAAAAACATAGGGTTTTACAACC
CAAGTGATGACATACTTCGTATCAGGAATACAGCCTCTCACACTAAAAACGGAACACCGCAAAAAGGGC
ACTCTAAAGTAGAGTTTAATTTGAATGATTGTCATAAGATGATTGATTTCTTTAAATCAAGCATTCAAAA
GCATCCAGAGTGGGGAAGTTTTGGATTCACCTTTTCAGATACATCAGATTTTGAAGATATGAGCGCCTTT
TATCGAGAAGTCGAAAACCAAGGTTATGTCATTAGTTTCGATAAAATAAAAGAAACTTACATTCAGAGT
CAAGTTGAACAGGGGAACCTATATTTATTCCAAATCTACAATAAAGACTTCTCGCCCTACAGCAAAGGC
AAACCAAATTTACACACGCTTTACTGGAAAGCGTTGTTTGAGGAAGCCAACCTAAATAATGTGGTGGCA
AAACTCAATGGTGAAGCTGAAATTTTCTTTAGGCGACACTCAATCAAAGCATCTGATAAAGTGGTGCAC
CCAGCGAATCAAGCCATTGACAATAAAAACCCGCATACCGAAAAAACGCAAAGCACCTTTGAATATGAT
CTTGTAAAAGACAAGCGCTATACCCAAGACAAATTCTTCTTCCATGTACCGATTTCATTGAACTTTAAGG
CACAAGGTGTTTCAAAATTTAACGATAAAGTGAATGGATTTTTAAAGGGTAACCCAGATGTCAATATTA
TTGGCATTGACCGAGGCGAACGACACCTTCTGTATTTCACTGTGGTGAATCAGAAAGGTGAAATTTTGGT
TCAAGAGTCGCTTAATACCCTAATGAGTGATAAAGGGCATGTGAATGACTACCAGCAAAAACTCGACAA
AAAAGAACAAGAACGCGATGCCGCTCGCAAAAGCTGGACGACGGTTGAAAATATCAAAGAATTAAAAG
AAGGCTATTTATCTCATGTTGTTCATAAGTTGGCACACCTGATTATTAAATACAATGCCATTGTTTGCTTG
GAAGACCTGAATTTTGGTTTCAAACGCGGGCGTTTTAAAGTGGAAAAACAAGTTTATCAGAAATTTGAA
AAAGCGCTTATTGATAAGCTTAACTACTTGGTATTTAAAGAAAAAGAGTTAGGCGAGGTGGGCCATTAT
CTAACCGCCTATCAGTTGACCGCACCGTTTGAAAGTTTCAAGAAGTTAGGCAAGCAAAGTGGCATATTG
TTTTATGTTCCGGCGGATTACACCTCCAAAATTGACCCAACCACCGGGTTTGTCAACTTTCTTGATCTGCG
TTATCAGAGTGTCGAAAAAGCCAAACAGCTCTTAAGCGACTTTAATGCCATTCGTTTTAATTCAGTACAA
AACTATTTTGAGTTCGAAATAGATTACAAAAAACTCACACCCAAACGTAAAGTTGGTACTCAGAGTAAA
TGGGTGATTTGTACCTATGGAGATGTCCGCTATCAAAATCGGCGTAATCAAAAAGGTCACTGGGAAACG
GAAGAAGTCAATGTGACTGAAAAACTAAAAGCCCTTTTCGCCAGTGATTCCAAAACTACAACCGTAATC
GATTACGCCAATGACGACAACCTAATTGACGTCATTCTGGAACAGGACAAAGCCAGCTTCTTCAAAGAA
CTGTTATGGTTATTAAAACTCACCATGACGCTCCGCCACAGCAAAATCAAAAGTGAAGACGACTTTATTC
TTTCACCCGTTAAAAACGAACAAGGCGAGTTTTACGATAGTCGAAAAGCGGGCGAGGTGTGGCCTAAAG
ATGCAGACGCCAATGGCGCTTATCACATAGCGTTGAAAGGCTTGTGGAATCTGCAACAGATCAATCAGT
GGGAAAAGGGTAAAACACTTAATCTGGCGATTAAAAACCAGGATTGGTTCAGTTTTATTCAAGAAAAGC
CCTATCAAGAATAA
SEQ ATGCACACAGGCGGATTACTTAGCATGGATGCCAAGGAGTTTACCGGACAGTACCCCCTTTCGAAGACT
ID CTGCGTTTTGAACTGAGACCGATAGGCAGAACGTGGGACAATCTCGAAGCATCGGGGTATCTTGCGGAG
NO: GACAGACACCGTGCAGAATGCTATCCCAGGGCAAAAGAGCTCTTGGACGACAACCATCGTGCATTCCTC
25 AACCGTGTCCTGCCTCAGATCGATATGGATTGGCACCCGATCGCAGAGGCATTCTGCAAAGTCCACAAG
AATCCGGGAAACAAGGAATTGGCTCAGGATTACAATCTTCAGCTGTCCAAACGCAGAAAGGAGATTTCG
GCCTATCTGCAGGATGCGGACGGCTATAAAGGTCTGTTTGCCAAACCTGCATTGGATGAAGCAATGAAG
ATCGCGAAAGAAAACGGAAATGAATCGGACATAGAGGTTCTTGAGGCATTCAACGGTTTCTCCGTATAC
TTCACCGGATATCATGAGAGCAGGGAGAACATCTATTCGGACGAGGATATGGTGTCGGTAGCTTATCGC
ATCACCGAAGACAATTTCCCGAGATTCGTTTCCAATGCGCTTATATTCGATAAGCTGAATGAGTCGCACC
CCGATATAATCTCGGAAGTATCCGGAAATCTGGGCGTAGACGACATCGGAAAATATTTTGATGTGTCTA
ACTACAATAATTTCCTGTCGCAGGCCGGTATAGATGACTACAATCACATCATCGGCGGCCATACGACGG
AGGACGGTCTGATCCAGGCATTCAATGTTGTTCTGAATCTCAGGCATCAGAAAGACCCCGGATTCGAAA
AAATCCAATTCAAACAGCTGTACAAACAGATACTCAGCGTCCGTACATCCAAATCCTATATCCCGAAAC
AGTTCGATAATTCGAAGGAGATGGTGGACTGCATCTGCGACTATGTGTCCAAGATCGAAAAATCCGAAA
CGGTCGAGAGAGCATTGAAGCTGGTAAGGAACATATCTTCTTTTGATTTGCGCGGAATATTCGTAAACA
AGAAGAATCTCCGCATTCTTTCCAACAAACTGATTGGTGATTGGGACGCGATCGAAACCGCGCTGATGC
ACTCCTCCTCTTCGGAAAATGATAAGAAATCCGTCTACGACAGCGCCGAGGCATTTACGCTGGATGATA
TCTTTTCGTCCGTTAAAAAATTCTCAGATGCATCTGCAGAGGATATCGGAAACCGGGCGGAGGACATAT
GCAGAGTCATATCTGAGACCGCTCCGTTCATAAACGATCTGAGGGCTGTCGATTTGGACAGTTTGAATG
ACGACGGTTACGAGGCGGCGGTTTCCAAGATAAGGGAATCTCTGGAACCATATATGGATCTGTTTCATG
AACTGGAGATATTCTCCGTAGGCGATGAATTCCCGAAATGTGCAGCTTTCTACAGTGAACTTGAAGAAG
TCTCCGAACAGCTAATCGAGATTATACCGTTATTCAACAAGGCCCGTTCGTTCTGTACGCGCAAGAGATA
CAGTACGGACAAGATAAAGGTCAATTTGAAATTCCCGACACTCGCCGACGGATGGGATCTCAACAAAGA
ACGCGACAACAAAGCCGCAATACTCAGGAAAGACGGAAAGTACTACCTGGCCATACTGGATATGAAGA
AAGATCTTTCTTCGATCAGAACTTCGGATGAAGACGAATCCAGTTTTGAGAAAATGGAGTACAAGCTTC
TTCCGAGTCCGGTAAAGATGCTGCCAAAGATCTTCGTAAAATCGAAGGCGGCCAAGGAGAAGTACGGTC
TGACCGACCGTATGCTGGAGTGCTACGATAAAGGGATGCACAAGAGCGGCAGTGCATTCGATCTCGGAT
TTTGTCACGAATTGATCGATTACTACAAGAGGTGCATCGCAGAATATCCCGGCTGGGACGTCTTCGATTT
CAAGTTCAGGGAAACATCGGATTATGGCAGCATGAAGGAGTTCAATGAGGATGTTGCAGGGGCCGGAT
ACTATATGTCCCTCAGAAAGATCCCTTGTTCGGAGGTCTACAGGCTTCTTGATGAGAAATCGATATATCT
TTTCCAGATCTACAACAAAGATTATTCGGAAAACGCTCATGGGAATAAGAACATGCATACCATGTATTG
GGAAGGGCTCTTTTCCCCCCAGAATCTGGAATCCCCTGTGTTTAAACTCAGCGGCGGTGCGGAGCTTTTC
TTCCGTAAATCCTCCATACCCAATGACGCCAAAACGGTCCATCCGAAGGGAAGCGTCCTGGTTCCGCGC
AATGATGTAAACGGCCGCAGGATACCTGACAGCATATATCGGGAGCTCACCAGATATTTCAACCGCGGA
GATTGCCGCATAAGCGACGAGGCAAAGAGTTATCTGGACAAGGTGAAAACCAAGAAAGCTGACCACGA
TATCGTGAAAGACAGGAGGTTCACGGTGGACAAGATGATGTTCCACGTCCCTATCGCCATGAATTTCAA
AGCGATTTCGAAGCCGAATCTCAATAAAAAGGTGATTGACGGCATAATCGACGACCAAGATCTGAAGAT
CATCGGCATAGACCGCGGAGAGCGCAACCTCATCTACGTAACCATGGTGGATCGCAAAGGGAACATCCT
CTATCAGGATAGCCTCAATATTCTGAACGGATACGATTACCGTAAGGCCCTCGACGTCCGCGAATATGA
CAATAAAGAGGCTCGGAGGAACTGGACGAAGGTCGAAGGCATCCGTAAGATGAAAGAGGGGTATCTGT
CGCTTGCAGTCAGCAAATTGGCAGATATGATCATAGAGAACAATGCGATTATCGTCATGGAGGATCTCA
ATCACGGATTCAAGGCAGGGCGTTCGAAGATAGAGAAACAGGTCTATCAGAAGTTCGAATCCATGCTCA
TAAACAAACTCGGTTACATGGTCCTCAAGGATAAGTCTATCGATCAGAGCGGCGGAGCTCTCCACGGAT
ACCAGCTTGCCAACCATGTGACAACATTGGCATCTGTAGGTAAACAATGTGGAGTGATATTCTACATCCC
TGCTGCATTTACATCCAAGATAGATCCGACAACAGGATTTGCAGATCTGTTCGCCCTCAGCAATGTTAAA
AACGTGGCATCTATGAGAGAATTTTTCTCCAAGATGAAGTCTGTAATCTATGATAAGGCGGAGGGAAAA
TTCGCATTTACCTTCGACTATCTTGATTATAATGTGAAATCCGAGTGCGGAAGGACCCTTTGGACCGTGT
ATACGGTCGGAGAGAGATTCACATACAGCAGGGTCAATAGAGAATATGTCAGAAAAGTTCCGACAGAC
ATAATCTACGACGCATTGCAAAAGGCAGGAATATCTGTTGAAGGGGATCTCAGGGACAGGATTGCTGAA
TCGGATGGCGACACTCTGAAGAGCATATTCTATGCATTCAAGTATGCATTGGATATGAGAGTAGAGAAC
CGCGAAGAGGATTACATACAGTCTCCTGTCAAAAATGCCTCCGGAGAATTCTTCTGTTCCAAGAACGCA
GGCAAATCGCTCCCTCAGGATTCCGATGCGAACGGTGCATACAATATCGCACTCAAGGGGATCCTGCAG
CTACGTATGCTTTCCGAGCAGTATGATCCGAATGCAGAGAGCATACGGTTGCCACTGATAACCAACAAG
GCCTGGCTGACCTTTATGCAGTCCGGTATGAAGACATGGAAGAACTGA
SEQ atgGATAGTTTGAAAGATTTCACCAATCTGTACCCTGTCAGTAAGACATTGAGATTTGAATTAAAGCCCGT
ID TGGAAAGACTTTAGAAAATATCGAGAAAGCAGGTATTTTGAAAGAGGATGAGCATCGTGCAGAAAGTT
NO: ATCGGAGGGTGAAGAAAATAATTGATACTTATCATAAGGTATTTATCGATTCTTCTCTTGAAAATATGGC
26 TAAAATGGGTATTGAGAATGAAATAAAAGCAATGCTCCAAAGTTTCTGCGAATTGTATAAAAAAGATCA
TCGCACTGAGGGTGAAGACAAGGCATTAGATAAAATTCGAGCAGTACTTCGTGGCCTGATTGTTGGGGC
TTTCACTGGTGTTTGCGGAAGACGGGAAAATACAGTCCAAAACGAGAAGTACGAGAGTTTGTTCAAAGA
AAAGTTGATAAAAGAAATTTTACCTGATTTTGTGCTCTCTACTGAGGCTGAAAGCTTGCCTTTCTCTGTTG
AAGAAGCTACGAGGTCACTGAAGGAGTTTGATAGCTTTACATCCTACTTTGCTGGTTTTTACGAGAATAG
AAAGAATATATACTCGACGAAACCTCAATCCACTGCCATTGCTTATCGTCTTATTCATGAGAACTTGCCG
AAGTTCATTGATAATATTCTTGTTTTTCAGAAGATCAAAGAGCCTATAGCCAAAGAGCTGGAACATATTC
GTGCGGACTTTTCTGCCGGGGGGTACATAAAAAAGGATGAGAGATTGGAGGATATTTTTTCGTTGAACT
ATTATATCCACGTGTTATCTCAGGCTGGGATCGAAAAATATAACGCATTGATTGGGAAGATTGTGACAG
AAGGAGATGGAGAGATGAAAGGGCTCAATGAACACATCAACCTTTACAACCAACAAAGAGGCAGAGAG
GATCGGCTCCCTCTTTTTAGGCCTCTTTATAAACAGATATTGAGTGACAGAGAGCAATTATCATACTTGC
CTGAGAGTTTTGAAAAAGATGAGGAGCTCCTCAGGGCTCTAAAAGAGTTCTATGATCATATCGCAGAAG
ACATTCTCGGACGTACTCAACAGTTGATGACTTCTATTTCAGAATATGATTTATCTCGGATATACGTAAG
GAACGATAGCCAATTGACTGATATATCAAAAAAAATGTTGGGAGATTGGAATGCTATCTACATGGCTAG
AGAACGAGCATATGACCACGAGCAGGCTCCCAAAAGAATCACGGCGAAATACGAGAGGGACAGGATTA
AAGCTCTTAAAGGAGAAGAGAGTATAAGTCTGGCAAATCTTAATAGTTGTATTGCCTTTCTGGACAATGT
TAGAGATTGCCGTGTAGATACTTATCTTTCCACACTGGGCCAGAAGGAAGGACCACATGGTCTATCTAAT
CTCGTTGAGAACGTTTTTGCCTCATACCATGAAGCAGAGCAATTGTTGAGCTTTCCATACCCCGAAGAGA
ATAATCTGATTCAGGACAAGGACAATGTGGTGTTAATTAAGAATCTTCTCGACAATATCAGTGATCTGCA
GAGGTTCTTGAAACCTCTTTGGGGTATGGGAGACGAACCCGATAAAGATGAAAGATTTTATGGAGAGTA
TAATTATATCCGAGGAGCTCTAGATCAGGTGATCCCTCTGTACAATAAGGTAAGGAACTACCTCACTCG
GAAGCCTTATTCGACCAGAAAAGTAAAACTCAATTTTGGGAATTCTCAATTGCTTAGTGGTTGGGATAG
AAATAAGGAAAAGGATAATAGCTGTGTGATTTTGCGTAAGGGGCAGAACTTCTATTTGGCTATTATGAA
CAATAGGCACAAAAGAAGTTTCGAAAACAAGGTGTTGCCCGAGTATAAGGAGGGAGAACCTTACTTCG
AAAAGATGGATTATAAATTTTTGCCTGATCCTAATAAAATGCTTCCTAAGGTTTTTCTTTCGAAAAAAGG
AATAGAGATATACAAACCAAGTCCGAAGCTTTTAGAACAATATGGACATGGAACTCACAAAAAGGGAG
ATACCTTTAGTATGGATGATTTGCACGAACTGATCGATTTCTTCAAACACTCAATCGAGGCTCATGAAGA
TTGGAAGCAATTCGGATTCAAATTTTCTGATACGGCTACTTATGAGAATGTATCTAGTTTCTATAGAGAA
GTTGAGGATCAGGGGTATAAGCTCTCTTTCCGAAAAGTTTCGGAATCTTATGTCTATTCATTAATAGATC
AAGGCAAGTTGTATTTATTTCAGATATACAACAAGGACTTTTCTCCCTGCAGCAAAGGGACACCTAATCT
GCATACCTTGTATTGGAGAATGCTTTTTGACGAGCGCAATTTGGCAGATGTCATATACAAACTGGATGGG
AAGGCTGAAATCTTTTTCCGAGAGAAGAGTTTGAAAAATGATCATCCCACGCATCCGGCTGGTAAGCCT
ATCAAAAAGAAAAGTCGACAAAAAAAAGGAGAGGAGAGTCTGTTTGAGTATGATTTAGTCAAGGATAG
GCACTATACGATGGATAAGTTCCAGTTTCATGTGCCTATTACTATGAATTTTAAATGTTCTGCAGGAAGC
AAAGTCAATGATATGGTTAATGCTCATATTCGAGAGGCAAAGGATATGCATGTCATTGGAATTGATCGT
GGAGAACGCAATCTGCTGTATATATGCGTGATAGATAGTCGAGGGACGATTTTGGATCAAATTTCTCTG
AATACGATTAACGATATAGACTATCATGATTTATTGGAGAGTCGAGACAAAGACCGTCAGCAGGAGCGC
CGAAACTGGCAAACTATCGAAGGGATCAAGGAGCTAAAACAAGGCTACCTTAGTCAGGCGGTTCATCG
GATAGCCGAACTGATGGTGGCTTATAAGGCTGTAGTTGCTTTGGAGGATTTGAATATGGGGTTCAAACG
TGGGCGGCAGAAAGTAGAAAGTTCTGTTTATCAGCAGTTTGAGAAACAGCTGATAGATAAGCTCAACTA
TCTTGTGGACAAGAAGAAAAGGCCTGAAGATATTGGAGGATTGTTGAGAGCCTATCAATTTACGGCCCC
ATTTAAGAGTTTTAAGGAAATGGGAAAGCAAAACGGCTTCTTGTTTTATATCCCGGCTTGGAACACGAG
CAACATAGATCCGACTACTGGATTTGTTAATTTATTTCATGCCCAGTATGAAAATGTAGATAAAGCGAAG
AGCTTCTTTCAAAAGTTTGATTCAATTAGTTACAACCCGAAGAAAGACTGGTTTGAGTTTGCATTCGATT
ATAAAAACTTTACTAAAAAGGCTGAAGGAAGTCGTTCTATGTGGATATTATGCACACATGGTTCCCGAA
TAAAGAATTTTAGAAATTCCCAGAAGAATGGTCAATGGGATTCCGAAGAATTCGCCTTGACGGAGGCTT
TTAAGTCTCTTTTTGTGCGATATGAGATAGATTATACCGCTGATTTGAAAACAGCTATTGTGGACGAAAA
GCAAAAAGACTTCTTCGTGGATCTTCTGAAGCTATTCAAATTGACAGTACAGATGCGCAACAGCTGGAA
AGAGAAGGATTTGGATTATCTAATCTCTCCTGTAGCAGGGGCTGATGGCCGTTTCTTCGATACAAGAGA
GGGAAATAAAAGTCTGCCTAAGGATGCAGATGCCAATGGAGCTTATAATATTGCCCTAAAAGGACTTTG
GGCTCTACGCCAGATTCGGCAAACTTCAGAAGGCGGTAAACTCAAATTGGCGATTTCCAATAAGGAATG
GCTACAGTTTGTGCAAGAGAGATCTTACGAGAAAGACtga
SEQ atgaataatggaacaaataactttcagaattttatcggaatttcttctttgcagaagactcttaggaatgctctcattccaaccgaaacaacac
ID agcaatttattgttaaaaacggaataattaaagaagatgagctaagaggagaaaatcgtcagatacttaaagatatcatggatgattattacag
NO: aggtttcatttcagaaactttatcgtcaattgatgatattgactggacttctttatttgagaaaatggaaattcagttaaaaaatggagataac
27 aaagacactcttataaaagaacagactgaataccgtaaggcaattcataaaaaatttgcaaatgatgatagatttaaaaatatgttcagtgcaa
aattaatctcagatattcttcctgaatttgtcattcataacaataattattctgcatcagaaaaggaagaaaaaacacaggtaattaaattatt
ttccagatttgcaacgtcattcaaggactattttaaaaacagggctaattgtttttcggctgatgatatatcttcatcttcttgtcatagaata
gttaatgataatgcagagatattttttagtaatgcattggtgtataggagaattgtaaaaagtctttcaaatgatgatataaataaaatatccg
gagatatgaaggattcattaaaggaaatgtctctggaagaaatttattcttatgaaaaatatggggaatttattacacaggaaggtatatcttt
ttataatgatatatgtggtaaagtaaattcatttatgaatttatattgccagaaaaataaagaaaacaaaaatctctataagctgcaaaagctt
cataaacagatactgtgcatagcagatacttcttatgaggtgccgtataaatttgaatcagatgaagaggtttatcaatcagtgaatggatttt
tggacaatattagttcgaaacatatcgttgaaagattgcgtaagattggagacaactataacggctacaatcttgataagatttatattgttag
taaattctatgaatcagtttcacaaaagacatatagagattgggaaacaataaatactgcattagaaattcattacaacaatatattacccgga
aatggtaaatctaaagctgacaaggtaaaaaaagcggtaaagaatgatctgcaaaaaagcattactgaaatcaatgagcttgttagcaattata
aattatgttcggatgataatattaaagctgagacatatatacatgaaatatcacatattttgaataattttgaagcacaggagcttaagtataa
tcctgaaattcatctggtggaaagtgaattgaaagcatctgaattaaaaaatgttctcgatgtaataatgaatgcttttcattggtgttcggtt
ttcatgacagaggagctggtagataaagataataatttttatgccgagttagaagagatatatgacgaaatatatccggtaatttcattgtata
atcttgtgcgtaattatgtaacgcagaagccatatagtacaaaaaaaattaaattgaattttggtattcctacactagcggatggatggagtaa
aagtaaagaatatagtaataatgcaattattctcatgcgtgataatttgtactatttaggaatatttaatgcaaaaaataagcctgacaaaaag
ataattgaaggtaatacatcagaaaataaaggggattataagaagatgatttataatcttctgccaggaccaaataaaatgatccccaaggtat
tcctctcttcaaaaaccggagtggaaacatataagccgtctgcctatatattggagggctataaacaaaacaagcatattaaatcctctaagga
ttttgatataacattttgtcacgatttgattgattattttaagaactgtatagcaatacatcctgaatggaagaattttggctttgatttttct
gacacctccacatatgaagatatcagcggattttacagagaagtcgaattacaaggttataaaatcgactggacatatatcagcgaaaaggata
ttgatttgttgcaggaaaaaggacagttatatttattccaaatatataacaaagatttttccaagaaaagtaccggaaatgataatcttcatac
tatgtatttgaagaatttgtttagtgaagagaatttaaaggatattgtactgaaattaaacggtgaggcggaaatcttctttagaaaatcaagc
ataaagaatccaataattcataaaaaaggctctattcttgttaatagaacatatgaagcagaggaaaaagatcaatttggaaatatccagatag
tcagaaaaaacataccggaaaatatatatcaggagctttataaatatttcaatgataaaagtgataaagaactttcggatgaagcagctaagct
taagaatgtagtaggtcatcatgaggctgctacaaacatagtaaaagattatagatatacatatgataaatattttcttcatatgcctattaca
atcaattttaaagccaataagacaggctttattaatgacagaatattacaatatattgctaaagaaaaggatttgcatgtaataggcattgatc
gtggtgaaagaaacctgatatatgtttcagtaattgatacttgtggaaatattgttgaacaaaaatcgtttaacattgttaatggatatgatta
tcagattaagctcaagcagcaggagggggcgcgacaaatcgcacgaaaagaatggaaagaaatcggcaaaataaaagaaattaaagaaggctat
ttatctcttgtaattcatgaaatttcaaagatggttattaaatataatgccataattgcaatggaggatttaagctacggatttaaaaaaggtc
gtttcaaggttgagcgacaggtttaccagaagtttgagacaatgcttatcaacaaactcaactatctggtatttaaagatatatccataacgga
aaacggtggtcttctaaagggataccagcttacatatattccagataaactgaaaaatgtgggtcatcaatgtggctgtatattttatgtacct
gctgcctatacatcaaaaatagatcctacaaccggatttgtaaatatattcaaatttaaagatttaacagttgatgcgaagagagaatttataa
aaaaatttgacagtatcagatatgattcagaaaaaaatctgttttgttttacattcgattataataactttattacgcaaaatactgttatgtc
aaagtcaagctggagtgtatatacgtacggagttaggataaaaagaagatttgtcaatggcaggttctcaaatgaatcggatacaattgatata
acaaaagatatggaaaaaacactcgaaatgacagatataaattggagagatggtcatgatctgaggcaggatattattgattatgaaatcgtac
aacacatatttgagatttttagattgactgtacaaatgagaaacagtttaagtgaattagaagacagggattatgaccgtttgatttctccggt
gctcaatgaaaataatatattttatgattcagetaaagcaggagatgcgttacctaaagacgcagatgctaatggtgcatattgtatagctcta
aaaggcttgtatgaaatcaaacaaattacagagaattggaaagaagacggtaagttttcaagagataaacttaaaatttccaataaggactggt
ttgactttattcaaaataaaaggtatttataa
SEQ atgacaaacaaatttacaaaccagtactcgctttccaaaacacttcgatttgagttgattccacaaggaaaaacattggaatttattcaagaaa
ID aaggattgctctctcaagataaacaacgagcggagagttatcaagaaatgaaaaaaactattgataaatttcataaatactttatcgatttagc
NO: tttaagcaatgctaaactaactcatttagaaacttacttggaattatacaataaaagtgctgaaacaaaaaaagaacaaaaatttaaagacgat
28 ttaaagaaagtacaagacaatttacgaaaagaaatcgttaaatctttttcagatggtgatgcaaaatcaatttttgcaattttggataaaaaag
aactgattaccgtagaacttgaaaaatggtttgaaaacaacgaacaaaaagacatttattttgacgaaaaattcaaaacgtttactacttattt
tactggttttcatcaaaacagaaaaaacatgtattcggttgaacccaattctacageaattgcttatcgattgattcatgaaaatttacctaaa
tttttagaaaatgctaaagcatttgaaaaaataaaacaagtagaaagtttgcaagttaattttagagaattaatgggggaatttggagatgaag
ggctaattttcgtaaatgaattagaagaaatgtttcaaatcaattattataatgatgtgctttcacaaaatggaattacaatttataatagtat
aatttcaggatttaccaaaaatgatataaaatataaaggtctaaatgaatacataaataattacaatcaaaccaaagacaaaaaagaccgtttg
ccaaaattaaaacaattgtataaacagattttgagtgataggatttcactttcgtttttgcccgatgcttttacggatgggaaacaagttttga
aagccatatttgacttttataaaatcaacttactttcttataccattgaaggacaggaagaaagccaaaatcttttactattaattcgtcagac
aattgaaaacctttctagttttgatacccaaaaaatttatctaaaaaatgatacccatttaaccactatttcacaacaagtatttggcgatttt
tcggtgttttcaactgctttaaattattggtatgaaactaaagtaaatccaaaatttgaaacggaatatagcaaagccaacgaaaaaaaacgag
aaattttagataaagccaaagcggtatttacaaaacaagattatttttcaattgcttttttacaagaagtactttcggaatacattcttacctt
agatcacacttctgatattgtaaaaaagcattcctccaactgtattgcggattattttaaaaatcattttgtagccaaaaaagaaaatgaaacc
gacaaaacctttgattttattgctaatattactgcaaaataccaatgtattcaaggtattttagaaaatgcagaccaatacgaagacgaactca
aacaagaccaaaaattaattgataatttgaaattctttttagatgctattttagaattgttgcattttattaaacctttgcatttaaaatcaga
aagcattaccgaaaaagacactgctttttatgatgtgtttgaaaattattacgaagcattgagtttgttgaccccattatataatatggtgcga
aactatgtaacgcaaaagccgtacagcaccgaaaaaataaaattaaattttgaaaatgcacaattattgaatggttgggatgccaataaagaag
gtgattacctaactaccattttgaaaaaagacggtaattattttttagccataatggataaaaagcataacaaagcgtttcaaaagtttccaga
aggaaaagaaaattatgaaaaaatggtgtataaactattgcctggagtaaataagatgttgccaaaagtatttttttccaataaaaatattgct
tacttcaacccatcaaaagagttattagaaaactataaaaaagagacgcacaaaaaaggagacacattcaatttagaacattgtcatacgttga
tcgattttttcaaggactctttaaacaaacatgaagactggaaatactttgattttcaattttctgaaacaaaatcgtatcaagatttgagtgg
tttttatagagaagtagaacatcaaggctacaaaatcaattttaaaaatatcgattcagaatatattgatggtttggtgaacgaaggtaaattg
tttctatttcaaatttacagcaaagatttttcgcctttttccaaagggaaaccgaacatgcacactttgtattggaaagccttatttgaagaac
aaaatttgcaaaatgtaatctataaattgaatggacaagccgaaatattttttagaaaagcctctataaaacctaaaaatataatattgcacaa
aaagaaaattaaaattgccaaaaagcattttattgataaaaaaacaaaaacatctgaaattgttcctgttcaaacaataaaaaacctcaatatg
tactaccaaggaaaaataagtgaaaaagaattaacacaagatgatttaaggtatattgataattttagcattttcaatgaaaaaaataaaacaa
ttgatattataaaagacaaacgatttacggttgataaatttcagtttcatgtgccgattaccatgaactttaaagcaacgggcggaagttatat
caatcaaaccgtattagaatatttgcaaaacaatcccgaagttaagattattggattggatagaggcgaacgccatttggtatatctgacactg
atagaccagcaaggaaacatcttgaaacaagaaagtttgaatacaatcaccgattctaaaatctcgacaccttatcataagttgttggataaca
aggaaaacgagcgtgacttggctcgaaaaaattggggaacggtggaaaacatcaaagaactcaaagaaggctacatcagtcaagtggtgcataa
aattgctacgttgatgctggaagaaaatgccattgtggtaatggaagatttgaattttggatttaaacgtggacgttttaaagtggaaaaacaa
atttatcaaaagctggaaaaaatgttgattgacaaattgaattatttggttttaaaagacaaacaacctcaggaattaggcggattgtacaacg
cattacaactcaccaataaatttgaaagtttccaaaaaatgggtaaacaatcgggctttttgttttatgtacccgcttggaacacctccaaaat
agacccaaccacagggtttgtcaattatttttataccaaatatgaaaatgttgacaaagccaaagccttttttgaaaaatttgaggcgattcgt
ttcaatgcagaaaagaagtattttgaatttgaagtaaaaaaatatagcgattttaacccaaaagccgaaggcactcaacaagcctggaccattt
gcacgtatggcgaacgaatagaaaccaaacgacaaaaagaccaaaacaacaaatttgtaagcactccaattaatctaaccgaaaagatagaaga
ctttttgggtaaaaaccaaattgtttatggtgatggtaattgcatcaaatctcaaattgctagcaaagacgacaaggctttttttgaaacctta
ttgtattggttcaaaatgactttacaaatgcgaaacagcgaaacaagaacagatatagattatctaatttcgcccgtgatgaatgacaacggaa
cattttacaacagccgagattatgaaaaattagaaaatccaactttgcccaaagatgccgatgccaacggagcgtatcatattgccaaaaaagg
attgatgcttttgaataaaatagaccaagccgacttgacaaaaaaagtggatttatctattagtaacagagattggttgcaatttgtacaaaaa
aataaataa
SEQ atggaacaggagtactatttaggactggatatgggaaccggatctgtaggatgggctgttacagattcggaatatcatgtcttgcgtaaacatg
ID gaaaagcactatggggagtccgattatttgaaagtgcatcgacagcagaagaacgaagaatgttccgaacatcaagaagaagactagatcgaag
NO: aaactggagaattgaaattttacaggaaatttttgcagaggaaataagtaagaaagatccaggatttttcttgcgaatgaaagaaagcaaatat
29 tatccagaagataagcgagatatcaatggaaattgtccggaactgccatatgcattatttgttgatgacgattttacagataaagattatcata
aaaaatttccgacaatttatcatctcaggaaaatgttgatgaatacagaggagacaccggatatccggttggtgtatctggcaattcatcatat
gatgaagcataggggccatttcttgttatctggtgacattaatgagattaaggagttcggaacgacattttcaaaattgttggagaatatcaaa
aatgaggaattggattggaatcttgaactgggaaaagaagaatatgctgttgtagaaagtattttaaaagataacatgttaaaccgatecacaa
agaaaaccagattaataaaagcattaaaagcaaaatcaatatgtgaaaaggctgtactgaatttattggctggtggaacggtgaaattgagtga
tatatttggtcttgaagaattaaatgagacagaaagaccgaagatttcctttgctgataatggatacgatgattatatcggagaagttgaaaat
gagctgggagaacaattctatattatagagacggcaaaagcagtgtatgactgggcggtattagttgaaatattgggaaaatatacgtcaattt
cagaagcgaaagtagcaacgtatgaaaaacataaatcggatttacaatttttgaaaaagatagttcggaaatatctgacaaaggaggaatataa
agatatttttgtaagtacgagtgacaaattgaaaaattactctgcttatataggaatgacgaaaataaatggaaaaaaggttgatttgcagagc
aaacggtgcagtaaagaagaattctatgattttattaagaaaaacgtacttaaaaagctagaaggacaacctgaatatgaatatttgaaagaag
agctagaaagagaaacatttctaccaaaacaggtgaacagggataatggtgtaataccgtatcagattcatttgtacgagttgaaaaagatatt
aggaaatttacgggataaaatagacctcattaaagagaacgaagataaactggttcaattatttgaattcagaattccgtattatgttggtccg
ctgaataagatagatgacggaaaagagggaaaatttacatgggctgtacggaaaagtaatgaaaagatatatccatggaattttgaaaatgtag
ttgatatagaagcaagtgcagaaaaatttatccggagaatgacaaataagtgtacatatctgatgggcgaagatgtattgccgaaggattcatt
gctttacagtaaatatatggttttaaatgaattaaataatgtaaagttggatggcgaaaaattatctgtagaattgaaacaacggttgtataca
gatgtattttgtaagtatcggaaagtaactgtaaagaagataaaaaattacttgaaatgtgaaggtatcatatccggcaatgtcgaaataactg
gaattgatggtgattttaaggcatcgttaacggcatatcatgattttaaagaaatcttgacaggaacagaattggctaaaaaggacaaagaaaa
tattattaccaatatagtattgtttggagatgataaaaagctgctgaaaaagagactgaatcgattatatcctcagattacgccgaatcagttg
aagaaaatatgtgcgctatcctatacaggctggggaagattttctaaaaagttcttagaagaaataacagctccagatccggaaacgggagag
gtatggaatatcattacggcattgtgggaatcgaataataatctgatgcaattattaagtaatgaatatcggtttatggaagaagtcgaaacat
acaatatgggaaaacagactaaaacattgtcgtacgaaacagtagagaatatgtatgtttctccatctgtgaaaagacagatatggcagacgct
gaaaatcgtgaaagaattagaaaaagtaatgaaagaatctccgaaacgtgtatttattgagatggcgagagaaaagcaagaaagtaagagaacc
gaatcgcgtaaaaaacaactaatagatttgtataaggcttgtaaaaatgaagaaaaagattgggtaaaagaactgggagatcaggaagaacaga
aattacgaagcgataagttgtacctatattatacgcaaaagggtcgttgtatgtattctggcgaggtaatagaactgaaagacttatgggataa
tacaaaatatgatattgatcatatatatccacaatctaaaacgatggatgacagtcttaataatcgcgtattggtaaaaaagaaatataatgca
acaaaatcagataagtatccattaaatgaaaatatacgacatgagagaaaaggcttttggaagtcactgttagatggagggtttataagtaaag
aaaaatatgaacgcttaataagaaatacagaattgagtccggaagaattagcaggatttattgaaaggcagattgttgaaacgaggcagagtac
aaaagctgtagcggaaatattaaagcaagtgtttccggaaagtgaaattgtatatgtcaaagcaggtacggtttcaagattcagaaaagatttt
gaattactgaaagttcgagaagtgaatgatttgcatcacgcaaaggatgcgtatttaaatattgtagttggtaatagttattatgtgaaattta
ctaagaatgcatcatggtttataaaagaaaatccgggacgtacttacaacttaaaaaagatgtttacatcaggttggaatattgaacgaaatgg
agaagttgcatgggaagtcgggaaaaaaggaacaattgtaacggtaaaacaaataatgaataaaaataatatattggtgacaagacaggttcat
gaagcgaaaggtgggctgtttgatcagcagattatgaaaaaaggaaaaggtcagattgctataaaggaaactgatgaacgtcttgcatcaatag
aaaagtatggaggctataataaagctgccggggcatattttatgctggtagaatctaaagataaaaaaggaaaaacaattcgaacgatagaatt
tataccattatatttaaagaataaaatcgagtcggatgaatcaatagcattgaactttttagaaaaaggcagaggtttgaaagaaccaaagata
ctattgaaaaaaattaagattgatacattatttgatgtggacggattcaaaatgtggttgtctggaagaacaggggacagactactatttaaat
gtgcaaatcaattgattttggatgagaaaataattgtaacaatgaaaaaaattgtaaagtttattcaaaggagacaagaaaatagagaattaaa
attatctgataaagatggaattgataatgaagtacttatggaaatatataacacttttgtggataagttagaaaacacagtgtatagaatacga
ttatccgaacaggcaaaaacgcttatagataaacaaaaagaatttgaaaggttatcactagaggataaaagtagtactttgtttgaaattttac
atatttttcagtgtcaaagtagtgcggccaatttaaaaatgataggcggacctggaaaagcaggaatattagttatgaataataatataagtaa
gtgtaacaaaatttctattataaatcagtctccaacaggaattttcgaaaatgagattgatttgttaaagat
SEQ ATGAAATCTTTCGATTCATTCACAAATCTTTATTCTCTTTCAAAAACCTTGAAATTTGAGATGAGACCTGT
ID CGGAAATACCCAAAAAATGCTCGACAATGCAGGAGTATTTGAAAAAGACAAACTAATTCAAAAAAAGT
NO: ACGGAAAAACAAAGCCGTATTTCGACAGACTCCACAGAGAATTTATAGAAGAAGCGCTCACGGGGGTA
30 GAGCTAATAGGACTAGATGAGAACTTTAGGACACTTGTTGACTGGCAAAAAGATAAGAAAAATAATGTC
GCAATGAAAGCGTATGAAAATAGTTTGCAGCGGCTGAGAACGGAAATAGGTAAAATATTTAACCTAAA
GGCTGAGGATTGGGTAAAGAACAAATATCCAATATTAGGGCTGAAAAATAAAAATACCGATATTTTATT
CGAAGAGGCTGTATTCGGGATATTGAAAGCCCGATATGGAGAAGAAAAAGATACTTTTATAGAAGTAG
AGGAAATAGATAAAACCGGCAAATCAAAGATCAATCAAATATCAATTTTCGATAGTTGGAAAGGATTTA
CAGGATATTTCAAAAAATTTTTTGAAACCAGAAAGAATTTTTACAAAAACGACGGAACTTCTACAGCAA
TTGCTACAAGGATCATTGATCAAAATCTGAAAAGATTCATAGATAATCTGTCAATAGTTGAAAGTGTGA
GACAAAAGGTTGATCTCGCCGAGACAGAAAAATCTTTCAGCATATCTCTATCGCAATTCTTCTCAATAGA
CTTTTATAACAAGTGTCTCCTTCAAGATGGTATTGATTACTACAACAAGATAATCGGTGGAGAAACTCTC
AAAAATGGCGAAAAACTAATAGGTCTCAATGAACTAATAAATCAATATAGGCAGAATAATAAGGATCA
GAAAATCCCATTTTTCAAACTTCTTGATAAACAAATTTTGAGTGAAAAGATATTATTTTTGGATGAAATA
AAAAATGACACAGAACTGATCGAGGCGCTGAGTCAGTTCGCAAAAACAGCCGAAGAAAAAACAAAAAT
TGTCAAAAAGCTTTTTGCCGATTTTGTAGAAAATAATTCCAAATACGATCTTGCACAGATTTATATTTCC
CAAGAAGCATTCAATACTATATCAAACAAGTGGACAAGCGAAACTGAGACGTTCGCTAAATATCTATTC
GAAGCAATGAAGAGTGGAAAACTTGCAAAGTATGAGAAAAAAGATAATAGCTATAAATTTCCTGATTTT
ATTGCCCTTTCACAGATGAAGAGTGCTTTATTAAGTATCAGCCTTGAGGGACATTTTTGGAAAGAGAAAT
ACTACAAAATTTCAAAATTCCAAGAGAAGACCAATTGGGAGCAGTTTCTTGCAATTTTTCTATACGAGTT
TAACTCTCTTTTCAGCGACAAAATAAATACAAAAGATGGAGAAACAAAGCAAGTTGGATACTATCTATT
TGCCAAAGACCTGCATAATCTTATCTTAAGTGAGCAGATTGATATTCCAAAAGATTCAAAAGTCACAAT
AAAAGATTTTGCCGATTCTGTACTCACAATCTACCAAATGGCAAAATATTTTGCGGTAGAAAAAAAACG
AGCGTGGCTTGCCGAGTATGAACTAGATTCATTTTATACCCAGCCAGACACAGGCTATTTACAGTTTTAT
GATAACGCCTACGAGGATATTGTGCAGGTATACAACAAGCTTCGAAACTATCTGACCAAAAAGCCATAT
AGCGAGGAGAAATGGAAGTTGAATTTTGAAAATTCTACGCTGGCAAATGGATGGGATAAGAACAAAGA
ATCTGATAATTCAGCAGTTATTCTACAAAAAGGTGGAAAATATTATTTGGGACTGATTACTAAAGGACA
CAACAAAATTTTTGATGACCGTTTTCAAGAAAAATTTATTGTGGGAATTGAAGGTGGAAAATATGAAAA
AATAGTCTATAAATTTTTCCCCGACCAGGCAAAAATGTTTCCCAAAGTGTGCTTTTCTGCAAAAGGACTC
GAATTTTTTAGACCGTCTGAAGAAATTTTAAGAATTTATAACAATGCAGAGTTTAAAAAAGGAGAAACT
TATTCAATAGATAGTATGCAGAAGTTGATTGATTTTTATAAAGATTGCTTGACTAAATATGAAGGCTGGG
CATGTTATACCTTTCGGCATCTAAAACCCACAGAAGAATACCAAAACAATATTGGAGAGTTTTTTCGAG
ATGTTGCAGAGGACGGATACAGGATTGATTTTCAAGGCATTTCAGATCAATATATTCATGAAAAAAACG
AGAAAGGCGAACTTCACCTTTTTGAAATCCACAATAAAGATTGGAATTTGGATAAGGCACGAGACGGAA
AGTCAAAAACAACACAAAAAAACCTTCATACACTCTATTTCGAATCGCTCTTTTCAAACGATAATGTTGT
TCAAAACTTTCCAATAAAACTCAATGGTCAAGCTGAAATTTTTTATAGACCGAAAACGGAAAAAGACAA
ATTAGAATCAAAAAAAGATAAGAAAGGGAATAAAGTGATTGACCATAAACGCTATAGTGAGAATAAGA
TTTTTTTTCATGTTCCTCTCACACTAAACCGCACTAAAAATGACTCATATCGCTTTAATGCTCAAATCAAC
AACTTTCTCGCAAATAATAAAGATATCAACATCATCGGTGTAGATAGGGGAGAAAAGCATTTAGTCTAT
TATTCGGTGATTACACAAGCTAGTGACATCTTAGAAAGTGGCTCACTAAATGAGCTAAATGGCGTGAAT
TATGCTGAAAAACTGGGAAAAAAGGCAGAAAATCGAGAACAAGCACGCAGAGACTGGCAAGACGTAC
AAGGGATCAAAGACCTCAAGAAAGGATATATTTCACAGGTGGTGCGAAAGCTTGCTGATTTAGCAATTA
AACACAATGCCATTATCATTCTTGAAGATTTGAATATGAGATTTAAACAAGTTCGGGGCGGTATCGAAA
AATCCATTTATCAACAGTTAGAAAAAGCACTGATAGATAAATTAAGCTTTCTTGTAGACAAAGGTGAAA
AAAATCCCGAGCAAGCAGGACATCTTCTGAAAGCATATCAGCTTTCGGCCCCATTTGAGACATTTCAAA
AAATGGGCAAACAGACGGGTATAATCTTTTATACACAAGCTTCGTATACCTCAAAAAGTGACCCTGTAA
CAGGTTGGCGACCACACCTGTATCTCAAATATTTCAGTGCCAAAAAAGCAAAAGACGATATTGCAAAGT
TTACAAAAATAGAATTTGTAAACGATAGGTTTGAGCTTACCTATGATATAAAGGACTTTCAGCAAGCAA
AAGAATATCCAAATAAAACTGTTTGGAAAGTTTGCTCAAATGTAGAGAGATTCAGGTGGGACAAAAACC
TCAATCAAAACAAAGGCGGATATACTCACTACACAAATATAACTGAGAATATCCAAGAGCTTTTTACAA
AATATGGAATTGATATCACAAAAGATTTGCTCACACAGATTTCTACAATTGATGAAAAACAAAATACCT
CATTTTTTAGAGATTTTATTTTTTATTTCAACCTTATTTGCCAAATCAGAAATACCGATGATTCTGAGATT
GCTAAAAAGAATGGGAAAGATGATTTTATACTGTCACCTGTTGAGCCGTTTTTCGATAGCCGAAAAGAC
AATGGAAATAAACTTCCTGAGAATGGAGATGATAACGGCGCGTATAACATAGCAAGAAAAGGGATTGT
CATACTCAACAAAATCTCACAATATTCAGAGAAAAACGAAAATTGCGAGAAAATGAAATGGGGGGATT
TGTATGTATCAAACATTGACTGGGACAATTTTGTAACCCAAGCTAATGCACGGCATTAA
SEQ ATGATTATCTTATATATTAGTACCTCGAATATGAACATGGAAGGAGTATTTATGGAAAATTTTAAAAACT
ID TGTATCCAATAAACAAAACACTTCGATTTGAATTAAGACCCTATGGAAAAACATTGGAAAATTTTAAAA
NO: AATCCGGACTTTTAGAAAAAGATGCCTTTAAGGCAAATAGTAGACGAAGTATGCAAGCTATAATCGATG
31 AAAAATTCAAAGAGACTATCGAAGAACGCTTAAAGTACACTGAATTCAGTGAATGTGATCTTGGAAACA
TGACATCAAAAGATAAAAAAATAACTGATAAAGCAGCTACAAATTTAAAAAAGCAAGTTATCTTATCTT
TTGACGATGAAATATTTAATAATTACCTAAAACCTGATAAAAATATTGACGCATTATTTAAAAATGATCC
TTCAAATCCTGTAATCTCTACATTTAAAGGTTTTACGACATATTTTGTGAATTTTTTTGAAATTCGAAAAC
ATATTTTCAAGGGAGAATCATCAGGCTCAATGGCATACCGAATTATAGATGAAAACCTGACAACATACT
TGAATAATATTGAAAAAATAAAAAAACTGCCAGAAGAATTAAAATCACAGCTAGAAGGCATTGATCAG
ATTGATAAACTTAATAATTATAATGAGTTCATTACACAGTCAGGTATAACACACTATAATGAAATCATCG
GCGGTATATCAAAATCAGAGAATGTCAAAATACAGGGAATTAATGAAGGAATTAATCTATACTGTCAGA
AGAACAAAGTTAAACTTCCTCGACTGACTCCGCTATACAAAATGATATTATCAGACAGAGTTTCCAACTC
TTTTGTATTAGACACTATTGAAAATGACACAGAATTAATTGAAATGATAAGTGATTTGATTAATAAGACT
GAGATTTCGCAAGATGTTATAATGTCAGATATTCAAAATATTTTCATAAAATACAAACAACTTGGTAATT
TGCCGGGTATCTCATATTCTTCAATAGTTAATGCTATTTGCTCGGATTATGACAACAATTTCGGAGATGG
GAAGCGAAAAAAATCTTACGAAAATGATCGCAAAAAGCATTTGGAGACTAATGTATACTCCATAAATTA
TATTTCTGAATTGCTTACAGATACCGATGTTTCATCAAATATCAAGATGAGATATAAAGAGCTTGAGCAA
AATTATCAGGTTTGCAAAGAAAATTTTAATGCCACAAACTGGATGAATATTAAAAATATAAAACAATCT
GAAAAAACAAACCTTATTAAAGATTTGTTAGATATACTTAAATCGATTCAACGTTTCTATGATTTGTTTG
ATATTGTTGACGAAGATAAAAATCCAAGTGCTGAATTTTATACCTGGTTATCAAAAAATGCTGAAAAGC
TTGACTTTGAATTCAATTCTGTATATAACAAGTCACGAAACTATCTCACCAGGAAACAATACTCTGATAA
AAAAATCAAGCTGAATTTTGATTCTCCAACATTGGCCAAAGGGTGGGATGCTAACAAAGAAATAGATAA
CTCCACGATTATAATGCGTAAATTTAATAATGACAGAGGCGATTATGATTACTTCCTTGGCATATGGAAT
AAATCCACACCTGCAAATGAAAAAATAATCCCACTGGAGGATAATGGATTATTCGAAAAAATGCAATAT
AAGCTGTATCCAGATCCTAGTAAGATGTTACCGAAACAATTTCTATCAAAAATATGGAAGGCAAAGCAT
CCTACGACACCTGAATTTGATAAAAAATATAAAGAGGGAAGACATAAAAAAGGTCCTGATTTCGAAAA
AGAATTCCTGCATGAATTGATTGATTGCTTCAAACATGGTCTTGTTAATCACGATGAAAAATATCAGGAT
GTTTTTGGCTTCAATCTCCGTAACACTGAAGATTATAATTCATATACAGAGTTTCTCGAAGATGTGGAAA
GATGCAATTACAATCTTTCATTTAACAAAATTGCTGATACTTCAAACCTTATTAATGATGGGAAATTGTA
TGTATTTCAGATATGGTCAAAAGACTTTTCTATTGATTCAAAAGGTACTAAAAACTTGAATACAATCTAT
TTTGAATCACTATTTTCAGAAGAAAACATGATAGAAAAAATGTTCAAGCTTTCTGGAGAGGCTGAGATA
TTCTATCGACCAGCATCGTTGAATTATTGTGAAGATATCATAAAAAAAGGTCATCACCATGCAGAATTA
AAAGATAAGTTTGACTATCCTATAATAAAAGATAAGCGATATTCACAAGATAAGTTTTTCTTTCATGTGC
CAATGGTTATAAATTATAAATCTGAGAAACTGAATTCCAAAAGCCTTAACAACCGAACAAATGAAAACC
TGGGACAGTTTACACATATTATAGGTATAGACAGGGGCGAGCGGCACTTGATTTATTTAACTGTTGTTGA
TGTTTCCACTGGTGAAATCGTTGAACAGAAACATCTGGACGAAATTATCAATACTGATACCAAGGGAGT
TGAACACAAAACCCATTATTTGAATAAATTGGAAGAAAAATCTAAAACAAGAGATAACGAGCGTAAAT
CATGGGAAGCTATTGAAACTATCAAAGAATTAAAAGAAGGCTATATTTCTCATGTAATTAATGAAATAC
AAAAGCTGCAAGAAAAATATAATGCCTTAATCGTAATGGAAAATCTTAACTATGGGTTCAAAAACTCAC
GAATCAAAGTTGAAAAACAGGTTTATCAAAAATTCGAGACAGCATTGATTAAAAAGTTCAATTATATTA
TTGATAAAAAAGATCCAGAAACCTATATACATGGTTACCAGCTTACAAATCCTATTACCACTCTGGATAA
GATTGGAAATCAATCTGGAATAGTGCTGTATATTCCTGCGTGGAATACTTCTAAGATAGATCCCGTCACA
GGATTTGTAAACCTTCTGTACGCAGATGATTTGAAGTATAAAAATCAGGAGCAGGCCAAATCATTCATT
CAGAAAATAGACAACATATATTTTGAAAATGGAGAGTTTAAATTTGATATTGATTTTTCCAAATGGAATA
ATCGCTACTCAATAAGTAAAACTAAATGGACGTTAACAAGTTATGGGACTCGCATCCAGACATTTAGAA
ATCCCCAGAAAAACAATAAGTGGGATTCTGCTGAATATGATTTGACAGAAGAGTTTAAATTAATTTTAA
ATATAGACGGAACGTTAAAGTCACAGGACGTAGAAACATACAAAAAATTCATGTCTTTATTTAAACTAA
TGCTACAGCTTCGAAACTCTGTTACAGGAACCGACATTGATTATATGATCTCTCCTGTCACTGATAAAAC
AGGAACACATTTCGATTCAAGAGAAAATATTAAAAATCTTCCTGCCGATGCAGATGCCAATGGTGCCTA
CAACATTGCGCGCAAAGGAATAATGGCTATTGAAAATATAATGAACGGTATAAGCGATCCACTAAAAAT
AAGCAACGAAGACTATTTAAAGTATATTCAGAATCAACAGGAATAA
SEQ ATGACCCAATTTGAAGGTTTTACCAATTTATACCAAGTTTCGAAGACCCTTCGTTTTGAACTGATTCCCC
ID AAGGAAAAACACTCAAACATATCCAGGAGCAAGGGTTCATTGAGGAGGATAAAGCTCGCAATGACCAT
NO: TACAAAGAGTTAAAACCAATCATTGACCGCATCTATAAGACTTATGCTGATCAATGTCTCCAACTGGTAC
32 AGCTTGACTGGGAGAATCTATCTGCAGCCATAGACTCCTATCGTAAGGAAAAAACCGAAGAAACACGA
AATGCGCTGATTGAGGAGCAAGCAACATATAGAAATGCGATTCATGACTACTTTATAGGTCGGACGGAT
AATCTGACAGATGCCATAAATAAGCGCCATGCTGAAATCTATAAAGGACTTTTTAAAGCTGAACTTTTCA
ATGGAAAAGTTTTAAAGCAATTAGGGACCGTAACCACGACAGAACATGAAAATGCTCTACTCCGTTCGT
TTGACAAATTTACGACCTATTTTTCCGGCTTTTATGAAAACCGAAAAAATGTCTTTAGCGCTGAAGATAT
CAGCACGGCAATTCCCCATCGAATCGTCCAGGACAATTTCCCTAAATTTAAGGAAAACTGCCATATTTTT
ACAAGATTGATAACCGCAGTTCCTTCTTTGCGGGAGCATTTTGAAAATGTCAAAAAGGCCATTGGAATCT
TTGTTAGTACGTCTATTGAAGAAGTCTTTTCCTTTCCCTTTTATAATCAACTTCTAACCCAAACGCAAATT
GATCTTTATAATCAACTTCTCGGCGGCATATCTAGGGAAGCAGGCACAGAAAAAATCAAGGGACTTAAT
GAAGTTCTCAATCTGGCTATCCAAAAAAATGATGAAACAGCCCATATAATCGCGTCCCTGCCGCATCGTT
TTATTCCTCTTTTTAAACAAATTCTTTCCGATCGAAATACGTTATCCTTTATTTTGGAAGAATTCAAAAGC
GATGAGGAAGTCATCCAATCCTTCTGCAAATATAAAACCCTCTTGAGAAACGAAAATGTACTGGAGACT
GCAGAAGCCCTTTTCAATGAATTAAATTCCATTGATTTGACTCATATCTTTATTTCCCATAAAAAGTTAG
AAACCATCTCTTCAGCGCTTTGTGACCATTGGGATACCTTGCGCAATGCACTTTACGAAAGACGGATTTC
TGAACTCACTGGCAAAATAACAAAAAGTGCCAAAGAAAAAGTTCAAAGGTCATTAAAACATGAGGATA
TAAATCTCCAAGAAATTATTTCTGCTGCAGGAAAAGAACTATCAGAAGCATTCAAACAAAAAACAAGTG
AAATTCTTTCCCATGCCCATGCTGCACTTGACCAGCCTCTTCCCACAACATTAAAAAAACAGGAAGAAA
AAGAAATCCTCAAATCACAGCTCGATTCGCTTTTAGGCCTTTATCATCTTCTTGATTGGTTTGCTGTCGAT
GAAAGCAATGAAGTCGACCCAGAATTCTCAGCACGGCTGACAGGCATTAAACTAGAAATGGAACCAAG
CCTTTCGTTTTATAATAAAGCAAGAAATTATGCGACAAAAAAGCCCTATTCGGTGGAAAAATTTAAATT
GAATTTTCAAATGCCAACCCTTGCCTCTGGTTGGGATGTCAATAAAGAAAAAAATAATGGAGCTATTTTA
TTCGTAAAAAATGGTCTCTATTACCTTGGTATCATGCCTAAACAGAAGGGGCGCTATAAAGCCCTGTCTT
TTGAGCCGACAGAAAAAACATCAGAAGGATTCGATAAGATGTACTATGACTACTTCCCAGATGCCGCAA
AAATGATTCCTAAGTGTTCCACTCAGCTAAAGGCTGTAACCGCTCATTTTCAAACTCATACCACCCCCAT
TCTTCTCTCAAATAATTTCATTGAACCTCTTGAAATCACAAAAGAAATTTATGACCTGAACAATCCTGAA
AAGGAGCCTAAAAAGTTTCAAACGGCTTATGCAAAGAAGACAGGCGATCAAAAAGGCTATAGAGAAGC
GCTTTGCAAATGGATTGACTTTACGCGGGATTTTCTCTCTAAATATACGAAAACAACTTCAATCGATTTA
TCTTCACTCCGCCCTTCTTCGCAATATAAAGATTTAGGGGAATATTACGCCGAACTGAATCCGCTTCTCT
ATCATATCTCCTTCCAACGAATTGCTGAAAAGGAAATCATGGATGCTGTAGAAACGGGAAAATTGTATC
TGTTCCAAATCTACAATAAGGATTTTGCGAAGGGCCATCACGGGAAACCAAATCTCCACACCCTGTATT
GGACAGGTCTCTTCAGTCCTGAAAACCTTGCGAAAACCAGCATCAAACTTAATGGTCAAGCAGAATTGT
TCTATCGACCTAAAAGCCGCATGAAGCGGATGGCCCATCGTCTTGGGGAAAAAATGCTGAACAAAAAAC
TAAAGGACCAGAAGACACCGATTCCAGATACCCTCTACCAAGAACTGTACGATTATGTCAACCACCGGC
TAAGCCATGATCTTTCCGATGAAGCAAGGGCCCTGCTTCCAAATGTTATCACCAAAGAAGTCTCCCATGA
AATTATAAAGGATCGGCGGTTTACTTCCGATAAATTTTTCTTCCATGTTCCCATTACACTGAATTATCAAG
CAGCCAATAGTCCCAGTAAATTCAACCAGCGTGTCAATGCCTACCTTAAGGAGCATCCGGAAACGCCCA
TCATTGGTATCGATCGTGGAGAACGCAATCTAATCTATATTACCGTCATTGACAGTACTGGGAAAATTTT
GGAGCAGCGTTCCCTGAATACCATCCAGCAATTTGACTACCAAAAAAAATTGGACAACAGGGAAAAAG
AGCGTGTTGCCGCCCGTCAAGCCTGGTCCGTCGTCGGAACGATCAAAGACCTTAAACAAGGCTACTTGT
CACAGGTCATCCATGAAATTGTAGACCTGATGATTCATTACCAAGCTGTTGTCGTCCTTGAAAACCTCAA
CTTCGGATTTAAATCAAAACGGACAGGCATTGCCGAAAAAGCAGTCTACCAACAATTTGAAAAGATGCT
AATAGATAAACTCAACTGTTTGGTTCTCAAAGATTATCCTGCTGAGAAAGTGGGAGGCGTCTTAAACCC
GTATCAACTTACAGATCAGTTCACGAGCTTTGCAAAAATGGGCACGCAAAGCGGCTTCCTTTTCTATGTA
CCGGCCCCTTATACCTCAAAGATTGATCCCCTGACTGGTTTTGTCGATCCCTTTGTATGGAAGACCATTA
AAAATCATGAAAGTCGGAAGCATTTCCTAGAAGGATTTGATTTCCTGCATTATGATGTCAAAACAGGTG
ATTTTATCCTCCATTTTAAAATGAATCGGAATCTCTCTTTCCAGAGAGGGCTTCCTGGCTTCATGCCAGCT
TGGGATATTGTTTTCGAAAAGAATGAAACCCAATTTGATGCAAAAGGGACGCCCTTCATTGCAGGAAAA
CGAATTGTTCCTGTAATCGAAAATCATCGTTTTACGGGTCGTTACAGAGACCTCTATCCCGCTAATGAAC
TCATTGCCCTTCTGGAAGAAAAAGGCATTGTCTTTAGAGACGGAAGTAATATATTACCCAAACTTTTAGA
AAATGATGATTCTCATGCAATTGATACGATGGTCGCCTTGATTCGCAGTGTACTCCAAATGAGAAACAG
CAATGCCGCAACGGGGGAAGACTACATCAACTCTCCCGTTAGGGATCTGAACGGGGTGTGTTTCGACAG
TCGATTCCAAAATCCAGAATGGCCAATGGATGCGGATGCCAACGGAGCTTATCATATTGCCTTAAAAGG
GCAGCTTCTTCTGAACCACCTCAAAGAAAGCAAAGATCTGAAATTACAAAACGGCATCAGCAACCAAGA
TTGGCTGGCCTACATTCAGGAACTGAGAAACTGA
SEQ ATGGCCGTCAAATCCATCAAAGTGAAACTTCGTCTCGACGATATGCCGGAGATTCGGGCCGGTCTATGG
ID AAACTTCATAAGGAAGTCAATGCGGGGGTTCGATATTACACGGAATGGCTCAGTCTTCTCCGTCAAGAG
NO: AACTTGTATCGAAGAAGTCCGAATGGGGACGGAGAGCAAGAATGTGATAAGACTGCAGAAGAATGCAA
33 AGCCGAATTGTTGGAGCGGCTGCGCGCGCGTCAAGTGGAGAATGGACACCGTGGTCCGGCGGGATCGG
ACGATGAATTGCTGCAGTTGGCGCGTCAACTCTATGAGTTGTTGGTTCCGCAGGCGATAGGTGCGAAAG
GCGACGCGCAGCAAATTGCCCGCAAATTTTTGAGCCCCTTGGCCGACAAGGACGCAGTTGGTGGGCTTG
GAATCGCGAAGGCGGGGAACAAACCGCGGTGGGTTCGCATGCGCGAAGCGGGGGAACCAGGCTGGGAA
GAGGAGAAGGAGAAGGCTGAGACGAGGAAATCTGCGGATCGGACTGCGGATGTTTTGCGCGCGCTCGC
GGATTTTGGGTTAAAGCCACTGATGCGCGTATACACCGATTCTGAGATGTCATCGGTGGAGTGGAAACC
GCTTCGGAAGGGACAAGCCGTTCGGACGTGGGATAGGGACATGTTCCAACAAGCTATCGAACGGATGAT
GTCGTGGGAGTCGTGGAATCAGCGCGTTGGGCAAGAGTACGCGAAACTCGTAGAACAAAAAAATCGAT
TTGAGCAGAAGAATTTCGTCGGCCAGGAACATCTGGTCCATCTCGTCAATCAGTTGCAACAAGATATGA
AAGAAGCATCGCCCGGACTCGAATCGAAAGAGCAAACCGCGCACTATGTGACGGGACGGGCATTGCGC
GGATCGGACAAGGTATTTGAGAAGTGGGGGAAACTCGCCCCCGATGCACCTTTCGATTTGTACGACGCC
GAAATCAAGAATGTGCAGAGACGTAACACGAGACGATTCGGATCACATGACTTGTTCGCAAAATTGGCA
GAGCCAGAGTATCAGGCCCTGTGGCGCGAAGATGCTTCGTTTCTCACGCGTTACGCGGTGTACAACAGC
ATCCTTCGCAAACTGAATCACGCCAAAATGTTCGCGACGTTTACTTTGCCGGATGCAACGGCGCACCCG
ATTTGGACTCGCTTCGATAAATTGGGTGGGAATTTGCACCAGTACACCTTTTTGTTCAACGAATTTGGAG
AACGCAGGCACGCGATTCGTTTTCACAAGCTATTGAAAGTCGAGAATGGTGTCGCAAGAGAAGTTGATG
ATGTCACCGTGCCCATTTCAATGTCAGAGCAATTGGATAATCTGCTTCCCAGAGATCCCAATGAACCGAT
TGCGCTATATTTTCGAGATTACGGAGCCGAACAGCATTTCACAGGTGAATTTGGTGGCGCGAAGATCCA
GTGCCGCCGGGATCAGCTGGCTCATATGCACCGACGCAGAGGGGCGAGGGATGTTTATCTCAATGTCAG
CGTACGTGTGCAGAGTCAGTCTGAGGCGCGGGGAGAACGTCGCCCGCCGTATGCGGCAGTATTTCGTCT
GGTCGGGGACAACCATCGCGCGTTTGTCCATTTCGATAAACTATCGGATTATCTTGCGGAACATCCGGAT
GATGGGAAGCTCGGGTCGGAGGGGTTGCTTTCCGGGCTGCGGGTGATGAGTGTCGATCTCGGCCTTCGC
ACATCTGCATCGATTTCCGTTTTTCGCGTTGCCCGGAAGGACGAGTTGAAGCCGAACTCAAAAGGTCGTG
TACCGTTTTTCTTTCCGATAAAAGGGAATGACAATCTCGTCGCGGTTCATGAGCGATCACAACTCTTGAA
GCTGCCTGGCGAAACGGAGTCGAAGGACCTGCGTGCTATCCGAGAAGAACGCCAACGGACATTGCGGC
AGTTGCGGACGCAACTGGCGTATTTGCGGCTGCTCGTGCGGTGTGGGTCGGAAGATGTGGGGCGGCGTG
AACGGAGTTGGGCAAAGCTTATCGAGCAGCCGGTGGATGCGGCCAATCACATGACACCGGATTGGCGC
GAGGCTTTTGAAAACGAACTTCAGAAGCTTAAGTCACTCCATGGTATCTGTAGCGACAAGGAATGGATG
GATGCTGTCTACGAGAGCGTTCGCCGCGTGTGGCGTCACATGGGCAAACAGGTTCGCGATTGGCGAAAG
GACGTACGAAGCGGAGAGCGGCCCAAGATTCGCGGCTATGCGAAAGACGTGGTCGGTGGAAACTCGAT
TGAGCAAATCGAGTATCTGGAACGTCAGTACAAGTTCCTCAAGAGTTGGAGCTTCTTTGGTAAGGTGTC
GGGACAAGTGATTCGTGCGGAGAAGGGATCTCGTTTTGCGATCACGCTGCGCGAACACATTGATCACGC
GAAGGAAGATCGGCTGAAGAAATTGGCGGATCGCATCATTATGGAGGCTCTCGGCTATGTGTACGCGTT
GGATGAGCGCGGCAAAGGAAAGTGGGTTGCGAAGTATCCGCCGTGCCAGCTCATCCTGCTGGAGGAATT
GAGCGAGTACCAGTTCAATAACGACAGGCCTCCGAGCGAAAACAACCAGTTGATGCAATGGAGTCATC
GCGGCGTGTTCCAGGAGTTGATAAATCAGGCCCAAGTCCATGATTTACTCGTTGGGACGATGTATGCAG
CGTTCTCGTCGCGATTCGACGCGCGAACTGGGGCACCGGGTATCCGCTGTCGCCGGGTTCCGGCGCGTTG
CACCCAGGAGCACAATCCAGAACCATTTCCTTGGTGGCTGAACAAGTTTGTGGTGGAACATACGTTGGA
TGCTTGTCCCCTACGCGCAGACGACCTCATCCCAACGGGTGAAGGAGAGATTTTTGTCTCGCCGTTCAGC
GCGGAGGAGGGGGACTTTCATCAGATTCACGCCGACCTGAATGCGGCGCAAAATCTGCAGCAGCGACTC
TGGTCTGATTTTGATATCAGTCAAATTCGGTTGCGGTGTGATTGGGGTGAAGTGGACGGTGAACTCGTTC
TGATCCCAAGGCTTACAGGAAAACGAACGGCGGATTCATATAGCAACAAGGTGTTTTATACCAATACAG
GTGTCACCTATTATGAGCGAGAGCGGGGGAAGAAGCGGAGAAAGGTTTTCGCGCAAGAGAAATTGTCG
GAGGAAGAGGCGGAGTTGCTCGTGGAAGCAGACGAGGCGAGGGAGAAATCGGTCGTTTTGATGCGTGA
TCCGTCTGGCATCATCAATCGGGGAAATTGGACCAGGCAAAAGGAATTTTGGTCGATGGTGAACCAGCG
GATCGAAGGATACTTGGTCAAGCAGATTCGCTCGCGCGTTCCATTACAAGATAGTGCGTGTGAAAACAC
GGGGGATATTTAA
SEQ ATGGCGACACGCAGTTTTATTTTAAAAATTGAACCAAATGAAGAAGTTAAAAAGGGATTATGGAAGACG
ID CATGAGGTATTGAATCATGGAATTGCCTACTACATGAATATTCTGAAACTAATTAGACAGGAAGCTATTT
NO: ATGAACATCATGAACAAGATCCTAAAAATCCGAAAAAAGTTTCAAAAGCAGAAATACAAGCCGAGTTA
34 TGGGATTTTGTTTTAAAAATGCAAAAATGTAATAGTTTTACACATGAAGTTGACAAAGATGTTGTTTTTA
ACATCCTGCGTGAACTATATGAAGAGTTGGTCCCTAGTTCAGTCGAGAAAAAGGGTGAAGCCAATCAAT
TATCGAATAAGTTTCTGTACCCGCTAGTTGATCCGAACAGTCAAAGTGGGAAAGGGACGGCATCATCCG
GACGTAAACCTCGGTGGTATAATTTAAAAATAGCAGGCGACCCATCGTGGGAGGAAGAAAAGAAAAAA
TGGGAAGAGGATAAAAAGAAAGATCCCCTTGCTAAAATCTTAGGTAAGTTAGCAGAATATGGGCTTATT
CCGCTATTTATTCCATTTACTGACAGCAACGAACCAATTGTAAAAGAAATTAAATGGATGGAAAAAAGT
CGTAATCAAAGTGTCCGGCGACTTGATAAGGATATGTTTATCCAAGCATTAGAGCGTTTTCTTTCATGGG
AAAGCTGGAACCTTAAAGTAAAGGAAGAGTATGAAAAAGTTGAAAAGGAACACAAAACACTAGAGGA
AAGGATAAAAGAGGACATTCAAGCATTTAAATCCCTTGAACAATATGAAAAAGAACGGCAGGAGCAAC
TTCTTAGAGATACATTGAATACAAATGAATACCGATTAAGCAAAAGAGGATTACGTGGTTGGCGTGAAA
TTATCCAAAAATGGCTAAAGATGGATGAAAATGAACCATCAGAAAAATATTTAGAAGTATTTAAAGATT
ATCAACGGAAACATCCACGAGAAGCCGGGGACTATTCTGTCTATGAATTTTTAAGCAAGAAAGAAAATC
ATTTTATTTGGCGAAATCATCCTGAATATCCTTATTTGTATGCTACATTTTGTGAAATTGACAAAAAAAA
GAAAGACGCTAAGCAACAGGCAACTTTTACTTTGGCTGACCCGATTAACCATCCGTTATGGGTACGATTT
GAAGAAAGAAGCGGTTCGAACTTAAACAAATATCGAATTTTAACAGAGCAATTACACACTGAAAAGTTA
AAAAAGAAATTAACAGTTCAACTTGATCGTTTAATTTATCCAACTGAATCCGGCGGTTGGGAGGAAAAA
GGTAAAGTAGATATCGTTTTGTTGCCGTCAAGACAATTTTATAATCAAATCTTCCTTGATATAGAAGAAA
AGGGGAAACATGCTTTTACTTATAAGGATGAAAGTATTAAATTCCCCCTTAAAGGTACACTTGGTGGTGC
AAGAGTGCAGTTTGACCGTGACCATTTGCGGAGATATCCGCATAAAGTAGAATCAGGAAATGTTGGACG
GATTTATTTTAACATGACAGTAAATATTGAACCAACTGAGAGCCCTGTTAGTAAGTCTTTGAAAATACAT
AGGGACGATTTCCCCAAGTTCGTTAATTTTAAACCGAAAGAGCTCACCGAATGGATAAAAGATAGTAAA
GGGAAAAAATTAAAAAGTGGTATAGAATCCCTTGAAATTGGTCTACGGGTGATGAGTATCGACTTAGGT
CAACGTCAAGCGGCTGCTGCATCGATTTTTGAAGTAGTTGATCAGAAACCGGATATTGAAGGGAAGTTA
TTTTTTCCAATCAAAGGAACTGAGCTTTATGCTGTTCACCGGGCAAGTTTTAACATTAAATTACCGGGTG
AAACATTAGTAAAATCACGGGAAGTATTGCGGAAAGCTCGGGAGGACAACTTAAAATTAATGAATCAA
AAGTTAAACTTTCTAAGAAATGTTCTACATTTCCAACAGTTTGAAGATATCACAGAAAGAGAGAAGCGT
GTAACTAAATGGATTTCTAGACAAGAAAATAGTGATGTTCCTCTTGTATATCAAGATGAGCTAATTCAAA
TTCGTGAATTAATGTATAAACCCTATAAAGATTGGGTTGCCTTTTTAAAACAACTCCATAAACGGCTAGA
AGTCGAGATTGGCAAAGAGGTTAAGCATTGGCGAAAATCATTAAGTGACGGGAGAAAAGGTCTTTACG
GAATCTCCCTAAAAAATATTGATGAAATTGATCGAACAAGGAAATTCCTTTTAAGATGGAGCTTACGTC
CAACAGAACCTGGGGAAGTAAGACGCTTGGAACCAGGACAGCGTTTTGCGATTGATCAATTAAACCACC
TAAATGCATTAAAAGAAGATCGATTAAAAAAGATGGCAAATACGATTATCATGCATGCCTTAGGTTACT
GTTATGATGTAAGAAAGAAAAAGTGGCAGGCAAAAAATCCAGCATGTCAAATTATTTTATTTGAAGATT
TATCTAACTACAATCCTTACGAGGAAAGGTCCCGTTTTGAAAACTCAAAACTGATGAAGTGGTCACGGA
GAGAAATTCCACGACAAGTCGCCTTACAAGGTGAAATTTACGGATTACAAGTTGGGGAAGTAGGTGCCC
AATTCAGTTCAAGATTCCATGCGAAAACCGGGTCGCCGGGAATTCGTTGCAGTGTTGTAACGAAAGAAA
AATTGCAGGATAATCGCTTTTTTAAAAATTTACAAAGAGAAGGACGACTTACTCTTGATAAAATCGCAG
TTTTAAAAGAAGGAGACTTATATCCAGATAAAGGTGGAGAAAAGTTTATTTCTTTATCAAAGGATCGAA
AGTTGGTAACTACGCATGCTGATATTAACGCGGCCCAAAATTTACAGAAGCGTTTTTGGACAAGAACAC
ATGGATTTTATAAAGTTTACTGCAAAGCCTATCAGGTTGATGGACAAACTGTTTATATTCCGGAGAGCAA
GGACCAAAAACAAAAAATAATTGAAGAATTTGGGGAAGGCTATTTTATTTTAAAAGATGGTGTATATGA
ATGGGGTAATGCGGGGAAACTAAAAATTAAAAAAGGTTCCTCTAAACAATCATCGAGTGAATTAGTAGA
TTCGGACATACTGAAAGATTCATTTGATTTAGCAAGTGAACTTAAGGGAGAGAAACTCATGTTATATCG
AGATCCGAGTGGAAACGTATTTCCTTCCGACAAGTGGATGGCAGCAGGAGTATTTTTTGGCAAATTAGA
AAGAATATTGATTTCTAAGTTAACAAATCAATACTCAATATCAACAATAGAAGATGATTCTTCAAAACA
ATCAATGTAA
SEQ ATGCCCACCCGCACCATCAATCTGAAACTTGTTCTTGGGAAAAATCCTGAAAACGCAACATTGCGACGC
ID GCCCTATTTTCGACACACCGTTTGGTTAACCAAGCGACGAAACGTATTGAGGAATTCTTGTTGCTGTGTC
NO: GTGGAGAAGCCTACAGAACAGTGGATAATGAGGGGAAGGAAGCCGAGATTCCACGTCATGCAGTCCAA
35 GAAGAAGCTCTTGCCTTTGCCAAAGCTGCTCAACGCCACAACGGCTGTATATCCACCTATGAAGACCAA
GAGATTCTTGATGTACTGCGGCAACTGTACGAACGTCTTGTTCCTTCGGTCAACGAAAACAACGAGGCA
GGCGATGCTCAAGCTGCTAACGCCTGGGTCAGTCCGCTCATGTCGGCAGAAAGCGAAGGAGGCTTGTCG
GTCTACGACAAGGTGCTTGATCCACCGCCGGTTTGGATGAAGCTTAAAGAAGAAAAGGCTCCAGGATGG
GAAGCCGCTTCTCAAATTTGGATTCAGAGTGATGAGGGACAGTCGTTACTTAATAAGCCAGGTAGCCCT
CCCCGCTGGATTCGAAAACTGCGATCTGGGCAACCGTGGCAAGATGATTTCGTCAGTGACCAAAAGAAA
AAGCAAGATGAGCTGACCAAAGGGAACGCACCACTTATAAAACAACTCAAAGAAATGGGGTTGTTGCC
TCTTGTTAACCCATTTTTTAGACATCTTCTTGACCCTGAAGGTAAAGGCGTGAGTCCATGGGACCGTCTT
GCTGTACGCGCTGCAGTGGCTCACTTTATCTCCTGGGAAAGTTGGAATCATAGAACACGTGCAGAATAC
AATTCCTTGAAACTACGGCGAGACGAGTTTGAGGCAGCATCCGACGAATTCAAAGACGATTTTACTTTG
CTCCGACAATATGAAGCCAAACGCCATAGTACATTGAAAAGCATCGCGCTGGCCGACGATTCGAACCCT
TACCGGATTGGAGTACGTTCTCTGCGTGCCTGGAACCGCGTTCGTGAAGAATGGATAGACAAGGGTGCA
ACAGAAGAACAACGCGTGACCATATTGTCAAAGCTTCAAACACAACTTCGGGGAAAATTCGGCGATCCC
GATCTGTTCAACTGGCTAGCTCAGGATAGGCATGTCCATTTGTGGTCTCCTCGGGACAGCGTGACACCAT
TGGTTCGCATCAATGCGGTAGATAAAGTTCTGCGTCGACGAAAACCGTATGCATTGATGACCTTTGCCCA
TCCCCGCTTCCACCCTCGATGGATACTGTACGAGGCTCCAGGAGGAAGCAATCTCCGTCAATATGCATTG
GATTGTACAGAAAACGCTCTACACATCACGTTGCCTTTGCTTGTCGACGATGCGCACGGAACCTGGATTG
AAAAAAAGATCAGGGTGCCGCTGGCACCATCCGGACAAATTCAAGATTTAACTCTGGAAAAACTTGAGA
AGAAAAAAAATCGTTTATACTACCGTTCCGGTTTTCAGCAGTTTGCCGGCTTGGCTGGCGGAGCTGAGGT
TCTTTTCCACAGACCCTATATGGAACACGACGAACGCAGCGAGGAGTCTCTTTTGGAACGTCCGGGAGC
CGTTTGGTTCAAATTGACCCTGGATGTGGCAACACAGGCTCCCCCGAACTGGCTTGATGGTAAGGGCCG
TGTCCGTACACCGCCGGAGGTACATCATTTTAAAACCGCATTGTCGAATAAAAGCAAACATACACGTAC
GCTGCAGCCGGGTCTCCGTGTCTTGTCAGTAGACTTGGGCATGCGAACATTCGCCTCCTGCTCAGTATTT
GAACTCATCGAGGGAAAGCCTGAGACAGGCCGTGCCTTCCCTGTTGCCGATGAGAGATCAATGGACAGC
CCGAATAAACTGTGGGCCAAGCATGAACGTAGTTTTAAACTGACGCTCCCCGGCGAAACCCCTTCTCGA
AAGGAAGAGGAAGAGCGTAGCATAGCAAGAGCGGAAATTTATGCACTGAAACGCGACATACAACGCCT
CAAAAGCCTACTCCGCTTAGGTGAAGAAGATAACGATAACCGTCGTGATGCATTGCTTGAACAGTTCTTT
AAAGGATGGGGAGAAGAAGACGTTGTGCCTGGACAAGCGTTTCCACGCTCTCTTTTCCAAGGGTTGGGA
GCTGCCCCGTTTCGCTCAACTCCAGAGTTATGGCGTCAGCATTGCCAAACATATTATGACAAAGCGGAA
GCCTGTCTGGCTAAACATATCAGTGATTGGCGCAAGCGAACTCGTCCCCGTCCGACATCGCGGGAGATG
TGGTACAAAACACGTTCCTATCATGGCGGCAAGTCCATTTGGATGTTGGAATATCTTGATGCCGTTCGAA
AACTGCTTCTCAGTTGGAGCTTACGTGGTCGTACTTACGGTGCCATTAATCGCCAGGATACAGCCCGGTT
TGGTTCTTTGGCATCACGGCTGCTCCACCATATCAATTCCCTAAAGGAAGACCGCATCAAAACAGGAGC
CGACTCTATCGTTCAGGCTGCTCGCGGGTATATTCCTCTCCCTCATGGCAAGGGTTGGGAACAAAGATAT
GAGCCTTGTCAGCTCATATTATTTGAAGACCTCGCACGATATCGCTTTCGCGTGGATCGACCTCGTCGAG
AGAACAGCCAACTCATGCAGTGGAACCATCGAGCCATCGTGGCAGAAACAACGATGCAAGCCGAACTC
TACGGACAAATTGTCGAAAATACTGCAGCGGGGTTCAGCAGTCGTTTTCACGCGGCGACAGGTGCCCCC
GGTGTACGTTGTCGTTTTCTTCTAGAAAGAGACTTTGATAACGATTTGCCCAAACCGTACCTTCTCAGGG
AACTTTCTTGGATGCTCGGCAATACAAAAGTCGAGTCTGAAGAAGAAAAGCTTCGATTGCTGTCTGAAA
AAATCAGGCCAGGCAGTCTTGTTCCTTGGGATGGAGGCGAACAGTTCGCTACCCTGCATCCCAAAAGAC
AAACACTTTGCGTCATTCATGCCGATATGAATGCTGCCCAAAATTTACAACGCCGGTTTTTCGGTCGATG
CGGCGAGGCCTTTCGGCTTGTTTGTCAACCCCACGGTGACGACGTGTTACGACTCGCATCCACCCCAGGA
GCTCGTCTTCTTGGAGCCCTGCAGCAGCTTGAAAATGGACAAGGAGCTTTCGAGTTGGTTCGAGACATG
GGGTCAACAAGTCAAATGAACCGGTTCGTCATGAAGTCTTTGGGAAAAAAGAAAATAAAACCCCTTCAG
GACAACAATGGAGACGACGAGCTTGAAGACGTGTTGTCCGTACTCCCGGAGGAAGACGACACAGGACG
TATCACAGTCTTCCGCGATTCATCAGGAATCTTTTTTCCTTGCAACGTCTGGATACCGGCCAAACAGTTTT
GGCCAGCAGTACGCGCCATGATTTGGAAGGTCATGGCTTCCCATTCTTTGGGGTGA
SEQ ATGACAAAGTTAAGACACCGACAGAAAAAATTAACACACGACTGGGCTGGCTCCAAAAAGAGGGAAGT
ID ATTAGGCTCAAATGGCAAGCTTCAGAATCCGTTGTTAATGCCGGTTAAAAAAGGTCAGGTTACTGAGTT
NO: CCGGAAAGCGTTTTCTGCGTATGCTCGCGCAACGAAAGGAGAAATGACTGACGGCCGAAAGAATATGTT
36 TACGCATAGTTTCGAGCCATTTAAGACAAAGCCCTCGCTTCATCAGTGTGAATTGGCAGATAAAGCATAT
CAATCTTTACATTCGTATCTGCCTGGTTCTCTTGCTCATTTTCTATTATCTGCTCACGCATTAGGTTTTCGT
ATTTTTTCAAAATCTGGTGAAGCAACTGCATTCCAGGCATCCTCTAAAATTGAAGCTTACGAATCAAAAT
TGGCAAGCGAATTAGCTTGTGTAGATTTATCTATTCAAAACTTGACTATTTCAACGCTTTTTAATGCGCTT
ACAACGTCTGTAAGAGGGAAGGGCGAAGAAACTAGCGCTGACCCCTTAATTGCACGATTTTACACCTTA
CTTACTGGCAAGCCTCTGTCTCGAGACACTCAAGGGCCTGAACGTGATTTAGCAGAAGTTATCTCGCGTA
AGATAGCTAGTTCTTTTGGCACATGGAAAGAAATGACGGCAAACCCTCTTCAGTCATTACAATTTTTTGA
AGAGGAACTCCATGCGCTGGATGCCAATGTCTCGCTCTCACCCGCCTTCGACGTTTTAATTAAAATGAAT
GATTTGCAGGGCGATTTAAAAAATCGAACCATTGTTTTTGATCCTGACGCCCCTGTTTTTGAATATAACG
CAGAAGACCCTGCCGACATAATTATTAAACTTACAGCTCGTTACGCTAAAGAAGCTGTCATCAAAAATC
AAAACGTAGGAAATTACGTTAAAAACGCTATTACTACCACAAATGCCAATGGTCTTGGTTGGCTTTTGA
ACAAAGGTTTGTCGTTACTCCCTGTCTCGACCGATGACGAATTGCTAGAGTTTATTGGCGTTGAACGATC
TCATCCCTCATGCCATGCCTTAATTGAATTGATTGCACAATTAGAAGCCCCCGAGCTCTTTGAGAAGAAC
GTATTTTCAGATACTCGTTCTGAAGTTCAAGGTATGATTGATTCAGCTGTTTCTAATCATATTGCTCGTCT
TTCCAGCTCTAGAAATAGCTTGTCAATGGATAGTGAAGAATTAGAACGTTTAATCAAAAGCTTTCAGAT
ACACACACCTCATTGCTCACTTTTTATTGGCGCCCAATCACTTTCACAGCAGTTAGAATCTTTGCCTGAA
GCCCTTCAATCGGGCGTTAATTCAGCCGATATTTTACTAGGCTCTACTCAATATATGCTCACCAATTCTTT
GGTTGAAGAGTCAATTGCAACTTATCAAAGAACACTTAATCGCATCAATTACTTGTCAGGTGTTGCAGGT
CAGATTAACGGCGCAATAAAGCGAAAAGCGATAGATGGAGAAAAAATTCACTTGCCTGCAGCTTGGTC
AGAGTTGATATCTTTACCATTTATAGGCCAGCCTGTTATAGATGTTGAAAGCGATTTAGCTCATCTAAAA
AATCAATACCAAACACTTTCAAATGAGTTTGATACTCTTATATCTGCTTTGCAAAAGAATTTTGATTTGA
ACTTTAATAAAGCGCTCCTTAATCGTACTCAGCATTTTGAAGCCATGTGTAGAAGCACTAAGAAAAACG
CTTTATCCAAACCAGAGATCGTTTCCTATCGCGACCTGCTTGCTCGATTAACTTCTTGTTTGTATCGAGGC
TCTTTAGTTTTGCGTCGTGCCGGCATTGAAGTGTTAAAAAAACATAAAATATTTGAGTCAAACAGCGAAC
TTCGTGAACATGTTCATGAAAGAAAGCATTTCGTGTTTGTTAGTCCTCTAGATCGCAAAGCCAAGAAACT
CCTTCGATTAACTGATTCGCGTCCAGACTTGTTACATGTTATTGATGAAATATTGCAGCACGATAATCTT
GAAAACAAAGACCGCGAGTCACTTTGGCTAGTTCGCTCTGGTTATTTGCTTGCAGGACTTCCAGATCAAC
TTTCTTCATCTTTTATTAACTTGCCTATCATTACTCAAAAAGGAGATAGACGCCTTATAGACCTGATTCAG
TATGATCAAATTAATCGTGATGCTTTTGTTATGTTAGTGACCTCTGCATTCAAGTCTAATTTGTCTGGTCT
GCAGTATCGTGCCAATAAGCAATCGTTCGTTGTTACTCGCACGCTAAGCCCTTATCTCGGCTCAAAACTT
GTCTACGTACCCAAGGATAAAGATTGGTTAGTTCCTTCTCAAATGTTTGAAGGACGATTTGCTGACATTC
TTCAATCAGATTATATGGTCTGGAAAGATGCCGGTCGTCTTTGTGTTATTGATACTGCAAAACACCTTTC
TAATATAAAGAAGTCTGTATTTTCATCCGAAGAAGTTCTCGCTTTTTTAAGAGAACTCCCTCACCGCACA
TTTATCCAGACCGAAGTTCGCGGCCTTGGCGTTAATGTCGATGGAATTGCATTTAATAATGGTGATATTC
CGTCATTAAAAACCTTTTCAAATTGCGTTCAGGTAAAAGTTTCTCGGACTAATACATCCCTAGTTCAAAC
ACTTAATCGTTGGTTTGAAGGAGGAAAAGTTTCTCCTCCGAGCATTCAATTTGAACGGGCGTATTATAAA
AAAGACGATCAAATTCATGAAGACGCAGCGAAAAGAAAGATACGATTCCAGATGCCCGCAACTGAGTT
GGTTCATGCTTCTGACGATGCGGGGTGGACACCAAGTTATTTGCTCGGCATTGATCCTGGCGAGTATGGA
ATGGGTCTTTCATTGGTTTCGATTAATAACGGAGAAGTCTTAGATTCAGGCTTTATTCATATTAATTCTCT
GATCAATTTTGCCTCTAAAAAGAGCAACCATCAAACTAAGGTTGTTCCGCGTCAGCAGTACAAATCTCCT
TATGCAAATTATTTAGAACAATCTAAAGATTCTGCTGCTGGTGATATTGCGCATATACTCGATCGACTTA
TATACAAATTAAATGCGTTGCCTGTTTTTGAGGCTCTTTCAGGTAATTCTCAGAGTGCTGCTGATCAAGTT
TGGACGAAAGTCTTATCGTTTTACACTTGGGGTGATAATGACGCTCAGAATTCTATTAGAAAGCAGCATT
GGTTTGGAGCCAGTCATTGGGATATCAAAGGTATGTTAAGGCAACCCCCTACGGAGAAGAAGCCTAAAC
CGTATATTGCTTTTCCTGGCTCTCAGGTTTCTTCGTATGGTAATTCCCAACGTTGCTCTTGCTGCGGTCGC
AATCCTATTGAACAACTTCGAGAAATGGCAAAGGATACCTCTATTAAAGAGCTAAAAATTCGCAATTCT
GAGATACAGCTTTTTGACGGAACCATTAAATTATTTAATCCAGACCCATCCACTGTGATAGAGAGAAGG
CGACATAATCTTGGTCCATCAAGAATTCCTGTTGCTGACCGTACTTTCAAAAACATCAGTCCATCAAGTC
TAGAATTTAAAGAATTGATTACTATCGTGTCTCGATCTATCCGTCATTCACCTGAGTTTATCGCTAAAAA
ACGCGGCATAGGGTTCTGAGTATTTTTGCGCTTATTCCGATTGCAACTCATCCTTAAATTCTGAAGCTAAC
GCAGCTGCTAACGTAGCGCAAAAATTTCAAAAACAGTTATTTTTTGAGTTATAA
SEQ ATGAAGAGAATTCTGAACAGTCTGAAAGTTGCTGCCTTGAGACTTCTGTTTCGAGGCAAAGGTTCTGAAT
ID TAGTGAAGACAGTCAAATATCCATTGGTTTCCCCGGTTCAAGGCGCGGTTGAAGAACTTGCTGAAGCAA
NO: TTCGGCACGACAACCTGCACCTTTTTGGGCAGAAGGAAATAGTGGATCTTATGGAGAAAGACGAAGGAA
37 CCCAGGTGTATTCGGTTGTGGATTTTTGGTTGGATACCCTGCGTTTAGGGATGTTTTTCTCACCATCAGCG
AATGCGTTGAAAATCACGCTGGGAAAATTCAATTCTGATCAGGTTTCACCTTTTCGTAAGGTTTTGGAGC
AGTCACCTTTTTTTCTTGCGGGTCGCTTGAAGGTTGAACCTGCGGAAAGGATACTTTCTGTTGAAATCAG
AAAGATTGGTAAAAGAGAAAACAGAGTTGAGAACTATGCCGCCGATGTGGAGACATGCTTCATTGGTCA
GCTTTCTTCAGATGAGAAACAGAGTATCCAGAAGCTGGCAAATGATATCTGGGATAGCAAGGATCATGA
GGAACAGAGAATGTTGAAGGCGGATTTTTTTGCTATACCTCTTATAAAAGACCCCAAAGCTGTCACAGA
AGAAGATCCTGAAAATGAAACGGCGGGAAAACAGAAACCGCTTGAATTATGTGTTTGTCTTGTTCCTGA
GTTGTATACCCGAGGTTTCGGCTCCATTGCTGATTTTCTGGTTCAGCGACTTACCTTGCTGCGTGACAAA
ATGAGTACCGACACGGCGGAAGATTGCCTCGAGTATGTTGGCATTGAGGAAGAAAAAGGCAATGGAAT
GAATTCCTTGCTCGGCACTTTTTTGAAGAACCTGCAGGGTGATGGTTTTGAACAGATTTTTCAGTTTATGC
TTGGGTCTTATGTTGGCTGGCAGGGGAAGGAAGATGTACTGCGCGAACGATTGGATTTGCTGGCCGAAA
AAGTCAAAAGATTACCAAAGCCAAAATTTGCCGGAGAATGGAGTGGTCATCGTATGTTTCTCCATGGTC
AGCTGAAAAGCTGGTCGTCGAATTTCTTCCGTCTTTTTAATGAGACGCGGGAACTTCTGGAAAGTATCAA
GAGTGATATTCAACATGCCACCATGCTCATTAGCTATGTGGAAGAGAAAGGAGGCTATCATCCACAGCT
GTTGAGTCAGTATCGGAAGTTAATGGAACAATTACCGGCGTTGCGGACTAAGGTTTTGGATCCTGAGAT
TGAGATGACGCATATGTCCGAGGCTGTTCGAAGTTACATTATGATACACAAGTCTGTAGCGGGATTTCTG
CCGGATTTACTCGAGTCTTTGGATCGAGATAAGGATAGGGAATTTTTGCTTTCCATCTTTCCTCGTATTCC
AAAGATAGATAAGAAGACGAAAGAGATCGTTGCATGGGAGCTACCGGGCGAGCCAGAGGAAGGCTATT
TGTTCACAGCAAACAACCTTTTCCGGAATTTTCTTGAGAATCCGAAACATGTGCCACGATTTATGGCAGA
GAGGATTCCCGAGGATTGGACGCGTTTGCGCTCGGCCCCTGTGTGGTTTGATGGGATGGTGAAGCAATG
GCAGAAGGTGGTGAATCAGTTGGTTGAATCTCCAGGCGCCCTTTATCAGTTCAATGAAAGTTTTTTGCGT
CAAAGACTGCAAGCAATGCTTACGGTCTATAAGCGGGATCTCCAGACTGAGAAGTTTCTGAAGCTGCTG
GCTGATGTCTGTCGTCCACTCGTTGATTTTTTCGGACTTGGAGGAAATGATATTATCTTCAAGTCATGTCA
GGATCCAAGAAAGCAATGGCAGACTGTTATTCCACTCAGTGTCCCAGCGGATGTTTATACAGCATGTGA
AGGCTTGGCTATTCGTCTCCGCGAAACTCTTGGATTCGAATGGAAAAATCTGAAAGGACACGAGCGGGA
AGATTTTTTACGGCTGCATCAGTTGCTGGGAAATCTGCTGTTCTGGATCAGGGATGCGAAACTTGTCGTG
AAGCTGGAAGACTGGATGAACAATCCTTGTGTTCAGGAGTATGTGGAAGCACGAAAAGCCATTGATCTT
CCCTTGGAGATTTTCGGATTTGAGGTGCCGATTTTTCTCAATGGCTATCTCTTTTCGGAACTGCGCCAGCT
GGAATTGTTGCTGAGGCGTAAGTCGGTGATGACGTCTTACAGCGTCAAAACGACAGGCTCGCCAAATAG
GCTCTTCCAGTTGGTTTACCTACCTCTAAACCCTTCAGATCCGGAAAAGAAAAATTCCAACAACTTTCAG
GAGCGCCTCGATACACCTACCGGTTTGTCGCGTCGTTTTCTGGATCTTACGCTGGATGCATTTGCTGGCA
AACTCTTGACGGATCCGGTAACTCAGGAACTGAAGACGATGGCCGGTTTTTACGATCATCTCTTTGGCTT
CAAGTTGCCGTGTAAACTGGCGGCGATGAGTAACCATCCAGGATCCTCTTCCAAAATGGTGGTTCTGGC
AAAACCAAAGAAGGGTGTTGCTAGTAACATCGGCTTTGAACCTATTCCCGATCCTGCTCATCCTGTGTTC
CGGGTGAGAAGTTCCTGGCCGGAGTTGAAGTACCTGGAGGGGTTGTTGTATCTTCCCGAAGATACACCA
CTGACCATTGAACTGGCGGAAACGTCGGTCAGTTGTCAGTCTGTGAGTTCAGTCGCTTTCGATTTGAAGA
ATCTGACGACTATCTTGGGTCGTGTTGGTGAATTCAGGGTGACGGCAGATCAACCTTTCAAGCTGACGCC
CATTATTCCTGAGAAAGAGGAATCCTTCATCGGGAAGACCTACCTCGGTCTTGATGCTGGAGAGCGATC
TGGCGTTGGTTTCGCGATTGTGACGGTTGACGGCGATGGGTATGAGGTGCAGAGGTTGGGTGTGCATGA
AGATACTCAGCTTATGGCGCTTCAGCAAGTCGCCAGCAAGTCTCTTAAGGAGCCGGTTTTCCAGCCACTC
CGTAAGGGCACATTTCGTCAGCAGGAGCGCATTCGCAAAAGCCTCCGCGGTTGCTACTGGAATTTCTATC
ATGCATTGATGATCAAGTACCGAGCTAAAGTTGTGCATGAGGAATCGGTGGGTTCATCCGGTCTGGTGG
GGCAGTGGCTGCGTGCATTTCAGAAGGATCTCAAAAAGGCTGATGTTCTGCCCAAGAAGGGTGGAAAAA
ATGGTGTAGACAAAAAAAAGAGAGAAAGCAGCGCTCAGGATACCTTATGGGGAGGAGCTTTCTCGAAG
AAGGAAGAGCAGCAGATAGCCTTTGAGGTTCAGGCAGCTGGATCAAGCCAGTTTTGTCTGAAGTGTGGT
TGGTGGTTTCAGTTGGGGATGCGGGAAGTAAATCGTGTGCAGGAGAGTGGCGTGGTGCTGGACTGGAAC
CGGTCCATTGTAACCTTCCTCATCGAATCCTCAGGAGAAAAGGTATATGGTTTCAGTCCTCAGCAACTGG
AAAAAGGCTTTCGTCCTGACATCGAAACGTTCAAAAAAATGGTAAGGGATTTTATGAGACCCCCCATGT
TTGATCGCAAAGGTCGGCCGGCCGCGGCGTATGAAAGATTCGTACTGGGACGTCGTCACCGTCGTTATC
GCTTTGATAAAGTTTTTGAAGAGAGATTTGGTCGCAGTGCTCTTTTCATCTGCCCGCGGGTCGGGTGTGG
GAATTTCGATCACTCCAGTGAGCAGTCAGCCGTTGTCCTTGCCCTTATTGGTTACATTGCTGATAAGGAA
GGGATGAGTGGTAAGAAGCTTGTTTATGTGAGGCTGGCTGAACTTATGGCTGAGTGGAAGCTGAAGAAA
CTGGAGAGATCAAGGGTGGAAGAACAGAGCTCGGCACAATAA
SEQ ATGGCAGAAAGCAAGCAGATGCAATGCCGCAAGTGCGGCGCAAGCATGAAGTATGAAGTAATTGGATT
ID GGGCAAGAAGTCATGCAGATATATGTGCCCAGATTGCGGCAATCACACCAGCGCGCGCAAGATTCAGA
NO: ACAAGAAAAAGCGCGACAAAAAGTATGGATCCGCAAGCAAAGCGCAGAGCCAGAGGATAGCTGTGGCT
38 GGCGCGCTTTATCCAGACAAAAAAGTGCAGACCATAAAGACCTACAAATACCCAGCGGATCTTAATGGC
GAAGTTCATGACAGCGGCGTCGCAGAGAAGATTGCGCAGGCGATTCAGGAAGATGAGATCGGCCTGCTT
GGCCCGTCCAGCGAATACGCTTGCTGGATTGCTTCACAAAAACAGAGCGAGCCGTATTCAGTTGTAGAT
TTTTGGTTTGACGCGGTGTGCGCAGGCGGAGTATTCGCGTATTCTGGCGCGCGCCTGCTTTCCACAGTCC
TCCAGTTGAGTGGCGAGGAAAGCGTTTTGCGCGCTGCTTTAGCATCTAGCCCGTTTGTAGATGACATTAA
TTTGGCGCAAGCGGAAAAGTTCCTAGCCGTTAGCCGGCGCACAGGCCAAGATAAGCTAGGCAAGCGCAT
TGGAGAATGTTTTGCGGAAGGCCGGCTTGAAGCGCTTGGCATCAAAGATCGCATGCGCGAATTCGTGCA
AGCGATTGATGTGGCCCAAACCGCGGGCCAGCGGTTCGCGGCCAAGCTAAAGATATTCGGCATCAGTCA
GATGCCTGAAGCCAAGCAATGGAACAATGATTCCGGGCTCACTGTATGTATTTTGCCGGATTATTATGTC
CCGGAAGAAAACCGCGCGGACCAGCTGGTTGTTTTGCTTCGGCGCTTACGCGAGATCGCGTATTGCATG
GGAATTGAGGATGAAGCAGGATTTGAGCATCTAGGCATTGACCCTGGTGCTCTTTCCAATTTTTCCAATG
GCAATCCAAAGCGAGGATTTCTCGGCCGCCTGCTCAATAATGACATTATAGCGCTGGCAAACAACATGT
CAGCCATGACGCCGTATTGGGAAGGCAGAAAAGGCGAGTTGATTGAGCGCCTTGCATGGCTTAAACATC
GCGCTGAAGGATTGTATTTGAAAGAGCCACATTTCGGCAACTCCTGGGCAGACCACCGCAGCAGGATTT
TCAGTCGCATTGCGGGCTGGCTTTCCGGATGCGCGGGCAAGCTCAAGATTGCCAAGGATCAGATTTCAG
GCGTGCGTACGGATTTGTTTCTGCTCAAGCGCCTTCTGGATGCGGTACCGCAAAGCGCGCCGTCGCCGGA
CTTTATTGCTTCCATCAGCGCGCTGGATCGGTTTTTGGAAGCGGCAGAAAGCAGCCAGGATCCGGCAGA
ACAGGTACGCGCTTTGTACGCGTTTCATCTGAACGCGCCTGCGGTCCGATCCATCGCCAACAAGGCGGT
ACAGAGGTCTGATTCCCAGGAGTGGCTTATCAAGGAACTGGATGCTGTAGATCACCTTGAATTCAACAA
AGCATTTCCGTTTTTTTCGGATACAGGAAAGAAAAAGAAGAAAGGAGCGAATAGCAACGGAGCGCCTTC
TGAAGAAGAATACACGGAAACAGAATCCATTCAACAACCAGAAGATGCAGAGCAGGAAGTGAATGGTC
AAGAAGGAAATGGCGCTTCAAAGAACCAGAAAAAGTTTCAGCGCATTCCTCGATTTTTCGGGGAAGGGT
CAAGGAGTGAGTATCGAATTTTAACAGAAGCGCCGCAATATTTTGACATGTTCTGCAATAATATGCGCG
CGATCTTTATGCAGCTAGAGAGTCAGCCGCGCAAGGCGCCTCGTGATTTCAAATGCTTTCTGCAGAATCG
TTTGCAGAAGCTTTACAAGCAAACCTTTCTCAATGCTCGCAGTAATAAATGCCGCGCGCTTCTGGAATCC
GTCCTTATTTCATGGGGAGAATTTTATACTTATGGCGCGAATGAAAAGAAGTTTCGTCTGCGCCATGAAG
CGAGCGAGCGCAGCTCGGATCCGGACTATGTGGTTCAGCAGGCATTGGAAATCGCGCGCCGGCTTTTCT
TGTTCGGATTTGAGTGGCGCGATTGCTCTGCTGGAGAGCGCGTGGATTTGGTTGAAATCCACAAAAAAG
CAATCTCATTTTTGCTTGCAATCACTCAGGCCGAGGTTTCAGTTGGTTCCTATAACTGGCTTGGGAATAG
CACCGTGAGCCGGTATCTTTCGGTTGCTGGCACAGACACATTGTACGGCACTCAACTGGAGGAGTTTTTG
AACGCCACAGTGCTTTCACAGATGCGTGGGCTGGCGATTCGGCTTTCATCTCAGGAGTTAAAAGACGGA
TTTGATGTTCAGTTGGAGAGTTCGTGCCAGGACAATCTCCAGCATCTGCTGGTGTATCGCGCTTCGCGCG
ACTTGGCTGCGTGCAAACGCGCTACATGCCCGGCTGAATTGGATCCGAAAATTCTTGTTCTGCCGGTTGG
TGCGTTTATCGCGAGCGTAATGAAAATGATTGAGCGTGGCGATGAACCATTAGCAGGCGCGTATTTGCG
TCATCGGCCGCATTCATTCGGCTGGCAGATACGGGTTCGTGGAGTGGCGGAAGTAGGCATGGATCAGGG
CACAGCGCTAGCATTCCAGAAGCCGACTGAATCAGAGCCGTTTAAAATAAAGCCGTTTTCCGCTCAATA
CGGCCCAGTACTTTGGCTTAATTCTTCATCCTATAGCCAGAGCCAGTATCTGGATGGATTTTTAAGCCAG
CCAAAGAATTGGTCTATGCGGGTGCTACCTCAAGCCGGATCAGTGCGCGTGGAACAGCGCGTTGCTCTG
ATATGGAATTTGCAGGCAGGCAAGATGCGGCTGGAGCGCTCTGGAGCGCGCGCGTTTTTCATGCCAGTG
CCATTCAGCTTCAGGCCGTCTGGTTCAGGAGATGAAGCAGTATTGGCGCCGAATCGGTACTTGGGACTTT
TTCCGCATTCCGGAGGAATAGAATACGCGGTGGTGGATGTATTAGATTCCGCGGGTTTCAAAATTCTTGA
GCGCGGTACGATTGCGGTAAATGGCTTTTCCCAGAAGCGCGGCGAACGCCAAGAGGAGGCACACAGAG
AAAAACAGAGACGCGGAATTTCTGATATAGGCCGCAAGAAGCCGGTGCAAGCTGAAGTTGACGCAGCC
AATGAATTGCACCGCAAATACACCGATGTTGCCACTCGTTTAGGGTGCAGAATTGTGGTTCAGTGGGCG
CCCCAGCCAAAGCCGGGCACAGCGCCGACCGCGCAAACAGTATACGCGCGCGCAGTGCGGACCGAAGC
GCCGCGATCTGGAAATCAAGAGGATCATGCTCGTATGAAATCCTCTTGGGGATATACCTGGGGCACCTA
TTGGGAGAAGCGCAAACCAGAGGATATTTTGGGCATCTCAACCCAAGTATACTGGACCGGCGGTATAGG
CGAGTCATGTCCCGCAGTCGCGGTTGCGCTTTTGGGGCACATTAGGGCAACATCCACTCAAACTGAATG
GGAAAAAGAGGAGGTTGTATTCGGTCGACTGAAGAAGTTCTTTCCAAGCTAG
SEQ ATGGAAAAGAGAATAAACAAGATACGAAAGAAACTATCGGCCGATAATGCCACAAAGCCTGTGAGCAG
ID GAGCGGCCCCATGAAAACACTCCTTGTCCGGGTCATGACGGACGACTTGAAAAAAAGACTGGAGAAGC
NO: GTCGGAAAAAGCCGGAAGTTATGCCGCAGGTTATTTCAAATAACGCAGCAAACAATCTTAGAATGCTCC
39 TTGATGACTATACAAAGATGAAGGAGGCGATACTACAAGTTTACTGGCAGGAATTTAAGGACGACCATG
TGGGCTTGATGTGCAAATTTGCCCAGCCTGCTTCCAAAAAAATTGACCAGAACAAACTAAAACCGGAAA
TGGATGAAAAAGGAAATCTAACAACTGCCGGTTTTGCATGTTCTCAATGCGGTCAGCCGCTATTTGTTTA
TAAGCTTGAACAGGTGAGTGAAAAAGGCAAGGCTTATACAAATTACTTCGGCCGGTGTAATGTGGCCGA
GCATGAGAAATTGATTCTTCTTGCTCAATTAAAACCTGAAAAAGACAGTGACGAAGCAGTGACATACTC
CCTTGGCAAATTCGGCCAGAGGGCATTGGACTTTTATTCAATCCACGTAACAAAAGAATCCACCCATCC
AGTAAAGCCCCTGGCACAGATTGCGGGCAACCGCTATGCAAGCGGACCTGTTGGCAAGGCCCTTTCCGA
TGCCTGTATGGGCACTATAGCCAGTTTTCTTTCGAAATATCAAGACATCATCATAGAACATCAAAAGGTT
GTGAAGGGTAATCAAAAGAGGTTAGAGAGTCTCAGGGAATTGGCAGGGAAAGAAAATCTTGAGTACCC
ATCGGTTACACTGCCGCCGCAGCCGCATACGAAAGAAGGGGTTGACGCTTATAACGAAGTTATTGCAAG
GGTACGTATGTGGGTTAATCTTAATCTGTGGCAAAAGCTGAAGCTCAGCCGTGATGACGCAAAACCGCT
ACTGCGGCTAAAAGGATTCCCATCTTTCCCTGTTGTGGAGCGGCGTGAAAACGAAGTTGACTGGTGGAA
TACGATTAATGAAGTAAAAAAACTGATTGACGCTAAACGAGATATGGGACGGGTATTCTGGAGCGGCGT
TACCGCAGAAAAGAGAAATACCATCCTTGAAGGATACAACTATCTGCCAAATGAGAATGACCATAAAA
AGAGAGAGGGCAGTTTGGAAAACCCTAAGAAGCCTGCCAAACGCCAGTTTGGAGACCTCTTGCTGTATC
TTGAAAAGAAATATGCCGGAGACTGGGGAAAGGTCTTCGATGAGGCATGGGAGAGGATAGATAAGAAA
ATAGCCGGACTCACAAGCCATATAGAGCGCGAAGAAGCAAGAAACGCGGAAGACGCTCAATCCAAAGC
CGTACTTACAGACTGGCTAAGGGCAAAGGCATCATTTGTTCTTGAAAGACTGAAGGAAATGGATGAAAA
GGAATTCTATGCGTGTGAAATCCAACTTCAAAAATGGTATGGCGATCTTCGAGGCAACCCGTTTGCCGTT
GAAGCTGAGAATAGAGTTGTTGATATAAGCGGGTTTTCTATCGGAAGCGATGGCCATTCAATCCAATAC
AGAAATCTCCTTGCCTGGAAATATCTGGAGAACGGCAAGCGTGAATTCTATCTGTTAATGAATTATGGC
AAGAAAGGGCGCATCAGATTTACAGATGGAACAGATATTAAAAAGAGCGGCAAATGGCAGGGACTATT
ATATGGCGGTGGCAAGGCAAAGGTTATTGATCTGACTTTCGACCCCGATGATGAACAGTTGATAATCCT
GCCGCTGGCCTTTGGCACAAGGCAAGGCCGCGAGTTTATCTGGAACGATTTGCTGAGTCTTGAAACAGG
CCTGATAAAGCTCGCAAACGGAAGAGTTATCGAAAAAACAATCTATAACAAAAAAATAGGGCGGGATG
AACCGGCTCTATTCGTTGCCTTAACATTTGAGCGCCGGGAAGTTGTTGATCCATCAAATATAAAGCCTGT
AAACCTTATAGGCGTTGACCGCGGCGAAAACATCCCGGCGGTTATTGCATTGACAGACCCTGAAGGTTG
TCCTTTACCGGAATTCAAGGATTCATCAGGGGGCCCAACAGACATCCTGCGAATAGGAGAAGGATATAA
GGAAAAGCAGAGGGCTATTCAGGCAGCAAAGGAGGTAGAGCAAAGGCGGGCTGGCGGTTATTCACGGA
AGTTTGCATCCAAGTCGAGGAACCTGGCGGACGACATGGTGAGAAATTCAGCGCGAGACCTTTTTTACC
ATGCCGTTACCCACGATGCCGTCCTTGTCTTTGAAAACCTGAGCAGGGGTTTTGGAAGGCAGGGCAAAA
GGACCTTCATGACGGAAAGACAATATACAAAGATGGAAGACTGGCTGACAGCGAAGCTCGCATACGAA
GGTCTTACGTCAAAAACCTACCTTTCAAAGACGCTGGCGCAATATACGTCAAAAACATGCTCCAACTGC
GGGTTTACTATAACGACTGCCGATTATGACGGGATGTTGGTAAGGCTTAAAAAGACTTCTGATGGATGG
GCAACTACCCTCAACAACAAAGAATTAAAAGCCGAAGGCCAGATAACGTATTATAACCGGTATAAAAG
GCAAACCGTGGAAAAAGAACTCTCCGCAGAGCTTGACAGGCTTTCAGAAGAGTCGGGCAATAATGATAT
TTCTAAGTGGACCAAGGGTCGCCGGGACGAGGCATTATTTTTGTTAAAGAAAAGATTCAGCCATCGGCC
TGTTCAGGAACAGTTTGTTTGCCTCGATTGCGGCCATGAAGTCCACGCCGATGAACAGGCAGCCTTGAAT
ATTGCAAGGTCATGGCTTTTTCTAAACTCAAATTCAACAGAATTCAAAAGTTATAAATCGGGTAAACAG
CCCTTCGTTGGTGCTTGGCAGGCCTTTTACAAAAGGAGGCTTAAAGAGGTATGGAAGCCCAACGCC
SEQ ATGAAAAGGATAAATAAAATACGAAGGAGATTGGTAAAGGATAGCAACACGAAAAAAGCCGGCAAAA
ID CCGGCCCTATGAAAACCTTGCTCGTTCGGGTTATGACACCTGACCTGAGAGAAAGGTTAGAGAATCTTC
NO: GCAAAAAGCCGGAAAACATTCCTCAGCCCATTTCAAATACTTCACGTGCAAATTTAAATAAACTCCTCA
40 CTGACTATACGGAAATGAAGAAAGCAATCCTGCATGTTTATTGGGAAGAGTTCCAAAAAGACCCTGTCG
GATTGATGAGCAGGGTTGCACAACCAGCGCCCAAGAATATTGATCAGAGAAAATTGATTCCGGTGAAGG
ACGGAAATGAGAGACTAACAAGTTCTGGATTTGCCTGTTCTCAGTGCTGTCAACCCCTCTATGTTTATAA
GCTTGAACAAGTGAATGACAAGGGTAAGCCCCATACAAATTACTTTGGCCGTTGTAATGTCTCCGAGCA
TGAACGTTTGATATTGCTCTCGCCGCATAAACCGGAGGCAAATGACGAGCTAGTAACGTATTCGTTGGG
GAAGTTCGGTCAAAGGGCATTGGACTTTTATTCAATCCACGTAACAAGAGAATCGAACCATCCTGTAAA
GCCGCTAGAACAGATCGGTGGCAATAGCTGCGCAAGTGGTCCCGTTGGTAAGGCTTTATCTGATGCCTG
TATGGGAGCAGTAGCCAGTTTCCTTACAAAGTACCAGGACATCATCCTCGAACACCAAAAGGTTATAAA
AAAAAACGAAAAGAGATTGGCAAATCTAAAGGATATAGCAAGTGCAAACGGGCTTGCATTTCCTAAAA
TCACTCTTCCACCGCAACCGCATACAAAAGAAGGGATTGAAGCTTATAACAATGTTGTTGCTCAGATAG
TGATCTGGGTAAACCTGAATCTTTGGCAGAAACTCAAAATTGGCAGGGATGAGGCAAAGCCCTTACAGC
GGCTTAAGGGTTTTCCGTCCTTCCCTCTTGTTGAACGCCAGGCGAATGAGGTTGATTGGTGGGATATGGT
CTGTAATGTCAAAAAGTTGATTAACGAAAAGAAAGAGGACGGGAAGGTCTTCTGGCAAAATCTTGCTGG
ATATAAAAGGCAGGAAGCCTTGCTTCCATATCTTTCGTCTGAAGAAGACCGTAAAAAAGGAAAAAAGTT
TGCGCGTTATCAGTTTGGTGACCTTTTGCTTCACCTTGAAAAGAAACACGGTGAAGATTGGGGCAAAGTT
TATGATGAGGCATGGGAAAGAATAGATAAAAAAGTTGAAGGTCTGAGTAAGCACATAAAGTTGGAGGA
AGAAAGAAGGTCTGAAGATGCTCAATCAAAGGCTGCCCTCACTGATTGGCTCAGGGCAAAGGCCTCTTT
TGTTATTGAAGGGCTCAAAGAAGCTGATAAGGATGAGTTTTGCAGGTGTGAGTTAAAGCTTCAAAAGTG
GTATGGAGATTTGAGAGGAAAACCATTTGCTATAGAAGCAGAGAACAGCATTTTAGATATAAGCGGATT
TTCTAAACAGTATAATTGTGCATTTATATGGCAGAAAGACGGCGTAAAGAAGTTAAATCTTTATTTAATA
ATAAATTACTTCAAAGGTGGTAAGCTACGCTTCAAAAAAATCAAGCCAGAAGCTTTTGAAGCAAATAGG
TTTTATACAGTAATTAATAAAAAAAGCGGTGAGATTGTGCCTATGGAGGTCAACTTCAATTTTGATGACC
CGAATTTGATAATTCTGCCTTTGGCCTTTGGAAAAAGGCAGGGGAGGGAGTTTATCTGGAACGACCTATT
GAGCCTTGAGACGGGTTCATTGAAACTCGCCAATGGCAGGGTTATTGAAAAAACGCTCTATAACAGAAG
GACGAGACAGGATGAACCAGCACTTTTTGTTGCCCTGACATTTGAAAGAAGAGAGGTGCTTGACTCATC
GAATATAAAACCGATGAATCTGATAGGAATAGACCGGGGAGAAAATATCCCGGCAGTCATAGCATTAA
CAGACCCGGAAGGATGCCCCTTGTCAAGATTCAAAGATTCATTGGGCAATCCAACGCATATTTTGCGAA
TAGGAGAAAGTTATAAGGAAAAACAACGGACTATTCAGGCTGCTAAAGAAGTTGAACAAAGGCGGGCA
GGCGGATATTCGAGAAAATATGCATCAAAGGCGAAGAATCTGGCGGACGATATGGTAAGAAATACAGC
TCGTGACCTCTTATATTATGCTGTTACTCAAGATGCAATGCTCATTTTTGAAAATCTTTCCCGCGGTTTTG
GTAGACAAGGCAAGAGGACTTTTATGGCGGAAAGGCAGTACACGAGGATGGAAGACTGGCTGACTGCA
AAGCTTGCCTATGAAGGTCTGCCATCAAAAACCTATCTTTCAAAGACTCTGGCACAGTATACCTCAAAG
ACATGTTCTAATTGTGGTTTTACAATCACAAGTGCAGATTATGACAGGGTGCTCGAAAAGCTCAAGAAG
ACGGCTACTGGATGGATGACTACAATCAATGGAAAAGAGTTAAAAGTTGAAGGACAGATAACATACTA
TAACCGGTATAAAAGGCAGAATGTGGTAAAAGACCTCTCTGTAGAGCTGGATAGACTTTCGGAAGAGTC
GGTAAATAATGATATTTCTAGTTGGACAAAAGGCCGCAGTGGTGAAGCTTTATCTCTGCTAAAAAAGAG
ATTTAGTCACAGGCCGGTGCAGGAAAAGTTTGTTTGCCTGAACTGTGGTTTTGAAACCCATGCAGACGA
ACAAGCAGCACTGAATATTGCAAGGTCGTGGCTCTTTCTCCGTTCTCAAGAATATAAGAAGTATCAAAC
CAATAAAACGACCGGAAATACTGACAAAAGGGCATTTGTTGAAACATGGCAATCCTTTTACAGAAAGAA
GCTCAAAGAAGTATGGAAACCA
SEQ ATGGGTAAAATGTATTACCTTGGTTTAGACATTGGCACGAATTCCGTGGGCTACGCGGTGACCGACCCCT
ID CATACCACCTGCTGAAGTTTAAGGGGGAACCAATGTGGGGTGCGCACGTATTTGCCGCCGGTAATCAGA
NO: GCGCGGAACGACGCTCGTTCCGCACATCGCGTCGTCGTTTGGACCGACGCCAACAGCGCGTTAAACTGG
41 TACAGGAGATTTTTGCCCCGGTGATTAGTCCGATCGACCCACGCTTCTTCATTCGTCTGCATGAATCCGC
CCTGTGGCGCGATGACGTCGCGGAGACGGATAAACATATCTTTTTCAATGATCCTACCTATACCGATAAG
GAATATTATAGCGATTACCCGACTATCCATCACCTGATCGTTGATCTGATGGAAAGCTCTGAGAAACAC
GATCCGCGGCTGGTGTACCTTGCAGTGGCGTGGTTAGTGGCACACCGTGGTCATTTTCTGAACGAGGTGG
ACAAGGATAATATTGGAGATGTGTTGTCGTTCGACGCATTTTATCCGGAGTTTCTCGCGTTCCTGTCGGA
CAACGGTGTATCACCGTGGGTGTGCGAAAGCAAAGCGCTGCAGGCGACCTTGCTGAGCCGTAACTCAGT
GAACGACAAATATAAAGCCCTTAAGTCTCTGATCTTCGGATCCCAGAAACCTGAAGATAACTTCGATGC
CAATATTTCGGAAGATGGACTCATTCAACTGCTGGCCGGCAAAAAGGTAAAAGTTAACAAACTGTTCCC
TCAGGAATCGAACGATGCATCCTTCACATTGAATGATAAAGAAGACGCGATAGAAGAAATCCTGGGTAC
GCTTACACCAGATGAATGTGAATGGATTGCGCATATACGCCGCCTTTTTGACTGGGCTATCATGAAACAT
GCTCTGAAAGATGGCAGGACTATTAGCGAGTCAAAAGTCAAACTGTATGAGCAGCACCATCACGATCTG
ACCCAACTTAAATACTTCGTGAAAACCTACCTTGCAAAAGAATACGACGATATTTTCCGCAACGTGGAT
AGCGAAACAACGAAAAACTATGTAGCGTATTCCTATCATGTGAAAGAGGTGAAAGGCACTCTGCCTAAA
AATAAGGCAACGCAAGAAGAGTTTTGTAAGTATGTCCTGGGCAAGGTTAAAAACATTGAATGCTCTGAA
GCAGACAAGGTTGACTTTGATGAGATGATTCAGCGTCTTACCGACAACTCTTTTATGCCTAAGCAGGTTT
CGGGCGAAAACCGCGTTATTCCTTATCAGTTATATTATTATGAACTGAAGACAATTCTGAATAAAGCAGC
CTCGTACCTGCCTTTCCTGACGCAGTGTGGAAAAGATGCAATTTCGAACCAGGACAAACTACTGTCGATC
ATGACGTTCCGTATTCCTTACTTCGTCGGACCCTTGCGAAAAGATAATTCGGAACATGCATGGCTCGAAC
GAAAGGCCGGTAAGATTTATCCGTGGAACTTTAACGACAAAGTGGACTTGGATAAATCAGAAGAAGCGT
TCATTCGCCGAATGACCAATACCTGTACCTATTATCCCGGCGAAGATGTTTTACCGTTGGATTCGCTGAT
CTATGAGAAATTTATGATTTTAAATGAAATCAATAATATTCGTATTGACGGCTACCCGATTAGTGTTGAC
GTTAAACAGCAGGTTTTTGGCTTGTTCGAAAAAAAACGACGCGTAACCGTGAAAGATATTCAGAACCTG
CTGCTGTCTCTCGGAGCTCTGGACAAACACGGGAAGCTGACAGGCATCGATACCACTATCCACTCAAAC
TATAATACGTATCACCATTTTAAATCTCTCATGGAACGCGGCGTCCTGACCCGGGATGACGTGGAACGC
ATCGTTGAAAGGATGACCTACAGCGACGATACTAAGCGTGTGCGTCTGTGGCTGAATAACAACTATGGT
ACTTTAACCGCCGACGATGTGAAACACATTTCGCGTCTGCGCAAACACGATTTTGGCCGTTTATCCAAAA
TGTTCTTAACAGGTCTGAAGGGTGTCCATAAGGAGACCGGTGAACGTGCCTCCATACTGGATTTCATGTG
GAACACGAACGATAACCTGATGCAGCTCCTTTCCGAATGCTACACGTTCAGTGATGAAATCACAAAGCT
GCAAGAGGCGTATTATGCAAAAGCCCAGTTGTCTTTAAACGATTTTTTAGACTCGATGTACATCTCTAAC
GCGGTGAAACGTCCGATTTACAGAACTCTGGCAGTGGTGAACGATATTCGAAAAGCATGTGGGACGGCC
CCTAAACGCATTTTCATCGAAATGGCTCGTGATGGTGAATCAAAAAAAAAGAGAAGTGTTACACGTCGC
GAGCAGATCAAAAACCTGTACCGCTCGATTCGTAAAGATTTCCAGCAGGAAGTTGATTTTCTGGAAAAG
ATCCTGGAAAATAAATCTGATGGTCAACTTCAGTCAGATGCTTTGTATCTTTACTTTGCACAATTAGGGC
GCGATATGTACACGGGCGATCCAATAAAGCTGGAGCACATCAAAGATCAGAGTTTCTATAACATAGACC
ATATTTACCCGCAGTCTATGGTGAAAGACGATTCCCTAGATAACAAAGTGCTGGTGCAAAGCGAAATTA
ACGGCGAGAAAAGCTCGCGATACCCTTTGGACGCCGCGATCCGCAATAAAATGAAGCCCCTTTGGGACG
CTTACTATAATCATGGCCTGATCTCCTTAAAGAAATACCAGCGTCTAACGCGCTCGACCCCGTTTACCGA
TGATGAAAAATGGGACTTTATTAATCGCCAGTTAGTGGAAACCCGTCAATCTACCAAAGCGCTGGCCAT
TTTGTTGAAGCGTAAGTTTCCAGACACCGAAATTGTGTATTCGAAGGCGGGGTTATCGTCCGACTTCAGA
CATGAATTCGGCCTTGTAAAAAGTCGCAATATTAATGATTTGCACCACGCTAAAGACGCATTCTTGGCTA
TCGTTACCGGCAATGTGTACCATGAAAGATTCAATCGCAGATGGTTTATGGTGAACCAGCCGTACTCAGT
TAAAACTAAAACTCTTTTTACCCACAGCATAAAGAATGGCAACTTCGTTGCCTGGAACGGCGAAGAAGA
TCTCGGTCGTATTGTAAAAATGCTGAAGCAAAACAAAAATACCATTCACTTCACGCGCTTCTCCTTCGAT
CGCAAAGAAGGATTATTTGATATCCAACCTCTGAAAGCCAGCACCGGCTTAGTCCCACGAAAAGCCGGT
CTGGATGTCGTTAAATACGGCGGATATGACAAATCTACCGCGGCCTATTACCTGCTGGTGAGGTTCACGC
TCGAGGACAAGAAAACCCAGCACAAGCTGATGATGATTCCTGTAGAAGGCCTGTACAAGGCTCGCATTG
ATCATGACAAGGAATTTCTTACCGATTATGCGCAAACGACTATAAGCGAAATCCTACAGAAAGATAAAC
AGAAAGTGATCAATATTATGTTTCCAATGGGTACGAGGCATATAAAACTCAATTCAATGATTAGTATCG
ATGGCTTCTATCTTAGTATCGGCGGAAAGTCCTCTAAAGGTAAGTCAGTTCTATGTCACGCAATGGTTCC
ACTGATCGTCCCTCACAAAATCGAATGTTACATTAAAGCAATGGAAAGCTTCGCCCGGAAGTTTAAAGA
AAACAACAAGCTGCGCATCGTAGAAAAATTCGATAAAATCACCGTTGAAGACAACCTGAATCTCTACGA
GCTCTTTCTCCAAAAACTGCAGCATAATCCCTATAATAAGTTTTTTTCGACACAGTTTGACGTACTGACG
AACGGCCGTTCTACTTTCACAAAACTGTCGCCGGAGGAACAGGTACAGACGCTCTTGAACATTTTAAGT
ATCTTTAAAACATGCCGCAGTTCGGGTTGCGACCTGAAATCCATCAACGGCAGTGCCCAGGCAGCGCGC
ATCATGATTAGCGCTGACTTAACTGGACTGTCGAAAAAATATTCAGATATTAGGTTGGTTGAACAGTCA
GCTTCTGGTTTGTTCGTATCCAAAAGTCAGAACTTACTGGAGTATCTCTAA
SEQ ATGTCATCGCTCACGAAATTCACTAACAAATACTCTAAACAGCTCACCATTAAGAATGAACTCATCCCA
ID GTTGGCAAAACACTGGAGAACATCAAAGAGAATGGTCTGATAGATGGCGACGAACAGCTGAATGAGAA
NO: TTATCAGAAGGCGAAAATTATTGTGGATGATTTTCTGCGGGACTTCATTAATAAAGCACTGAATAATACG
42 CAGATCGGGAACTGGCGCGAACTGGCGGATGCCCTTAATAAAGAGGATGAAGATAACATCGAGAAATT
GCAGGATAAAATTCGGGGAATCATTGTATCCAAATTTGAAACGTTTGATCTGTTTAGCAGCTATTCTATT
AAGAAAGATGAAAAGATTATTGACGACGACAATGATGTTGAAGAAGAGGAACTGGATCTGGGCAAGAA
GACCAGCTCATTTAAATACATATTTAAAAAAAACCTGTTTAAGTTAGTGTTGCCATCCTACCTGAAAACC
ACAAACCAGGACAAGCTGAAGATTATTAGCTCGTTTGATAATTTTTCAACGTACTTCCGCGGGTTCTTTG
AAAACCGGAAAAACATTTTTACCAAGAAACCGATCTCCACAAGTATTGCGTATCGCATTGTTCATGATA
ACTTCCCGAAATTCCTTGATAACATTCGTTGTTTTAATGTGTGGCAGACGGAATGCCCGCAACTAATCGT
GAAAGCAGATAACTATCTGAAAAGCAAAAATGTTATAGCGAAAGATAAAAGTTTGGCAAACTATTTTAC
CGTGGGCGCGTATGACTATTTCCTGTCTCAGAATGGTATAGATTTTTACAACAATATTATAGGTGGACTG
CCAGCGTTCGCCGGCCATGAGAAAATCCAAGGTCTCAATGAATTCATCAATCAAGAGTGCCAAAAAGAC
AGCGAGCTGAAAAGTAAGCTGAAAAACCGTCACGCGTTCAAAATGGCGGTACTGTTCAAACAGATACTC
AGCGATCGTGAAAAAAGTTTTGTAATTGATGAGTTCGAGTCGGATGCTCAAGTTATTGACGCCGTTAAA
AACTTTTACGCCGAACAGTGCAAAGATAACAATGTTATTTTTAACTTATTAAATCTTATCAAGAATATCG
CTTTCTTAAGTGATGACGAACTGGACGGCATATTCATTGAAGGGAAATACCTGTCGAGCGTTAGTCAAA
AACTCTATAGCGATTGGTCAAAATTACGTAACGACATTGAGGATTCGGCTAACTCTAAACAAGGCAATA
AAGAGCTGGCCAAGAAGATCAAAACCAACAAAGGGGATGTAGAAAAAGCGATCTCGAAATATGAGTTC
TCGCTGTCGGAACTGAACTCGATTGTACATGATAACACCAAGTTTTCTGACCTCCTTAGTTGTACACTGC
ATAAGGTGGCTTCTGAGAAACTGGTGAAGGTCAATGAAGGCGACTGGCCGAAACATCTCAAGAATAAT
GAAGAGAAACAAAAAATCAAAGAGCCGCTTGATGCTCTGCTGGAGATCTATAATACACTTCTGATTTTT
AACTGCAAAAGCTTCAATAAAAACGGCAACTTCTATGTCGACTATGATCGTTGCATCAATGAACTGAGT
TCGGTCGTGTATCTGTATAATAAAACACGTAACTATTGCACTAAAAAACCCTATAACACGGACAAGTTC
AAACTCAATTTTAACAGTCCGCAGCTCGGTGAAGGCTTTTCCAAGTCGAAAGAAAATGACTGTCTGACT
CTTTTGTTTAAAAAAGACGACAACTATTATGTAGGCATTATCCGCAAAGGTGCAAAAATCAATTTTGATG
ATACACAAGCAATCGCCGATAACACCGACAATTGCATCTTTAAAATGAATTATTTCCTACTTAAAGACGC
AAAAAAATTTATCCCGAAATGTAGCATTCAGCTGAAAGAAGTCAAGGCCCATTTTAAGAAATCTGAAGA
TGATTACATTTTGTCTGATAAAGAGAAATTTGCTAGCCCGCTGGTCATTAAAAAGAGCACATTTTTGCTG
GCAACTGCACATGTGAAAGGGAAAAAAGGCAATATCAAGAAATTTCAGAAAGAATATTCGAAAGAAAA
CCCCACTGAGTATCGCAATTCTTTAAACGAATGGATTGCTTTTTGTAAAGAGTTCTTAAAAACTTATAAA
GCGGCTACCATTTTTGATATAACCACATTGAAAAAGGCAGAGGAATATGCTGATATTGTAGAATTCTAC
AAGGATGTCGATAATCTGTGCTACAAACTGGAGTTCTGCCCGATTAAAACCTCGTTTATAGAAAACCTG
ATAGATAACGGCGACCTGTATCTGTTTCGCATCAATAACAAAGACTTCAGCAGTAAATCGACCGGCACC
AAGAACCTTCATACGTTATATTTACAAGCTATATTCGATGAACGTAATCTGAACAATCCGACAATTATGC
TGAATGGGGGAGCAGAACTGTTCTATCGTAAAGAAAGTATTGAGCAGAAAAACCGTATCACACACAAA
GCCGGTTCAATTCTCGTGAATAAGGTGTGTAAAGACGGTACAAGCCTGGATGATAAGATACGTAATGAA
ATTTATCAATATGAGAATAAATTTATTGATACCCTGTCTGATGAAGCTAAAAAGGTGTTACCGAATGTCA
TTAAAAAGGAAGCTACCCATGACATTACAAAAGATAAACGTTTCACTAGTGACAAATTCTTCTTTCACTG
CCCCCTGACAATTAATTATAAGGAAGGCGATACCAAGCAGTTCAATAACGAAGTGCTGAGTTTTCTGCG
TGGAAATCCTGACATCAACATTATCGGCATTGACCGCGGAGAGCGTAATTTAATCTATGTAACGGTTATA
AACCAGAAAGGCGAGATTCTGGATTCGGTTTCATTCAATACCGTGACCAACAAGAGTTCAAAAATCGAG
CAGACAGTCGATTATGAAGAGAAATTGGCAGTCCGCGAGAAAGAGAGGATTGAAGCAAAACGTTCCTG
GGACTCTATCTCAAAAATTGCGACACTAAAGGAAGGTTATCTGAGCGCAATAGTTCACGAGATCTGTCT
GTTAATGATTAAACACAACGCGATCGTTGTCTTAGAGAATCTTAATGCAGGCTTTAAGCGTATTCGTGGC
GGTTTATCAGAAAAAAGTGTTTATCAAAAATTCGAAAAAATGTTGATTAACAAACTGAACTATTTTGTCA
GCAAGAAGGAATCCGACTGGAATAAACCGTCTGGTCTGCTGAATGGACTGCAGCTTTCGGATCAGTTTG
AAAGCTTCGAAAAACTGGGTATTCAGTCTGGTTTTATTTTTTACGTGCCGGCTGCATATACCTCAAAGAT
TGATCCGACCACGGGCTTCGCCAATGTTCTGAATCTGTCGAAGGTACGCAATGTTGATGCGATCAAAAG
CTTTTTTTCTAACTTCAACGAAATTAGTTATAGCAAGAAAGAAGCCCTTTTCAAATTCTCATTCGATCTGG
ATTCACTGAGTAAGAAAGGCTTTAGTAGCTTTGTGAAATTTAGTAAGAGTAAATGGAACGTCTACACCTT
TGGAGAACGTATCATAAAGCCAAAGAATAAGCAAGGTTATCGGGAGGACAAAAGAATCAACTTGACCT
TCGAGATGAAGAAGTTACTTAACGAGTATAAGGTTTCTTTTGATCTTGAAAATAACTTGATTCCGAATCT
CACGAGTGCCAACCTGAAGGATACTTTTTGGAAAGAGCTATTCTTTATCTTCAAGACTACGCTGCAGCTC
CGTAACAGCGTTACTAACGGTAAAGAAGATGTGCTCATCTCTCCGGTCAAAAATGCGAAGGGTGAATTC
TTCGTTTCGGGAACGCATAACAAGACTCTTCCGCAAGATTGCGATGCGAACGGTGCATACCATATTGCGT
TGAAAGGTCTGATGATACTCGAACGTAACAACCTTGTACGTGAGGAGAAAGATACGAAAAAGATTATG
GCGATTTCAAACGTGGATTGGTTCGAGTACGTGCAGAAACGTAGAGGCGTTCTGTAA
SEQ ATGAACAACTACGACGAATTCACCAAACTGTACCCGATCCAGAAAACCATCCGTTTCGAACTGAAACCG
ID CAGGGTCGTACCATGGAACACCTGGAAACCTTCAACTTCTTCGAAGAAGACCGTGACCGTGCGGAAAAA
NO: TACAAAATCCTGAAAGAAGCGATCGACGAATACCACAAAAAATTCATCGACGAACACCTGACCAACAT
43 GTCTCTGGACTGGAACTCTCTGAAACAGATCTCTGAAAAATACTACAAATCTCGTGAAGAAAAAGACAA
AAAAGTTTTCCTGTCTGAACAGAAACGTATGCGTCAGGAAATCGTTTCTGAATTCAAAAAAGACGACCG
TTTCAAAGACCTGTTCTCTAAAAAACTGTTCTCTGAACTGCTGAAAGAAGAAATCTACAAAAAAGGTAA
CCACCAGGAAATCGACGCGCTGAAATCTTTCGACAAATTCTCTGGTTACTTCATCGGTCTGCACGAAAAC
CGTAAAAACATGTACTCTGACGGTGACGAAATCACCGCGATCTCTAACCGTATCGTTAACGAAAACTTC
CCGAAATTCCTGGACAACCTGCAGAAATACCAGGAAGCGCGTAAAAAATACCCGGAATGGATCATCAA
AGCGGAATCTGCGCTGGTTGCGCACAACATCAAAATGGACGAAGTTTTCTCTCTGGAATACTTCAACAA
AGTTCTGAACCAGGAAGGTATCCAGCGTTACAACCTGGCGCTGGGTGGTTACGTTACCAAATCTGGTGA
AAAAATGATGGGTCTGAACGACGCGCTGAACCTGGCGCACCAGTCTGAAAAATCTTCTAAAGGTCGTAT
CCACATGACCCCGCTGTTCAAACAGATCCTGTCTGAAAAAGAATCTTTCTCTTACATCCCGGACGTTTTC
ACCGAAGACTCTCAGCTGCTGCCGTCTATCGGTGGTTTCTTCGCGCAGATCGAAAACGACAAAGACGGT
AACATCTTCGACCGTGCGCTGGAACTGATCTCTTCTTACGCGGAATACGACACCGAACGTATCTACATCC
GTCAGGCGGACATCAACCGTGTTTCTAACGTTATCTTCGGTGAATGGGGTACCCTGGGTGGTCTGATGCG
TGAATACAAAGCGGACTCTATCAACGACATCAACCTGGAACGTACCTGCAAAAAAGTTGACAAATGGCT
GGACTCTAAAGAATTCGCGCTGTCTGACGTTCTGGAAGCGATCAAACGTACCGGTAACAACGACGCGTT
CAACGAATACATCTCTAAAATGCGTACCGCGCGTGAAAAAATCGACGCGGCGCGTAAAGAAATGAAAT
TCATCTCTGAAAAAATCTCTGGTGACGAAGAATCTATCCACATCATCAAAACCCTGCTGGACTCTGTTCA
GCAGTTCCTGCACTTCTTCAACCTGTTCAAAGCGCGTCAGGACATCCCGCTGGACGGTGCGTTCTACGCG
GAATTCGACGAAGTTCACTCTAAACTGTTCGCGATCGTTCCGCTGTACAACAAAGTTCGTAACTACCTGA
CCAAAAACAACCTGAACACCAAAAAAATCAAACTGAACTTCAAAAACCCGACCCTGGCGAACGGTTGG
GACCAGAACAAAGTTTACGACTACGCGTCTCTGATCTTCCTGCGTGACGGTAACTACTACCTGGGTATCA
TCAACCCGAAACGTAAAAAAAACATCAAATTCGAACAGGGTTCTGGTAACGGTCCGTTCTACCGTAAAA
TGGTTTACAAACAGATCCCGGGTCCGAACAAAAACCTGCCGCGTGTTTTCCTGACCTCTACCAAAGGTA
AAAAAGAATACAAACCGTCTAAAGAAATCATCGAAGGTTACGAAGCGGACAAACACATCCGTGGTGAC
AAATTCGACCTGGACTTCTGCCACAAACTGATCGACTTCTTCAAAGAATCTATCGAAAAACACAAAGAC
TGGTCTAAATTCAACTTCTACTTCTCTCCGACCGAATCTTACGGTGACATCTCTGAATTCTACCTGGACGT
TGAAAAACAGGGTTACCGTATGCACTTCGAAAACATCTCTGCGGAAACCATCGACGAATACGTTGAAAA
AGGTGACCTGTTCCTGTTCCAGATCTACAACAAAGACTTCGTTAAAGCGGCGACCGGTAAAAAAGACAT
GCACACCATCTACTGGAACGCGGCGTTCTCTCCGGAAAACCTGCAGGACGTTGTTGTTAAACTGAACGG
TGAAGCGGAACTGTTCTACCGTGACAAATCTGACATCAAAGAAATCGTTCACCGTGAAGGTGAAATCCT
GGTTAACCGTACCTACAACGGTCGTACCCCGGTTCCGGACAAAATCCACAAAAAACTGACCGACTACCA
CAACGGTCGTACCAAAGACCTGGGTGAAGCGAAAGAATACCTGGACAAAGTTCGTTACTTCAAAGCGCA
CTACGACATCACCAAAGACCGTCGTTACCTGAACGACAAAATCTACTTCCACGTTCCGCTGACCCTGAAC
TTCAAAGCGAACGGTAAAAAAAACCTGAACAAAATGGTTATCGAAAAATTCCTGTCTGACGAAAAAGC
GCACATCATCGGTATCGACCGTGGTGAACGTAACCTGCTGTACTACTCTATCATCGACCGTTCTGGTAAA
ATCATCGACCAGCAGTCTCTGAACGTTATCGACGGTTTCGACTACCGTGAAAAACTGAACCAGCGTGAA
ATCGAAATGAAAGACGCGCGTCAGTCTTGGAACGCGATCGGTAAAATCAAAGACCTGAAAGAAGGTTA
CCTGTCTAAAGCGGTTCACGAAATCACCAAAATGGCGATCCAGTACAACGCGATCGTTGTTATGGAAGA
ACTGAACTACGGTTTCAAACGTGGTCGTTTCAAAGTTGAAAAACAGATCTACCAGAAATTCGAAAACAT
GCTGATCGACAAAATGAACTACCTGGTTTTCAAAGACGCGCCGGACGAATCTCCGGGTGGTGTTCTGAA
CGCGTACCAGCTGACCAACCCGCTGGAATCTTTCGCGAAACTGGGTAAACAGACCGGTATCCTGTTCTA
CGTTCCGGCGGCGTACACCTCTAAAATCGACCCGACCACCGGTTTCGTTAACCTGTTCAACACCTCTTCT
AAAACCAACGCGCAGGAACGTAAAGAATTCCTGCAGAAATTCGAATCTATCTCTTACTCTGCGAAAGAC
GGTGGTATCTTCGCGTTCGCGTTCGACTACCGTAAATTCGGTACCTCTAAAACCGACCACAAAAACGTTT
GGACCGCGTACACCAACGGTGAACGTATGCGTTACATCAAAGAAAAAAAACGTAACGAACTGTTCGAC
CCGTCTAAAGAAATCAAAGAAGCGCTGACCTCTTCTGGTATCAAATACGACGGTGGTCAGAACATCCTG
CCGGACATCCTGCGTTCTAACAACAACGGTCTGATCTACACCATGTACTCTTCTTTCATCGCGGCGATCC
AGATGCGTGTTTACGACGGTAAAGAAGACTACATCATCTCTCCGATCAAAAACTCTAAAGGTGAATTCT
TCCGTACCGACCCGAAACGTCGTGAACTGCCGATCGACGCGGACGCGAACGGTGCGTACAACATCGCGC
TGCGTGGTGAACTGACCATGCGTGCGATCGCGGAAAAATTCGACCCGGACTCTGAAAAAATGGCGAAAC
TGGAACTGAAACACAAAGACTGGTTCGAATTCATGCAGACCCGTGGTGACTAA
SEQ ATGACTAAAACATTTGATTCAGAGTTTTTTAATTTGTACTCGCTGCAAAAAACGGTACGCTTTGAGTTAA
ID AACCCGTGGGAGAAACCGCGTCATTTGTGGAAGACTTTAAAAACGAGGGCTTGAAACGTGTTGTGAGCG
NO: AAGATGAAAGGCGAGCCGTCGATTACCAGAAAGTTAAGGAAATAATTGACGATTACCATCGGGATTTCA
44 TTGAAGAAAGTTTAAATTATTTTCCGGAACAGGTGAGTAAAGATGCTCTTGAGCAGGCGTTTCATCTTTA
TCAGAAACTGAAGGCAGCAAAAGTTGAGGAAAGGGAAAAAGCGCTGAAAGAATGGGAAGCGCTGCAG
AAAAAGCTACGTGAAAAAGTGGTGAAATGCTTCTCGGACTCGAATAAAGCCCGCTTCTCAAGGATTGAT
AAAAAGGAACTGATTAAGGAAGACCTGATAAATTGGTTGGTCGCCCAGAATCGCGAGGATGATATCCCT
ACGGTCGAAACGTTTAACAACTTCACCACATATTTTACCGGCTTCCATGAGAATCGTAAAAATATTTACT
CCAAAGATGATCACGCCACCGCTATTAGCTTTCGCCTTATTCATGAAAATCTTCCAAAGTTTTTTGACAA
CGTGATTAGCTTCAATAAGTTGAAAGAGGGTTTCCCTGAATTAAAATTTGATAAAGTGAAAGAGGATTT
AGAAGTAGATTATGATCTGAAGCATGCGTTTGAAATAGAATATTTCGTTAACTTCGTGACCCAAGCGGG
CATAGATCAGTATAATTATCTGTTAGGAGGGAAAACCCTGGAGGACGGGACGAAAAAACAAGGGATGA
ATGAGCAAATTAATCTGTTCAAACAACAGCAAACGCGAGATAAAGCGCGTCAGATTCCCAAACTGATCC
CCCTGTTCAAACAGATTCTTAGCGAAAGGACTGAAAGCCAGTCCTTTATTCCTAAACAATTTGAAAGTGA
TCAGGAGTTGTTCGATTCACTGCAGAAGTTACATAATAACTGCCAGGATAAATTCACCGTGCTGCAACA
AGCCATTCTCGGTCTGGCAGAGGCGGATCTTAAGAAGGTCTTCATCAAAACCTCTGATTTAAATGCCTTA
TCTAACACCATTTTCGGGAATTACAGCGTCTTTTCCGATGCACTGAACCTGTATAAAGAAAGCCTGAAAA
CGAAAAAAGCGCAGGAGGCTTTTGAGAAACTACCGGCCCATTCTATTCACGACCTCATTCAATACTTGG
AACAGTTCAATTCCAGCCTGGACGCGGAAAAACAACAGAGCACCGACACCGTCCTGAACTACTTCATCA
AGACCGATGAATTATATTCTCGCTTCATTAAATCCACTAGCGAGGCTTTCACTCAGGTGCAGCCTTTGTT
CGAACTGGAAGCCCTGTCATCTAAGCGCCGCCCACCGGAATCGGAAGATGAAGGGGCAAAAGGGCAGG
AAGGCTTCGAGCAGATCAAGCGTATTAAAGCTTACCTGGATACGCTTATGGAAGCGGTACACTTTGCAA
AGCCGTTGTATCTTGTTAAGGGTCGTAAAATGATCGAAGGGCTCGATAAAGACCAGTCCTTTTATGAAG
CGTTTGAAATGGCGTACCAAGAACTTGAATCGTTAATCATTCCTATCTATAACAAAGCGCGGAGCTATCT
GTCGCGGAAACCTTTCAAGGCCGATAAATTCAAGATTAATTTTGACAACAACACGCTACTGAGCGGATG
GGATGCGAACAAGGAAACTGCTAACGCGTCCATTCTGTTTAAGAAAGACGGGTTATATTACCTTGGAAT
TATGCCGAAAGGTAAGACCTTTCTCTTTGACTACTTTGTATCGAGCGAGGATTCAGAGAAACTGAAACA
GCGTCGCCAGAAGACCGCCGAAGAAGCTCTGGCGCAGGATGGTGAAAGTTACTTCGAAAAAATTCGTTA
TAAACTGTTACCAGGGGCTTCAAAGATGTTACCGAAAGTCTTTTTTAGCAACAAAAATATTGGCTTTTAC
AACCCGTCGGATGACATTTTACGCATTCGCAACACAGCCTCTCACACCAAAAACGGGACCCCTCAGAAA
GGCCACTCAAAAGTTGAGTTTAACCTGAATGATTGTCATAAGATGATTGATTTCTTCAAATCATCAATTC
AGAAACACCCGGAATGGGGGTCTTTTGGCTTTACGTTTTCTGATACCAGTGATTTTGAAGACATGAGTGC
CTTCTACCGGGAAGTAGAAAACCAGGGTTACGTAATTAGCTTTGACAAAATCAAAGAGACCTATATACA
GAGCCAGGTGGAACAGGGTAATCTCTACTTATTCCAGATTTATAACAAGGATTTCTCGCCCTACAGCAA
AGGCAAACCAAACCTGCATACTCTGTACTGGAAAGCCCTGTTTGAAGAAGCGAACCTGAATAACGTAGT
GGCGAAGTTGAACGGTGAAGCGGAAATCTTCTTCCGTCGTCACTCCATTAAGGCCTCTGATAAAGTTGTC
CATCCGGCAAATCAGGCCATTGATAATAAGAATCCACACACGGAAAAAACGCAGTCAACCTTTGAATAT
GACCTCGTTAAAGACAAACGCTACACGCAAGATAAGTTCTTTTTCCACGTCCCAATCAGCCTCAACTTTA
AAGCACAAGGGGTTTCAAAGTTTAATGATAAAGTCAATGGGTTCCTCAAGGGCAACCCGGATGTCAACA
TTATAGGTATAGACAGGGGCGAACGCCATCTGCTTTACTTTACCGTAGTGAATCAGAAAGGTGAAATAC
TGGTTCAGGAATCATTAAATACCTTGATGTCGGACAAAGGGCACGTTAATGATTACCAGCAGAAACTGG
ATAAAAAAGAACAGGAACGTGATGCTGCGCGTAAATCGTGGACCACGGTTGAGAACATTAAAGAGCTG
AAAGAGGGGTATCTAAGCCATGTGGTACACAAACTGGCGCACCTCATCATTAAATATAACGCAATAGTC
TGCCTAGAAGACTTGAATTTTGGCTTTAAACGCGGCCGCTTCAAAGTGGAAAAACAAGTTTATCAAAAA
TTTGAAAAGGCGCTTATAGATAAACTGAATTATCTGGTTTTTAAAGAAAAGGAACTTGGTGAGGTAGGG
CACTACTTGACAGCTTATCAACTGACGGCCCCGTTCGAATCATTCAAAAAACTGGGCAAACAGTCTGGC
ATTCTGTTTTACGTGCCGGCAGATTATACTTCAAAAATCGATCCAACAACTGGCTTTGTGAACTTCCTGG
ACCTGAGATATCAGTCTGTAGAAAAAGCTAAACAACTTCTTAGCGATTTTAATGCCATTCGTTTTAACAG
CGTTCAGAATTACTTTGAATTCGAAATTGACTATAAAAAACTTACTCCGAAACGTAAAGTCGGAACCCA
AAGTAAATGGGTAATTTGTACGTATGGCGATGTCAGGTATCAGAACCGTCGGAATCAAAAAGGTCATTG
GGAGACCGAAGAAGTGAACGTGACCGAAAAGCTGAAGGCTCTGTTCGCCAGCGATTCAAAAACTACAA
CTGTGATCGATTACGCAAATGATGATAACCTGATAGATGTGATTTTAGAGCAGGATAAAGCCAGCTTTTT
TAAAGAACTGTTGTGGCTCCTGAAACTTACGATGACCTTACGACATTCCAAGATCAAATCGGAAGATGA
TTTTATTCTGTCACCGGTCAAGAATGAGCAGGGTGAATTCTATGATAGTAGGAAAGCCGGCGAAGTGTG
GCCGAAAGACGCCGACGCCAATGGCGCCTATCATATCGCGCTCAAAGGGCTTTGGAATTTGCAGCAGAT
TAACCAGTGGGAAAAAGGTAAAACCCTGAATCTGGCTATCAAAAACCAGGATTGGTTTAGCTTTATCCA
AGAGAAACCGTATCAGGAATGA
SEQ ATGCATACAGGCGGTCTTCTTAGTATGGACGCGAAAGAGTTCACAGGTCAGTATCCGTTGTCGAAAACA
ID TTACGATTCGAACTTCGGCCCATCGGCCGCACGTGGGATAACCTGGAGGCCTCAGGCTACTTAGCGGAA
NO: GACCGCCATCGTGCCGAATGTTATCCTCGTGCGAAAGAGTTATTGGATGACAACCATCGTGCCTTCCTGA
45 ATCGTGTGTTGCCACAAATCGATATGGATTGGCACCCGATTGCGGAGGCCTTTTGTAAGGTACATAAAA
ACCCTGGTAATAAAGAACTTGCCCAGGATTACAACCTTCAGTTGTCAAAGCGCCGTAAGGAGATCAGCG
CATATCTTCAGGATGCAGATGGCTATAAAGGCCTGTTCGCGAAGCCCGCCTTAGACGAAGCTATGAAAA
TTGCGAAAGAAAACGGGAACGAAAGTGATATTGAGGTTCTCGAAGCGTTTAACGGTTTTAGCGTATACT
TCACCGGTTATCATGAGTCACGCGAGAACATTTATAGCGATGAGGATATGGTGAGCGTAGCCTACCGAA
TTACTGAGGATAATTTCCCGCGCTTTGTCTCAAACGCTTTGATCTTTGATAAATTAAACGAAAGCCATCC
GGATATTATCTCTGAAGTATCGGGCAATCTTGGAGTTGATGACATTGGTAAGTACTTTGACGTGTCGAAC
TATAACAATTTTCTTTCCCAGGCCGGTATAGATGACTACAATCACATTATTGGCGGCCATACAACCGAAG
ACGGACTGATACAAGCGTTTAATGTCGTATTGAACTTACGTCACCAAAAAGACCCTGGCTTTGAAAAAA
TTCAGTTCAAACAGCTCTACAAACAAATCCTGAGCGTGCGTACCAGCAAAAGCTACATCCCGAAACAGT
TTGACAACTCTAAGGAGATGGTTGACTGCATTTGCGATTATGTCAGCAAAATAGAGAAATCCGAAACAG
TAGAACGGGCCCTGAAACTAGTCCGTAATATCAGTTCTTTCGACTTGCGCGGGATCTTTGTCAATAAAAA
GAACTTGCGCATACTGAGCAACAAACTGATAGGAGATTGGGACGCGATCGAAACCGCATTGATGCATAG
TTCTTCATCAGAAAACGATAAGAAAAGCGTATATGATAGCGCGGAGGCTTTTACGTTGGATGACATCTTT
TCAAGCGTGAAAAAATTTTCTGATGCCTCTGCCGAAGATATTGGCAACAGGGCGGAAGACATCTGTAGA
GTGATAAGTGAGACGGCCCCTTTTATCAACGATCTGCGAGCGGTGGACCTGGATAGCCTGAACGACGAT
GGTTATGAAGCGGCCGTCTCAAAAATTCGGGAGTCGCTGGAGCCTTATATGGATCTTTTCCATGAACTGG
AAATTTTCTCGGTTGGCGATGAGTTCCCAAAATGCGCAGCATTTTACAGCGAACTGGAGGAAGTCAGCG
AACAGCTGATCGAAATTATTCCGTTATTCAACAAGGCGCGTTCGTTCTGCACCCGGAAACGCTATAGCAC
CGATAAGATTAAAGTGAACTTAAAATTCCCGACCTTGGCGGACGGGTGGGACCTGAACAAAGAGAGAG
ACAACAAAGCCGCGATTCTGCGGAAAGACGGTAAGTATTATCTGGCAATTCTGGATATGAAGAAAGATC
TGTCAAGCATTAGGACCAGCGACGAAGATGAATCCAGCTTCGAAAAGATGGAGTATAAACTGTTACCGA
GTCCAGTAAAAATGCTGCCAAAGATATTCGTAAAATCGAAAGCCGCTAAGGAAAAATATGGCCTGACA
GATCGTATGCTTGAATGCTACGATAAAGGTATGCATAAGTCGGGTAGTGCGTTTGATCTTGGCTTTTGCC
ATGAACTCATTGATTATTACAAGCGTTGTATCGCGGAGTACCCAGGCTGGGATGTGTTCGATTTCAAGTT
TCGCGAAACTTCCGATTATGGGTCCATGAAAGAGTTCAATGAAGATGTGGCCGGAGCCGGTTACTATAT
GAGTCTGAGAAAAATTCCGTGCAGCGAAGTGTACCGTCTGTTAGACGAGAAATCGATTTATCTATTTCA
AATTTATAACAAAGATTACTCTGAAAATGCACATGGTAATAAGAACATGCATACCATGTACTGGGAGGG
TCTCTTTTCCCCGCAAAACCTGGAGTCGCCCGTTTTCAAGTTGTCGGGTGGGGCAGAACTTTTCTTTCGA
AAATCCTCAATCCCTAACGATGCCAAAACAGTACACCCGAAAGGCTCAGTGCTGGTTCCACGTAATGAT
GTTAACGGTCGGCGTATTCCAGATTCAATCTACCGCGAACTGACACGCTATTTTAACCGTGGCGATTGCC
GAATCAGTGACGAAGCCAAAAGTTATCTTGACAAGGTTAAGACTAAAAAAGCGGACCATGACATTGTG
AAAGATCGCCGCTTTACCGTGGATAAAATGATGTTCCACGTCCCGATTGCGATGAACTTTAAGGCGATC
AGTAAACCGAACTTAAACAAAAAAGTCATTGATGGCATCATTGATGATCAGGATCTGAAAATCATTGGT
ATTGATCGTGGCGAGCGGAACTTAATTTACGTCACGATGGTTGACAGAAAAGGGAATATCTTATATCAG
GATTCTCTTAACATCCTCAATGGCTACGACTATCGTAAAGCTCTGGATGTGCGCGAATATGACAACAAG
GAAGCGCGTCGTAACTGGACTAAAGTGGAGGGCATTCGCAAAATGAAGGAAGGCTATCTGTCATTAGCG
GTCTCGAAATTAGCGGATATGATTATCGAAAATAACGCCATCATCGTTATGGAGGACCTGAACCACGGA
TTCAAAGCGGGCCGCTCAAAGATTGAAAAACAAGTTTATCAGAAATTTGAGAGTATGCTGATTAACAAA
CTGGGCTATATGGTGTTAAAAGACAAGTCAATTGACCAATCAGGTGGCGCGCTGCATGGATACCAGCTG
GCGAACCATGTTACCACCTTAGCATCAGTTGGAAAGCAGTGTGGGGTTATCTTTTATATACCGGCAGCGT
TCACTAGTAAAATAGATCCGACCACTGGTTTCGCCGATCTCTTTGCCCTGAGTAACGTTAAAAACGTAGC
GAGCATGCGTGAATTCTTTTCCAAAATGAAATCTGTCATTTATGATAAAGCTGAAGGCAAATTCGCATTC
ACCTTTGATTACTTGGATTACAACGTGAAGAGCGAATGTGGTCGTACGCTGTGGACCGTTTACACCGTTG
GTGAGCGCTTCACCTATTCCCGTGTGAACCGCGAATATGTACGTAAAGTCCCCACCGATATTATCTATGA
TGCCCTCCAGAAAGCAGGCATTAGCGTCGAAGGAGACTTAAGGGACAGAATTGCCGAAAGCGATGGCG
ATACGCTGAAGTCTATTTTTTACGCATTCAAATACGCGCTAGATATGCGCGTTGAGAATCGCGAGGAAG
ACTACATTCAATCACCTGTGAAAAATGCCTCTGGGGAATTTTTTTGTTCAAAAAATGCTGGTAAAAGCCT
CCCACAAGATAGCGATGCAAACGGTGCATATAACATTGCCCTGAAAGGTATTCTTCAATTACGCATGCT
GTCTGAGCAGTACGACCCCAACGCGGAATCTATTAGACTTCCGCTGATAACCAATAAAGCCTGGCTGAC
ATTCATGCAGTCTGGCATGAAGACCTGGAAAAATTAG
SEQ ATGGATAGTTTAAAAGATTTTACGAATCTATATCCCGTAAGCAAAACTCTTCGTTTTGAACTGAAACCTG
ID TTGGAAAAACGTTGGAGAATATCGAGAAAGCGGGCATCCTGAAAGAAGACGAGCACCGTGCCGAAAGC
NO: TACAGGCGTGTCAAAAAGATTATCGATACTTATCACAAAGTGTTCATTGATAGCAGTCTGGAGAACATG
46 GCAAAAATGGGCATAGAAAATGAAATCAAAGCAATGCTGCAGAGCTTTTGCGAGCTCTACAAGAAAGA
TCACCGAACGGAAGGTGAAGATAAAGCACTGGACAAAATTCGCGCCGTTCTTCGCGGTCTGATTGTTGG
CGCGTTCACCGGCGTGTGCGGCCGCCGTGAAAACACCGTGCAGAACGAAAAGTACGAGTCGCTGTTCAA
AGAAAAACTGATAAAAGAAATTTTGCCTGACTTTGTGCTTTCGACCGAAGCGGAATCCCTGCCATTTTCT
GTCGAAGAAGCGACCCGCAGCCTGAAAGAATTTGACTCATTCACAAGTTACTTTGCAGGCTTCTACGAA
AACCGTAAAAACATCTACAGCACGAAGCCACAGAGCACGGCTATTGCTTATCGCCTGATTCATGAGAAC
CTGCCGAAGTTCATCGATAACATCCTTGTTTTTCAAAAAATTAAAGAGCCGATTGCGAAAGAGTTAGAA
CATATTCGAGCTGACTTTTCTGCGGGTGGGTACATTAAAAAAGATGAGCGGCTGGAAGACATCTTCAGT
CTAAACTATTATATCCACGTTCTGTCGCAGGCAGGCATTGAGAAATATAATGCGCTGATTGGTAAGATTG
TCACAGAAGGCGATGGTGAGATGAAAGGTCTTAATGAACATATCAATCTGTATAACCAGCAGCGTGGTC
GCGAAGACCGTCTTCCACTGTTCCGCCCACTGTATAAACAGATCCTGTCTGACCGGGAACAGCTGTCCTA
CCTGCCGGAAAGCTTTGAAAAGGATGAAGAGCTACTTCGCGCATTAAAGGAGTTTTACGACCATATTGC
GGAAGACATTTTGGGTAGAACGCAGCAACTGATGACGTCAATTTCTGAATACGATCTGAGTAGAATCTA
CGTTAGGAATGATAGCCAGCTGACCGATATTAGCAAAAAAATGCTGGGCGACTGGAACGCTATCTATAT
GGCACGTGAACGTGCATATGATCATGAACAAGCACCGAAACGTATAACCGCGAAATATGAGCGTGATC
GCATTAAGGCGCTAAAGGGAGAAGAAAGCATCTCACTCGCAAACCTGAACTCCTGTATCGCTTTCTTAG
ATAACGTGCGCGATTGTCGCGTCGACACGTATCTGTCAACCCTTGGGCAGAAAGAGGGTCCACATGGTC
TGTCTAACCTGGTGGAAAATGTCTTTGCGAGTTACCATGAAGCGGAACAACTGCTGTCTTTTCCATACCC
CGAAGAAAACAATCTAATACAGGATAAAGATAACGTGGTGTTAATCAAAAACCTGCTGGACAACATCA
GCGATCTGCAACGTTTCCTGAAACCTTTGTGGGGTATGGGTGACGAGCCAGACAAAGACGAACGTTTTT
ATGGTGAGTATAATTATATACGTGGCGCCCTTGACCAAGTTATTCCGCTGTATAACAAAGTACGGAACTA
TCTGACCCGTAAGCCATATTCTACCCGTAAAGTGAAACTGAACTTTGGCAACTCGCAACTGCTGTCGGGT
TGGGATCGTAACAAAGAAAAAGATAATAGTTGTGTTATCCTGCGTAAGGGACAAAATTTTTACCTCGCG
ATTATGAACAACAGACACAAGCGTTCATTTGAAAATAAGGTTCTGCCGGAGTATAAAGAGGGCGAACCG
TACTTCGAGAAAATGGATTATAAGTTCTTACCAGACCCTAATAAGATGTTACCGAAAGTCTTTCTTTCGA
AAAAAGGCATAGAAATCTATAAGCCGTCCCCGAAATTACTCGAACAGTATGGGCACGGGACCCACAAG
AAAGGGGATACTTTTAGCATGGACGATCTGCACGAACTGATCGATTTTTTTAAACACTCCATCGAAGCCC
ATGAAGACTGGAAACAGTTTGGGTTCAAGTTCTCTGATACAGCCACATACGAGAATGTGTCTAGTTTTTA
TCGGGAAGTGGAGGATCAGGGCTACAAACTTAGTTTTCGTAAAGTTTCAGAGAGTTATGTTTATAGTTTA
ATTGATCAGGGAAAACTTTACCTGTTCCAGATCTACAACAAAGATTTCTCGCCATGTAGTAAGGGTACCC
CGAATCTGCATACACTCTATTGGAGAATGTTATTCGATGAGCGTAACTTAGCGGATGTCATTTATAAATT
GGACGGGAAAGCAGAGATCTTTTTTCGTGAAAAATCACTGAAGAATGACCACCCGACTCATCCGGCCGG
GAAACCGATCAAAAAAAAATCCCGCCAGAAAAAAGGAGAAGAGTCTCTGTTTGAATATGATCTGGTGA
AAGACCGTCATTACACTATGGATAAATTTCAATTTCATGTTCCAATTACAATGAACTTCAAATGTTCGGC
GGGTTCCAAAGTAAATGATATGGTAAACGCCCATATTCGCGAAGCGAAAGATATGCATGTTATTGGCAT
CGATAGAGGCGAAAGAAACCTGCTTTATATTTGCGTAATTGACAGCCGTGGTACCATTCTGGACCAGAT
CTCTTTAAACACCATCAATGACATCGATTATCACGACCTGTTGGAGTCTCGGGACAAGGACCGCCAGCA
GGAGCGCCGTAATTGGCAGACAATTGAAGGCATAAAAGAATTAAAACAGGGTTACCTTTCCCAGGCCGT
ACACCGCATAGCGGAACTGATGGTGGCCTACAAAGCCGTAGTTGCCCTGGAAGACTTGAATATGGGGTT
TAAACGTGGCCGTCAAAAAGTCGAGAGCAGCGTGTATCAGCAATTTGAAAAACAGTTGATTGACAAGTT
GAATTATTTGGTTGATAAAAAGAAACGTCCAGAAGATATTGGTGGCTTACTGCGTGCATACCAGTTTAC
GGCACCTTTTAAGTCCTTCAAAGAAATGGGTAAACAGAACGGGTTTCTGTTTTACATCCCGGCCTGGAAT
ACATCCAACATCGATCCTACCACCGGGTTTGTCAACCTGTTTCATGCACAATATGAAAACGTGGATAAA
GCGAAGAGTTTTTTCCAAAAATTCGATAGTATTTCGTATAACCCAAAAAAAGATTGGTTTGAGTTTGCGT
TCGATTATAAAAATTTTACTAAAAAGGCTGAGGGATCCCGCAGTATGTGGATCCTCTGCACCCATGGCA
GTCGTATTAAAAATTTTCGTAATTCGCAAAAGAATGGCCAGTGGGACTCGGAAGAGTTTGCCCTGACCG
AAGCGTTCAAATCGCTGTTTGTACGCTACGAAATTGACTACACAGCAGATCTGAAAACAGCCATCGTCG
ATGAAAAACAGAAAGATTTTTTTGTAGATCTCCTAAAACTGTTCAAACTGACTGTTCAGATGCGCAATTC
CTGGAAAGAGAAAGACCTGGATTATCTGATTAGCCCGGTAGCCGGTGCTGATGGACGATTTTTCGATAC
TCGTGAAGGTAACAAAAGTCTCCCGAAAGATGCTGATGCCAATGGTGCATACAATATTGCATTAAAGGG
GCTATGGGCCTTGCGACAGATCCGCCAGACCAGCGAAGGCGGCAAGCTGAAATTGGCCATATCGAATAA
GGAATGGTTACAATTTGTTCAGGAACGTAGCTATGAAAAAGATTGA
SEQ ATGAACAACGGCACAAATAATTTTCAGAACTTCATCGGGATCTCAAGTTTGCAGAAAACGCTGCGCAAT
ID GCTCTGATCCCCACGGAAACCACGCAACAGTTCATCGTCAAGAACGGAATAATTAAAGAAGATGAGTTA
NO: CGTGGCGAGAACCGCCAGATTCTGAAAGATATCATGGATGACTACTACCGCGGATTCATCTCTGAGACT
47 CTGAGTTCTATTGATGACATAGATTGGACTAGCCTGTTCGAAAAAATGGAAATTCAGCTGAAAAATGGT
GATAATAAAGATACCTTAATTAAGGAACAGACAGAGTATCGGAAAGCAATCCATAAAAAATTTGCGAA
CGACGATCGGTTTAAGAACATGTTTAGCGCCAAACTGATTAGTGACATATTACCTGAATTTGTCATCCAC
AACAATAATTATTCGGCATCAGAGAAAGAGGAAAAAACCCAGGTGATAAAATTGTTTTCGCGCTTTGCG
ACTAGCTTTAAAGATTACTTCAAGAACCGTGCAAATTGCTTTTCAGCGGACGATATTTCATCAAGCAGCT
GCCATCGCATCGTCAACGACAATGCAGAGATATTCTTTTCAAATGCGCTGGTCTACCGCCGGATCGTAAA
ATCGCTGAGCAATGACGATATCAACAAAATTTCGGGCGATATGAAAGATTCATTAAAAGAAATGAGTCT
GGAAGAAATATATTCTTACGAGAAGTATGGGGAATTTATTACCCAGGAAGGCATTAGCTTCTATAATGA
TATCTGTGGGAAAGTGAATTCTTTTATGAACCTGTATTGTCAGAAAAATAAAGAAAACAAAAATTTATA
CAAACTTCAGAAACTTCACAAACAGATTCTATGCATTGCGGACACTAGCTATGAGGTCCCGTATAAATTT
GAAAGTGACGAGGAAGTGTACCAATCAGTTAACGGCTTCCTTGATAACATTAGCAGCAAACATATAGTC
GAAAGATTACGCAAAATCGGCGATAACTATAACGGCTACAACCTGGATAAAATTTATATCGTGTCCAAA
TTTTACGAGAGCGTTAGCCAAAAAACCTACCGCGACTGGGAAACAATTAATACCGCCCTCGAAATTCAT
TACAATAATATCTTGCCGGGTAACGGTAAAAGTAAAGCCGACAAAGTAAAAAAAGCGGTTAAGAATGA
TTTACAGAAATCCATCACCGAAATAAATGAACTAGTGTCAAACTATAAGCTGTGCAGTGACGACAACAT
CAAAGCGGAGACTTATATACATGAGATTAGCCATATCTTGAATAACTTTGAAGCACAGGAATTGAAATA
CAATCCGGAAATTCACCTAGTTGAATCCGAGCTCAAAGCGAGTGAGCTTAAAAACGTGCTGGACGTGAT
CATGAATGCGTTTCATTGGTGTTCGGTTTTTATGACTGAGGAACTTGTTGATAAAGACAACAATTTTTAT
GCGGAACTGGAGGAGATTTACGATGAAATTTATCCAGTAATTAGTCTGTACAACCTGGTTCGTAACTAC
GTTACCCAGAAACCGTACAGCACGAAAAAGATTAAATTGAACTTTGGAATACCGACGTTAGCAGACGGT
TGGTCAAAGTCCAAAGAGTATTCTAATAACGCTATCATACTGATGCGCGACAATCTGTATTATCTGGGCA
TCTTTAATGCGAAGAATAAACCGGACAAGAAGATTATCGAGGGTAATACGTCAGAAAATAAGGGTGAC
TACAAAAAGATGATTTATAATTTGCTCCCGGGTCCCAACAAAATGATCCCGAAAGTTTTCTTGAGCAGCA
AGACGGGGGTGGAAACGTATAAACCGAGCGCCTATATCCTAGAGGGGTATAAACAGAATAAACATATC
AAGTCTTCAAAAGACTTTGATATCACTTTCTGTCATGATCTGATCGACTACTTCAAAAACTGTATTGCAA
TTCATCCCGAGTGGAAAAACTTCGGTTTTGATTTTAGCGACACCAGTACTTATGAAGACATTTCCGGGTT
TTATCGTGAGGTAGAGTTACAAGGTTACAAGATTGATTGGACATACATTAGCGAAAAAGACATTGATCT
GCTGCAGGAAAAAGGTCAACTGTATCTGTTCCAGATATATAACAAAGATTTTTCGAAAAAATCAACCGG
GAATGACAACCTTCACACCATGTACCTGAAAAATCTTTTCTCAGAAGAAAATCTTAAGGATATCGTCCTG
AAACTTAACGGCGAAGCGGAAATCTTCTTCAGGAAGAGCAGCATAAAGAACCCAATCATTCATAAAAA
AGGCTCGATTTTAGTCAACCGTACCTACGAAGCAGAAGAAAAAGACCAGTTTGGCAACATTCAAATTGT
GCGTAAAAATATTCCGGAAAACATTTATCAGGAGCTGTACAAATACTTCAACGATAAAAGCGACAAAGA
GCTGTCTGATGAAGCAGCCAAACTGAAGAATGTAGTGGGACACCACGAGGCAGCGACGAATATAGTCA
AGGACTATCGCTACACGTATGATAAATACTTCCTTCATATGCCTATTACGATCAATTTCAAAGCCAATAA
AACGGGTTTTATTAATGATAGGATCTTACAGTATATCGCTAAAGAAAAAGACTTACATGTGATCGGCATT
GATCGGGGCGAGCGTAACCTGATCTACGTGTCCGTGATTGATACTTGTGGTAATATAGTTGAACAGAAA
AGCTTTAACATTGTAAACGGCTACGACTATCAGATAAAACTGAAACAACAGGAGGGCGCTAGACAGATT
GCGCGGAAAGAATGGAAAGAAATTGGTAAAATTAAAGAGATCAAAGAGGGCTACCTGAGCTTAGTAAT
CCACGAGATCTCTAAAATGGTAATCAAATACAATGCAATTATAGCGATGGAGGATTTGTCTTATGGTTTT
AAAAAAGGGCGCTTTAAGGTCGAACGGCAAGTTTACCAGAAATTTGAAACCATGCTCATCAATAAACTC
AACTATCTGGTATTTAAAGATATTTCGATTACCGAGAATGGCGGTCTCCTGAAAGGTTATCAGCTGACAT
ACATTCCTGATAAACTTAAAAACGTGGGTCATCAGTGCGGCTGCATTTTTTATGTGCCTGCTGCATACAC
GAGCAAAATTGATCCGACCACCGGCTTTGTGAATATCTTTAAATTTAAAGACCTGACAGTGGACGCAAA
ACGTGAATTCATTAAAAAATTTGACTCAATTCGTTATGACAGTGAAAAAAATCTGTTCTGCTTTACATTT
GACTACAATAACTTTATTACGCAAAACACGGTCATGAGCAAATCATCGTGGAGTGTGTATACATACGGC
GTGCGCATCAAACGTCGCTTTGTGAACGGCCGCTTCTCAAACGAAAGTGATACCATTGACATAACCAAA
GATATGGAGAAAACGTTGGAAATGACGGACATTAACTGGCGCGATGGCCACGATCTTCGTCAAGACATT
ATAGATTATGAAATTGTTCAGCACATATTCGAAATTTTCCGTTTAACAGTGCAAATGCGTAACTCCTTGT
CTGAACTGGAGGACCGTGATTACGATCGTCTCATTTCACCTGTACTGAACGAAAATAACATTTTTTATGA
CAGCGCGAAAGCGGGGGATGCACTTCCTAAGGATGCCGATGCAAATGGTGCGTATTGTATTGCATTAAA
AGGGTTATATGAAATTAAACAAATTACCGAAAATTGGAAAGAAGATGGTAAATTTTCGCGCGATAAACT
CAAAATCAGCAATAAAGATTGGTTCGACTTTATCCAGAATAAGCGCTATCTCTAA
SEQ ATGACCAATAAATTCACTAACCAGTATTCTCTCTCTAAGACCCTGCGCTTTGAACTGATTCCGCAGGGGA
ID AAACCTTGGAGTTCATTCAAGAAAAAGGCCTCTTGTCTCAGGATAAACAGAGGGCTGAATCTTACCAAG
NO: AAATGAAGAAAACTATTGATAAGTTTCATAAATATTTCATTGATTTAGCCTTGTCTAACGCCAAATTAAC
48 TCACTTGGAAACGTATCTGGAGTTATACAACAAATCTGCCGAAACTAAGAAAGAACAGAAATTTAAAGA
CGATTTGAAAAAAGTACAGGACAATCTGCGTAAAGAAATTGTCAAATCCTTCAGTGACGGCGATGCTAA
AAGCATTTTTGCCATTCTGGACAAAAAAGAGTTGATTACTGTGGAATTAGAAAAGTGGTTTGAAAACAA
TGAGCAGAAAGACATCTACTTCGATGAGAAATTCAAAACTTTCACCACCTATTTTACAGGATTTCATCAA
AACCGGAAGAACATGTACTCAGTAGAACCGAACTCCACGGCCATTGCGTATCGTTTGATCCATGAGAAT
CTGCCTAAATTTCTGGAGAATGCGAAAGCCTTTGAAAAGATTAAGCAGGTCGAATCGCTGCAAGTGAAT
TTTCGTGAACTCATGGGCGAATTTGGTGACGAAGGTCTAATCTTCGTTAACGAACTGGAAGAAATGTTTC
AGATTAATTACTACAATGACGTGCTATCGCAGAACGGTATCACAATCTACAATAGTATTATCTCAGGGTT
CACAAAAAACGATATAAAATACAAAGGCCTGAACGAGTATATCAATAACTACAACCAAACAAAGGACA
AAAAGGATAGGCTTCCGAAACTGAAGCAGTTATACAAACAGATTTTATCTGACAGAATCTCCCTGAGCT
TTCTGCCGGATGCTTTCACTGATGGGAAGCAGGTTCTGAAAGCGATTTTCGATTTTTATAAGATTAACTT
ACTGAGCTACACGATTGAAGGTCAAGAAGAATCTCAAAACTTACTGCTCTTGATCCGTCAAACCATTGA
AAATCTATCATCGTTCGATACGCAGAAAATCTACCTCAAAAACGATACTCACCTGACTACGATCTCTCAG
CAGGTTTTCGGGGATTTTAGTGTATTTTCAACAGCTCTGAACTACTGGTATGAAACCAAAGTCAATCCGA
AATTCGAGACGGAATATTCTAAGGCCAACGAAAAAAAACGTGAGATTCTTGATAAAGCTAAAGCCGTAT
TTACTAAACAGGATTACTTTTCTATTGCTTTCCTGCAGGAAGTTTTATCGGAGTATATCCTGACCCTGGAT
CATACATCTGATATCGTTAAAAAACACAGCAGCAATTGCATCGCTGACTATTTCAAAAACCACTTTGTCG
CCAAAAAAGAAAACGAAACAGACAAGACTTTCGATTTCATTGCTAACATCACCGCAAAATACCAGTGTA
TTCAGGGTATCTTGGAAAACGCCGACCAATACGAAGACGAACTGAAACAAGATCAGAAGCTGATCGAT
AATTTAAAATTCTTCTTAGATGCAATCCTGGAGCTGCTGCACTTCATCAAACCGCTTCATTTAAAGAGCG
AGTCCATTACCGAAAAGGACACCGCCTTCTATGACGTTTTTGAAAATTATTATGAAGCCCTCTCCTTGCT
GACTCCGCTGTATAATATGGTACGCAATTACGTAACCCAGAAACCATATTCTACCGAAAAAATTAAACT
GAACTTTGAAAACGCACAGCTGCTCAACGGTTGGGACGCGAATAAAGAAGGTGACTACCTCACCACCAT
CCTGAAAAAAGATGGTAACTATTTTCTGGCAATTATGGATAAGAAACATAATAAAGCATTCCAGAAATT
TCCTGAAGGGAAAGAAAATTACGAAAAGATGGTGTACAAACTCTTACCTGGAGTTAACAAAATGTTGCC
GAAAGTATTTTTTAGTAATAAGAACATCGCGTACTTTAACCCGTCCAAAGAACTGCTGGAAAATTATAA
AAAGGAGACGCATAAGAAAGGGGATACCTTTAACCTGGAACATTGCCATACCTTAATAGACTTCTTCAA
GGATTCCCTGAATAAACACGAGGATTGGAAATATTTCGATTTTCAGTTTAGTGAGACCAAGTCATACCA
GGATCTTAGCGGCTTTTATCGCGAAGTAGAACACCAAGGCTATAAAATTAACTTCAAAAACATCGACAG
CGAATACATCGACGGTTTAGTTAACGAGGGCAAACTGTTTCTGTTCCAGATCTATTCAAAGGATTTTAGC
CCGTTCTCTAAAGGCAAACCAAATATGCATACGTTGTACTGGAAAGCACTGTTTGAAGAGCAAAACCTG
CAGAATGTGATTTATAAACTGAACGGCCAAGCTGAGATTTTTTTCCGTAAAGCCTCGATTAAACCGAAA
AATATCATCCTTCATAAGAAGAAAATAAAGATCGCTAAAAAACACTTCATAGATAAAAAAACCAAAAC
CTCCGAAATAGTGCCTGTTCAAACAATTAAGAACTTGAATATGTACTACCAGGGCAAGATATCGGAAAA
GGAGTTGACTCAAGACGATCTTCGCTATATCGATAACTTTTCGATTTTTAACGAAAAAAACAAGACGATC
GACATCATCAAAGATAAACGCTTCACTGTAGATAAGTTCCAGTTTCATGTGCCGATTACTATGAACTTCA
AAGCTACCGGGGGTAGCTATATCAACCAAACGGTGTTGGAATACCTGCAGAATAACCCGGAAGTCAAA
ATCATTGGGCTGGACCGCGGAGAACGTCACCTTGTGTACTTGACCTTAATCGATCAGCAAGGCAACATC
TTAAAACAAGAATCGCTGAATACCATTACGGATTCAAAGATTAGCACCCCGTATCATAAGCTGCTCGAT
AACAAGGAGAATGAGCGCGACCTGGCCCGTAAAAACTGGGGCACGGTGGAAAACATTAAGGAGTTAAA
GGAGGGTTATATTTCCCAGGTAGTGCATAAGATCGCCACTCTCATGCTCGAGGAAAATGCGATCGTTGTC
ATGGAAGACTTAAACTTCGGATTTAAACGTGGGCGATTTAAAGTAGAGAAACAAATCTACCAGAAGTTA
GAAAAAATGCTGATTGACAAATTAAATTACTTGGTCCTAAAAGACAAACAGCCGCAAGAATTGGGTGGA
TTATACAACGCCCTCCAACTTACCAATAAATTCGAAAGTTTTCAGAAAATGGGTAAACAGTCAGGCTTTC
TTTTTTATGTTCCTGCGTGGAACACATCCAAAATCGACCCTACAACCGGCTTCGTCAATTACTTCTATACT
AAATATGAAAACGTCGACAAAGCAAAAGCATTCTTTGAAAAGTTCGAAGCAATACGTTTTAACGCTGAG
AAAAAATATTTCGAGTTCGAAGTCAAGAAATACTCAGACTTTAACCCCAAAGCTGAGGGCACACAGCAA
GCGTGGACAATCTGCACCTACGGCGAGCGCATCGAAACGAAGCGTCAAAAAGATCAGAATAACAAATT
TGTTTCAACACCTATCAACCTGACCGAGAAGATTGAAGACTTCTTAGGTAAAAATCAGATTGTTTATGGC
GACGGTAACTGTATAAAATCTCAAATAGCCTCAAAGGATGATAAAGCATTTTTCGAAACATTATTATATT
GGTTCAAAATGACACTGCAGATGCGCAATAGTGAGACGCGTACAGATATTGATTATCTTATCAGCCCGG
TCATGAACGACAACGGTACTTTTTACAACTCCAGAGACTATGAAAAACTTGAGAATCCAACTCTCCCCA
AAGATGCTGATGCGAACGGTGCTTATCACATCGCGAAAAAAGGTCTGATGCTGCTGAACAAAATCGACC
AAGCCGATCTGACTAAGAAAGTTGACCTAAGCATTTCAAATCGGGACTGGTTACAGTTTGTTCAAAAGA
ACAAATGA
SEQ ATGGAACAGGAATATTATCTGGGCTTGGACATGGGCACCGGTTCCGTCGGCTGGGCTGTTACTGACAGT
ID GAATATCACGTTCTAAGAAAGCATGGTAAGGCATTGTGGGGTGTAAGACTTTTCGAATCTGCTTCCACTG
NO: CTGAAGAGCGTAGAATGTTTAGAACGAGTCGACGTAGGCTAGACAGGCGCAATTGGAGAATCGAAATTT
49 TACAAGAAATTTTTGCGGAAGAGATATCTAAGAAAGACCCAGGCTTTTTCCTGAGAATGAAGGAATCTA
AGTATTACCCTGAGGATAAAAGAGATATAAATGGTAACTGTCCCGAATTGCCTTACGCATTATTTGTGGA
CGATGATTTTACCGATAAGGATTACCATAAAAAGTTCCCAACTATCTACCATTTACGCAAAATGTTAATG
AATACAGAGGAAACCCCAGACATAAGACTAGTTTATCTGGCAATACACCATATGATGAAACATAGAGGC
CATTTCTTACTTTCCGGGGATATCAACGAAATCAAAGAGTTTGGTACCACATTTAGTAAGTTACTGGAAA
ACATAAAGAATGAAGAATTGGATTGGAACTTAGAACTCGGAAAAGAAGAATACGCGGTTGTCGAATCT
ATCCTGAAGGATAATATGCTGAATAGGTCGACCAAAAAAACTAGGCTGATCAAAGCACTGAAAGCCAA
ATCTATCTGCGAAAAAGCTGTTTTAAATTTACTTGCTGGTGGCACTGTTAAGTTATCAGACATTTTTGGTT
TGGAAGAATTGAACGAAACCGAGCGTCCAAAAATTAGTTTCGCTGATAATGGCTACGATGATTACATTG
GTGAGGTGGAAAACGAGTTGGGCGAACAATTTTATATTATAGAGACAGCTAAGGCAGTCTATGACTGGG
CTGTTTTAGTAGAAATCCTTGGTAAATACACATCTATCTCCGAAGCGAAAGTTGCTACTTACGAAAAGCA
CAAGTCCGATCTCCAGTTTTTGAAGAAAATTGTCAGGAAATATCTGACTAAGGAAGAATATAAAGATAT
TTTCGTTAGTACCTCTGACAAACTGAAAAATTACTCCGCTTACATCGGGATGACCAAGATTAATGGCAAA
AAAGTTGATCTGCAAAGCAAAAGGTGTTCGAAGGAAGAATTTTATGATTTCATTAAAAAGAATGTCTTA
AAAAAATTAGAAGGTCAGCCAGAATACGAATATTTGAAAGAAGAACTGGAAAGAGAGACATTCTTACC
AAAACAAGTCAACAGAGATAATGGGGTAATTCCATATCAAATTCACCTCTACGAATTAAAAAAAATTTT
AGGCAATTTACGCGATAAAATTGACCTTATCAAAGAAAATGAGGATAAGCTGGTTCAACTCTTTGAATT
CAGAATACCCTATTATGTGGGCCCACTGAACAAGATTGATGACGGCAAAGAAGGTAAATTCACATGGGC
CGTCCGCAAATCCAATGAAAAAATTTACCCATGGAACTTTGAAAATGTAGTAGATATTGAAGCGTCTGC
GGAGAAATTTATTCGAAGAATGACTAATAAATGCACTTACTTGATGGGAGAGGATGTTCTGCCTAAAGA
CAGCTTATTATACAGCAAGTACATGGTTCTAAACGAACTTAACAACGTTAAGTTGGACGGTGAGAAATT
AAGTGTAGAATTGAAACAAAGATTGTATACTGACGTCTTCTGCAAGTACAGAAAAGTGACAGTTAAAAA
AATTAAGAATTACTTGAAGTGCGAAGGTATAATTTCTGGAAACGTAGAGATTACTGGTATTGATGGTGA
TTTCAAAGCATCCCTAACAGCTTACCACGATTTCAAGGAAATCCTGACAGGAACTGAACTCGCAAAAAA
AGATAAAGAAAACATTATTACTAATATTGTTCTTTTCGGTGATGACAAGAAATTGTTGAAGAAAAGACT
GAATAGACTTTACCCCCAGATTACTCCCAATCAACTTAAGAAAATTTGTGCTTTGTCTTACACAGGATGG
GGTCGTTTTTCAAAAAAGTTCTTAGAAGAGATTACCGCACCTGATCCAGAAACAGGCGAAGTATGGAAT
ATAATTACCGCCTTATGGGAATCGAACAATAATCTTATGCAACTTCTGAGCAATGAATATCGTTTCATGG
AAGAAGTTGAGACTTACAACATGGGCAAACAGACGAAGACTTTATCCTATGAAACTGTGGAAAATATGT
ATGTATCACCTTCTGTCAAGAGACAAATTTGGCAAACCTTAAAAATTGTCAAAGAATTAGAAAAGGTAA
TGAAGGAGTCTCCTAAACGTGTGTTTATTGAAATGGCTAGAGAAAAACAAGAGTCAAAAAGAACCGAG
TCAAGAAAGAAGCAGTTAATCGATTTATATAAGGCTTGTAAAAACGAAGAGAAAGATTGGGTTAAAGA
ATTGGGGGACCAAGAGGAACAAAAACTACGGTCGGATAAGTTGTATTTATACTATACGCAAAAGGGAC
GATGTATGTATTCCGGCGAGGTAATAGAATTGAAGGATTTATGGGACAATACAAAATATGACATAGACC
ATATATATCCCCAATCAAAAACGATGGACGATAGCTTGAACAATAGAGTACTCGTGAAAAAAAAATATA
ATGCGACCAAATCTGATAAGTATCCTCTGAATGAAAATATCAGACATGAAAGAAAGGGGTTCTGGAAGT
CCTTGTTAGATGGTGGGTTTATAAGCAAAGAAAAGTACGAGCGTCTAATAAGAAACACGGAGTTATCGC
CAGAAGAACTCGCTGGTTTTATTGAGAGGCAAATCGTGGAAACGAGACAATCTACCAAAGCCGTTGCTG
AGATCCTAAAGCAAGTTTTCCCAGAGTCGGAGATTGTCTATGTCAAAGCTGGCACAGTGAGCAGGTTTA
GGAAAGACTTCGAACTATTAAAGGTAAGAGAAGTGAACGATTTACATCACGCAAAGGACGCTTACCTAA
ATATCGTTGTAGGTAACTCATATTATGTTAAATTTACCAAGAACGCCTCTTGGTTTATAAAGGAGAACCC
AGGTAGAACATATAACCTGAAAAAGATGTTCACCTCTGGTTGGAATATTGAGAGAAACGGAGAAGTCGC
ATGGGAAGTTGGTAAGAAAGGGACTATAGTGACAGTAAAGCAAATTATGAACAAAAATAATATCCTCG
TTACAAGGCAGGTTCATGAAGCAAAGGGCGGCCTTTTTGACCAACAAATTATGAAGAAAGGGAAAGGT
CAAATTGCAATAAAAGAAACCGATGAGAGACTAGCGTCAATAGAAAAGTATGGTGGCTATAATAAAGC
TGCGGGTGCATACTTTATGCTTGTTGAATCAAAAGACAAGAAAGGTAAGACTATTAGAACTATAGAATT
TATACCCCTGTACCTTAAAAACAAAATTGAATCGGATGAGTCAATCGCGTTAAATTTTCTAGAGAAAGG
AAGGGGTTTAAAAGAACCAAAGATCCTGTTAAAAAAGATTAAGATTGACACCTTGTTCGATGTAGATGG
ATTTAAAATGTGGTTATCTGGCAGAACAGGCGATAGACTTTTGTTTAAGTGCGCTAATCAATTAATTTTG
GATGAGAAAATCATTGTCACAATGAAAAAAATAGTTAAGTTTATTCAGAGAAGACAAGAAAACAGGGA
GTTGAAATTATCTGATAAAGATGGTATCGACAATGAAGTTTTAATGGAAATCTACAATACATTCGTTGAT
AAACTTGAAAATACCGTATATCGAATCAGGTTAAGTGAACAAGCCAAAACATTAATTGATAAACAAAAA
GAATTTGAAAGGCTATCACTGGAAGACAAATCCTCCACCCTATTTGAAATTTTGCATATATTCCAGTGCC
AATCTTCAGCAGCTAATTTAAAAATGATTGGCGGACCTGGGAAAGCCGGCATCCTAGTGATGAACAATA
ATATCTCCAAGTGTAACAAAATATCAATTATTAACCAATCTCCGACAGGTATTTTTGAAAATGAAATAGA
CTTGCTTAAGATATAA
SEQ ATGTCTTTCGACTCTTTCACCAACCTGTACTCTCTGTCTAAAACCCTGAAATTCGAAATGCGTCCGGTTGG
ID TAACACCCAGAAAATGCTGGACAACGCGGGTGTTTTCGAAAAAGACAAACTGATCCAGAAAAAATACG
NO: GTAAAACCAAACCGTACTTCGACCGTCTGCACCGTGAATTCATCGAAGAAGCGCTGACCGGTGTTGAAC
50 TGATCGGTCTGGACGAAAACTTCCGTACCCTGGTTGACTGGCAGAAAGACAAAAAAAACAACGTTGCGA
TGAAAGCGTACGAAAACTCTCTGCAGCGTCTGCGTACCGAAATCGGTAAAATCTTCAACCTGAAAGCGG
AAGACTGGGTTAAAAACAAATACCCGATCCTGGGTCTGAAAAACAAAAACACCGACATCCTGTTCGAAG
AAGCGGTTTTCGGTATCCTGAAAGCGCGTTACGGTGAAGAAAAAGACACCTTCATCGAAGTTGAAGAAA
TCGACAAAACCGGTAAATCTAAAATCAACCAGATCTCTATCTTCGACTCTTGGAAAGGTTTCACCGGTTA
CTTCAAAAAATTCTTCGAAACCCGTAAAAACTTCTACAAAAACGACGGTACCTCTACCGCGATCGCGAC
CCGTATCATCGACCAGAACCTGAAACGTTTCATCGACAACCTGTCTATCGTTGAATCTGTTCGTCAGAAA
GTTGACCTGGCGGAAACCGAAAAATCTTTCTCTATCTCTCTGTCTCAGTTCTTCTCTATCGACTTCTACAA
CAAATGCCTGCTGCAGGACGGTATCGACTACTACAACAAAATCATCGGTGGTGAAACCCTGAAAAACGG
TGAAAAACTGATCGGTCTGAACGAACTGATCAACCAGTACCGTCAGAACAACAAAGACCAGAAAATCC
CGTTCTTCAAACTGCTGGACAAACAGATCCTGTCTGAAAAAATCCTGTTCCTGGACGAAATCAAAAACG
ACACCGAACTGATCGAAGCGCTGTCTCAGTTCGCGAAAACCGCGGAAGAAAAAACCAAAATCGTTAAA
AAACTGTTCGCGGACTTCGTTGAAAACAACTCTAAATACGACCTGGCGCAGATCTACATCTCTCAGGAA
GCGTTCAACACCATCTCTAACAAATGGACCTCTGAAACCGAAACCTTCGCGAAATACCTGTTCGAAGCG
ATGAAATCTGGTAAACTGGCGAAATACGAAAAAAAAGACAACTCTTACAAATTCCCGGACTTCATCGCG
CTGTCTCAGATGAAATCTGCGCTGCTGTCTATCTCTCTGGAAGGTCACTTCTGGAAAGAAAAATACTACA
AAATCTCTAAATTCCAGGAAAAAACCAACTGGGAACAGTTCCTGGCGATCTTCCTGTACGAATTCAACT
CTCTGTTCTCTGACAAAATCAACACCAAAGACGGTGAAACCAAACAGGTTGGTTACTACCTGTTCGCGA
AAGACCTGCACAACCTGATCCTGTCTGAACAGATCGACATCCCGAAAGACTCTAAAGTTACCATCAAAG
ACTTCGCGGACTCTGTTCTGACCATCTACCAGATGGCGAAATACTTCGCGGTTGAAAAAAAACGTGCGT
GGCTGGCGGAATACGAACTGGACTCTTTCTACACCCAGCCGGACACCGGTTACCTGCAGTTCTACGACA
ACGCGTACGAAGACATCGTTCAGGTTTACAACAAACTGCGTAACTACCTGACCAAAAAACCGTACTCTG
AAGAAAAATGGAAACTGAACTTCGAAAACTCTACCCTGGCGAACGGTTGGGACAAAAACAAAGAATCT
GACAACTCTGCGGTTATCCTGCAGAAAGGTGGTAAATACTACCTGGGTCTGATCACCAAAGGTCACAAC
AAAATCTTCGACGACCGTTTCCAGGAAAAATTCATCGTTGGTATCGAAGGTGGTAAATACGAAAAAATC
GTTTACAAATTCTTCCCGGACCAGGCGAAAATGTTCCCGAAAGTTTGCTTCTCTGCGAAAGGTCTGGAAT
TCTTCCGTCCGTCTGAAGAAATCCTGCGTATCTACAACAACGCGGAATTCAAAAAAGGTGAAACCTACT
CTATCGACTCTATGCAGAAACTGATCGACTTCTACAAAGACTGCCTGACCAAATACGAAGGTTGGGCGT
GCTACACCTTCCGTCACCTGAAACCGACCGAAGAATACCAGAACAACATCGGTGAATTCTTCCGTGACG
TTGCGGAAGACGGTTACCGTATCGACTTCCAGGGTATCTCTGACCAGTACATCCACGAAAAAAACGAAA
AAGGTGAACTGCACCTGTTCGAAATCCACAACAAAGACTGGAACCTGGACAAAGCGCGTGACGGTAAA
TCTAAAACCACCCAGAAAAACCTGCACACCCTGTACTTCGAATCTCTGTTCTCTAACGACAACGTTGTTC
AGAACTTCCCGATCAAACTGAACGGTCAGGCGGAAATCTTCTACCGTCCGAAAACCGAAAAAGACAAA
CTGGAATCTAAAAAAGACAAAAAAGGTAACAAAGTTATCGACCACAAACGTTACTCTGAAAACAAAAT
CTTCTTCCACGTTCCGCTGACCCTGAACCGTACCAAAAACGACTCTTACCGTTTCAACGCGCAGATCAAC
AACTTCCTGGCGAACAACAAAGACATCAACATCATCGGTGTTGACCGTGGTGAAAAACACCTGGTTTAC
TACTCTGTTATCACCCAGGCGTCTGACATCCTGGAATCTGGTTCTCTGAACGAACTGAACGGTGTTAACT
ACGCGGAAAAACTGGGTAAAAAAGCGGAAAACCGTGAACAGGCGCGTCGTGACTGGCAGGACGTTCAG
GGTATCAAAGACCTGAAAAAAGGTTACATCTCTCAGGTTGTTCGTAAACTGGCGGACCTGGCGATCAAA
CACAACGCGATCATCATCCTGGAAGACCTGAACATGCGTTTCAAACAGGTTCGTGGTGGTATCGAAAAA
TCTATCTACCAGCAGCTGGAAAAAGCGCTGATCGACAAACTGTCTTTCCTGGTTGACAAAGGTGAAAAA
AACCCGGAACAGGCGGGTCACCTGCTGAAAGCGTACCAGCTGTCTGCGCCGTTCGAAACCTTCCAGAAA
ATGGGTAAACAGACCGGTATCATCTTCTACACCCAGGCGTCTTACACCTCTAAATCTGACCCGGTTACCG
GTTGGCGTCCGCACCTGTACCTGAAATACTTCTCTGCGAAAAAAGCGAAAGACGACATCGCGAAATTCA
CCAAAATCGAATTCGTTAACGACCGTTTCGAACTGACCTACGACATCAAAGACTTCCAGCAGGCGAAAG
AATACCCGAACAAAACCGTTTGGAAAGTTTGCTCTAACGTTGAACGTTTCCGTTGGGACAAAAACCTGA
ACCAGAACAAAGGTGGTTACACCCACTACACCAACATCACCGAAAACATCCAGGAACTGTTCACCAAAT
ACGGTATCGACATCACCAAAGACCTGCTGACCCAGATCTCTACCATCGACGAAAAACAGAACACCTCTT
TCTTCCGTGACTTCATCTTCTACTTCAACCTGATCTGCCAGATCCGTAACACCGACGACTCTGAAATCGC
GAAAAAAAACGGTAAAGACGACTTCATCCTGTCTCCGGTTGAACCGTTCTTCGACTCTCGTAAAGACAA
CGGTAACAAACTGCCGGAAAACGGTGACGACAACGGTGCGTACAACATCGCGCGTAAAGGTATCGTTAT
CCTGAACAAAATCTCTCAGTACTCTGAAAAAAACGAAAACTGCGAAAAAATGAAATGGGGTGACCTGT
ACGTTTCTAACATCGACTGGGACAACTTCGTT
SEQ ATGGAAAACTTTAAAAACTTATACCCAATAAACAAAACGTTACGTTTTGAACTGCGTCCATATGGTAAA
ID ACACTGGAAAACTTTAAAAAAAGCGGTTTGTTGGAGAAGGATGCATTTAAAGCGAACTCTCGCAGATCC
NO: ATGCAGGCCATCATTGATGAAAAATTTAAAGAGACGATCGAAGAACGTCTGAAATACACGGAATTTAGT
51 GAGTGTGACTTAGGTAATATGACTTCTAAAGATAAGAAAATCACCGATAAGGCGGCGACCAACCTGAAG
AAGCAAGTCATTTTATCTTTTGATGATGAAATCTTTAACAACTATTTGAAACCGGACAAAAACATCGATG
CCTTATTTAAAAATGACCCTTCGAACCCGGTGATTAGCACATTTAAGGGCTTCACAACGTATTTTGTCAA
TTTTTTTGAAATTCGTAAACATATCTTCAAAGGAGAATCAAGCGGCTCTATGGCTTATCGCATTATTGAT
GAAAACCTGACGACCTATTTGAATAACATTGAAAAAATCAAAAAACTGCCAGAGGAATTAAAGTCTCAG
TTAGAAGGCATCGACCAGATCGACAAACTCAACAACTATAACGAATTTATTACGCAGTCTGGTATCACC
CACTATAATGAAATTATTGGAGGTATCAGTAAATCAGAAAATGTGAAAATCCAAGGGATTAATGAAGGC
ATTAACCTCTATTGCCAGAAAAATAAAGTGAAACTGCCGAGGCTGACTCCACTCTACAAAATGATCCTG
TCTGACCGCGTCTCGAATAGCTTTGTCCTGGACACAATTGAAAACGATACGGAATTGATTGAGATGATA
AGCGATCTGATTAACAAAACCGAAATTTCACAGGATGTAATCATGAGTGATATACAAAACATCTTTATT
AAATATAAACAGCTTGGTAATCTGCCTGGAATTAGCTATTCGTCAATAGTGAACGCAATCTGTTCTGATT
ATGATAACAATTTTGGCGACGGTAAGCGTAAAAAGAGTTATGAAAACGATAGGAAAAAACACCTGGAA
ACTAACGTGTATTCTATCAACTATATCAGCGAACTGCTTACGGACACCGATGTGAGTTCAAACATTAAGA
TGCGGTATAAGGAGCTTGAACAGAACTACCAGGTCTGTAAGGAAAACTTCAACGCAACCAACTGGATGA
ACATTAAAAATATCAAACAATCCGAGAAGACCAACTTAATCAAAGATCTGCTGGATATTTTGAAGAGCA
TTCAACGTTTTTATGATCTGTTCGATATCGTTGATGAAGACAAGAATCCTAGTGCGGAATTTTATACATG
GCTGTCTAAAAATGCGGAGAAATTGGATTTCGAATTCAATTCTGTTTATAATAAATCACGCAACTATTTG
ACCCGCAAACAATACAGCGACAAAAAGATAAAACTAAACTTCGACAGTCCGACATTGGCAAAGGGCTG
GGACGCAAATAAGGAAATCGATAACTCTACGATAATTATGCGTAAGTTCAATAATGATCGAGGTGATTA
TGATTATTTCTTAGGCATTTGGAACAAAAGCACCCCGGCCAACGAAAAGATAATTCCACTGGAGGATAA
CGGTCTGTTCGAAAAAATGCAGTACAAATTATATCCGGATCCAAGCAAGATGCTTCCAAAGCAGTTTCT
GTCTAAAATTTGGAAAGCTAAGCATCCGACCACCCCAGAATTTGACAAGAAATATAAGGAAGGCCGCCA
TAAGAAAGGTCCCGATTTTGAAAAAGAATTCTTGCACGAACTGATTGATTGCTTTAAACATGGCTTAGTC
AATCACGATGAAAAGTATCAAGATGTTTTTGGATTCAATTTGAGAAACACAGAAGACTACAATTCCTAC
ACTGAGTTTCTCGAAGATGTGGAACGATGTAATTATAATCTGAGCTTTAACAAAATCGCGGACACCTCG
AATCTGATTAACGATGGTAAACTTTATGTTTTCCAGATCTGGAGCAAGGATTTCTCTATTGACAGCAAAG
GCACCAAAAACCTGAACACCATTTACTTTGAAAGTCTCTTCAGCGAAGAAAATATGATTGAGAAAATGT
TTAAACTTAGCGGTGAAGCTGAAATATTCTATCGCCCGGCAAGCCTGAACTATTGCGAAGACATTATCA
AAAAGGGTCATCACCACGCTGAACTGAAAGATAAATTTGATTATCCTATCATAAAAGATAAACGCTATA
GCCAGGATAAATTTTTTTTTCATGTTCCTATGGTCATTAACTACAAATCAGAAAAACTGAACTCTAAAAG
CCTCAATAATCGAACCAATGAAAACCTTGGGCAGTTTACCCATATAATTGGAATTGATCGCGGAGAGCG
TCATTTAATCTACCTGACCGTAGTCGATGTATCGACCGGCGAGATCGTCGAGCAGAAGCACTTAGACGA
GATTATCAACACTGATACCAAAGGTGTTGAGCATAAGACGCACTATCTAAACAAGCTGGAGGAAAAATC
GAAAACCCGTGATAATGAACGTAAGAGTTGGGAGGCAATTGAAACGATTAAAGAACTGAAGGAGGGTT
ATATCAGCCACGTAATCAATGAAATTCAAAAACTGCAGGAAAAATACAACGCCCTGATCGTTATGGAAA
ATCTGAATTACGGTTTCAAAAATTCTCGCATCAAAGTGGAAAAACAGGTATATCAGAAGTTCGAGACGG
CATTAATTAAAAAGTTTAATTACATCATTGACAAAAAAGATCCGGAAACTTATATTCATGGCTATCAGCT
GACGAACCCGATCACCACACTGGATAAAATTGGTAACCAGTCTGGTATCGTGCTTTACATCCCTGCCTGG
AATACCAGTAAAATCGATCCGGTAACGGGATTCGTCAACCTTCTATATGCAGATGACCTCAAATATAAG
AATCAGGAACAGGCCAAGTCTTTTATTCAGAAAATCGATAACATTTACTTTGAGAATGGGGAATTCAAA
TTTGATATTGATTTTTCTAAATGGAACAATCGTTATAGTATATCTAAGACGAAATGGACGCTCACCTCGT
ACGGAACCCGAATCCAGACATTCCGCAATCCGCAGAAGAACAATAAATGGGACAGCGCCGAGTATGAT
CTCACTGAAGAATTCAAATTGATTCTGAACATTGACGGTACCCTGAAAAGCCAGGATGTCGAAACCTAT
AAAAAATTTATGTCTCTGTTCAAGCTGATGCTGCAACTTAGGAACTCTGTTACCGGCACTGATATCGATT
ATATGATCTCCCCTGTCACTGATAAAACAGGTACGCATTTCGATTCGCGCGAAAATATCAAAAATCTGCC
CGCAGATGCCGACGCCAATGGGGCGTACAATATTGCACGCAAGGGTATCATGGCGATCGAAAACATTAT
GAATGGTATCAGCGACCCGCTGAAAATCTCAAACGAAGATTATTTGAAATATATCCAAAACCAGCAGGA
ATAA
SEQ ATGACCCAGTTCGAAGGTTTCACCAACCTGTACCAGGTTTCTAAAACCCTGCGTTTCGAACTGATCCCGC
ID AGGGTAAAACCCTGAAACACATCCAGGAACAGGGTTTCATCGAAGAAGACAAAGCGCGTAACGACCAC
NO: TACAAAGAACTGAAACCGATCATCGACCGTATCTACAAAACCTACGCGGACCAGTGCCTGCAGCTGGTT
52 CAGCTGGACTGGGAAAACCTGTCTGCGGCGATCGACTCTTACCGTAAAGAAAAAACCGAAGAAACCCGT
AACGCGCTGATCGAAGAACAGGCGACCTACCGTAACGCGATCCACGACTACTTCATCGGTCGTACCGAC
AACCTGACCGACGCGATCAACAAACGTCACGCGGAAATCTACAAAGGTCTGTTCAAAGCGGAACTGTTC
AACGGTAAAGTTCTGAAACAGCTGGGTACCGTTACCACCACCGAACACGAAAACGCGCTGCTGCGTTCT
TTCGACAAATTCACCACCTACTTCTCTGGTTTCTACGAAAACCGTAAAAACGTTTTCTCTGCGGAAGACA
TCTCTACCGCGATCCCGCACCGTATCGTTCAGGACAACTTCCCGAAATTCAAAGAAAACTGCCACATCTT
CACCCGTCTGATCACCGCGGTTCCGTCTCTGCGTGAACACTTCGAAAACGTTAAAAAAGCGATCGGTATC
TTCGTTTCTACCTCTATCGAAGAAGTTTTCTCTTTCCCGTTCTACAACCAGCTGCTGACCCAGACCCAGAT
CGACCTGTACAACCAGCTGCTGGGTGGTATCTCTCGTGAAGCGGGTACCGAAAAAATCAAAGGTCTGAA
CGAAGTTCTGAACCTGGCGATCCAGAAAAACGACGAAACCGCGCACATCATCGCGTCTCTGCCGCACCG
TTTCATCCCGCTGTTCAAACAGATCCTGTCTGACCGTAACACCCTGTCTTTCATCCTGGAAGAATTCAAA
TCTGACGAAGAAGTTATCCAGTCTTTCTGCAAATACAAAACCCTGCTGCGTAACGAAAACGTTCTGGAA
ACCGCGGAAGCGCTGTTCAACGAACTGAACTCTATCGACCTGACCCACATCTTCATCTCTCACAAAAAA
CTGGAAACCATCTCTTCTGCGCTGTGCGACCACTGGGACACCCTGCGTAACGCGCTGTACGAACGTCGTA
TCTCTGAACTGACCGGTAAAATCACCAAATCTGCGAAAGAAAAAGTTCAGCGTTCTCTGAAACACGAAG
ACATCAACCTGCAGGAAATCATCTCTGCGGCGGGTAAAGAACTGTCTGAAGCGTTCAAACAGAAAACCT
CTGAAATCCTGTCTCACGCGCACGCGGCGCTGGACCAGCCGCTGCCGACCACCCTGAAAAAACAGGAAG
AAAAAGAAATCCTGAAATCTCAGCTGGACTCTCTGCTGGGTCTGTACCACCTGCTGGACTGGTTCGCGGT
TGACGAATCTAACGAAGTTGACCCGGAATTCTCTGCGCGTCTGACCGGTATCAAACTGGAAATGGAACC
GTCTCTGTCTTTCTACAACAAAGCGCGTAACTACGCGACCAAAAAACCGTACTCTGTTGAAAAATTCAA
ACTGAACTTCCAGATGCCGACCCTGGCGTCTGGTTGGGACGTTAACAAAGAAAAAAACAACGGTGCGAT
CCTGTTCGTTAAAAACGGTCTGTACTACCTGGGTATCATGCCGAAACAGAAAGGTCGTTACAAAGCGCT
GTCTTTCGAACCGACCGAAAAAACCTCTGAAGGTTTCGACAAAATGTACTACGACTACTTCCCGGACGC
GGCGAAAATGATCCCGAAATGCTCTACCCAGCTGAAAGCGGTTACCGCGCACTTCCAGACCCACACCAC
CCCGATCCTGCTGTCTAACAACTTCATCGAACCGCTGGAAATCACCAAAGAAATCTACGACCTGAACAA
CCCGGAAAAAGAACCGAAAAAATTCCAGACCGCGTACGCGAAAAAAACCGGTGACCAGAAAGGTTACC
GTGAAGCGCTGTGCAAATGGATCGACTTCACCCGTGACTTCCTGTCTAAATACACCAAAACCACCTCTAT
CGACCTGTCTTCTCTGCGTCCGTCTTCTCAGTACAAAGACCTGGGTGAATACTACGCGGAACTGAACCCG
CTGCTGTACCACATCTCTTTCCAGCGTATCGCGGAAAAAGAAATCATGGACGCGGTTGAAACCGGTAAA
CTGTACCTGTTCCAGATCTACAACAAAGACTTCGCGAAAGGTCACCACGGTAAACCGAACCTGCACACC
CTGTACTGGACCGGTCTGTTCTCTCCGGAAAACCTGGCGAAAACCTCTATCAAACTGAACGGTCAGGCG
GAACTGTTCTACCGTCCGAAATCTCGTATGAAACGTATGGCGCACCGTCTGGGTGAAAAAATGCTGAAC
AAAAAACTGAAAGACCAGAAAACCCCGATCCCGGACACCCTGTACCAGGAACTGTACGACTACGTTAA
CCACCGTCTGTCTCACGACCTGTCTGACGAAGCGCGTGCGCTGCTGCCGAACGTTATCACCAAAGAAGTT
TCTCACGAAATCATCAAAGACCGTCGTTTCACCTCTGACAAATTCTTCTTCCACGTTCCGATCACCCTGA
ACTACCAGGCGGCGAACTCTCCGTCTAAATTCAACCAGCGTGTTAACGCGTACCTGAAAGAACACCCGG
AAACCCCGATCATCGGTATCGACCGTGGTGAACGTAACCTGATCTACATCACCGTTATCGACTCTACCGG
TAAAATCCTGGAACAGCGTTCTCTGAACACCATCCAGCAGTTCGACTACCAGAAAAAACTGGACAACCG
TGAAAAAGAACGTGTTGCGGCGCGTCAGGCGTGGTCTGTTGTTGGTACCATCAAAGACCTGAAACAGGG
TTACCTGTCTCAGGTTATCCACGAAATCGTTGACCTGATGATCCACTACCAGGCGGTTGTTGTTCTGGAA
AACCTGAACTTCGGTTTCAAATCTAAACGTACCGGTATCGCGGAAAAAGCGGTTTACCAGCAGTTCGAA
AAAATGCTGATCGACAAACTGAACTGCCTGGTTCTGAAAGACTACCCGGCGGAAAAAGTTGGTGGTGTT
CTGAACCCGTACCAGCTGACCGACCAGTTCACCTCTTTCGCGAAAATGGGTACCCAGTCTGGTTTCCTGT
TCTACGTTCCGGCGCCGTACACCTCTAAAATCGACCCGCTGACCGGTTTCGTTGACCCGTTCGTTTGGAA
AACCATCAAAAACCACGAATCTCGTAAACACTTCCTGGAAGGTTTCGACTTCCTGCACTACGACGTTAA
AACCGGTGACTTCATCCTGCACTTCAAAATGAACCGTAACCTGTCTTTCCAGCGTGGTCTGCCGGGTTTC
ATGCCGGCGTGGGACATCGTTTTCGAAAAAAACGAAACCCAGTTCGACGCGAAAGGTACCCCGTTCATC
GCGGGTAAACGTATCGTTCCGGTTATCGAAAACCACCGTTTCACCGGTCGTTACCGTGACCTGTACCCGG
CGAACGAACTGATCGCGCTGCTGGAAGAAAAAGGTATCGTTTTCCGTGACGGTTCTAACATCCTGCCGA
AACTGCTGGAAAACGACGACTCTCACGCGATCGACACCATGGTTGCGCTGATCCGTTCTGTTCTGCAGAT
GCGTAACTCTAACGCGGCGACCGGTGAAGACTACATCAACTCTCCGGTTCGTGACCTGAACGGTGTTTG
CTTCGACTCTCGTTTCCAGAACCCGGAATGGCCGATGGACGCGGACGCGAACGGTGCGTACCACATCGC
GCTGAAAGGTCAGCTGCTGCTGAACCACCTGAAAGAATCTAAAGACCTGAAACTGCAGAACGGTATCTC
TAACCAGGACTGGCTGGCGTACATCCAGGAACTGCGTAACTA
SEQ ATGGCGGTTAAATCTATCAAAGTTAAACTGCGTCTGGACGACATGCCGGAAATCCGTGCGGGTCTGTGG
ID AAACTGCACAAAGAAGTTAACGCGGGTGTTCGTTACTACACCGAATGGCTGTCTCTGCTGCGTCAGGAA
NO: AACCTGTACCGTCGTTCTCCGAACGGTGACGGTGAACAGGAATGCGACAAAACCGCGGAAGAATGCAA
53 AGCGGAACTGCTGGAACGTCTGCGTGCGCGTCAGGTTGAAAACGGTCACCGTGGTCCGGCGGGTTCTGA
CGACGAACTGCTGCAGCTGGCGCGTCAGCTGTACGAACTGCTGGTTCCGCAGGCGATCGGTGCGAAAGG
TGACGCGCAGCAGATCGCGCGTAAATTCCTGTCTCCGCTGGCGGACAAAGACGCGGTTGGTGGTCTGGG
TATCGCGAAAGCGGGTAACAAACCGCGTTGGGTTCGTATGCGTGAAGCGGGTGAACCGGGTTGGGAAG
AAGAAAAAGAAAAAGCGGAAACCCGTAAATCTGCGGACCGTACCGCGGACGTTCTGCGTGCGCTGGCG
GACTTCGGTCTGAAACCGCTGATGCGTGTTTACACCGACTCTGAAATGTCTTCTGTTGAATGGAAACCGC
TGCGTAAAGGTCAGGCGGTTCGTACCTGGGACCGTGACATGTTCCAGCAGGCGATCGAACGTATGATGT
CTTGGGAATCTTGGAACCAGCGTGTTGGTCAGGAATACGCGAAACTGGTTGAACAGAAAAACCGTTTCG
AACAGAAAAACTTCGTTGGTCAGGAACACCTGGTTCACCTGGTTAACCAGCTGCAGCAGGACATGAAAG
AAGCGTCTCCGGGTCTGGAATCTAAAGAACAGACCGCGCACTACGTTACCGGTCGTGCGCTGCGTGGTT
CTGACAAAGTTTTCGAAAAATGGGGTAAACTGGCGCCGGACGCGCCGTTCGACCTGTACGACGCGGAAA
TCAAAAACGTTCAGCGTCGTAACACCCGTCGTTTCGGTTCTCACGACCTGTTCGCGAAACTGGCGGAACC
GGAATACCAGGCGCTGTGGCGTGAAGACGCGTCTTTCCTGACCCGTTACGCGGTTTACAACTCTATCCTG
CGTAAACTGAACCACGCGAAAATGTTCGCGACCTTCACCCTGCCGGACGCGACCGCGCACCCGATCTGG
ACCCGTTTCGACAAACTGGGTGGTAACCTGCACCAGTACACCTTCCTGTTCAACGAATTCGGTGAACGTC
GTCACGCGATCCGTTTCCACAAACTGCTGAAAGTTGAAAACGGTGTTGCGCGTGAAGTTGACGACGTTA
CCGTTCCGATCTCTATGTCTGAACAGCTGGACAACCTGCTGCCGCGTGACCCGAACGAACCGATCGCGCT
GTACTTCCGTGACTACGGTGCGGAACAGCACTTCACCGGTGAATTCGGTGGTGCGAAAATCCAGTGCCG
TCGTGACCAGCTGGCGCACATGCACCGTCGTCGTGGTGCGCGTGACGTTTACCTGAACGTTTCTGTTCGT
GTTCAGTCTCAGTCTGAAGCGCGTGGTGAACGTCGTCCGCCGTACGCGGCGGTTTTCCGTCTGGTTGGTG
ACAACCACCGTGCGTTCGTTCACTTCGACAAACTGTCTGACTACCTGGCGGAACACCCGGACGACGGTA
AACTGGGTTCTGAAGGTCTGCTGTCTGGTCTGCGTGTTATGTCTGTTGACCTGGGTCTGCGTACCTCTGCG
TCTATCTCTGTTTTCCGTGTTGCGCGTAAAGACGAACTGAAACCGAACTCTAAAGGTCGTGTTCCGTTCT
TCTTCCCGATCAAAGGTAACGACAACCTGGTTGCGGTTCACGAACGTTCTCAGCTGCTGAAACTGCCGG
GTGAAACCGAATCTAAAGACCTGCGTGCGATCCGTGAAGAACGTCAGCGTACCCTGCGTCAGCTGCGTA
CCCAGCTGGCGTACCTGCGTCTGCTGGTTCGTTGCGGTTCTGAAGACGTTGGTCGTCGTGAACGTTCTTG
GGCGAAACTGATCGAACAGCCGGTTGACGCGGCGAACCACATGACCCCGGACTGGCGTGAAGCGTTCG
AAAACGAACTGCAGAAACTGAAATCTCTGCACGGTATCTGCTCTGACAAAGAATGGATGGACGCGGTTT
ACGAATCTGTTCGTCGTGTTTGGCGTCACATGGGTAAACAGGTTCGTGACTGGCGTAAAGACGTTCGTTC
TGGTGAACGTCCGAAAATCCGTGGTTACGCGAAAGACGTTGTTGGTGGTAACTCTATCGAACAGATCGA
ATACCTGGAACGTCAGTACAAATTCCTGAAATCTTGGTCTTTCTTCGGTAAAGTTTCTGGTCAGGTTATC
CGTGCGGAAAAAGGTTCTCGTTTCGCGATCACCCTGCGTGAACACATCGACCACGCGAAAGAAGACCGT
CTGAAAAAACTGGCGGACCGTATCATCATGGAAGCGCTGGGTTACGTTTACGCGCTGGACGAACGTGGT
AAAGGTAAATGGGTTGCGAAATACCCGCCGTGCCAGCTGATCCTGCTGGAAGAACTGTCTGAATACCAG
TTCAACAACGACCGTCCGCCGTCTGAAAACAACCAGCTGATGCAGTGGTCTCACCGTGGTGTTTTCCAGG
AACTGATCAACCAGGCGCAGGTTCACGACCTGCTGGTTGGTACCATGTACGCGGCGTTCTCTTCTCGTTT
CGACGCGCGTACCGGTGCGCCGGGTATCCGTTGCCGTCGTGTTCCGGCGCGTTGCACCCAGGAACACAA
CCCGGAACCGTTCCCGTGGTGGCTGAACAAATTCGTTGTTGAACACACCCTGGACGCGTGCCCGCTGCGT
GCGGACGACCTGATCCCGACCGGTGAAGGTGAAATCTTCGTTTCTCCGTTCTCTGCGGAAGAAGGTGAC
TTCCACCAGATCCACGCGGACCTGAACGCGGCGCAGAACCTGCAGCAGCGTCTGTGGTCTGACTTCGAC
ATCTCTCAGATCCGTCTGCGTTGCGACTGGGGTGAAGTTGACGGTGAACTGGTTCTGATCCCGCGTCTGA
CCGGTAAACGTACCGCGGACTCTTACTCTAACAAAGTTTTCTACACCAACACCGGTGTTACCTACTACGA
ACGTGAACGTGGTAAAAAACGTCGTAAAGTTTTCGCGCAGGAAAAACTGTCTGAAGAAGAAGCGGAAC
TGCTGGTTGAAGCGGACGAAGCGCGTGAAAAATCTGTTGTTCTGATGCGTGACCCGTCTGGTATCATCA
ACCGTGGTAACTGGACCCGTCAGAAAGAATTCTGGTCTATGGTTAACCAGCGTATCGAAGGTTACCTGG
TTAAACAGATCCGTTCTCGTGTTCCGCTGCAGGACTCTGCGTGCGAAAACACCGGTGACATCTAA
SEQ ATGGCGACCCGTTCTTTCATCCTGAAAATCGAACCGAACGAAGAAGTTAAAAAAGGTCTGTGGAAAACC
ID CACGAAGTTCTGAACCACGGTATCGCGTACTACATGAACATCCTGAAACTGATCCGTCAGGAAGCGATC
NO: TACGAACACCACGAACAGGACCCGAAAAACCCGAAAAAAGTTTCTAAAGCGGAAATCCAGGCGGAACT
54 GTGGGACTTCGTTCTGAAAATGCAGAAATGCAACTCTTTCACCCACGAAGTTGACAAAGACGTTGTTTTC
AACATCCTGCGTGAACTGTACGAAGAACTGGTTCCGTCTTCTGTTGAAAAAAAAGGTGAAGCGAACCAG
CTGTCTAACAAATTCCTGTACCCGCTGGTTGACCCGAACTCTCAGTCTGGTAAAGGTACCGCGTCTTCTG
GTCGTAAACCGCGTTGGTACAACCTGAAAATCGCGGGTGACCCGTCTTGGGAAGAAGAAAAAAAAAAA
TGGGAAGAAGACAAAAAAAAAGACCCGCTGGCGAAAATCCTGGGTAAACTGGCGGAATACGGTCTGAT
CCCGCTGTTCATCCCGTTCACCGACTCTAACGAACCGATCGTTAAAGAAATCAAATGGATGGAAAAATC
TCGTAACCAGTCTGTTCGTCGTCTGGACAAAGACATGTTCATCCAGGCGCTGGAACGTTTCCTGTCTTGG
GAATCTTGGAACCTGAAAGTTAAAGAAGAATACGAAAAAGTTGAAAAAGAACACAAAACCCTGGAAGA
ACGTATCAAAGAAGACATCCAGGCGTTCAAATCTCTGGAACAGTACGAAAAAGAACGTCAGGAACAGC
TGCTGCGTGACACCCTGAACACCAACGAATACCGTCTGTCTAAACGTGGTCTGCGTGGTTGGCGTGAAA
TCATCCAGAAATGGCTGAAAATGGACGAAAACGAACCGTCTGAAAAATACCTGGAAGTTTTCAAAGACT
ACCAGCGTAAACACCCGCGTGAAGCGGGTGACTACTCTGTTTACGAATTCCTGTCTAAAAAAGAAAACC
ACTTCATCTGGCGTAACCACCCGGAATACCCGTACCTGTACGCGACCTTCTGCGAAATCGACAAAAAAA
AAAAAGACGCGAAACAGCAGGCGACCTTCACCCTGGCGGACCCGATCAACCACCCGCTGTGGGTTCGTT
TCGAAGAACGTTCTGGTTCTAACCTGAACAAATACCGTATCCTGACCGAACAGCTGCACACCGAAAAAC
TGAAAAAAAAACTGACCGTTCAGCTGGACCGTCTGATCTACCCGACCGAATCTGGTGGTTGGGAAGAAA
AAGGTAAAGTTGACATCGTTCTGCTGCCGTCTCGTCAGTTCTACAACCAGATCTTCCTGGACATCGAAGA
AAAAGGTAAACACGCGTTCACCTACAAAGACGAATCTATCAAATTCCCGCTGAAAGGTACCCTGGGTGG
TGCGCGTGTTCAGTTCGACCGTGACCACCTGCGTCGTTACCCGCACAAAGTTGAATCTGGTAACGTTGGT
CGTATCTACTTCAACATGACCGTTAACATCGAACCGACCGAATCTCCGGTTTCTAAATCTCTGAAAATCC
ACCGTGACGACTTCCCGAAATTCGTTAACTTCAAACCGAAAGAACTGACCGAATGGATCAAAGACTCTA
AAGGTAAAAAACTGAAATCTGGTATCGAATCTCTGGAAATCGGTCTGCGTGTTATGTCTATCGACCTGG
GTCAGCGTCAGGCGGCGGCGGCGTCTATCTTCGAAGTTGTTGACCAGAAACCGGACATCGAAGGTAAAC
TGTTCTTCCCGATCAAAGGTACCGAACTGTACGCGGTTCACCGTGCGTCTTTCAACATCAAACTGCCGGG
TGAAACCCTGGTTAAATCTCGTGAAGTTCTGCGTAAAGCGCGTGAAGACAACCTGAAACTGATGAACCA
GAAACTGAACTTCCTGCGTAACGTTCTGCACTTCCAGCAGTTCGAAGACATCACCGAACGTGAAAAACG
TGTTACCAAATGGATCTCTCGTCAGGAAAACTCTGACGTTCCGCTGGTTTACCAGGACGAACTGATCCAG
ATCCGTGAACTGATGTACAAACCGTACAAAGACTGGGTTGCGTTCCTGAAACAGCTGCACAAACGTCTG
GAAGTTGAAATCGGTAAAGAAGTTAAACACTGGCGTAAATCTCTGTCTGACGGTCGTAAAGGTCTGTAC
GGTATCTCTCTGAAAAACATCGACGAAATCGACCGTACCCGTAAATTCCTGCTGCGTTGGTCTCTGCGTC
CGACCGAACCGGGTGAAGTTCGTCGTCTGGAACCGGGTCAGCGTTTCGCGATCGACCAGCTGAACCACC
TGAACGCGCTGAAAGAAGACCGTCTGAAAAAAATGGCGAACACCATCATCATGCACGCGCTGGGTTACT
GCTACGACGTTCGTAAAAAAAAATGGCAGGCGAAAAACCCGGCGTGCCAGATCATCCTGTTCGAAGACC
TGTCTAACTACAACCCGTACGAAGAACGTTCTCGTTTCGAAAACTCTAAACTGATGAAATGGTCTCGTCG
TGAAATCCCGCGTCAGGTTGCGCTGCAGGGTGAAATCTACGGTCTGCAGGTTGGTGAAGTTGGTGCGCA
GTTCTCTTCTCGTTTCCACGCGAAAACCGGTTCTCCGGGTATCCGTTGCTCTGTTGTTACCAAAGAAAAA
CTGCAGGACAACCGTTTCTTCAAAAACCTGCAGCGTGAAGGTCGTCTGACCCTGGACAAAATCGCGGTT
CTGAAAGAAGGTGACCTGTACCCGGACAAAGGTGGTGAAAAATTCATCTCTCTGTCTAAAGACCGTAAA
CTGGTTACCACCCACGCGGACATCAACGCGGCGCAGAACCTGCAGAAACGTTTCTGGACCCGTACCCAC
GGTTTCTACAAAGTTTACTGCAAAGCGTACCAGGTTGACGGTCAGACCGTTTACATCCCGGAATCTAAA
GACCAGAAACAGAAAATCATCGAAGAATTCGGTGAAGGTTACTTCATCCTGAAAGACGGTGTTTACGAA
TGGGGTAACGCGGGTAAACTGAAAATCAAAAAAGGTTCTTCTAAACAGTCTTCTTCTGAACTGGTTGAC
TCTGACATCCTGAAAGACTCTTTCGACCTGGCGTCTGAACTGAAAGGTGAAAAACTGATGCTGTACCGT
GACCCGTCTGGTAACGTTTTCCCGTCTGACAAATGGATGGCGGCGGGTGTTTTCTTCGGTAAACTGGAAC
GTATCCTGATCTCTAAACTGACCAACCAGTACTCTATCTCTACCATCGAAGACGACTCTTCTAAACAGTC
TATGTAA
SEQ ATGCCGACCCGTACCATCAACCTGAAACTGGTTCTGGGTAAAAACCCGGAAAACGCGACCCTGCGTCGT
ID GCGCTGTTCTCTACCCACCGTCTGGTTAACCAGGCGACCAAACGTATCGAAGAATTCCTGCTGCTGTGCC
NO: GTGGTGAAGCGTACCGTACCGTTGACAACGAAGGTAAAGAAGCGGAAATCCCGCGTCACGCGGTTCAG
55 GAAGAAGCGCTGGCGTTCGCGAAAGCGGCGCAGCGTCACAACGGTTGCATCTCTACCTACGAAGACCAG
GAAATCCTGGACGTTCTGCGTCAGCTGTACGAACGTCTGGTTCCGTCTGTTAACGAAAACAACGAAGCG
GGTGACGCGCAGGCGGCGAACGCGTGGGTTTCTCCGCTGATGTCTGCGGAATCTGAAGGTGGTCTGTCT
GTTTACGACAAAGTTCTGGACCCGCCGCCGGTTTGGATGAAACTGAAAGAAGAAAAAGCGCCGGGTTGG
GAAGCGGCGTCTCAGATCTGGATCCAGTCTGACGAAGGTCAGTCTCTGCTGAACAAACCGGGTTCTCCG
CCGCGTTGGATCCGTAAACTGCGTTCTGGTCAGCCGTGGCAGGACGACTTCGTTTCTGACCAGAAAAAA
AAACAGGACGAACTGACCAAAGGTAACGCGCCGCTGATCAAACAGCTGAAAGAAATGGGTCTGCTGCC
GCTGGTTAACCCGTTCTTCCGTCACCTGCTGGACCCGGAAGGTAAAGGTGTTTCTCCGTGGGACCGTCTG
GCGGTTCGTGCGGCGGTTGCGCACTTCATCTCTTGGGAATCTTGGAACCACCGTACCCGTGCGGAATACA
ACTCTCTGAAACTGCGTCGTGACGAATTCGAAGCGGCGTCTGACGAATTCAAAGACGACTTCACCCTGC
TGCGTCAGTACGAAGCGAAACGTCACTCTACCCTGAAATCTATCGCGCTGGCGGACGACTCTAACCCGT
ACCGTATCGGTGTTCGTTCTCTGCGTGCGTGGAACCGTGTTCGTGAAGAATGGATCGACAAAGGTGCGA
CCGAAGAACAGCGTGTTACCATCCTGTCTAAACTGCAGACCCAGCTGCGTGGTAAATTCGGTGACCCGG
ACCTGTTCAACTGGCTGGCGCAGGACCGTCACGTTCACCTGTGGTCTCCGCGTGACTCTGTTACCCCGCT
GGTTCGTATCAACGCGGTTGACAAAGTTCTGCGTCGTCGTAAACCGTACGCGCTGATGACCTTCGCGCAC
CCGCGTTTCCACCCGCGTTGGATCCTGTACGAAGCGCCGGGTGGTTCTAACCTGCGTCAGTACGCGCTGG
ACTGCACCGAAAACGCGCTGCACATCACCCTGCCGCTGCTGGTTGACGACGCGCACGGTACCTGGATCG
AAAAAAAAATCCGTGTTCCGCTGGCGCCGTCTGGTCAGATCCAGGACCTGACCCTGGAAAAACTGGAAA
AAAAAAAAAACCGTCTGTACTACCGTTCTGGTTTCCAGCAGTTCGCGGGTCTGGCGGGTGGTGCGGAAG
TTCTGTTCCACCGTCCGTACATGGAACACGACGAACGTTCTGAAGAATCTCTGCTGGAACGTCCGGGTGC
GGTTTGGTTCAAACTGACCCTGGACGTTGCGACCCAGGCGCCGCCGAACTGGCTGGACGGTAAAGGTCG
TGTTCGTACCCCGCCGGAAGTTCACCACTTCAAAACCGCGCTGTCTAACAAATCTAAACACACCCGTACC
CTGCAGCCGGGTCTGCGTGTTCTGTCTGTTGACCTGGGTATGCGTACCTTCGCGTCTTGCTCTGTTTTCGA
ACTGATCGAAGGTAAACCGGAAACCGGTCGTGCGTTCCCGGTTGCGGACGAACGTTCTATGGACTCTCC
GAACAAACTGTGGGCGAAACACGAACGTTCTTTCAAACTGACCCTGCCGGGTGAAACCCCGTCTCGTAA
AGAAGAAGAAGAACGTTCTATCGCGCGTGCGGAAATCTACGCGCTGAAACGTGACATCCAGCGTCTGAA
ATCTCTGCTGCGTCTGGGTGAAGAAGACAACGACAACCGTCGTGACGCGCTGCTGGAACAGTTCTTCAA
AGGTTGGGGTGAAGAAGACGTTGTTCCGGGTCAGGCGTTCCCGCGTTCTCTGTTCCAGGGTCTGGGTGCG
GCGCCGTTCCGTTCTACCCCGGAACTGTGGCGTCAGCACTGCCAGACCTACTACGACAAAGCGGAAGCG
TGCCTGGCGAAACACATCTCTGACTGGCGTAAACGTACCCGTCCGCGTCCGACCTCTCGTGAAATGTGGT
ACAAAACCCGTTCTTACCACGGTGGTAAATCTATCTGGATGCTGGAATACCTGGACGCGGTTCGTAAACT
GCTGCTGTCTTGGTCTCTGCGTGGTCGTACCTACGGTGCGATCAACCGTCAGGACACCGCGCGTTTCGGT
TCTCTGGCGTCTCGTCTGCTGCACCACATCAACTCTCTGAAAGAAGACCGTATCAAAACCGGTGCGGACT
CTATCGTTCAGGCGGCGCGTGGTTACATCCCGCTGCCGCACGGTAAAGGTTGGGAACAGCGTTACGAAC
CGTGCCAGCTGATCCTGTTCGAAGACCTGGCGCGTTACCGTTTCCGTGTTGACCGTCCGCGTCGTGAAAA
CTCTCAGCTGATGCAGTGGAACCACCGTGCGATCGTTGCGGAAACCACCATGCAGGCGGAACTGTACGG
TCAGATCGTTGAAAACACCGCGGCGGGTTTCTCTTCTCGTTTCCACGCGGCGACCGGTGCGCCGGGTGTT
CGTTGCCGTTTCCTGCTGGAACGTGACTTCGACAACGACCTGCCGAAACCGTACCTGCTGCGTGAACTGT
CTTGGATGCTGGGTAACACCAAAGTTGAATCTGAAGAAGAAAAACTGCGTCTGCTGTCTGAAAAAATCC
GTCCGGGTTCTCTGGTTCCGTGGGACGGTGGTGAACAGTTCGCGACCCTGCACCCGAAACGTCAGACCC
TGTGCGTTATCCACGCGGACATGAACGCGGCGCAGAACCTGCAGCGTCGTTTCTTCGGTCGTTGCGGTGA
AGCGTTCCGTCTGGTTTGCCAGCCGCACGGTGACGACGTTCTGCGTCTGGCGTCTACCCCGGGTGCGCGT
CTGCTGGGTGCGCTGCAGCAGCTGGAAAACGGTCAGGGTGCGTTCGAACTGGTTCGTGACATGGGTTCT
ACCTCTCAGATGAACCGTTTCGTTATGAAATCTCTGGGTAAAAAAAAAATCAAACCGCTGCAGGACAAC
AACGGTGACGACGAACTGGAAGACGTTCTGTCTGTTCTGCCGGAAGAAGACGACACCGGTCGTATCACC
GTTTTCCGTGACTCTTCTGGTATCTTCTTCCCGTGCAACGTTTGGATCCCGGCGAAACAGTTCTGGCCGGC
GGTTCGTGCGATGATCTGGAAAGTTATGGCGTCTCACTCTCTGGGTTAA
SEQ ATGACCAAACTGCGTCACCGTCAGAAAAAACTGACCCACGACTGGGCGGGTTCTAAAAAACGTGAAGTT
ID CTGGGTTCTAACGGTAAACTGCAGAACCCGCTGCTGATGCCGGTTAAAAAAGGTCAGGTTACCGAATTC
NO: CGTAAAGCGTTCTCTGCGTACGCGCGTGCGACCAAAGGTGAAATGACCGACGGTCGTAAAAACATGTTC
56 ACCCACTCTTTCGAACCGTTCAAAACCAAACCGTCTCTGCACCAGTGCGAACTGGCGGACAAAGCGTAC
CAGTCTCTGCACTCTTACCTGCCGGGTTCTCTGGCGCACTTCCTGCTGTCTGCGCACGCGCTGGGTTTCCG
TATCTTCTCTAAATCTGGTGAAGCGACCGCGTTCCAGGCGTCTTCTAAAATCGAAGCGTACGAATCTAAA
CTGGCGTCTGAACTGGCGTGCGTTGACCTGTCTATCCAGAACCTGACCATCTCTACCCTGTTCAACGCGC
TGACCACCTCTGTTCGTGGTAAAGGTGAAGAAACCTCTGCGGACCCGCTGATCGCGCGTTTCTACACCCT
GCTGACCGGTAAACCGCTGTCTCGTGACACCCAGGGTCCGGAACGTGACCTGGCGGAAGTTATCTCTCG
TAAAATCGCGTCTTCTTTCGGTACCTGGAAAGAAATGACCGCGAACCCGCTGCAGTCTCTGCAGTTCTTC
GAAGAAGAACTGCACGCGCTGGACGCGAACGTTTCTCTGTCTCCGGCGTTCGACGTTCTGATCAAAATG
AACGACCTGCAGGGTGACCTGAAAAACCGTACCATCGTTTTCGACCCGGACGCGCCGGTTTTCGAATAC
AACGCGGAAGACCCGGCGGACATCATCATCAAACTGACCGCGCGTTACGCGAAAGAAGCGGTTATCAA
AAACCAGAACGTTGGTAACTACGTTAAAAACGCGATCACCACCACCAACGCGAACGGTCTGGGTTGGCT
GCTGAACAAAGGTCTGTCTCTGCTGCCGGTTTCTACCGACGACGAACTGCTGGAATTCATCGGTGTTGAA
CGTTCTCACCCGTCTTGCCACGCGCTGATCGAACTGATCGCGCAGCTGGAAGCGCCGGAACTGTTCGAA
AAAAACGTTTTCTCTGACACCCGTTCTGAAGTTCAGGGTATGATCGACTCTGCGGTTTCTAACCACATCG
CGCGTCTGTCTTCTTCTCGTAACTCTCTGTCTATGGACTCTGAAGAACTGGAACGTCTGATCAAATCTTTC
CAGATCCACACCCCGCACTGCTCTCTGTTCATCGGTGCGCAGTCTCTGTCTCAGCAGCTGGAATCTCTGC
CGGAAGCGCTGCAGTCTGGTGTTAACTCTGCGGACATCCTGCTGGGTTCTACCCAGTACATGCTGACCAA
CTCTCTGGTTGAAGAATCTATCGCGACCTACCAGCGTACCCTGAACCGTATCAACTACCTGTCTGGTGTT
GCGGGTCAGATCAACGGTGCGATCAAACGTAAAGCGATCGACGGTGAAAAAATCCACCTGCCGGCGGC
GTGGTCTGAACTGATCTCTCTGCCGTTCATCGGTCAGCCGGTTATCGACGTTGAATCTGACCTGGCGCAC
CTGAAAAACCAGTACCAGACCCTGTCTAACGAATTCGACACCCTGATCTCTGCGCTGCAGAAAAACTTC
GACCTGAACTTCAACAAAGCGCTGCTGAACCGTACCCAGCACTTCGAAGCGATGTGCCGTTCTACCAAA
AAAAACGCGCTGTCTAAACCGGAAATCGTTTCTTACCGTGACCTGCTGGCGCGTCTGACCTCTTGCCTGT
ACCGTGGTTCTCTGGTTCTGCGTCGTGCGGGTATCGAAGTTCTGAAAAAACACAAAATCTTCGAATCTAA
CTCTGAACTGCGTGAACACGTTCACGAACGTAAACACTTCGTTTTCGTTTCTCCGCTGGACCGTAAAGCG
AAAAAACTGCTGCGTCTGACCGACTCTCGTCCGGACCTGCTGCACGTTATCGACGAAATCCTGCAGCAC
GACAACCTGGAAAACAAAGACCGTGAATCTCTGTGGCTGGTTCGTTCTGGTTACCTGCTGGCGGGTCTGC
CGGACCAGCTGTCTTCTTCTTTCATCAACCTGCCGATCATCACCCAGAAAGGTGACCGTCGTCTGATCGA
CCTGATCCAGTACGACCAGATCAACCGTGACGCGTTCGTTATGCTGGTTACCTCTGCGTTCAAATCTAAC
CTGTCTGGTCTGCAGTACCGTGCGAACAAACAGTCTTTCGTTGTTACCCGTACCCTGTCTCCGTACCTGG
GTTCTAAACTGGTTTACGTTCCGAAAGACAAAGACTGGCTGGTTCCGTCTCAGATGTTCGAAGGTCGTTT
CGCGGACATCCTGCAGTCTGACTACATGGTTTGGAAAGACGCGGGTCGTCTGTGCGTTATCGACACCGC
GAAACACCTGTCTAACATCAAAAAATCTGTTTTCTCTTCTGAAGAAGTTCTGGCGTTCCTGCGTGAACTG
CCGCACCGTACCTTCATCCAGACCGAAGTTCGTGGTCTGGGTGTTAACGTTGACGGTATCGCGTTCAACA
ACGGTGACATCCCGTCTCTGAAAACCTTCTCTAACTGCGTTCAGGTTAAAGTTTCTCGTACCAACACCTC
TCTGGTTCAGACCCTGAACCGTTGGTTCGAAGGTGGTAAAGTTTCTCCGCCGTCTATCCAGTTCGAACGT
GCGTACTACAAAAAAGACGACCAGATCCACGAAGACGCGGCGAAACGTAAAATCCGTTTCCAGATGCC
GGCGACCGAACTGGTTCACGCGTCTGACGACGCGGGTTGGACCCCGTCTTACCTGCTGGGTATCGACCC
GGGTGAATACGGTATGGGTCTGTCTCTGGTTTCTATCAACAACGGTGAAGTTCTGGACTCTGGTTTCATC
CACATCAACTCTCTGATCAACTTCGCGTCTAAAAAATCTAACCACCAGACCAAAGTTGTTCCGCGTCAGC
AGTACAAATCTCCGTACGCGAACTACCTGGAACAGTCTAAAGACTCTGCGGCGGGTGACATCGCGCACA
TCCTGGACCGTCTGATCTACAAACTGAACGCGCTGCCGGTTTTCGAAGCGCTGTCTGGTAACTCTCAGTC
TGCGGCGGACCAGGTTTGGACCAAAGTTCTGTCTTTCTACACCTGGGGTGACAACGACGCGCAGAACTC
TATCCGTAAACAGCACTGGTTCGGTGCGTCTCACTGGGACATCAAAGGTATGCTGCGTCAGCCGCCGAC
CGAAAAAAAACCGAAACCGTACATCGCGTTCCCGGGTTCTCAGGTTTCTTCTTACGGTAACTCTCAGCGT
TGCTCTTGCTGCGGTCGTAACCCGATCGAACAGCTGCGTGAAATGGCGAAAGACACCTCTATCAAAGAA
CTGAAAATCCGTAACTCTGAAATCCAGCTGTTCGACGGTACCATCAAACTGTTCAACCCGGACCCGTCTA
CCGTTATCGAACGTCGTCGTCACAACCTGGGTCCGTCTCGTATCCCGGTTGCGGACCGTACCTTCAAAAA
CATCTCTCCGTCTTCTCTGGAATTCAAAGAACTGATCACCATCGTTTCTCGTTCTATCCGTCACTCTCCGG
AATTCATCGCGAAAAAACGTGGTATCGGTTCTGAATACTTCTGCGCGTACTCTGACTGCAACTCTTCTCT
GAACTCTGAAGCGAACGCGGCGGCGAACGTTGCGCAGAAATTCCAGAAACAGCTGTTCTTCGAACTGTA
A
SEQ ATGAAACGTATCCTGAACTCTCTGAAAGTTGCGGCGCTGCGTCTGCTGTTCCGTGGTAAAGGTTCTGAAC
ID TGGTTAAAACCGTTAAATACCCGCTGGTTTCTCCGGTTCAGGGTGCGGTTGAAGAACTGGCGGAAGCGA
NO: TCCGTCACGACAACCTGCACCTGTTCGGTCAGAAAGAAATCGTTGACCTGATGGAAAAAGACGAAGGTA
57 CCCAGGTTTACTCTGTTGTTGACTTCTGGCTGGACACCCTGCGTCTGGGTATGTTCTTCTCTCCGTCTGCG
AACGCGCTGAAAATCACCCTGGGTAAATTCAACTCTGACCAGGTTTCTCCGTTCCGTAAAGTTCTGGAAC
AGTCTCCGTTCTTCCTGGCGGGTCGTCTGAAAGTTGAACCGGCGGAACGTATCCTGTCTGTTGAAATCCG
TAAAATCGGTAAACGTGAAAACCGTGTTGAAAACTACGCGGCGGACGTTGAAACCTGCTTCATCGGTCA
GCTGTCTTCTGACGAAAAACAGTCTATCCAGAAACTGGCGAACGACATCTGGGACTCTAAAGACCACGA
AGAACAGCGTATGCTGAAAGCGGACTTCTTCGCGATCCCGCTGATCAAAGACCCGAAAGCGGTTACCGA
AGAAGACCCGGAAAACGAAACCGCGGGTAAACAGAAACCGCTGGAACTGTGCGTTTGCCTGGTTCCGG
AACTGTACACCCGTGGTTTCGGTTCTATCGCGGACTTCCTGGTTCAGCGTCTGACCCTGCTGCGTGACAA
AATGTCTACCGACACCGCGGAAGACTGCCTGGAATACGTTGGTATCGAAGAAGAAAAAGGTAACGGTA
TGAACTCTCTGCTGGGTACCTTCCTGAAAAACCTGCAGGGTGACGGTTTCGAACAGATCTTCCAGTTCAT
GCTGGGTTCTTACGTTGGTTGGCAGGGTAAAGAAGACGTTCTGCGTGAACGTCTGGACCTGCTGGCGGA
AAAAGTTAAACGTCTGCCGAAACCGAAATTCGCGGGTGAATGGTCTGGTCACCGTATGTTCCTGCACGG
TCAGCTGAAATCTTGGTCTTCTAACTTCTTCCGTCTGTTCAACGAAACCCGTGAACTGCTGGAATCTATC
AAATCTGACATCCAGCACGCGACCATGCTGATCTCTTACGTTGAAGAAAAAGGTGGTTACCACCCGCAG
CTGCTGTCTCAGTACCGTAAACTGATGGAACAGCTGCCGGCGCTGCGTACCAAAGTTCTGGACCCGGAA
ATCGAAATGACCCACATGTCTGAAGCGGTTCGTTCTTACATCATGATCCACAAATCTGTTGCGGGTTTCC
TGCCGGACCTGCTGGAATCTCTGGACCGTGACAAAGACCGTGAATTCCTGCTGTCTATCTTCCCGCGTAT
CCCGAAAATCGACAAAAAAACCAAAGAAATCGTTGCGTGGGAACTGCCGGGTGAACCGGAAGAAGGTT
ACCTGTTCACCGCGAACAACCTGTTCCGTAACTTCCTGGAAAACCCGAAACACGTTCCGCGTTTCATGGC
GGAACGTATCCCGGAAGACTGGACCCGTCTGCGTTCTGCGCCGGTTTGGTTCGACGGTATGGTTAAACA
GTGGCAGAAAGTTGTTAACCAGCTGGTTGAATCTCCGGGTGCGCTGTACCAGTTCAACGAATCTTTCCTG
CGTCAGCGTCTGCAGGCGATGCTGACCGTTTACAAACGTGACCTGCAGACCGAAAAATTCCTGAAACTG
CTGGCGGACGTTTGCCGTCCGCTGGTTGACTTCTTCGGTCTGGGTGGTAACGACATCATCTTCAAATCTT
GCCAGGACCCGCGTAAACAGTGGCAGACCGTTATCCCGCTGTCTGTTCCGGCGGACGTTTACACCGCGT
GCGAAGGTCTGGCGATCCGTCTGCGTGAAACCCTGGGTTTCGAATGGAAAAACCTGAAAGGTCACGAAC
GTGAAGACTTCCTGCGTCTGCACCAGCTGCTGGGTAACCTGCTGTTCTGGATCCGTGACGCGAAACTGGT
TGTTAAACTGGAAGACTGGATGAACAACCCGTGCGTTCAGGAATACGTTGAAGCGCGTAAAGCGATCGA
CCTGCCGCTGGAAATCTTCGGTTTCGAAGTTCCGATCTTCCTGAACGGTTACCTGTTCTCTGAACTGCGTC
AGCTGGAACTGCTGCTGCGTCGTAAATCTGTTATGACCTCTTACTCTGTTAAAACCACCGGTTCTCCGAA
CCGTCTGTTCCAGCTGGTTTACCTGCCGCTGAACCCGTCTGACCCGGAAAAAAAAAACTCTAACAACTTC
CAGGAACGTCTGGACACCCCGACCGGTCTGTCTCGTCGTTTCCTGGACCTGACCCTGGACGCGTTCGCGG
GTAAACTGCTGACCGACCCGGTTACCCAGGAACTGAAAACCATGGCGGGTTTCTACGACCACCTGTTCG
GTTTCAAACTGCCGTGCAAACTGGCGGCGATGTCTAACCACCCGGGTTCTTCTTCTAAAATGGTTGTTCT
GGCGAAACCGAAAAAAGGTGTTGCGTCTAACATCGGTTTCGAACCGATCCCGGACCCGGCGCACCCGGT
TTTCCGTGTTCGTTCTTCTTGGCCGGAACTGAAATACCTGGAAGGTCTGCTGTACCTGCCGGAAGACACC
CCGCTGACCATCGAACTGGCGGAAACCTCTGTTTCTTGCCAGTCTGTTTCTTCTGTTGCGTTCGACCTGAA
AAACCTGACCACCATCCTGGGTCGTGTTGGTGAATTCCGTGTTACCGCGGACCAGCCGTTCAAACTGACC
CCGATCATCCCGGAAAAAGAAGAATCTTTCATCGGTAAAACCTACCTGGGTCTGGACGCGGGTGAACGT
TCTGGTGTTGGTTTCGCGATCGTTACCGTTGACGGTGACGGTTACGAAGTTCAGCGTCTGGGTGTTCACG
AAGACACCCAGCTGATGGCGCTGCAGCAGGTTGCGTCTAAATCTCTGAAAGAACCGGTTTTCCAGCCGC
TGCGTAAAGGTACCTTCCGTCAGCAGGAACGTATCCGTAAATCTCTGCGTGGTTGCTACTGGAACTTCTA
CCACGCGCTGATGATCAAATACCGTGCGAAAGTTGTTCACGAAGAATCTGTTGGTTCTTCTGGTCTGGTT
GGTCAGTGGCTGCGTGCGTTCCAGAAAGACCTGAAAAAAGCGGACGTTCTGCCGAAAAAAGGTGGTAA
AAACGGTGTTGACAAAAAAAAACGTGAATCTTCTGCGCAGGACACCCTGTGGGGTGGTGCGTTCTCTAA
AAAAGAAGAACAGCAGATCGCGTTCGAAGTTCAGGCGGCGGGTTCTTCTCAGTTCTGCCTGAAATGCGG
TTGGTGGTTCCAGCTGGGTATGCGTGAAGTTAACCGTGTTCAGGAATCTGGTGTTGTTCTGGACTGGAAC
CGTTCTATCGTTACCTTCCTGATCGAATCTTCTGGTGAAAAAGTTTACGGTTTCTCTCCGCAGCAGCTGGA
AAAAGGTTTCCGTCCGGACATCGAAACCTTCAAAAAAATGGTTCGTGACTTCATGCGTCCGCCGATGTTC
GACCGTAAAGGTCGTCCGGCGGCGGCGTACGAACGTTTCGTTCTGGGTCGTCGTCACCGTCGTTACCGTT
TCGACAAAGTTTTCGAAGAACGTTTCGGTCGTTCTGCGCTGTTCATCTGCCCGCGTGTTGGTTGCGGTAA
CTTCGACCACTCTTCTGAACAGTCTGCGGTTGTTCTGGCGCTGATCGGTTACATCGCGGACAAAGAAGGT
ATGTCTGGTAAAAAACTGGTTTACGTTCGTCTGGCGGAACTGATGGCGGAATGGAAACTGAAAAAACTG
GAACGTTCTCGTGTTGAAGAACAGTCTTCTGCGCAGTAA
SEQ ATGGCGGAATCTAAACAGATGCAGTGCCGTAAATGCGGTGCGTCTATGAAATACGAAGTTATCGGTCTG
ID GGTAAAAAATCTTGCCGTTACATGTGCCCGGACTGCGGTAACCACACCTCTGCGCGTAAAATCCAGAAC
NO: AAAAAAAAACGTGACAAAAAATACGGTTCTGCGTCTAAAGCGCAGTCTCAGCGTATCGCGGTTGCGGGT
58 GCGCTGTACCCGGACAAAAAAGTTCAGACCATCAAAACCTACAAATACCCGGCGGACCTGAACGGTGA
AGTTCACGACTCTGGTGTTGCGGAAAAAATCGCGCAGGCGATCCAGGAAGACGAAATCGGTCTGCTGGG
TCCGTCTTCTGAATACGCGTGCTGGATCGCGTCTCAGAAACAGTCTGAACCGTACTCTGTTGTTGACTTC
TGGTTCGACGCGGTTTGCGCGGGTGGTGTTTTCGCGTACTCTGGTGCGCGTCTGCTGTCTACCGTTCTGCA
GCTGTCTGGTGAAGAATCTGTTCTGCGTGCGGCGCTGGCGTCTTCTCCGTTCGTTGACGACATCAACCTG
GCGCAGGCGGAAAAATTCCTGGCGGTTTCTCGTCGTACCGGTCAGGACAAACTGGGTAAACGTATCGGT
GAATGCTTCGCGGAAGGTCGTCTGGAAGCGCTGGGTATCAAAGACCGTATGCGTGAATTCGTTCAGGCG
ATCGACGTTGCGCAGACCGCGGGTCAGCGTTTCGCGGCGAAACTGAAAATCTTCGGTATCTCTCAGATG
CCGGAAGCGAAACAGTGGAACAACGACTCTGGTCTGACCGTTTGCATCCTGCCGGACTACTACGTTCCG
GAAGAAAACCGTGCGGACCAGCTGGTTGTTCTGCTGCGTCGTCTGCGTGAAATCGCGTACTGCATGGGT
ATCGAAGACGAAGCGGGTTTCGAACACCTGGGTATCGACCCGGGTGCGCTGTCTAACTTCTCTAACGGT
AACCCGAAACGTGGTTTCCTGGGTCGTCTGCTGAACAACGACATCATCGCGCTGGCGAACAACATGTCT
GCGATGACCCCGTACTGGGAAGGTCGTAAAGGTGAACTGATCGAACGTCTGGCGTGGCTGAAACACCGT
GCGGAAGGTCTGTACCTGAAAGAACCGCACTTCGGTAACTCTTGGGCGGACCACCGTTCTCGTATCTTCT
CTCGTATCGCGGGTTGGCTGTCTGGTTGCGCGGGTAAACTGAAAATCGCGAAAGACCAGATCTCTGGTG
TTCGTACCGACCTGTTCCTGCTGAAACGTCTGCTGGACGCGGTTCCGCAGTCTGCGCCGTCTCCGGACTT
CATCGCGTCTATCTCTGCGCTGGACCGTTTCCTGGAAGCGGCGGAATCTTCTCAGGACCCGGCGGAACA
GGTTCGTGCGCTGTACGCGTTCCACCTGAACGCGCCGGCGGTTCGTTCTATCGCGAACAAAGCGGTTCAG
CGTTCTGACTCTCAGGAATGGCTGATCAAAGAACTGGACGCGGTTGACCACCTGGAATTCAACAAAGCG
TTCCCGTTCTTCTCTGACACCGGTAAAAAAAAAAAAAAAGGTGCGAACTCTAACGGTGCGCCGTCTGAA
GAAGAATACACCGAAACCGAATCTATCCAGCAGCCGGAAGACGCGGAACAGGAAGTTAACGGTCAGGA
AGGTAACGGTGCGTCTAAAAACCAGAAAAAATTCCAGCGTATCCCGCGTTTCTTCGGTGAAGGTTCTCG
TTCTGAATACCGTATCCTGACCGAAGCGCCGCAGTACTTCGACATGTTCTGCAACAACATGCGTGCGATC
TTCATGCAGCTGGAATCTCAGCCGCGTAAAGCGCCGCGTGACTTCAAATGCTTCCTGCAGAACCGTCTGC
AGAAACTGTACAAACAGACCTTCCTGAACGCGCGTTCTAACAAATGCCGTGCGCTGCTGGAATCTGTTCT
GATCTCTTGGGGTGAATTCTACACCTACGGTGCGAACGAAAAAAAATTCCGTCTGCGTCACGAAGCGTC
TGAACGTTCTTCTGACCCGGACTACGTTGTTCAGCAGGCGCTGGAAATCGCGCGTCGTCTGTTCCTGTTC
GGTTTCGAATGGCGTGACTGCTCTGCGGGTGAACGTGTTGACCTGGTTGAAATCCACAAAAAAGCGATC
TCTTTCCTGCTGGCGATCACCCAGGCGGAAGTTTCTGTTGGTTCTTACAACTGGCTGGGTAACTCTACCG
TTTCTCGTTACCTGTCTGTTGCGGGTACCGACACCCTGTACGGTACCCAGCTGGAAGAATTCCTGAACGC
GACCGTTCTGTCTCAGATGCGTGGTCTGGCGATCCGTCTGTCTTCTCAGGAACTGAAAGACGGTTTCGAC
GTTCAGCTGGAATCTTCTTGCCAGGACAACCTGCAGCACCTGCTGGTTTACCGTGCGTCTCGTGACCTGG
CGGCGTGCAAACGTGCGACCTGCCCGGCGGAACTGGACCCGAAAATCCTGGTTCTGCCGGTTGGTGCGT
TCATCGCGTCTGTTATGAAAATGATCGAACGTGGTGACGAACCGCTGGCGGGTGCGTACCTGCGTCACC
GTCCGCACTCTTTCGGTTGGCAGATCCGTGTTCGTGGTGTTGCGGAAGTTGGTATGGACCAGGGTACCGC
GCTGGCGTTCCAGAAACCGACCGAATCTGAACCGTTCAAAATCAAACCGTTCTCTGCGCAGTACGGTCC
GGTTCTGTGGCTGAACTCTTCTTCTTACTCTCAGTCTCAGTACCTGGACGGTTTCCTGTCTCAGCCGAAAA
ACTGGTCTATGCGTGTTCTGCCGCAGGCGGGTTCTGTTCGTGTTGAACAGCGTGTTGCGCTGATCTGGAA
CCTGCAGGCGGGTAAAATGCGTCTGGAACGTTCTGGTGCGCGTGCGTTCTTCATGCCGGTTCCGTTCTCT
TTCCGTCCGTCTGGTTCTGGTGACGAAGCGGTTCTGGCGCCGAACCGTTACCTGGGTCTGTTCCCGCACT
CTGGTGGTATCGAATACGCGGTTGTTGACGTTCTGGACTCTGCGGGTTTCAAAATCCTGGAACGTGGTAC
CATCGCGGTTAACGGTTTCTCTCAGAAACGTGGTGAACGTCAGGAAGAAGCGCACCGTGAAAAACAGCG
TCGTGGTATCTCTGACATCGGTCGTAAAAAACCGGTTCAGGCGGAAGTTGACGCGGCGAACGAACTGCA
CCGTAAATACACCGACGTTGCGACCCGTCTGGGTTGCCGTATCGTTGTTCAGTGGGCGCCGCAGCCGAA
ACCGGGTACCGCGCCGACCGCGCAGACCGTTTACGCGCGTGCGGTTCGTACCGAAGCGCCGCGTTCTGG
TAACCAGGAAGACCACGCGCGTATGAAATCTTCTTGGGGTTACACCTGGGGTACCTACTGGGAAAAACG
TAAACCGGAAGACATCCTGGGTATCTCTACCCAGGTTTACTGGACCGGTGGTATCGGTGAATCTTGCCCG
GCGGTTGCGGTTGCGCTGCTGGGTCACATCCGTGCGACCTCTACCCAGACCGAATGGGAAAAAGAAGAA
GTTGTTTTCGGTCGTCTGAAAAAATTCTTCCCGTCTTAA
SEQ ATGGAAAAACGTATCAACAAAATCCGTAAAAAACTGTCTGCGGACAACGCGACCAAACCGGTTTCTCGT
ID TCTGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCGACGACCTGAAAAAACGTCTGGAAAAACGT
NO: CGTAAAAAACCGGAAGTTATGCCGCAGGTTATCTCTAACAACGCGGCGAACAACCTGCGTATGCTGCTG
59 GACGACTACACCAAAATGAAAGAAGCGATCCTGCAGGTTTACTGGCAGGAATTCAAAGACGACCACGTT
GGTCTGATGTGCAAATTCGCGCAGCCGGCGTCTAAAAAAATCGACCAGAACAAACTGAAACCGGAAAT
GGACGAAAAAGGTAACCTGACCACCGCGGGTTTCGCGTGCTCTCAGTGCGGTCAGCCGCTGTTCGTTTA
CAAACTGGAACAGGTTTCTGAAAAAGGTAAAGCGTACACCAACTACTTCGGTCGTTGCAACGTTGCGGA
ACACGAAAAACTGATCCTGCTGGCGCAGCTGAAACCGGAAAAAGACTCTGACGAAGCGGTTACCTACTC
TCTGGGTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTTACCAAAGAATCTACCCACCCG
GTTAAACCGCTGGCGCAGATCGCGGGTAACCGTTACGCGTCTGGTCCGGTTGGTAAAGCGCTGTCTGAC
GCGTGCATGGGTACCATCGCGTCTTTCCTGTCTAAATACCAGGACATCATCATCGAACACCAGAAAGTTG
TTAAAGGTAACCAGAAACGTCTGGAATCTCTGCGTGAACTGGCGGGTAAAGAAAACCTGGAATACCCGT
CTGTTACCCTGCCGCCGCAGCCGCACACCAAAGAAGGTGTTGACGCGTACAACGAAGTTATCGCGCGTG
TTCGTATGTGGGTTAACCTGAACCTGTGGCAGAAACTGAAACTGTCTCGTGACGACGCGAAACCGCTGC
TGCGTCTGAAAGGTTTCCCGTCTTTCCCGGTTGTTGAACGTCGTGAAAACGAAGTTGACTGGTGGAACAC
CATCAACGAAGTTAAAAAACTGATCGACGCGAAACGTGACATGGGTCGTGTTTTCTGGTCTGGTGTTAC
CGCGGAAAAACGTAACACCATCCTGGAAGGTTACAACTACCTGCCGAACGAAAACGACCACAAAAAAC
GTGAAGGTTCTCTGGAAAACCCGAAAAAACCGGCGAAACGTCAGTTCGGTGACCTGCTGCTGTACCTGG
AAAAAAAATACGCGGGTGACTGGGGTAAAGTTTTCGACGAAGCGTGGGAACGTATCGACAAAAAAATC
GCGGGTCTGACCTCTCACATCGAACGTGAAGAAGCGCGTAACGCGGAAGACGCGCAGTCTAAAGCGGTT
CTGACCGACTGGCTGCGTGCGAAAGCGTCTTTCGTTCTGGAACGTCTGAAAGAAATGGACGAAAAAGAA
TTCTACGCGTGCGAAATCCAGCTGCAGAAATGGTACGGTGACCTGCGTGGTAACCCGTTCGCGGTTGAA
GCGGAAAACCGTGTTGTTGACATCTCTGGTTTCTCTATCGGTTCTGACGGTCACTCTATCCAGTACCGTA
ACCTGCTGGCGTGGAAATACCTGGAAAACGGTAAACGTGAATTCTACCTGCTGATGAACTACGGTAAAA
AAGGTCGTATCCGTTTCACCGACGGTACCGACATCAAAAAATCTGGTAAATGGCAGGGTCTGCTGTACG
GTGGTGGTAAAGCGAAAGTTATCGACCTGACCTTCGACCCGGACGACGAACAGCTGATCATCCTGCCGC
TGGCGTTCGGTACCCGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTGTCTCTGGAAACCGGTCTGAT
CAAACTGGCGAACGGTCGTGTTATCGAAAAAACCATCTACAACAAAAAAATCGGTCGTGACGAACCGG
CGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAAGTTGTTGACCCGTCTAACATCAAACCGGTTAACCT
GATCGGTGTTGACCGTGGTGAAAACATCCCGGCGGTTATCGCGCTGACCGACCCGGAAGGTTGCCCGCT
GCCGGAATTCAAAGACTCTTCTGGTGGTCCGACCGACATCCTGCGTATCGGTGAAGGTTACAAAGAAAA
ACAGCGTGCGATCCAGGCGGCGAAAGAAGTTGAACAGCGTCGTGCGGGTGGTTACTCTCGTAAATTCGC
GTCTAAATCTCGTAACCTGGCGGACGACATGGTTCGTAACTCTGCGCGTGACCTGTTCTACCACGCGGTT
ACCCACGACGCGGTTCTGGTTTTCGAAAACCTGTCTCGTGGTTTCGGTCGTCAGGGTAAACGTACCTTCA
TGACCGAACGTCAGTACACCAAAATGGAAGACTGGCTGACCGCGAAACTGGCGTACGAAGGTCTGACCT
CTAAAACCTACCTGTCTAAAACCCTGGCGCAGTACACCTCTAAAACCTGCTCTAACTGCGGTTTCACCAT
CACCACCGCGGACTACGACGGTATGCTGGTTCGTCTGAAAAAAACCTCTGACGGTTGGGCGACCACCCT
GAACAACAAAGAACTGAAAGCGGAAGGTCAGATCACCTACTACAACCGTTACAAACGTCAGACCGTTG
AAAAAGAACTGTCTGCGGAACTGGACCGTCTGTCTGAAGAATCTGGTAACAACGACATCTCTAAATGGA
CCAAAGGTCGTCGTGACGAAGCGCTGTTCCTGCTGAAAAAACGTTTCTCTCACCGTCCGGTTCAGGAAC
AGTTCGTTTGCCTGGACTGCGGTCACGAAGTTCACGCGGACGAACAGGCGGCGCTGAACATCGCGCGTT
CTTGGCTGTTCCTGAACTCTAACTCTACCGAATTCAAATCTTACAAATCTGGTAAACAGCCGTTCGTTGG
TGCGTGGCAGGCGTTCTACAAACGTCGTCTGAAAGAAGTTTGGAAACCGAACGCG
SEQ ATGAAACGTATCAACAAAATCCGTCGTCGTCTGGTTAAAGACTCTAACACCAAAAAAGCGGGTAAAACC
ID GGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCCCGGACCTGCGTGAACGTCTGGAAAACCTGCGT
NO: AAAAAACCGGAAAACATCCCGCAGCCGATCTCTAACACCTCTCGTGCGAACCTGAACAAACTGCTGACC
60 GACTACACCGAAATGAAAAAAGCGATCCTGCACGTTTACTGGGAAGAATTCCAGAAAGACCCGGTTGGT
CTGATGTCTCGTGTTGCGCAGCCGGCGCCGAAAAACATCGACCAGCGTAAACTGATCCCGGTTAAAGAC
GGTAACGAACGTCTGACCTCTTCTGGTTTCGCGTGCTCTCAGTGCTGCCAGCCGCTGTACGTTTACAAAC
TGGAACAGGTTAACGACAAAGGTAAACCGCACACCAACTACTTCGGTCGTTGCAACGTTTCTGAACACG
AACGTCTGATCCTGCTGTCTCCGCACAAACCGGAAGCGAACGACGAACTGGTTACCTACTCTCTGGGTA
AATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTTACCCGTGAATCTAACCACCCGGTTAAACC
GCTGGAACAGATCGGTGGTAACTCTTGCGCGTCTGGTCCGGTTGGTAAAGCGCTGTCTGACGCGTGCAT
GGGTGCGGTTGCGTCTTTCCTGACCAAATACCAGGACATCATCCTGGAACACCAGAAAGTTATCAAAAA
AAACGAAAAACGTCTGGCGAACCTGAAAGACATCGCGTCTGCGAACGGTCTGGCGTTCCCGAAAATCAC
CCTGCCGCCGCAGCCGCACACCAAAGAAGGTATCGAAGCGTACAACAACGTTGTTGCGCAGATCGTTAT
CTGGGTTAACCTGAACCTGTGGCAGAAACTGAAAATCGGTCGTGACGAAGCGAAACCGCTGCAGCGTCT
GAAAGGTTTCCCGTCTTTCCCGCTGGTTGAACGTCAGGCGAACGAAGTTGACTGGTGGGACATGGTTTGC
AACGTTAAAAAACTGATCAACGAAAAAAAAGAAGACGGTAAAGTTTTCTGGCAGAACCTGGCGGGTTA
CAAACGTCAGGAAGCGCTGCTGCCGTACCTGTCTTCTGAAGAAGACCGTAAAAAAGGTAAAAAATTCGC
GCGTTACCAGTTCGGTGACCTGCTGCTGCACCTGGAAAAAAAACACGGTGAAGACTGGGGTAAAGTTTA
CGACGAAGCGTGGGAACGTATCGACAAAAAAGTTGAAGGTCTGTCTAAACACATCAAACTGGAAGAAG
AACGTCGTTCTGAAGACGCGCAGTCTAAAGCGGCGCTGACCGACTGGCTGCGTGCGAAAGCGTCTTTCG
TTATCGAAGGTCTGAAAGAAGCGGACAAAGACGAATTCTGCCGTTGCGAACTGAAACTGCAGAAATGGT
ACGGTGACCTGCGTGGTAAACCGTTCGCGATCGAAGCGGAAAACTCTATCCTGGACATCTCTGGTTTCTC
TAAACAGTACAACTGCGCGTTCATCTGGCAGAAAGACGGTGTTAAAAAACTGAACCTGTACCTGATCAT
CAACTACTTCAAAGGTGGTAAACTGCGTTTCAAAAAAATCAAACCGGAAGCGTTCGAAGCGAACCGTTT
CTACACCGTTATCAACAAAAAATCTGGTGAAATCGTTCCGATGGAAGTTAACTTCAACTTCGACGACCC
GAACCTGATCATCCTGCCGCTGGCGTTCGGTAAACGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTG
TCTCTGGAAACCGGTTCTCTGAAACTGGCGAACGGTCGTGTTATCGAAAAAACCCTGTACAACCGTCGT
ACCCGTCAGGACGAACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAAGTTCTGGACTCTTCTA
ACATCAAACCGATGAACCTGATCGGTATCGACCGTGGTGAAAACATCCCGGCGGTTATCGCGCTGACCG
ACCCGGAAGGTTGCCCGCTGTCTCGTTTCAAAGACTCTCTGGGTAACCCGACCCACATCCTGCGTATCGG
TGAATCTTACAAAGAAAAACAGCGTACCATCCAGGCGGCGAAAGAAGTTGAACAGCGTCGTGCGGGTG
GTTACTCTCGTAAATACGCGTCTAAAGCGAAAAACCTGGCGGACGACATGGTTCGTAACACCGCGCGTG
ACCTGCTGTACTACGCGGTTACCCAGGACGCGATGCTGATCTTCGAAAACCTGTCTCGTGGTTTCGGTCG
TCAGGGTAAACGTACCTTCATGGCGGAACGTCAGTACACCCGTATGGAAGACTGGCTGACCGCGAAACT
GGCGTACGAAGGTCTGCCGTCTAAAACCTACCTGTCTAAAACCCTGGCGCAGTACACCTCTAAAACCTG
CTCTAACTGCGGTTTCACCATCACCTCTGCGGACTACGACCGTGTTCTGGAAAAACTGAAAAAAACCGC
GACCGGTTGGATGACCACCATCAACGGTAAAGAACTGAAAGTTGAAGGTCAGATCACCTACTACAACCG
TTACAAACGTCAGAACGTTGTTAAAGACCTGTCTGTTGAACTGGACCGTCTGTCTGAAGAATCTGTTAAC
AACGACATCTCTTCTTGGACCAAAGGTCGTTCTGGTGAAGCGCTGTCTCTGCTGAAAAAACGTTTCTCTC
ACCGTCCGGTTCAGGAAAAATTCGTTTGCCTGAACTGCGGTTTCGAAACCCACGCGGACGAACAGGCGG
CGCTGAACATCGCGCGTTCTTGGCTGTTCCTGCGTTCTCAGGAATACAAAAAATACCAGACCAACAAAA
CCACCGGTAACACCGACAAACGTGCGTTCGTTGAAACCTGGCAGTCTTTCTACCGTAAAAAACTGAAAG
AAGTTTGGAAACCG
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgt
61 cactcgcgtcttttactggctttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaa
caaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtcc
acattgattatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagc
cggatctacctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcacc
gcctatctcgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGGGTAAAATGTATT
ACCTTGGTTTAGACATTGGCACGAATTCCGTGGGCTACGCGGTGACCGACCCCTCATACCACCTGCTGAA
GTTTAAGGGGGAACCAATGTGGGGTGCGCACGTATTTGCCGCCGGTAATCAGAGCGCGGAACGACGCTC
GTTCCGCACATCGCGTCGTCGTTTGGACCGACGCCAACAGCGCGTTAAACTGGTACAGGAGATTTTTGCC
CCGGTGATTAGTCCGATCGACCCACGCTTCTTCATTCGTCTGCATGAATCCGCCCTGTGGCGCGATGACG
TCGCGGAGACGGATAAACATATCTTTTTCAATGATCCTACCTATACCGATAAGGAATATTATAGCGATTA
CCCGACTATCCATCACCTGATCGTTGATCTGATGGAAAGCTCTGAGAAACACGATCCGCGGCTGGTGTA
CCTTGCAGTGGCGTGGTTAGTGGCACACCGTGGTCATTTTCTGAACGAGGTGGACAAGGATAATATTGG
AGATGTGTTGTCGTTCGACGCATTTTATCCGGAGTTTCTCGCGTTCCTGTCGGACAACGGTGTATCACCG
TGGGTGTGCGAAAGCAAAGCGCTGCAGGCGACCTTGCTGAGCCGTAACTCAGTGAACGACAAATATAA
AGCCCTTAAGTCTCTGATCTTCGGATCCCAGAAACCTGAAGATAACTTCGATGCCAATATTTCGGAAGAT
GGACTCATTCAACTGCTGGCCGGCAAAAAGGTAAAAGTTAACAAACTGTTCCCTCAGGAATCGAACGAT
GCATCCTTCACATTGAATGATAAAGAAGACGCGATAGAAGAAATCCTGGGTACGCTTACACCAGATGAA
TGTGAATGGATTGCGCATATACGCCGCCTTTTTGACTGGGCTATCATGAAACATGCTCTGAAAGATGGCA
GGACTATTAGCGAGTCAAAAGTCAAACTGTATGAGCAGCACCATCACGATCTGACCCAACTTAAATACT
TCGTGAAAACCTACCTTGCAAAAGAATACGACGATATTTTCCGCAACGTGGATAGCGAAACAACGAAAA
ACTATGTAGCGTATTCCTATCATGTGAAAGAGGTGAAAGGCACTCTGCCTAAAAATAAGGCAACGCAAG
AAGAGTTTTGTAAGTATGTCCTGGGCAAGGTTAAAAACATTGAATGCTCTGAAGCAGACAAGGTTGACT
TTGATGAGATGATTCAGCGTCTTACCGACAACTCTTTTATGCCTAAGCAGGTTTCGGGCGAAAACCGCGT
TATTCCTTATCAGTTATATTATTATGAACTGAAGACAATTCTGAATAAAGCAGCCTCGTACCTGCCTTTCC
TGACGCAGTGTGGAAAAGATGCAATTTCGAACCAGGACAAACTACTGTCGATCATGACGTTCCGTATTC
CTTACTTCGTCGGACCCTTGCGAAAAGATAATTCGGAACATGCATGGCTCGAACGAAAGGCCGGTAAGA
TTTATCCGTGGAACTTTAACGACAAAGTGGACTTGGATAAATCAGAAGAAGCGTTCATTCGCCGAATGA
CCAATACCTGTACCTATTATCCCGGCGAAGATGTTTTACCGTTGGATTCGCTGATCTATGAGAAATTTAT
GATTTTAAATGAAATCAATAATATTCGTATTGACGGCTACCCGATTAGTGTTGACGTTAAACAGCAGGTT
TTTGGCTTGTTCGAAAAAAAACGACGCGTAACCGTGAAAGATATTCAGAACCTGCTGCTGTCTCTCGGA
GCTCTGGACAAACACGGGAAGCTGACAGGCATCGATACCACTATCCACTCAAACTATAATACGTATCAC
CATTTTAAATCTCTCATGGAACGCGGCGTCCTGACCCGGGATGACGTGGAACGCATCGTTGAAAGGATG
ACCTACAGCGACGATACTAAGCGTGTGCGTCTGTGGCTGAATAACAACTATGGTACTTTAACCGCCGAC
GATGTGAAACACATTTCGCGTCTGCGCAAACACGATTTTGGCCGTTTATCCAAAATGTTCTTAACAGGTC
TGAAGGGTGTCCATAAGGAGACCGGTGAACGTGCCTCCATACTGGATTTCATGTGGAACACGAACGATA
ACCTGATGCAGCTCCTTTCCGAATGCTACACGTTCAGTGATGAAATCACAAAGCTGCAAGAGGCGTATT
ATGCAAAAGCCCAGTTGTCTTTAAACGATTTTTTAGACTCGATGTACATCTCTAACGCGGTGAAACGTCC
GATTTACAGAACTCTGGCAGTGGTGAACGATATTCGAAAAGCATGTGGGACGGCCCCTAAACGCATTTT
CATCGAAATGGCTCGTGATGGTGAATCAAAAAAAAAGAGAAGTGTTACACGTCGCGAGCAGATCAAAA
ACCTGTACCGCTCGATTCGTAAAGATTTCCAGCAGGAAGTTGATTTTCTGGAAAAGATCCTGGAAAATA
AATCTGATGGTCAACTTCAGTCAGATGCTTTGTATCTTTACTTTGCACAATTAGGGCGCGATATGTACAC
GGGCGATCCAATAAAGCTGGAGCACATCAAAGATCAGAGTTTCTATAACATAGACCATATTTACCCGCA
GTCTATGGTGAAAGACGATTCCCTAGATAACAAAGTGCTGGTGCAAAGCGAAATTAACGGCGAGAAAA
GCTCGCGATACCCTTTGGACGCCGCGATCCGCAATAAAATGAAGCCCCTTTGGGACGCTTACTATAATCA
TGGCCTGATCTCCTTAAAGAAATACCAGCGTCTAACGCGCTCGACCCCGTTTACCGATGATGAAAAATG
GGACTTTATTAATCGCCAGTTAGTGGAAACCCGTCAATCTACCAAAGCGCTGGCCATTTTGTTGAAGCGT
AAGTTTCCAGACACCGAAATTGTGTATTCGAAGGCGGGGTTATCGTCCGACTTCAGACATGAATTCGGC
CTTGTAAAAAGTCGCAATATTAATGATTTGCACCACGCTAAAGACGCATTCTTGGCTATCGTTACCGGCA
ATGTGTACCATGAAAGATTCAATCGCAGATGGTTTATGGTGAACCAGCCGTACTCAGTTAAAACTAAAA
CTCTTTTTACCCACAGCATAAAGAATGGCAACTTCGTTGCCTGGAACGGCGAAGAAGATCTCGGTCGTAT
TGTAAAAATGCTGAAGCAAAACAAAAATACCATTCACTTCACGCGCTTCTCCTTCGATCGCAAAGAAGG
ATTATTTGATATCCAACCTCTGAAAGCCAGCACCGGCTTAGTCCCACGAAAAGCCGGTCTGGATGTCGTT
AAATACGGCGGATATGACAAATCTACCGCGGCCTATTACCTGCTGGTGAGGTTCACGCTCGAGGACAAG
AAAACCCAGCACAAGCTGATGATGATTCCTGTAGAAGGCCTGTACAAGGCTCGCATTGATCATGACAAG
GAATTTCTTACCGATTATGCGCAAACGACTATAAGCGAAATCCTACAGAAAGATAAACAGAAAGTGATC
AATATTATGTTTCCAATGGGTACGAGGCATATAAAACTCAATTCAATGATTAGTATCGATGGCTTCTATC
TTAGTATCGGCGGAAAGTCCTCTAAAGGTAAGTCAGTTCTATGTCACGCAATGGTTCCACTGATCGTCCC
TCACAAAATCGAATGTTACATTAAAGCAATGGAAAGCTTCGCCCGGAAGTTTAAAGAAAACAACAAGCT
GCGCATCGTAGAAAAATTCGATAAAATCACCGTTGAAGACAACCTGAATCTCTACGAGCTCTTTCTCCA
AAAACTGCAGCATAATCCCTATAATAAGTTTTTTTCGACACAGTTTGACGTACTGACGAACGGCCGTTCT
ACTTTCACAAAACTGTCGCCGGAGGAACAGGTACAGACGCTCTTGAACATTTTAAGTATCTTTAAAACAT
GCCGCAGTTCGGGTTGCGACCTGAAATCCATCAACGGCAGTGCCCAGGCAGCGCGCATCATGATTAGCG
CTGACTTAACTGGACTGTCGAAAAAATATTCAGATATTAGGTTGGTTGAACAGTCAGCTTCTGGTTTGTT
CGTATCCAAAAGTCAGAACTTACTGGAGTATCTCTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTT
TTATCTGAAATTTATTATATCGCGTTGATTATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGTG
TTATTAATTGAATGAATTTTATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACT
CAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTA
TTTCC
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgtc
62 actgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaacaa
agcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattga
ttatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctacctgac
gctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctatctcgtgtga
gataggcggagatacgaactttaagAAGGAGatatacCATGTCATCGCTCACGA
AATTCACTAACAAATACTCTAAACAGCTCACCATTAAGAATGAACTCATCCCAGTTGGCAAAACACTGG
AGAACATCAAAGAGAATGGTCTGATAGATGGCGACGAACAGCTGAATGAGAATTATCAGAAGGCGAAA
ATTATTGTGGATGATTTTCTGCGGGACTTCATTAATAAAGCACTGAATAATACGCAGATCGGGAACTGGC
GCGAACTGGCGGATGCCCTTAATAAAGAGGATGAAGATAACATCGAGAAATTGCAGGATAAAATTCGG
GGAATCATTGTATCCAAATTTGAAACGTTTGATCTGTTTAGCAGCTATTCTATTAAGAAAGATGAAAAGA
TTATTGACGACGACAATGATGTTGAAGAAGAGGAACTGGATCTGGGCAAGAAGACCAGCTCATTTAAAT
ACATATTTAAAAAAAACCTGTTTAAGTTAGTGTTGCCATCCTACCTGAAAACCACAAACCAGGACAAGC
TGAAGATTATTAGCTCGTTTGATAATTTTTCAACGTACTTCCGCGGGTTCTTTGAAAACCGGAAAAACAT
TTTTACCAAGAAACCGATCTCCACAAGTATTGCGTATCGCATTGTTCATGATAACTTCCCGAAATTCCTT
GATAACATTCGTTGTTTTAATGTGTGGCAGACGGAATGCCCGCAACTAATCGTGAAAGCAGATAACTAT
CTGAAAAGCAAAAATGTTATAGCGAAAGATAAAAGTTTGGCAAACTATTTTACCGTGGGCGCGTATGAC
TATTTCCTGTCTCAGAATGGTATAGATTTTTACAACAATATTATAGGTGGACTGCCAGCGTTCGCCGGCC
ATGAGAAAATCCAAGGTCTCAATGAATTCATCAATCAAGAGTGCCAAAAAGACAGCGAGCTGAAAAGT
AAGCTGAAAAACCGTCACGCGTTCAAAATGGCGGTACTGTTCAAACAGATACTCAGCGATCGTGAAAAA
AGTTTTGTAATTGATGAGTTCGAGTCGGATGCTCAAGTTATTGACGCCGTTAAAAACTTTTACGCCGAAC
AGTGCAAAGATAACAATGTTATTTTTAACTTATTAAATCTTATCAAGAATATCGCTTTCTTAAGTGATGA
CGAACTGGACGGCATATTCATTGAAGGGAAATACCTGTCGAGCGTTAGTCAAAAACTCTATAGCGATTG
GTCAAAATTACGTAACGACATTGAGGATTCGGCTAACTCTAAACAAGGCAATAAAGAGCTGGCCAAGA
AGATCAAAACCAACAAAGGGGATGTAGAAAAAGCGATCTCGAAATATGAGTTCTCGCTGTCGGAACTG
AACTCGATTGTACATGATAACACCAAGTTTTCTGACCTCCTTAGTTGTACACTGCATAAGGTGGCTTCTG
AGAAACTGGTGAAGGTCAATGAAGGCGACTGGCCGAAACATCTCAAGAATAATGAAGAGAAACAAAAA
ATCAAAGAGCCGCTTGATGCTCTGCTGGAGATCTATAATACACTTCTGATTTTTAACTGCAAAAGCTTCA
ATAAAAACGGCAACTTCTATGTCGACTATGATCGTTGCATCAATGAACTGAGTTCGGTCGTGTATCTGTA
TAATAAAACACGTAACTATTGCACTAAAAAACCCTATAACACGGACAAGTTCAAACTCAATTTTAACAG
TCCGCAGCTCGGTGAAGGCTTTTCCAAGTCGAAAGAAAATGACTGTCTGACTCTTTTGTTTAAAAAAGAC
GACAACTATTATGTAGGCATTATCCGCAAAGGTGCAAAAATCAATTTTGATGATACACAAGCAATCGCC
GATAACACCGACAATTGCATCTTTAAAATGAATTATTTCCTACTTAAAGACGCAAAAAAATTTATCCCGA
AATGTAGCATTCAGCTGAAAGAAGTCAAGGCCCATTTTAAGAAATCTGAAGATGATTACATTTTGTCTG
ATAAAGAGAAATTTGCTAGCCCGCTGGTCATTAAAAAGAGCACATTTTTGCTGGCAACTGCACATGTGA
AAGGGAAAAAAGGCAATATCAAGAAATTTCAGAAAGAATATTCGAAAGAAAACCCCACTGAGTATCGC
AATTCTTTAAACGAATGGATTGCTTTTTGTAAAGAGTTCTTAAAAACTTATAAAGCGGCTACCATTTTTG
ATATAACCACATTGAAAAAGGCAGAGGAATATGCTGATATTGTAGAATTCTACAAGGATGTCGATAATC
TGTGCTACAAACTGGAGTTCTGCCCGATTAAAACCTCGTTTATAGAAAACCTGATAGATAACGGCGACC
TGTATCTGTTTCGCATCAATAACAAAGACTTCAGCAGTAAATCGACCGGCACCAAGAACCTTCATACGTT
ATATTTACAAGCTATATTCGATGAACGTAATCTGAACAATCCGACAATTATGCTGAATGGGGGAGCAGA
ACTGTTCTATCGTAAAGAAAGTATTGAGCAGAAAAACCGTATCACACACAAAGCCGGTTCAATTCTCGT
GAATAAGGTGTGTAAAGACGGTACAAGCCTGGATGATAAGATACGTAATGAAATTTATCAATATGAGAA
TAAATTTATTGATACCCTGTCTGATGAAGCTAAAAAGGTGTTACCGAATGTCATTAAAAAGGAAGCTAC
CCATGACATTACAAAAGATAAACGTTTCACTAGTGACAAATTCTTCTTTCACTGCCCCCTGACAATTAAT
TATAAGGAAGGCGATACCAAGCAGTTCAATAACGAAGTGCTGAGTTTTCTGCGTGGAAATCCTGACATC
AACATTATCGGCATTGACCGCGGAGAGCGTAATTTAATCTATGTAACGGTTATAAACCAGAAAGGCGAG
ATTCTGGATTCGGTTTCATTCAATACCGTGACCAACAAGAGTTCAAAAATCGAGCAGACAGTCGATTAT
GAAGAGAAATTGGCAGTCCGCGAGAAAGAGAGGATTGAAGCAAAACGTTCCTGGGACTCTATCTCAAA
AATTGCGACACTAAAGGAAGGTTATCTGAGCGCAATAGTTCACGAGATCTGTCTGTTAATGATTAAACA
CAACGCGATCGTTGTCTTAGAGAATCTTAATGCAGGCTTTAAGCGTATTCGTGGCGGTTTATCAGAAAAA
AGTGTTTATCAAAAATTCGAAAAAATGTTGATTAACAAACTGAACTATTTTGTCAGCAAGAAGGAATCC
GACTGGAATAAACCGTCTGGTCTGCTGAATGGACTGCAGCTTTCGGATCAGTTTGAAAGCTTCGAAAAA
CTGGGTATTCAGTCTGGTTTTATTTTTTACGTGCCGGCTGCATATACCTCAAAGATTGATCCGACCACGG
GCTTCGCCAATGTTCTGAATCTGTCGAAGGTACGCAATGTTGATGCGATCAAAAGCTTTTTTTCTAACTT
CAACGAAATTAGTTATAGCAAGAAAGAAGCCCTTTTCAAATTCTCATTCGATCTGGATTCACTGAGTAAG
AAAGGCTTTAGTAGCTTTGTGAAATTTAGTAAGAGTAAATGGAACGTCTACACCTTTGGAGAACGTATC
ATAAAGCCAAAGAATAAGCAAGGTTATCGGGAGGACAAAAGAATCAACTTGACCTTCGAGATGAAGAA
GTTACTTAACGAGTATAAGGTTTCTTTTGATCTTGAAAATAACTTGATTCCGAATCTCACGAGTGCCAAC
CTGAAGGATACTTTTTGGAAAGAGCTATTCTTTATCTTCAAGACTACGCTGCAGCTCCGTAACAGCGTTA
CTAACGGTAAAGAAGATGTGCTCATCTCTCCGGTCAAAAATGCGAAGGGTGAATTCTTCGTTTCGGGAA
CGCATAACAAGACTCTTCCGCAAGATTGCGATGCGAACGGTGCATACCATATTGCGTTGAAAGGTCTGA
TGATACTCGAACGTAACAACCTTGTACGTGAGGAGAAAGATACGAAAAAGATTATGGCGATTTCAAACG
TGGATTGGTTCGAGTACGTGCAGAAACGTAGAGGCGTTCTGTAAGAAATCATCCTTAGCGAAAGCTAAG
GATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGATGCTGTTTTTAGTTTTAACGGCAATTAAT
ATATGTGTTATTAATTGAATGAATTTTATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAAT
ATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGG
GATGTTATTTCC
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
63 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATAACAACTACGACGAATTCACCAAACTGTACCCGATCCAGAAAACCATCCGTTTCGAACTGAAA
CCGCAGGGTCGTACCATGGAACACCTGGAAACCTTCAACTTCTTCGAAGAAGACCGTGACCGTGCGGAA
AAATACAAAATCCTGAAAGAAGCGATCGACGAATACCACAAAAAATTCATCGACGAACACCTGACCAA
CATGTCTCTGGACTGGAACTCTCTGAAACAGATCTCTGAAAAATACTACAAATCTCGTGAAGAAAAAGA
CAAAAAAGTTTTCCTGTCTGAACAGAAACGTATGCGTCAGGAAATCGTTTCTGAATTCAAAAAAGACGA
CCGTTTCAAAGACCTGTTCTCTAAAAAACTGTTCTCTGAACTGCTGAAAGAAGAAATCTACAAAAAAGG
TAACCACCAGGAAATCGACGCGCTGAAATCTTTCGACAAATTCTCTGGTTACTTCATCGGTCTGCACGAA
AACCGTAAAAACATGTACTCTGACGGTGACGAAATCACCGCGATCTCTAACCGTATCGTTAACGAAAAC
TTCCCGAAATTCCTGGACAACCTGCAGAAATACCAGGAAGCGCGTAAAAAATACCCGGAATGGATCATC
AAAGCGGAATCTGCGCTGGTTGCGCACAACATCAAAATGGACGAAGTTTTCTCTCTGGAATACTTCAAC
AAAGTTCTGAACCAGGAAGGTATCCAGCGTTACAACCTGGCGCTGGGTGGTTACGTTACCAAATCTGGT
GAAAAAATGATGGGTCTGAACGACGCGCTGAACCTGGCGCACCAGTCTGAAAAATCTTCTAAAGGTCGT
ATCCACATGACCCCGCTGTTCAAACAGATCCTGTCTGAAAAAGAATCTTTCTCTTACATCCCGGACGTTT
TCACCGAAGACTCTCAGCTGCTGCCGTCTATCGGTGGTTTCTTCGCGCAGATCGAAAACGACAAAGACG
GTAACATCTTCGACCGTGCGCTGGAACTGATCTCTTCTTACGCGGAATACGACACCGAACGTATCTACAT
CCGTCAGGCGGACATCAACCGTGTTTCTAACGTTATCTTCGGTGAATGGGGTACCCTGGGTGGTCTGATG
CGTGAATACAAAGCGGACTCTATCAACGACATCAACCTGGAACGTACCTGCAAAAAAGTTGACAAATGG
CTGGACTCTAAAGAATTCGCGCTGTCTGACGTTCTGGAAGCGATCAAACGTACCGGTAACAACGACGCG
TTCAACGAATACATCTCTAAAATGCGTACCGCGCGTGAAAAAATCGACGCGGCGCGTAAAGAAATGAA
ATTCATCTCTGAAAAAATCTCTGGTGACGAAGAATCTATCCACATCATCAAAACCCTGCTGGACTCTGTT
CAGCAGTTCCTGCACTTCTTCAACCTGTTCAAAGCGCGTCAGGACATCCCGCTGGACGGTGCGTTCTACG
CGGAATTCGACGAAGTTCACTCTAAACTGTTCGCGATCGTTCCGCTGTACAACAAAGTTCGTAACTACCT
GACCAAAAACAACCTGAACACCAAAAAAATCAAACTGAACTTCAAAAACCCGACCCTGGCGAACGGTT
GGGACCAGAACAAAGTTTACGACTACGCGTCTCTGATCTTCCTGCGTGACGGTAACTACTACCTGGGTAT
CATCAACCCGAAACGTAAAAAAAACATCAAATTCGAACAGGGTTCTGGTAACGGTCCGTTCTACCGTAA
AATGGTTTACAAACAGATCCCGGGTCCGAACAAAAACCTGCCGCGTGTTTTCCTGACCTCTACCAAAGG
TAAAAAAGAATACAAACCGTCTAAAGAAATCATCGAAGGTTACGAAGCGGACAAACACATCCGTGGTG
ACAAATTCGACCTGGACTTCTGCCACAAACTGATCGACTTCTTCAAAGAATCTATCGAAAAACACAAAG
ACTGGTCTAAATTCAACTTCTACTTCTCTCCGACCGAATCTTACGGTGACATCTCTGAATTCTACCTGGAC
GTTGAAAAACAGGGTTACCGTATGCACTTCGAAAACATCTCTGCGGAAACCATCGACGAATACGTTGAA
AAAGGTGACCTGTTCCTGTTCCAGATCTACAACAAAGACTTCGTTAAAGCGGCGACCGGTAAAAAAGAC
ATGCACACCATCTACTGGAACGCGGCGTTCTCTCCGGAAAACCTGCAGGACGTTGTTGTTAAACTGAAC
GGTGAAGCGGAACTGTTCTACCGTGACAAATCTGACATCAAAGAAATCGTTCACCGTGAAGGTGAAATC
CTGGTTAACCGTACCTACAACGGTCGTACCCCGGTTCCGGACAAAATCCACAAAAAACTGACCGACTAC
CACAACGGTCGTACCAAAGACCTGGGTGAAGCGAAAGAATACCTGGACAAAGTTCGTTACTTCAAAGCG
CACTACGACATCACCAAAGACCGTCGTTACCTGAACGACAAAATCTACTTCCACGTTCCGCTGACCCTGA
ACTTCAAAGCGAACGGTAAAAAAAACCTGAACAAAATGGTTATCGAAAAATTCCTGTCTGACGAAAAA
GCGCACATCATCGGTATCGACCGTGGTGAACGTAACCTGCTGTACTACTCTATCATCGACCGTTCTGGTA
AAATCATCGACCAGCAGTCTCTGAACGTTATCGACGGTTTCGACTACCGTGAAAAACTGAACCAGCGTG
AAATCGAAATGAAAGACGCGCGTCAGTCTTGGAACGCGATCGGTAAAATCAAAGACCTGAAAGAAGGT
TACCTGTCTAAAGCGGTTCACGAAATCACCAAAATGGCGATCCAGTACAACGCGATCGTTGTTATGGAA
GAACTGAACTACGGTTTCAAACGTGGTCGTTTCAAAGTTGAAAAACAGATCTACCAGAAATTCGAAAAC
ATGCTGATCGACAAAATGAACTACCTGGTTTTCAAAGACGCGCCGGACGAATCTCCGGGTGGTGTTCTG
AACGCGTACCAGCTGACCAACCCGCTGGAATCTTTCGCGAAACTGGGTAAACAGACCGGTATCCTGTTC
TACGTTCCGGCGGCGTACACCTCTAAAATCGACCCGACCACCGGTTTCGTTAACCTGTTCAACACCTCTT
CTAAAACCAACGCGCAGGAACGTAAAGAATTCCTGCAGAAATTCGAATCTATCTCTTACTCTGCGAAAG
ACGGTGGTATCTTCGCGTTCGCGTTCGACTACCGTAAATTCGGTACCTCTAAAACCGACCACAAAAACGT
TTGGACCGCGTACACCAACGGTGAACGTATGCGTTACATCAAAGAAAAAAAACGTAACGAACTGTTCGA
CCCGTCTAAAGAAATCAAAGAAGCGCTGACCTCTTCTGGTATCAAATACGACGGTGGTCAGAACATCCT
GCCGGACATCCTGCGTTCTAACAACAACGGTCTGATCTACACCATGTACTCTTCTTTCATCGCGGCGATC
CAGATGCGTGTTTACGACGGTAAAGAAGACTACATCATCTCTCCGATCAAAAACTCTAAAGGTGAATTC
TTCCGTACCGACCCGAAACGTCGTGAACTGCCGATCGACGCGGACGCGAACGGTGCGTACAACATCGCG
CTGCGTGGTGAACTGACCATGCGTGCGATCGCGGAAAAATTCGACCCGGACTCTGAAAAAATGGCGAAA
CTGGAACTGAAACACAAAGACTGGTTCGAATTCATGCAGACCCGTGGTGACTAAGAAATCATCCTTAGC
GAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTAT
TACTCAGGAAGCAAAGAGGATTACA
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgt
64 cactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaac
aaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacat
tgattatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctacc
tgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctatctcg
tgtgagataggcggagatacgaactttaagAAGGAGatataccATGACTAAAACATTTG
ATTCAGAGTTTTTTAATTTGTACTCGCTGCAAAAAACGGTACGCTTTGAGTTAAAACCCGTGGGAGAAAC
CGCGTCATTTGTGGAAGACTTTAAAAACGAGGGCTTGAAACGTGTTGTGAGCGAAGATGAAAGGCGAGC
CGTCGATTACCAGAAAGTTAAGGAAATAATTGACGATTACCATCGGGATTTCATTGAAGAAAGTTTAAA
TTATTTTCCGGAACAGGTGAGTAAAGATGCTCTTGAGCAGGCGTTTCATCTTTATCAGAAACTGAAGGCA
GCAAAAGTTGAGGAAAGGGAAAAAGCGCTGAAAGAATGGGAAGCGCTGCAGAAAAAGCTACGTGAAA
AAGTGGTGAAATGCTTCTCGGACTCGAATAAAGCCCGCTTCTCAAGGATTGATAAAAAGGAACTGATTA
AGGAAGACCTGATAAATTGGTTGGTCGCCCAGAATCGCGAGGATGATATCCCTACGGTCGAAACGTTTA
ACAACTTCACCACATATTTTACCGGCTTCCATGAGAATCGTAAAAATATTTACTCCAAAGATGATCACGC
CACCGCTATTAGCTTTCGCCTTATTCATGAAAATCTTCCAAAGTTTTTTGACAACGTGATTAGCTTCAATA
AGTTGAAAGAGGGTTTCCCTGAATTAAAATTTGATAAAGTGAAAGAGGATTTAGAAGTAGATTATGATC
TGAAGCATGCGTTTGAAATAGAATATTTCGTTAACTTCGTGACCCAAGCGGGCATAGATCAGTATAATTA
TCTGTTAGGAGGGAAAACCCTGGAGGACGGGACGAAAAAACAAGGGATGAATGAGCAAATTAATCTGT
TCAAACAACAGCAAACGCGAGATAAAGCGCGTCAGATTCCCAAACTGATCCCCCTGTTCAAACAGATTC
TTAGCGAAAGGACTGAAAGCCAGTCCTTTATTCCTAAACAATTTGAAAGTGATCAGGAGTTGTTCGATTC
ACTGCAGAAGTTACATAATAACTGCCAGGATAAATTCACCGTGCTGCAACAAGCCATTCTCGGTCTGGC
AGAGGCGGATCTTAAGAAGGTCTTCATCAAAACCTCTGATTTAAATGCCTTATCTAACACCATTTTCGGG
AATTACAGCGTCTTTTCCGATGCACTGAACCTGTATAAAGAAAGCCTGAAAACGAAAAAAGCGCAGGAG
GCTTTTGAGAAACTACCGGCCCATTCTATTCACGACCTCATTCAATACTTGGAACAGTTCAATTCCAGCC
TGGACGCGGAAAAACAACAGAGCACCGACACCGTCCTGAACTACTTCATCAAGACCGATGAATTATATT
CTCGCTTCATTAAATCCACTAGCGAGGCTTTCACTCAGGTGCAGCCTTTGTTCGAACTGGAAGCCCTGTC
ATCTAAGCGCCGCCCACCGGAATCGGAAGATGAAGGGGCAAAAGGGCAGGAAGGCTTCGAGCAGATCA
AGCGTATTAAAGCTTACCTGGATACGCTTATGGAAGCGGTACACTTTGCAAAGCCGTTGTATCTTGTTAA
GGGTCGTAAAATGATCGAAGGGCTCGATAAAGACCAGTCCTTTTATGAAGCGTTTGAAATGGCGTACCA
AGAACTTGAATCGTTAATCATTCCTATCTATAACAAAGCGCGGAGCTATCTGTCGCGGAAACCTTTCAAG
GCCGATAAATTCAAGATTAATTTTGACAACAACACGCTACTGAGCGGATGGGATGCGAACAAGGAAACT
GCTAACGCGTCCATTCTGTTTAAGAAAGACGGGTTATATTACCTTGGAATTATGCCGAAAGGTAAGACCT
TTCTCTTTGACTACTTTGTATCGAGCGAGGATTCAGAGAAACTGAAACAGCGTCGCCAGAAGACCGCCG
AAGAAGCTCTGGCGCAGGATGGTGAAAGTTACTTCGAAAAAATTCGTTATAAACTGTTACCAGGGGCTT
CAAAGATGTTACCGAAAGTCTTTTTTAGCAACAAAAATATTGGCTTTTACAACCCGTCGGATGACATTTT
ACGCATTCGCAACACAGCCTCTCACACCAAAAACGGGACCCCTCAGAAAGGCCACTCAAAAGTTGAGTT
TAACCTGAATGATTGTCATAAGATGATTGATTTCTTCAAATCATCAATTCAGAAACACCCGGAATGGGG
GTCTTTTGGCTTTACGTTTTCTGATACCAGTGATTTTGAAGACATGAGTGCCTTCTACCGGGAAGTAGAA
AACCAGGGTTACGTAATTAGCTTTGACAAAATCAAAGAGACCTATATACAGAGCCAGGTGGAACAGGGT
AATCTCTACTTATTCCAGATTTATAACAAGGATTTCTCGCCCTACAGCAAAGGCAAACCAAACCTGCATA
CTCTGTACTGGAAAGCCCTGTTTGAAGAAGCGAACCTGAATAACGTAGTGGCGAAGTTGAACGGTGAAG
CGGAAATCTTCTTCCGTCGTCACTCCATTAAGGCCTCTGATAAAGTTGTCCATCCGGCAAATCAGGCCAT
TGATAATAAGAATCCACACACGGAAAAAACGCAGTCAACCTTTGAATATGACCTCGTTAAAGACAAACG
CTACACGCAAGATAAGTTCTTTTTCCACGTCCCAATCAGCCTCAACTTTAAAGCACAAGGGGTTTCAAAG
TTTAATGATAAAGTCAATGGGTTCCTCAAGGGCAACCCGGATGTCAACATTATAGGTATAGACAGGGGC
GAACGCCATCTGCTTTACTTTACCGTAGTGAATCAGAAAGGTGAAATACTGGTTCAGGAATCATTAAAT
ACCTTGATGTCGGACAAAGGGCACGTTAATGATTACCAGCAGAAACTGGATAAAAAAGAACAGGAACG
TGATGCTGCGCGTAAATCGTGGACCACGGTTGAGAACATTAAAGAGCTGAAAGAGGGGTATCTAAGCCA
TGTGGTACACAAACTGGCGCACCTCATCATTAAATATAACGCAATAGTCTGCCTAGAAGACTTGAATTTT
GGCTTTAAACGCGGCCGCTTCAAAGTGGAAAAACAAGTTTATCAAAAATTTGAAAAGGCGCTTATAGAT
AAACTGAATTATCTGGTTTTTAAAGAAAAGGAACTTGGTGAGGTAGGGCACTACTTGACAGCTTATCAA
CTGACGGCCCCGTTCGAATCATTCAAAAAACTGGGCAAACAGTCTGGCATTCTGTTTTACGTGCCGGCAG
ATTATACTTCAAAAATCGATCCAACAACTGGCTTTGTGAACTTCCTGGACCTGAGATATCAGTCTGTAGA
AAAAGCTAAACAACTTCTTAGCGATTTTAATGCCATTCGTTTTAACAGCGTTCAGAATTACTTTGAATTC
GAAATTGACTATAAAAAACTTACTCCGAAACGTAAAGTCGGAACCCAAAGTAAATGGGTAATTTGTACG
TATGGCGATGTCAGGTATCAGAACCGTCGGAATCAAAAAGGTCATTGGGAGACCGAAGAAGTGAACGT
GACCGAAAAGCTGAAGGCTCTGTTCGCCAGCGATTCAAAAACTACAACTGTGATCGATTACGCAAATGA
TGATAACCTGATAGATGTGATTTTAGAGCAGGATAAAGCCAGCTTTTTTAAAGAACTGTTGTGGCTCCTG
AAACTTACGATGACCTTACGACATTCCAAGATCAAATCGGAAGATGATTTTATTCTGTCACCGGTCAAGA
ATGAGCAGGGTGAATTCTATGATAGTAGGAAAGCCGGCGAAGTGTGGCCGAAAGACGCCGACGCCAAT
GGCGCCTATCATATCGCGCTCAAAGGGCTTTGGAATTTGCAGCAGATTAACCAGTGGGAAAAAGGTAAA
ACCCTGAATCTGGCTATCAAAAACCAGGATTGGTTTAGCTTTATCCAAGAGAAACCGTATCAGGAATGA
GAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGATGC
TGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTTTATCATTCATAATAAGTAT
GTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACAG
AATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
65 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATCATACAGGCGGTCTTCTTAGTATGGACGCGAAAGAGTTCACAGGTCAGTATCCGTTGTCGAAA
ACATTACGATTCGAACTTCGGCCCATCGGCCGCACGTGGGATAACCTGGAGGCCTCAGGCTACTTAGCG
GAAGACCGCCATCGTGCCGAATGTTATCCTCGTGCGAAAGAGTTATTGGATGACAACCATCGTGCCTTCC
TGAATCGTGTGTTGCCACAAATCGATATGGATTGGCACCCGATTGCGGAGGCCTTTTGTAAGGTACATAA
AAACCCTGGTAATAAAGAACTTGCCCAGGATTACAACCTTCAGTTGTCAAAGCGCCGTAAGGAGATCAG
CGCATATCTTCAGGATGCAGATGGCTATAAAGGCCTGTTCGCGAAGCCCGCCTTAGACGAAGCTATGAA
AATTGCGAAAGAAAACGGGAACGAAAGTGATATTGAGGTTCTCGAAGCGTTTAACGGTTTTAGCGTATA
CTTCACCGGTTATCATGAGTCACGCGAGAACATTTATAGCGATGAGGATATGGTGAGCGTAGCCTACCG
AATTACTGAGGATAATTTCCCGCGCTTTGTCTCAAACGCTTTGATCTTTGATAAATTAAACGAAAGCCAT
CCGGATATTATCTCTGAAGTATCGGGCAATCTTGGAGTTGATGACATTGGTAAGTACTTTGACGTGTCGA
ACTATAACAATTTTCTTTCCCAGGCCGGTATAGATGACTACAATCACATTATTGGCGGCCATACAACCGA
AGACGGACTGATACAAGCGTTTAATGTCGTATTGAACTTACGTCACCAAAAAGACCCTGGCTTTGAAAA
AATTCAGTTCAAACAGCTCTACAAACAAATCCTGAGCGTGCGTACCAGCAAAAGCTACATCCCGAAACA
GTTTGACAACTCTAAGGAGATGGTTGACTGCATTTGCGATTATGTCAGCAAAATAGAGAAATCCGAAAC
AGTAGAACGGGCCCTGAAACTAGTCCGTAATATCAGTTCTTTCGACTTGCGCGGGATCTTTGTCAATAAA
AAGAACTTGCGCATACTGAGCAACAAACTGATAGGAGATTGGGACGCGATCGAAACCGCATTGATGCAT
AGTTCTTCATCAGAAAACGATAAGAAAAGCGTATATGATAGCGCGGAGGCTTTTACGTTGGATGACATC
TTTTCAAGCGTGAAAAAATTTTCTGATGCCTCTGCCGAAGATATTGGCAACAGGGCGGAAGACATCTGT
AGAGTGATAAGTGAGACGGCCCCTTTTATCAACGATCTGCGAGCGGTGGACCTGGATAGCCTGAACGAC
GATGGTTATGAAGCGGCCGTCTCAAAAATTCGGGAGTCGCTGGAGCCTTATATGGATCTTTTCCATGAAC
TGGAAATTTTCTCGGTTGGCGATGAGTTCCCAAAATGCGCAGCATTTTACAGCGAACTGGAGGAAGTCA
GCGAACAGCTGATCGAAATTATTCCGTTATTCAACAAGGCGCGTTCGTTCTGCACCCGGAAACGCTATA
GCACCGATAAGATTAAAGTGAACTTAAAATTCCCGACCTTGGCGGACGGGTGGGACCTGAACAAAGAG
AGAGACAACAAAGCCGCGATTCTGCGGAAAGACGGTAAGTATTATCTGGCAATTCTGGATATGAAGAA
AGATCTGTCAAGCATTAGGACCAGCGACGAAGATGAATCCAGCTTCGAAAAGATGGAGTATAAACTGTT
ACCGAGTCCAGTAAAAATGCTGCCAAAGATATTCGTAAAATCGAAAGCCGCTAAGGAAAAATATGGCCT
GACAGATCGTATGCTTGAATGCTACGATAAAGGTATGCATAAGTCGGGTAGTGCGTTTGATCTTGGCTTT
TGCCATGAACTCATTGATTATTACAAGCGTTGTATCGCGGAGTACCCAGGCTGGGATGTGTTCGATTTCA
AGTTTCGCGAAACTTCCGATTATGGGTCCATGAAAGAGTTCAATGAAGATGTGGCCGGAGCCGGTTACT
ATATGAGTCTGAGAAAAATTCCGTGCAGCGAAGTGTACCGTCTGTTAGACGAGAAATCGATTTATCTATT
TCAAATTTATAACAAAGATTACTCTGAAAATGCACATGGTAATAAGAACATGCATACCATGTACTGGGA
GGGTCTCTTTTCCCCGCAAAACCTGGAGTCGCCCGTTTTCAAGTTGTCGGGTGGGGCAGAACTTTTCTTT
CGAAAATCCTCAATCCCTAACGATGCCAAAACAGTACACCCGAAAGGCTCAGTGCTGGTTCCACGTAAT
GATGTTAACGGTCGGCGTATTCCAGATTCAATCTACCGCGAACTGACACGCTATTTTAACCGTGGCGATT
GCCGAATCAGTGACGAAGCCAAAAGTTATCTTGACAAGGTTAAGACTAAAAAAGCGGACCATGACATT
GTGAAAGATCGCCGCTTTACCGTGGATAAAATGATGTTCCACGTCCCGATTGCGATGAACTTTAAGGCG
ATCAGTAAACCGAACTTAAACAAAAAAGTCATTGATGGCATCATTGATGATCAGGATCTGAAAATCATT
GGTATTGATCGTGGCGAGCGGAACTTAATTTACGTCACGATGGTTGACAGAAAAGGGAATATCTTATAT
CAGGATTCTCTTAACATCCTCAATGGCTACGACTATCGTAAAGCTCTGGATGTGCGCGAATATGACAACA
AGGAAGCGCGTCGTAACTGGACTAAAGTGGAGGGCATTCGCAAAATGAAGGAAGGCTATCTGTCATTA
GCGGTCTCGAAATTAGCGGATATGATTATCGAAAATAACGCCATCATCGTTATGGAGGACCTGAACCAC
GGATTCAAAGCGGGCCGCTCAAAGATTGAAAAACAAGTTTATCAGAAATTTGAGAGTATGCTGATTAAC
AAACTGGGCTATATGGTGTTAAAAGACAAGTCAATTGACCAATCAGGTGGCGCGCTGCATGGATACCAG
CTGGCGAACCATGTTACCACCTTAGCATCAGTTGGAAAGCAGTGTGGGGTTATCTTTTATATACCGGCAG
CGTTCACTAGTAAAATAGATCCGACCACTGGTTTCGCCGATCTCTTTGCCCTGAGTAACGTTAAAAACGT
AGCGAGCATGCGTGAATTCTTTTCCAAAATGAAATCTGTCATTTATGATAAAGCTGAAGGCAAATTCGC
ATTCACCTTTGATTACTTGGATTACAACGTGAAGAGCGAATGTGGTCGTACGCTGTGGACCGTTTACACC
GTTGGTGAGCGCTTCACCTATTCCCGTGTGAACCGCGAATATGTACGTAAAGTCCCCACCGATATTATCT
ATGATGCCCTCCAGAAAGCAGGCATTAGCGTCGAAGGAGACTTAAGGGACAGAATTGCCGAAAGCGAT
GGCGATACGCTGAAGTCTATTTTTTACGCATTCAAATACGCGCTAGATATGCGCGTTGAGAATCGCGAG
GAAGACTACATTCAATCACCTGTGAAAAATGCCTCTGGGGAATTTTTTTGTTCAAAAAATGCTGGTAAAA
GCCTCCCACAAGATAGCGATGCAAACGGTGCATATAACATTGCCCTGAAAGGTATTCTTCAATTACGCA
TGCTGTCTGAGCAGTACGACCCCAACGCGGAATCTATTAGACTTCCGCTGATAACCAATAAAGCCTGGC
TGACATTCATGCAGTCTGGCATGAAGACCTGGAAAAATTAGGAAATCATCCTTAGCGAAAGCTAAGGAT
TTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCA
AAGAGGATTACA
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgt
66 cactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaac
aaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacat
tgattatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctacc
tgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctatctcg
tgtgagataggcggagatacgaactttaagAAGGAGatataccatgGATAGTTTGAAAGA
TTTCACCAATCTGTACCCTGTCAGTAAGACATTGAGATTTGAATTAAAGCCCGTTGGAAAGACTTTAGAA
AATATCGAGAAAGCAGGTATTTTGAAAGAGGATGAGCATCGTGCAGAAAGTTATCGGAGGGTGAAGAA
AATAATTGATACTTATCATAAGGTATTTATCGATTCTTCTCTTGAAAATATGGCTAAAATGGGTATTGAG
AATGAAATAAAAGCAATGCTCCAAAGTTTCTGCGAATTGTATAAAAAAGATCATCGCACTGAGGGTGAA
GACAAGGCATTAGATAAAATTCGAGCAGTACTTCGTGGCCTGATTGTTGGGGCTTTCACTGGTGTTTGCG
GAAGACGGGAAAATACAGTCCAAAACGAGAAGTACGAGAGTTTGTTCAAAGAAAAGTTGATAAAAGAA
ATTTTACCTGATTTTGTGCTCTCTACTGAGGCTGAAAGCTTGCCTTTCTCTGTTGAAGAAGCTACGAGGTC
ACTGAAGGAGTTTGATAGCTTTACATCCTACTTTGCTGGTTTTTACGAGAATAGAAAGAATATATACTCG
ACGAAACCTCAATCCACTGCCATTGCTTATCGTCTTATTCATGAGAACTTGCCGAAGTTCATTGATAATA
TTCTTGTTTTTCAGAAGATCAAAGAGCCTATAGCCAAAGAGCTGGAACATATTCGTGCGGACTTTTCTGC
CGGGGGGTACATAAAAAAGGATGAGAGATTGGAGGATATTTTTTCGTTGAACTATTATATCCACGTGTT
ATCTCAGGCTGGGATCGAAAAATATAACGCATTGATTGGGAAGATTGTGACAGAAGGAGATGGAGAGA
TGAAAGGGCTCAATGAACACATCAACCTTTACAACCAACAAAGAGGCAGAGAGGATCGGCTCCCTCTTT
TTAGGCCTCTTTATAAACAGATATTGAGTGACAGAGAGCAATTATCATACTTGCCTGAGAGTTTTGAAAA
AGATGAGGAGCTCCTCAGGGCTCTAAAAGAGTTCTATGATCATATCGCAGAAGACATTCTCGGACGTAC
TCAACAGTTGATGACTTCTATTTCAGAATATGATTTATCTCGGATATACGTAAGGAACGATAGCCAATTG
ACTGATATATCAAAAAAAATGTTGGGAGATTGGAATGCTATCTACATGGCTAGAGAACGAGCATATGAC
CACGAGCAGGCTCCCAAAAGAATCACGGCGAAATACGAGAGGGACAGGATTAAAGCTCTTAAAGGAGA
AGAGAGTATAAGTCTGGCAAATCTTAATAGTTGTATTGCCTTTCTGGACAATGTTAGAGATTGCCGTGTA
GATACTTATCTTTCCACACTGGGCCAGAAGGAAGGACCACATGGTCTATCTAATCTCGTTGAGAACGTTT
TTGCCTCATACCATGAAGCAGAGCAATTGTTGAGCTTTCCATACCCCGAAGAGAATAATCTGATTCAGG
ACAAGGACAATGTGGTGTTAATTAAGAATCTTCTCGACAATATCAGTGATCTGCAGAGGTTCTTGAAAC
CTCTTTGGGGTATGGGAGACGAACCCGATAAAGATGAAAGATTTTATGGAGAGTATAATTATATCCGAG
GAGCTCTAGATCAGGTGATCCCTCTGTACAATAAGGTAAGGAACTACCTCACTCGGAAGCCTTATTCGA
CCAGAAAAGTAAAACTCAATTTTGGGAATTCTCAATTGCTTAGTGGTTGGGATAGAAATAAGGAAAAGG
ATAATAGCTGTGTGATTTTGCGTAAGGGGCAGAACTTCTATTTGGCTATTATGAACAATAGGCACAAAA
GAAGTTTCGAAAACAAGGTGTTGCCCGAGTATAAGGAGGGAGAACCTTACTTCGAAAAGATGGATTATA
AATTTTTGCCTGATCCTAATAAAATGCTTCCTAAGGTTTTTCTTTCGAAAAAAGGAATAGAGATATACAA
ACCAAGTCCGAAGCTTTTAGAACAATATGGACATGGAACTCACAAAAAGGGAGATACCTTTAGTATGGA
TGATTTGCACGAACTGATCGATTTCTTCAAACACTCAATCGAGGCTCATGAAGATTGGAAGCAATTCGG
ATTCAAATTTTCTGATACGGCTACTTATGAGAATGTATCTAGTTTCTATAGAGAAGTTGAGGATCAGGGG
TATAAGCTCTCTTTCCGAAAAGTTTCGGAATCTTATGTCTATTCATTAATAGATCAAGGCAAGTTGTATTT
ATTTCAGATATACAACAAGGACTTTTCTCCCTGCAGCAAAGGGACACCTAATCTGCATACCTTGTATTGG
AGAATGCTTTTTGACGAGCGCAATTTGGCAGATGTCATATACAAACTGGATGGGAAGGCTGAAATCTTT
TTCCGAGAGAAGAGTTTGAAAAATGATCATCCCACGCATCCGGCTGGTAAGCCTATCAAAAAGAAAAGT
CGACAAAAAAAAGGAGAGGAGAGTCTGTTTGAGTATGATTTAGTCAAGGATAGGCACTATACGATGGA
TAAGTTCCAGTTTCATGTGCCTATTACTATGAATTTTAAATGTTCTGCAGGAAGCAAAGTCAATGATATG
GTTAATGCTCATATTCGAGAGGCAAAGGATATGCATGTCATTGGAATTGATCGTGGAGAACGCAATCTG
CTGTATATATGCGTGATAGATAGTCGAGGGACGATTTTGGATCAAATTTCTCTGAATACGATTAACGATA
TAGACTATCATGATTTATTGGAGAGTCGAGACAAAGACCGTCAGCAGGAGCGCCGAAACTGGCAAACTA
TCGAAGGGATCAAGGAGCTAAAACAAGGCTACCTTAGTCAGGCGGTTCATCGGATAGCCGAACTGATGG
TGGCTTATAAGGCTGTAGTTGCTTTGGAGGATTTGAATATGGGGTTCAAACGTGGGCGGCAGAAAGTAG
AAAGTTCTGTTTATCAGCAGTTTGAGAAACAGCTGATAGATAAGCTCAACTATCTTGTGGACAAGAAGA
AAAGGCCTGAAGATATTGGAGGATTGTTGAGAGCCTATCAATTTACGGCCCCATTTAAGAGTTTTAAGG
AAATGGGAAAGCAAAACGGCTTCTTGTTTTATATCCCGGCTTGGAACACGAGCAACATAGATCCGACTA
CTGGATTTGTTAATTTATTTCATGCCCAGTATGAAAATGTAGATAAAGCGAAGAGCTTCTTTCAAAAGTT
TGATTCAATTAGTTACAACCCGAAGAAAGACTGGTTTGAGTTTGCATTCGATTATAAAAACTTTACTAAA
AAGGCTGAAGGAAGTCGTTCTATGTGGATATTATGCACACATGGTTCCCGAATAAAGAATTTTAGAAAT
TCCCAGAAGAATGGTCAATGGGATTCCGAAGAATTCGCCTTGACGGAGGCTTTTAAGTCTCTTTTTGTGC
GATATGAGATAGATTATACCGCTGATTTGAAAACAGCTATTGTGGACGAAAAGCAAAAAGACTTCTTCG
TGGATCTTCTGAAGCTATTCAAATTGACAGTACAGATGCGCAACAGCTGGAAAGAGAAGGATTTGGATT
ATCTAATCTCTCCTGTAGCAGGGGCTGATGGCCGTTTCTTCGATACAAGAGAGGGAAATAAAAGTCTGC
CTAAGGATGCAGATGCCAATGGAGCTTATAATATTGCCCTAAAAGGACTTTGGGCTCTACGCCAGATTC
GGCAAACTTCAGAAGGCGGTAAACTCAAATTGGCGATTTCCAATAAGGAATGGCTACAGTTTGTGCAAG
AGAGATCTTACGAGAAAGACtgaGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTAT
TATATCGCGTTGATTATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGA
ATTTTATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACT
CAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgt
67 cactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaac
aaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacat
tgattatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctacc
tgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctatct
cgtgtgagataggcggagatacgaactttaagAAGGAGatataccATGAACAACGGCACAA
ATAATTTTCAGAACTTCATCGGGATCTCAAGTTTGCAGAAAACGCTGCGCAATGCTCTGATCCCCACGGA
AACCACGCAACAGTTCATCGTCAAGAACGGAATAATTAAAGAAGATGAGTTACGTGGCGAGAACCGCC
AGATTCTGAAAGATATCATGGATGACTACTACCGCGGATTCATCTCTGAGACTCTGAGTTCTATTGATGA
CATAGATTGGACTAGCCTGTTCGAAAAAATGGAAATTCAGCTGAAAAATGGTGATAATAAAGATACCTT
AATTAAGGAACAGACAGAGTATCGGAAAGCAATCCATAAAAAATTTGCGAACGACGATCGGTTTAAGA
ACATGTTTAGCGCCAAACTGATTAGTGACATATTACCTGAATTTGTCATCCACAACAATAATTATTCGGC
ATCAGAGAAAGAGGAAAAAACCCAGGTGATAAAATTGTTTTCGCGCTTTGCGACTAGCTTTAAAGATTA
CTTCAAGAACCGTGCAAATTGCTTTTCAGCGGACGATATTTCATCAAGCAGCTGCCATCGCATCGTCAAC
GACAATGCAGAGATATTCTTTTCAAATGCGCTGGTCTACCGCCGGATCGTAAAATCGCTGAGCAATGAC
GATATCAACAAAATTTCGGGCGATATGAAAGATTCATTAAAAGAAATGAGTCTGGAAGAAATATATTCT
TACGAGAAGTATGGGGAATTTATTACCCAGGAAGGCATTAGCTTCTATAATGATATCTGTGGGAAAGTG
AATTCTTTTATGAACCTGTATTGTCAGAAAAATAAAGAAAACAAAAATTTATACAAACTTCAGAAACTT
CACAAACAGATTCTATGCATTGCGGACACTAGCTATGAGGTCCCGTATAAATTTGAAAGTGACGAGGAA
GTGTACCAATCAGTTAACGGCTTCCTTGATAACATTAGCAGCAAACATATAGTCGAAAGATTACGCAAA
ATCGGCGATAACTATAACGGCTACAACCTGGATAAAATTTATATCGTGTCCAAATTTTACGAGAGCGTTA
GCCAAAAAACCTACCGCGACTGGGAAACAATTAATACCGCCCTCGAAATTCATTACAATAATATCTTGC
CGGGTAACGGTAAAAGTAAAGCCGACAAAGTAAAAAAAGCGGTTAAGAATGATTTACAGAAATCCATC
ACCGAAATAAATGAACTAGTGTCAAACTATAAGCTGTGCAGTGACGACAACATCAAAGCGGAGACTTAT
ATACATGAGATTAGCCATATCTTGAATAACTTTGAAGCACAGGAATTGAAATACAATCCGGAAATTCAC
CTAGTTGAATCCGAGCTCAAAGCGAGTGAGCTTAAAAACGTGCTGGACGTGATCATGAATGCGTTTCAT
TGGTGTTCGGTTTTTATGACTGAGGAACTTGTTGATAAAGACAACAATTTTTATGCGGAACTGGAGGAGA
TTTACGATGAAATTTATCCAGTAATTAGTCTGTACAACCTGGTTCGTAACTACGTTACCCAGAAACCGTA
CAGCACGAAAAAGATTAAATTGAACTTTGGAATACCGACGTTAGCAGACGGTTGGTCAAAGTCCAAAGA
GTATTCTAATAACGCTATCATACTGATGCGCGACAATCTGTATTATCTGGGCATCTTTAATGCGAAGAAT
AAACCGGACAAGAAGATTATCGAGGGTAATACGTCAGAAAATAAGGGTGACTACAAAAAGATGATTTA
TAATTTGCTCCCGGGTCCCAACAAAATGATCCCGAAAGTTTTCTTGAGCAGCAAGACGGGGGTGGAAAC
GTATAAACCGAGCGCCTATATCCTAGAGGGGTATAAACAGAATAAACATATCAAGTCTTCAAAAGACTT
TGATATCACTTTCTGTCATGATCTGATCGACTACTTCAAAAACTGTATTGCAATTCATCCCGAGTGGAAA
AACTTCGGTTTTGATTTTAGCGACACCAGTACTTATGAAGACATTTCCGGGTTTTATCGTGAGGTAGAGT
TACAAGGTTACAAGATTGATTGGACATACATTAGCGAAAAAGACATTGATCTGCTGCAGGAAAAAGGTC
AACTGTATCTGTTCCAGATATATAACAAAGATTTTTCGAAAAAATCAACCGGGAATGACAACCTTCACA
CCATGTACCTGAAAAATCTTTTCTCAGAAGAAAATCTTAAGGATATCGTCCTGAAACTTAACGGCGAAG
CGGAAATCTTCTTCAGGAAGAGCAGCATAAAGAACCCAATCATTCATAAAAAAGGCTCGATTTTAGTCA
ACCGTACCTACGAAGCAGAAGAAAAAGACCAGTTTGGCAACATTCAAATTGTGCGTAAAAATATTCCGG
AAAACATTTATCAGGAGCTGTACAAATACTTCAACGATAAAAGCGACAAAGAGCTGTCTGATGAAGCAG
CCAAACTGAAGAATGTAGTGGGACACCACGAGGCAGCGACGAATATAGTCAAGGACTATCGCTACACG
TATGATAAATACTTCCTTCATATGCCTATTACGATCAATTTCAAAGCCAATAAAACGGGTTTTATTAATG
ATAGGATCTTACAGTATATCGCTAAAGAAAAAGACTTACATGTGATCGGCATTGATCGGGGCGAGCGTA
ACCTGATCTACGTGTCCGTGATTGATACTTGTGGTAATATAGTTGAACAGAAAAGCTTTAACATTGTAAA
CGGCTACGACTATCAGATAAAACTGAAACAACAGGAGGGCGCTAGACAGATTGCGCGGAAAGAATGGA
AAGAAATTGGTAAAATTAAAGAGATCAAAGAGGGCTACCTGAGCTTAGTAATCCACGAGATCTCTAAAA
TGGTAATCAAATACAATGCAATTATAGCGATGGAGGATTTGTCTTATGGTTTTAAAAAAGGGCGCTTTAA
GGTCGAACGGCAAGTTTACCAGAAATTTGAAACCATGCTCATCAATAAACTCAACTATCTGGTATTTAA
AGATATTTCGATTACCGAGAATGGCGGTCTCCTGAAAGGTTATCAGCTGACATACATTCCTGATAAACTT
AAAAACGTGGGTCATCAGTGCGGCTGCATTTTTTATGTGCCTGCTGCATACACGAGCAAAATTGATCCGA
CCACCGGCTTTGTGAATATCTTTAAATTTAAAGACCTGACAGTGGACGCAAAACGTGAATTCATTAAAA
AATTTGACTCAATTCGTTATGACAGTGAAAAAAATCTGTTCTGCTTTACATTTGACTACAATAACTTTATT
ACGCAAAACACGGTCATGAGCAAATCATCGTGGAGTGTGTATACATACGGCGTGCGCATCAAACGTCGC
TTTGTGAACGGCCGCTTCTCAAACGAAAGTGATACCATTGACATAACCAAAGATATGGAGAAAACGTTG
GAAATGACGGACATTAACTGGCGCGATGGCCACGATCTTCGTCAAGACATTATAGATTATGAAATTGTT
CAGCACATATTCGAAATTTTCCGTTTAACAGTGCAAATGCGTAACTCCTTGTCTGAACTGGAGGACCGTG
ATTACGATCGTCTCATTTCACCTGTACTGAACGAAAATAACATTTTTTATGACAGCGCGAAAGCGGGGG
ATGCACTTCCTAAGGATGCCGATGCAAATGGTGCGTATTGTATTGCATTAAAAGGGTTATATGAAATTAA
ACAAATTACCGAAAATTGGAAAGAAGATGGTAAATTTTCGCGCGATAAACTCAAAATCAGCAATAAAG
ATTGGTTCGACTTTATCCAGAATAAGCGCTATCTCTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTT
TTTATCTGAAATTTATTATATCGCGTTGATTATTGATGCTGTTTTTAGTTTTAACGGCAATTAATATATGT
GTTATTAATTGAATGAATTTTATCATTCATAATAAGTATGTGTAGGATCAAGCTCAGGTTAAATATTCAC
TCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACAGAATTATCTCATAACAAGTGTTAAGGGATGTT
ATTTCC
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
68 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATACCAATAAATTCACTAACCAGTATTCTCTCTCTAAGACCCTGCGCTTTGAACTGATTCCGCAGG
GGAAAACCTTGGAGTTCATTCAAGAAAAAGGCCTCTTGTCTCAGGATAAACAGAGGGCTGAATCTTACC
AAGAAATGAAGAAAACTATTGATAAGTTTCATAAATATTTCATTGATTTAGCCTTGTCTAACGCCAAATT
AACTCACTTGGAAACGTATCTGGAGTTATACAACAAATCTGCCGAAACTAAGAAAGAACAGAAATTTAA
AGACGATTTGAAAAAAGTACAGGACAATCTGCGTAAAGAAATTGTCAAATCCTTCAGTGACGGCGATGC
TAAAAGCATTTTTGCCATTCTGGACAAAAAAGAGTTGATTACTGTGGAATTAGAAAAGTGGTTTGAAAA
CAATGAGCAGAAAGACATCTACTTCGATGAGAAATTCAAAACTTTCACCACCTATTTTACAGGATTTCAT
CAAAACCGGAAGAACATGTACTCAGTAGAACCGAACTCCACGGCCATTGCGTATCGTTTGATCCATGAG
AATCTGCCTAAATTTCTGGAGAATGCGAAAGCCTTTGAAAAGATTAAGCAGGTCGAATCGCTGCAAGTG
AATTTTCGTGAACTCATGGGCGAATTTGGTGACGAAGGTCTAATCTTCGTTAACGAACTGGAAGAAATG
TTTCAGATTAATTACTACAATGACGTGCTATCGCAGAACGGTATCACAATCTACAATAGTATTATCTCAG
GGTTCACAAAAAACGATATAAAATACAAAGGCCTGAACGAGTATATCAATAACTACAACCAAACAAAG
GACAAAAAGGATAGGCTTCCGAAACTGAAGCAGTTATACAAACAGATTTTATCTGACAGAATCTCCCTG
AGCTTTCTGCCGGATGCTTTCACTGATGGGAAGCAGGTTCTGAAAGCGATTTTCGATTTTTATAAGATTA
ACTTACTGAGCTACACGATTGAAGGTCAAGAAGAATCTCAAAACTTACTGCTCTTGATCCGTCAAACCAT
TGAAAATCTATCATCGTTCGATACGCAGAAAATCTACCTCAAAAACGATACTCACCTGACTACGATCTCT
CAGCAGGTTTTCGGGGATTTTAGTGTATTTTCAACAGCTCTGAACTACTGGTATGAAACCAAAGTCAATC
CGAAATTCGAGACGGAATATTCTAAGGCCAACGAAAAAAAACGTGAGATTCTTGATAAAGCTAAAGCC
GTATTTACTAAACAGGATTACTTTTCTATTGCTTTCCTGCAGGAAGTTTTATCGGAGTATATCCTGACCCT
GGATCATACATCTGATATCGTTAAAAAACACAGCAGCAATTGCATCGCTGACTATTTCAAAAACCACTTT
GTCGCCAAAAAAGAAAACGAAACAGACAAGACTTTCGATTTCATTGCTAACATCACCGCAAAATACCAG
TGTATTCAGGGTATCTTGGAAAACGCCGACCAATACGAAGACGAACTGAAACAAGATCAGAAGCTGATC
GATAATTTAAAATTCTTCTTAGATGCAATCCTGGAGCTGCTGCACTTCATCAAACCGCTTCATTTAAAGA
GCGAGTCCATTACCGAAAAGGACACCGCCTTCTATGACGTTTTTGAAAATTATTATGAAGCCCTCTCCTT
GCTGACTCCGCTGTATAATATGGTACGCAATTACGTAACCCAGAAACCATATTCTACCGAAAAAATTAA
ACTGAACTTTGAAAACGCACAGCTGCTCAACGGTTGGGACGCGAATAAAGAAGGTGACTACCTCACCAC
CATCCTGAAAAAAGATGGTAACTATTTTCTGGCAATTATGGATAAGAAACATAATAAAGCATTCCAGAA
ATTTCCTGAAGGGAAAGAAAATTACGAAAAGATGGTGTACAAACTCTTACCTGGAGTTAACAAAATGTT
GCCGAAAGTATTTTTTAGTAATAAGAACATCGCGTACTTTAACCCGTCCAAAGAACTGCTGGAAAATTAT
AAAAAGGAGACGCATAAGAAAGGGGATACCTTTAACCTGGAACATTGCCATACCTTAATAGACTTCTTC
AAGGATTCCCTGAATAAACACGAGGATTGGAAATATTTCGATTTTCAGTTTAGTGAGACCAAGTCATAC
CAGGATCTTAGCGGCTTTTATCGCGAAGTAGAACACCAAGGCTATAAAATTAACTTCAAAAACATCGAC
AGCGAATACATCGACGGTTTAGTTAACGAGGGCAAACTGTTTCTGTTCCAGATCTATTCAAAGGATTTTA
GCCCGTTCTCTAAAGGCAAACCAAATATGCATACGTTGTACTGGAAAGCACTGTTTGAAGAGCAAAACC
TGCAGAATGTGATTTATAAACTGAACGGCCAAGCTGAGATTTTTTTCCGTAAAGCCTCGATTAAACCGAA
AAATATCATCCTTCATAAGAAGAAAATAAAGATCGCTAAAAAACACTTCATAGATAAAAAAACCAAAA
CCTCCGAAATAGTGCCTGTTCAAACAATTAAGAACTTGAATATGTACTACCAGGGCAAGATATCGGAAA
AGGAGTTGACTCAAGACGATCTTCGCTATATCGATAACTTTTCGATTTTTAACGAAAAAAACAAGACGA
TCGACATCATCAAAGATAAACGCTTCACTGTAGATAAGTTCCAGTTTCATGTGCCGATTACTATGAACTT
CAAAGCTACCGGGGGTAGCTATATCAACCAAACGGTGTTGGAATACCTGCAGAATAACCCGGAAGTCAA
AATCATTGGGCTGGACCGCGGAGAACGTCACCTTGTGTACTTGACCTTAATCGATCAGCAAGGCAACAT
CTTAAAACAAGAATCGCTGAATACCATTACGGATTCAAAGATTAGCACCCCGTATCATAAGCTGCTCGA
TAACAAGGAGAATGAGCGCGACCTGGCCCGTAAAAACTGGGGCACGGTGGAAAACATTAAGGAGTTAA
AGGAGGGTTATATTTCCCAGGTAGTGCATAAGATCGCCACTCTCATGCTCGAGGAAAATGCGATCGTTG
TCATGGAAGACTTAAACTTCGGATTTAAACGTGGGCGATTTAAAGTAGAGAAACAAATCTACCAGAAGT
TAGAAAAAATGCTGATTGACAAATTAAATTACTTGGTCCTAAAAGACAAACAGCCGCAAGAATTGGGTG
GATTATACAACGCCCTCCAACTTACCAATAAATTCGAAAGTTTTCAGAAAATGGGTAAACAGTCAGGCT
TTCTTTTTTATGTTCCTGCGTGGAACACATCCAAAATCGACCCTACAACCGGCTTCGTCAATTACTTCTAT
ACTAAATATGAAAACGTCGACAAAGCAAAAGCATTCTTTGAAAAGTTCGAAGCAATACGTTTTAACGCT
GAGAAAAAATATTTCGAGTTCGAAGTCAAGAAATACTCAGACTTTAACCCCAAAGCTGAGGGCACACAG
CAAGCGTGGACAATCTGCACCTACGGCGAGCGCATCGAAACGAAGCGTCAAAAAGATCAGAATAACAA
ATTTGTTTCAACACCTATCAACCTGACCGAGAAGATTGAAGACTTCTTAGGTAAAAATCAGATTGTTTAT
GGCGACGGTAACTGTATAAAATCTCAAATAGCCTCAAAGGATGATAAAGCATTTTTCGAAACATTATTA
TATTGGTTCAAAATGACACTGCAGATGCGCAATAGTGAGACGCGTACAGATATTGATTATCTTATCAGCC
CGGTCATGAACGACAACGGTACTTTTTACAACTCCAGAGACTATGAAAAACTTGAGAATCCAACTCTCC
CCAAAGATGCTGATGCGAACGGTGCTTATCACATCGCGAAAAAAGGTCTGATGCTGCTGAACAAAATCG
ACCAAGCCGATCTGACTAAGAAAGTTGACCTAAGCATTTCAAATCGGGACTGGTTACAGTTTGTTCAAA
AGAACAAATGAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTC
AGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgt
69 cactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaa
caaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccac
attgattatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcct
acctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctat
ctcgtgtgagataggcggagatacgaactttaagAAGGAGatatacCATGGAACAGGAATATT
ATCTGGGCTTGGACATGGGCACCGGTTCCGTCGGCTGGGCTGTTACTGACAGTGAATATCACGTTCTAAG
AAAGCATGGTAAGGCATTGTGGGGTGTAAGACTTTTCGAATCTGCTTCCACTGCTGAAGAGCGTAGAAT
GTTTAGAACGAGTCGACGTAGGCTAGACAGGCGCAATTGGAGAATCGAAATTTTACAAGAAATTTTTGC
GGAAGAGATATCTAAGAAAGACCCAGGCTTTTTCCTGAGAATGAAGGAATCTAAGTATTACCCTGAGGA
TAAAAGAGATATAAATGGTAACTGTCCCGAATTGCCTTACGCATTATTTGTGGACGATGATTTTACCGAT
AAGGATTACCATAAAAAGTTCCCAACTATCTACCATTTACGCAAAATGTTAATGAATACAGAGGAAACC
CCAGACATAAGACTAGTTTATCTGGCAATACACCATATGATGAAACATAGAGGCCATTTCTTACTTTCCG
GGGATATCAACGAAATCAAAGAGTTTGGTACCACATTTAGTAAGTTACTGGAAAACATAAAGAATGAAG
AATTGGATTGGAACTTAGAACTCGGAAAAGAAGAATACGCGGTTGTCGAATCTATCCTGAAGGATAATA
TGCTGAATAGGTCGACCAAAAAAACTAGGCTGATCAAAGCACTGAAAGCCAAATCTATCTGCGAAAAA
GCTGTTTTAAATTTACTTGCTGGTGGCACTGTTAAGTTATCAGACATTTTTGGTTTGGAAGAATTGAACG
AAACCGAGCGTCCAAAAATTAGTTTCGCTGATAATGGCTACGATGATTACATTGGTGAGGTGGAAAACG
AGTTGGGCGAACAATTTTATATTATAGAGACAGCTAAGGCAGTCTATGACTGGGCTGTTTTAGTAGAAA
TCCTTGGTAAATACACATCTATCTCCGAAGCGAAAGTTGCTACTTACGAAAAGCACAAGTCCGATCTCCA
GTTTTTGAAGAAAATTGTCAGGAAATATCTGACTAAGGAAGAATATAAAGATATTTTCGTTAGTACCTCT
GACAAACTGAAAAATTACTCCGCTTACATCGGGATGACCAAGATTAATGGCAAAAAAGTTGATCTGCAA
AGCAAAAGGTGTTCGAAGGAAGAATTTTATGATTTCATTAAAAAGAATGTCTTAAAAAAATTAGAAGGT
CAGCCAGAATACGAATATTTGAAAGAAGAACTGGAAAGAGAGACATTCTTACCAAAACAAGTCAACAG
AGATAATGGGGTAATTCCATATCAAATTCACCTCTACGAATTAAAAAAAATTTTAGGCAATTTACGCGAT
AAAATTGACCTTATCAAAGAAAATGAGGATAAGCTGGTTCAACTCTTTGAATTCAGAATACCCTATTATG
TGGGCCCACTGAACAAGATTGATGACGGCAAAGAAGGTAAATTCACATGGGCCGTCCGCAAATCCAATG
AAAAAATTTACCCATGGAACTTTGAAAATGTAGTAGATATTGAAGCGTCTGCGGAGAAATTTATTCGAA
GAATGACTAATAAATGCACTTACTTGATGGGAGAGGATGTTCTGCCTAAAGACAGCTTATTATACAGCA
AGTACATGGTTCTAAACGAACTTAACAACGTTAAGTTGGACGGTGAGAAATTAAGTGTAGAATTGAAAC
AAAGATTGTATACTGACGTCTTCTGCAAGTACAGAAAAGTGACAGTTAAAAAAATTAAGAATTACTTGA
AGTGCGAAGGTATAATTTCTGGAAACGTAGAGATTACTGGTATTGATGGTGATTTCAAAGCATCCCTAA
CAGCTTACCACGATTTCAAGGAAATCCTGACAGGAACTGAACTCGCAAAAAAAGATAAAGAAAACATT
ATTACTAATATTGTTCTTTTCGGTGATGACAAGAAATTGTTGAAGAAAAGACTGAATAGACTTTACCCCC
AGATTACTCCCAATCAACTTAAGAAAATTTGTGCTTTGTCTTACACAGGATGGGGTCGTTTTTCAAAAAA
GTTCTTAGAAGAGATTACCGCACCTGATCCAGAAACAGGCGAAGTATGGAATATAATTACCGCCTTATG
GGAATCGAACAATAATCTTATGCAACTTCTGAGCAATGAATATCGTTTCATGGAAGAAGTTGAGACTTA
CAACATGGGCAAACAGACGAAGACTTTATCCTATGAAACTGTGGAAAATATGTATGTATCACCTTCTGT
CAAGAGACAAATTTGGCAAACCTTAAAAATTGTCAAAGAATTAGAAAAGGTAATGAAGGAGTCTCCTA
AACGTGTGTTTATTGAAATGGCTAGAGAAAAACAAGAGTCAAAAAGAACCGAGTCAAGAAAGAAGCAG
TTAATCGATTTATATAAGGCTTGTAAAAACGAAGAGAAAGATTGGGTTAAAGAATTGGGGGACCAAGA
GGAACAAAAACTACGGTCGGATAAGTTGTATTTATACTATACGCAAAAGGGACGATGTATGTATTCCGG
CGAGGTAATAGAATTGAAGGATTTATGGGACAATACAAAATATGACATAGACCATATATATCCCCAATC
AAAAACGATGGACGATAGCTTGAACAATAGAGTACTCGTGAAAAAAAAATATAATGCGACCAAATCTG
ATAAGTATCCTCTGAATGAAAATATCAGACATGAAAGAAAGGGGTTCTGGAAGTCCTTGTTAGATGGTG
GGTTTATAAGCAAAGAAAAGTACGAGCGTCTAATAAGAAACACGGAGTTATCGCCAGAAGAACTCGCT
GGTTTTATTGAGAGGCAAATCGTGGAAACGAGACAATCTACCAAAGCCGTTGCTGAGATCCTAAAGCAA
GTTTTCCCAGAGTCGGAGATTGTCTATGTCAAAGCTGGCACAGTGAGCAGGTTTAGGAAAGACTTCGAA
CTATTAAAGGTAAGAGAAGTGAACGATTTACATCACGCAAAGGACGCTTACCTAAATATCGTTGTAGGT
AACTCATATTATGTTAAATTTACCAAGAACGCCTCTTGGTTTATAAAGGAGAACCCAGGTAGAACATAT
AACCTGAAAAAGATGTTCACCTCTGGTTGGAATATTGAGAGAAACGGAGAAGTCGCATGGGAAGTTGGT
AAGAAAGGGACTATAGTGACAGTAAAGCAAATTATGAACAAAAATAATATCCTCGTTACAAGGCAGGT
TCATGAAGCAAAGGGCGGCCTTTTTGACCAACAAATTATGAAGAAAGGGAAAGGTCAAATTGCAATAA
AAGAAACCGATGAGAGACTAGCGTCAATAGAAAAGTATGGTGGCTATAATAAAGCTGCGGGTGCATAC
TTTATGCTTGTTGAATCAAAAGACAAGAAAGGTAAGACTATTAGAACTATAGAATTTATACCCCTGTACC
TTAAAAACAAAATTGAATCGGATGAGTCAATCGCGTTAAATTTTCTAGAGAAAGGAAGGGGTTTAAAAG
AACCAAAGATCCTGTTAAAAAAGATTAAGATTGACACCTTGTTCGATGTAGATGGATTTAAAATGTGGT
TATCTGGCAGAACAGGCGATAGACTTTTGTTTAAGTGCGCTAATCAATTAATTTTGGATGAGAAAATCAT
TGTCACAATGAAAAAAATAGTTAAGTTTATTCAGAGAAGACAAGAAAACAGGGAGTTGAAATTATCTGA
TAAAGATGGTATCGACAATGAAGTTTTAATGGAAATCTACAATACATTCGTTGATAAACTTGAAAATAC
CGTATATCGAATCAGGTTAAGTGAACAAGCCAAAACATTAATTGATAAACAAAAAGAATTTGAAAGGCT
ATCACTGGAAGACAAATCCTCCACCCTATTTGAAATTTTGCATATATTCCAGTGCCAATCTTCAGCAGCT
AATTTAAAAATGATTGGCGGACCTGGGAAAGCCGGCATCCTAGTGATGAACAATAATATCTCCAAGTGT
AACAAAATATCAATTATTAACCAATCTCCGACAGGTATTTTTGAAAATGAAATAGACTTGCTTAAGATAT
AAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGAT
GCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTTTATCATTCATAATAAGT
ATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTAC
AGAATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
70 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATTCTTTCGACTCTTTCACCAACCTGTACTCTCTGTCTAAAACCCTGAAATTCGAAATGCGTCCGGT
TGGTAACACCCAGAAAATGCTGGACAACGCGGGTGTTTTCGAAAAAGACAAACTGATCCAGAAAAAAT
ACGGTAAAACCAAACCGTACTTCGACCGTCTGCACCGTGAATTCATCGAAGAAGCGCTGACCGGTGTTG
AACTGATCGGTCTGGACGAAAACTTCCGTACCCTGGTTGACTGGCAGAAAGACAAAAAAAACAACGTTG
CGATGAAAGCGTACGAAAACTCTCTGCAGCGTCTGCGTACCGAAATCGGTAAAATCTTCAACCTGAAAG
CGGAAGACTGGGTTAAAAACAAATACCCGATCCTGGGTCTGAAAAACAAAAACACCGACATCCTGTTCG
AAGAAGCGGTTTTCGGTATCCTGAAAGCGCGTTACGGTGAAGAAAAAGACACCTTCATCGAAGTTGAAG
AAATCGACAAAACCGGTAAATCTAAAATCAACCAGATCTCTATCTTCGACTCTTGGAAAGGTTTCACCG
GTTACTTCAAAAAATTCTTCGAAACCCGTAAAAACTTCTACAAAAACGACGGTACCTCTACCGCGATCG
CGACCCGTATCATCGACCAGAACCTGAAACGTTTCATCGACAACCTGTCTATCGTTGAATCTGTTCGTCA
GAAAGTTGACCTGGCGGAAACCGAAAAATCTTTCTCTATCTCTCTGTCTCAGTTCTTCTCTATCGACTTCT
ACAACAAATGCCTGCTGCAGGACGGTATCGACTACTACAACAAAATCATCGGTGGTGAAACCCTGAAAA
ACGGTGAAAAACTGATCGGTCTGAACGAACTGATCAACCAGTACCGTCAGAACAACAAAGACCAGAAA
ATCCCGTTCTTCAAACTGCTGGACAAACAGATCCTGTCTGAAAAAATCCTGTTCCTGGACGAAATCAAA
AACGACACCGAACTGATCGAAGCGCTGTCTCAGTTCGCGAAAACCGCGGAAGAAAAAACCAAAATCGT
TAAAAAACTGTTCGCGGACTTCGTTGAAAACAACTCTAAATACGACCTGGCGCAGATCTACATCTCTCA
GGAAGCGTTCAACACCATCTCTAACAAATGGACCTCTGAAACCGAAACCTTCGCGAAATACCTGTTCGA
AGCGATGAAATCTGGTAAACTGGCGAAATACGAAAAAAAAGACAACTCTTACAAATTCCCGGACTTCAT
CGCGCTGTCTCAGATGAAATCTGCGCTGCTGTCTATCTCTCTGGAAGGTCACTTCTGGAAAGAAAAATAC
TACAAAATCTCTAAATTCCAGGAAAAAACCAACTGGGAACAGTTCCTGGCGATCTTCCTGTACGAATTC
AACTCTCTGTTCTCTGACAAAATCAACACCAAAGACGGTGAAACCAAACAGGTTGGTTACTACCTGTTC
GCGAAAGACCTGCACAACCTGATCCTGTCTGAACAGATCGACATCCCGAAAGACTCTAAAGTTACCATC
AAAGACTTCGCGGACTCTGTTCTGACCATCTACCAGATGGCGAAATACTTCGCGGTTGAAAAAAAACGT
GCGTGGCTGGCGGAATACGAACTGGACTCTTTCTACACCCAGCCGGACACCGGTTACCTGCAGTTCTAC
GACAACGCGTACGAAGACATCGTTCAGGTTTACAACAAACTGCGTAACTACCTGACCAAAAAACCGTAC
TCTGAAGAAAAATGGAAACTGAACTTCGAAAACTCTACCCTGGCGAACGGTTGGGACAAAAACAAAGA
ATCTGACAACTCTGCGGTTATCCTGCAGAAAGGTGGTAAATACTACCTGGGTCTGATCACCAAAGGTCA
CAACAAAATCTTCGACGACCGTTTCCAGGAAAAATTCATCGTTGGTATCGAAGGTGGTAAATACGAAAA
AATCGTTTACAAATTCTTCCCGGACCAGGCGAAAATGTTCCCGAAAGTTTGCTTCTCTGCGAAAGGTCTG
GAATTCTTCCGTCCGTCTGAAGAAATCCTGCGTATCTACAACAACGCGGAATTCAAAAAAGGTGAAACC
TACTCTATCGACTCTATGCAGAAACTGATCGACTTCTACAAAGACTGCCTGACCAAATACGAAGGTTGG
GCGTGCTACACCTTCCGTCACCTGAAACCGACCGAAGAATACCAGAACAACATCGGTGAATTCTTCCGT
GACGTTGCGGAAGACGGTTACCGTATCGACTTCCAGGGTATCTCTGACCAGTACATCCACGAAAAAAAC
GAAAAAGGTGAACTGCACCTGTTCGAAATCCACAACAAAGACTGGAACCTGGACAAAGCGCGTGACGG
TAAATCTAAAACCACCCAGAAAAACCTGCACACCCTGTACTTCGAATCTCTGTTCTCTAACGACAACGTT
GTTCAGAACTTCCCGATCAAACTGAACGGTCAGGCGGAAATCTTCTACCGTCCGAAAACCGAAAAAGAC
AAACTGGAATCTAAAAAAGACAAAAAAGGTAACAAAGTTATCGACCACAAACGTTACTCTGAAAACAA
AATCTTCTTCCACGTTCCGCTGACCCTGAACCGTACCAAAAACGACTCTTACCGTTTCAACGCGCAGATC
AACAACTTCCTGGCGAACAACAAAGACATCAACATCATCGGTGTTGACCGTGGTGAAAAACACCTGGTT
TACTACTCTGTTATCACCCAGGCGTCTGACATCCTGGAATCTGGTTCTCTGAACGAACTGAACGGTGTTA
ACTACGCGGAAAAACTGGGTAAAAAAGCGGAAAACCGTGAACAGGCGCGTCGTGACTGGCAGGACGTT
CAGGGTATCAAAGACCTGAAAAAAGGTTACATCTCTCAGGTTGTTCGTAAACTGGCGGACCTGGCGATC
AAACACAACGCGATCATCATCCTGGAAGACCTGAACATGCGTTTCAAACAGGTTCGTGGTGGTATCGAA
AAATCTATCTACCAGCAGCTGGAAAAAGCGCTGATCGACAAACTGTCTTTCCTGGTTGACAAAGGTGAA
AAAAACCCGGAACAGGCGGGTCACCTGCTGAAAGCGTACCAGCTGTCTGCGCCGTTCGAAACCTTCCAG
AAAATGGGTAAACAGACCGGTATCATCTTCTACACCCAGGCGTCTTACACCTCTAAATCTGACCCGGTTA
CCGGTTGGCGTCCGCACCTGTACCTGAAATACTTCTCTGCGAAAAAAGCGAAAGACGACATCGCGAAAT
TCACCAAAATCGAATTCGTTAACGACCGTTTCGAACTGACCTACGACATCAAAGACTTCCAGCAGGCGA
AAGAATACCCGAACAAAACCGTTTGGAAAGTTTGCTCTAACGTTGAACGTTTCCGTTGGGACAAAAACC
TGAACCAGAACAAAGGTGGTTACACCCACTACACCAACATCACCGAAAACATCCAGGAACTGTTCACCA
AATACGGTATCGACATCACCAAAGACCTGCTGACCCAGATCTCTACCATCGACGAAAAACAGAACACCT
CTTTCTTCCGTGACTTCATCTTCTACTTCAACCTGATCTGCCAGATCCGTAACACCGACGACTCTGAAATC
GCGAAAAAAAACGGTAAAGACGACTTCATCCTGTCTCCGGTTGAACCGTTCTTCGACTCTCGTAAAGAC
AACGGTAACAAACTGCCGGAAAACGGTGACGACAACGGTGCGTACAACATCGCGCGTAAAGGTATCGT
TATCCTGAACAAAATCTCTCAGTACTCTGAAAAAAACGAAAACTGCGAAAAAATGAAATGGGGTGACCT
GTACGTTTCTAACATCGACTGGGACAACTTCGTTGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTA
TCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGA
TTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
71 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATAACAAATTCGAAAACTTCACCGGTCTGTACCCGATCTCTAAAACCCTGCGTTTCGAACTGATCC
CGCAGGGTAAAACCCTGGAATACATCGAAAAATCTGAAATCCTGGAAAACGACAACTACCGTGCGGAA
AAATACGAAGAAGTTAAAGACATCATCGACGGTTACCACAAATGGTTCATCAACGAAACCCTGCACGAC
CTGCACATCAACTGGTCTGAACTGAAAGTTGCGCTGGAAAACAACCGTATCGAAAAATCTGACGCGTCT
AAAAAAGAACTGCAGCGTGTTCAGAAAATCAAACGTGAAGAAATCTACAACGCGTTCATCGAACACGA
AGCGTTCCAGTACCTGTTCAAAGAAAACCTGCTGTCTGACCTGCTGCCGATCCAGATCGAACAGTCTGA
AGACCTGGACGCGGAAAAAAAAAAACAGGCGGTTGAAACCTTCAACCGTTTCTCTACCTACTTCACCGG
TTTCCACGAAAACCGTAAAAACATCTACTCTAAAGAAGGTATCTCTACCTCTGTTACCTACCGTATCGTT
CACGACAACTTCCCGAAATTCCTGGAAAACATGAAAGTTTTCGAAATCCTGCGTAACGAATGCCCGGAA
GTTATCTCTGACACCGCGAACGAACTGGCGCCGTTCATCGACGGTGTTCGTATCGAAGACATCTTCCTGA
TCGACTTCTTCAACTCTACCTTCTCTCAGAACGGTATCGACTACTACAACCGTATCCTGGGTGGTGTTACC
ACCGAAACCGGTGAAAAATACCGTGGTATCAACGAATTCACCAACCTGTACCGTCAGCAGCACCCGGAA
TTCGGTAAATCTAAAAAAGCGACCAAAATGGTTGTTCTGTTCAAACAGATCCTGTCTGACCGTGACACCC
TGTCTTTCATCCCGGAAATGTTCGGTAACGACAAACAGGTTCAGAACTCTATCCAGCTGTTCTACAACCG
TGAAATCTCTCAGTTCGAAAACGAAGGTGTTAAAACCGACGTTTGCACCGCGCTGGCGACCCTGACCTC
TAAAATCGCGGAATTCGACACCGAAAAAATCTACATCCAGCAGCCGGAACTGCCGAACGTTTCTCAGCG
TCTGTTCGGTTCTTGGAACGAACTGAACGCGTGCCTGTTCAAATACGCGGAACTGAAATTCGGTACCGCG
GAAAAAGTTGCGAACCGTAAAAAAATCGACAAATGGCTGAAATCTGACCTGTTCTCTTTCACCGAACTG
AACAAAGCGCTGGAATTCTCTGGTAAAGACGAACGTATCGAAAACTACTTCTCTGAAACCGGTATCTTC
GCGCAGCTGGTTAAAACCGGTTTCGACGAAGCGCAGTCTATCCTGGAAACCGAATACACCTCTGAAGTT
CACCTGAAAGACCAGCAGACCGACATCGAAAAAATCAAAACCTTCCTGGACGCGCTGCAGAACCTGAT
GCACCTGCTGAAATCTCTGTGCGTTTCTGAAGAAGCGGACCGTGACGCGGCGTTCTACAACGAATTCGA
CATGCTGTACAACCAGCTGAAACTGGTTGTTCCGCTGTACAACAAAGTTCGTAACTACATCACCCAGAA
ACTGTTCCGTTCTGACAAAATCAAAATCTACTTCGAAAACAAAGGTCAGTTCCTGGGTGGTTGGGTTGAC
TCTCAGACCGAAAACTCTGACAACGGTACCCAGGCGGGTGGTTACATCTTCCGTAAAGAAAACGTTATC
AACGAATACGACTACTACCTGGGTATCTGCTCTGACCCGAAACTGTTCCGTCGTACCACCATCGTTTCTG
AAAACGACCGTTCTTCTTTCGAACGTCTGGACTACTACCAGCTGAAAACCGCGTCTGTTTACGGTAACTC
TTACTGCGGTAAACACCCGTACACCGAAGACAAAAACGAACTGGTTAACTCTATCGACCGTTTCGTTCA
CCTGTCTGGTAACAACATCCTGATCGAAAAAATCGCGAAAGACAAAGTTAAATCTAACCCGACCACCAA
CACCCCGTCTGGTTACCTGAACTTCATCCACCGTGAAGCGCCGAACACCTACGAATGCCTGCTGCAGGA
CGAAAACTTCGTTTCTCTGAACCAGCGTGTTGTTTCTGCGCTGAAAGCGACCCTGGCGACCCTGGTTCGT
GTTCCGAAAGCGCTGGTTTACGCGAAAAAAGACTACCACCTGTTCTCTGAAATCATCAACGACATCGAC
GAACTGTCTTACGAAAAAGCGTTCTCTTACTTCCCGGTTTCTCAGACCGAATTCGAAAACTCTTCTAACC
GTACCATCAAACCGCTGCTGCTGTTCAAAATCTCTAACAAAGACCTGTCTTTCGCGGAAAACTTCGAAAA
AGGTAACCGTCAGAAAATCGGTAAAAAAAACCTGCACACCCTGTACTTCGAAGCGCTGATGAAAGGTA
ACCAGGACACCATCGACATCGGTACCGGTATGGTTTTCCACCGTGTTAAATCTCTGAACTACAACGAAA
AAACCCTGAAATACGGTCACCACTCTACCCAGCTGAACGAAAAATTCTCTTACCCGATCATCAAAGACA
AACGTTTCGCGTCTGACAAATTCCTGTTCCACCTGTCTACCGAAATCAACTACAAAGAAAAACGTAAAC
CGCTGAACAACTCTATCATCGAATTCCTGACCAACAACCCGGACATCAACATCATCGGTCTGGACCGTG
GTGAACGTCACCTGATCTACCTGACCCTGATCAACCAGAAAGGTGAAATCCTGCGTCAGAAAACCTTCA
ACATCGTTGGTAACACCAACTACCACGAAAAACTGAACCAGCGTGAAAAAGAACGTGACAACGCGCGT
AAATCTTGGGCGACCATCGGTAAAATCAAAGAACTGAAAGAAGGTTTCCTGTCTCTGGTTATCCACGAA
ATCGCGAAAATCATGGTTGAAAACAACGCGATCGTTGTTCTGGAAGACCTGAACTTCGGTTTCAAACGT
GGTCGTTTCAAAGTTGAAAAACAGATCTACCAGAAATTCGAAAAAATGCTGATCGACAAACTGAACTAC
CTGGTTTTCAAAGACAAAAAAGCGAACGAAGCGGGTGGTGTTCTGAAAGGTTACCAGCTGGCGGAAAA
ATTCGAATCTTTCCAGAAAATGGGTAAACAGTCTGGTTTCCTGTTCTACGTTCCGGCGGCGTACACCTCT
AAAATCGACCCGACCACCGGTTTCGTTAACATGCTGAACCTGAACTACACCAACATGAAAGACGCGCAG
ACCCTGCTGTCTGGTATGGACAAAATCTCTTTCAACGCGGACGCGAACTACTTCGAATTCGAACTGGACT
ACGAAAAATTCAAAACCAACCAGACCGACCACACCAACAAATGGACCATCTGCACCGTTGGTGAAAAA
CGTTTCACCTACAACTCTGCGACCAAAGAAACCACCACCGTTAACGTTACCGAAGACCTGAAAAAACTG
CTGGACAAATTCGAAGTTAAATACTCTAACGGTGACAACATCAAAGACGAAATCTGCCGTCAGACCGAC
GCGAAATTCTTCGAAATCATCCTGTGGCTGCTGAAACTGACCATGCAGATGCGTAACTCTAACACCAAA
ACCGAAGAAGACTTCATCCTGTCTCCGGTTAAAAACTCTAACGGTGAATTCTTCCGTTCTAACGACGACG
CGAACGGTATCTGGCCGGCGGACGCGGACGCGAACGGTGCGTACCACATCGCGCTGAAAGGTCTGTACC
TGGTTAAAGAATGCTTCAACAAAAACGAAAAATCTCTGAAAATCGAACACAAAAACTGGTTCAAATTCG
CGCAGACCCGTTTCAACGGTTCTCTGACCAAAAACGGTTAAGAAATCATCCTTAGCGAAAGCTAAGGAT
TTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCA
AAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
72 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATACCCAGTTCGAAGGTTTCACCAACCTGTACCAGGTTTCTAAAACCCTGCGTTTCGAACTGATCC
CGCAGGGTAAAACCCTGAAACACATCCAGGAACAGGGTTTCATCGAAGAAGACAAAGCGCGTAACGAC
CACTACAAAGAACTGAAACCGATCATCGACCGTATCTACAAAACCTACGCGGACCAGTGCCTGCAGCTG
GTTCAGCTGGACTGGGAAAACCTGTCTGCGGCGATCGACTCTTACCGTAAAGAAAAAACCGAAGAAACC
CGTAACGCGCTGATCGAAGAACAGGCGACCTACCGTAACGCGATCCACGACTACTTCATCGGTCGTACC
GACAACCTGACCGACGCGATCAACAAACGTCACGCGGAAATCTACAAAGGTCTGTTCAAAGCGGAACT
GTTCAACGGTAAAGTTCTGAAACAGCTGGGTACCGTTACCACCACCGAACACGAAAACGCGCTGCTGCG
TTCTTTCGACAAATTCACCACCTACTTCTCTGGTTTCTACGAAAACCGTAAAAACGTTTTCTCTGCGGAA
GACATCTCTACCGCGATCCCGCACCGTATCGTTCAGGACAACTTCCCGAAATTCAAAGAAAACTGCCAC
ATCTTCACCCGTCTGATCACCGCGGTTCCGTCTCTGCGTGAACACTTCGAAAACGTTAAAAAAGCGATCG
GTATCTTCGTTTCTACCTCTATCGAAGAAGTTTTCTCTTTCCCGTTCTACAACCAGCTGCTGACCCAGACC
CAGATCGACCTGTACAACCAGCTGCTGGGTGGTATCTCTCGTGAAGCGGGTACCGAAAAAATCAAAGGT
CTGAACGAAGTTCTGAACCTGGCGATCCAGAAAAACGACGAAACCGCGCACATCATCGCGTCTCTGCCG
CACCGTTTCATCCCGCTGTTCAAACAGATCCTGTCTGACCGTAACACCCTGTCTTTCATCCTGGAAGAAT
TCAAATCTGACGAAGAAGTTATCCAGTCTTTCTGCAAATACAAAACCCTGCTGCGTAACGAAAACGTTCT
GGAAACCGCGGAAGCGCTGTTCAACGAACTGAACTCTATCGACCTGACCCACATCTTCATCTCTCACAA
AAAACTGGAAACCATCTCTTCTGCGCTGTGCGACCACTGGGACACCCTGCGTAACGCGCTGTACGAACG
TCGTATCTCTGAACTGACCGGTAAAATCACCAAATCTGCGAAAGAAAAAGTTCAGCGTTCTCTGAAACA
CGAAGACATCAACCTGCAGGAAATCATCTCTGCGGCGGGTAAAGAACTGTCTGAAGCGTTCAAACAGAA
AACCTCTGAAATCCTGTCTCACGCGCACGCGGCGCTGGACCAGCCGCTGCCGACCACCCTGAAAAAACA
GGAAGAAAAAGAAATCCTGAAATCTCAGCTGGACTCTCTGCTGGGTCTGTACCACCTGCTGGACTGGTT
CGCGGTTGACGAATCTAACGAAGTTGACCCGGAATTCTCTGCGCGTCTGACCGGTATCAAACTGGAAAT
GGAACCGTCTCTGTCTTTCTACAACAAAGCGCGTAACTACGCGACCAAAAAACCGTACTCTGTTGAAAA
ATTCAAACTGAACTTCCAGATGCCGACCCTGGCGTCTGGTTGGGACGTTAACAAAGAAAAAAACAACGG
TGCGATCCTGTTCGTTAAAAACGGTCTGTACTACCTGGGTATCATGCCGAAACAGAAAGGTCGTTACAA
AGCGCTGTCTTTCGAACCGACCGAAAAAACCTCTGAAGGTTTCGACAAAATGTACTACGACTACTTCCC
GGACGCGGCGAAAATGATCCCGAAATGCTCTACCCAGCTGAAAGCGGTTACCGCGCACTTCCAGACCCA
CACCACCCCGATCCTGCTGTCTAACAACTTCATCGAACCGCTGGAAATCACCAAAGAAATCTACGACCT
GAACAACCCGGAAAAAGAACCGAAAAAATTCCAGACCGCGTACGCGAAAAAAACCGGTGACCAGAAA
GGTTACCGTGAAGCGCTGTGCAAATGGATCGACTTCACCCGTGACTTCCTGTCTAAATACACCAAAACC
ACCTCTATCGACCTGTCTTCTCTGCGTCCGTCTTCTCAGTACAAAGACCTGGGTGAATACTACGCGGAAC
TGAACCCGCTGCTGTACCACATCTCTTTCCAGCGTATCGCGGAAAAAGAAATCATGGACGCGGTTGAAA
CCGGTAAACTGTACCTGTTCCAGATCTACAACAAAGACTTCGCGAAAGGTCACCACGGTAAACCGAACC
TGCACACCCTGTACTGGACCGGTCTGTTCTCTCCGGAAAACCTGGCGAAAACCTCTATCAAACTGAACG
GTCAGGCGGAACTGTTCTACCGTCCGAAATCTCGTATGAAACGTATGGCGCACCGTCTGGGTGAAAAAA
TGCTGAACAAAAAACTGAAAGACCAGAAAACCCCGATCCCGGACACCCTGTACCAGGAACTGTACGAC
TACGTTAACCACCGTCTGTCTCACGACCTGTCTGACGAAGCGCGTGCGCTGCTGCCGAACGTTATCACCA
AAGAAGTTTCTCACGAAATCATCAAAGACCGTCGTTTCACCTCTGACAAATTCTTCTTCCACGTTCCGAT
CACCCTGAACTACCAGGCGGCGAACTCTCCGTCTAAATTCAACCAGCGTGTTAACGCGTACCTGAAAGA
ACACCCGGAAACCCCGATCATCGGTATCGACCGTGGTGAACGTAACCTGATCTACATCACCGTTATCGA
CTCTACCGGTAAAATCCTGGAACAGCGTTCTCTGAACACCATCCAGCAGTTCGACTACCAGAAAAAACT
GGACAACCGTGAAAAAGAACGTGTTGCGGCGCGTCAGGCGTGGTCTGTTGTTGGTACCATCAAAGACCT
GAAACAGGGTTACCTGTCTCAGGTTATCCACGAAATCGTTGACCTGATGATCCACTACCAGGCGGTTGTT
GTTCTGGAAAACCTGAACTTCGGTTTCAAATCTAAACGTACCGGTATCGCGGAAAAAGCGGTTTACCAG
CAGTTCGAAAAAATGCTGATCGACAAACTGAACTGCCTGGTTCTGAAAGACTACCCGGCGGAAAAAGTT
GGTGGTGTTCTGAACCCGTACCAGCTGACCGACCAGTTCACCTCTTTCGCGAAAATGGGTACCCAGTCTG
GTTTCCTGTTCTACGTTCCGGCGCCGTACACCTCTAAAATCGACCCGCTGACCGGTTTCGTTGACCCGTTC
GTTTGGAAAACCATCAAAAACCACGAATCTCGTAAACACTTCCTGGAAGGTTTCGACTTCCTGCACTACG
ACGTTAAAACCGGTGACTTCATCCTGCACTTCAAAATGAACCGTAACCTGTCTTTCCAGCGTGGTCTGCC
GGGTTTCATGCCGGCGTGGGACATCGTTTTCGAAAAAAACGAAACCCAGTTCGACGCGAAAGGTACCCC
GTTCATCGCGGGTAAACGTATCGTTCCGGTTATCGAAAACCACCGTTTCACCGGTCGTTACCGTGACCTG
TACCCGGCGAACGAACTGATCGCGCTGCTGGAAGAAAAAGGTATCGTTTTCCGTGACGGTTCTAACATC
CTGCCGAAACTGCTGGAAAACGACGACTCTCACGCGATCGACACCATGGTTGCGCTGATCCGTTCTGTTC
TGCAGATGCGTAACTCTAACGCGGCGACCGGTGAAGACTACATCAACTCTCCGGTTCGTGACCTGAACG
GTGTTTGCTTCGACTCTCGTTTCCAGAACCCGGAATGGCCGATGGACGCGGACGCGAACGGTGCGTACC
ACATCGCGCTGAAAGGTCAGCTGCTGCTGAACCACCTGAAAGAATCTAAAGACCTGAAACTGCAGAACG
GTATCTCTAACCAGGACTGGCTGGCGTACATCCAGGAACTGCGTAACTAGAAATCATCCTTAGCGAAAG
CTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTC
AGGAAGCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
73 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATGCGGTTAAATCTATCAAAGTTAAACTGCGTCTGGACGACATGCCGGAAATCCGTGCGGGTCTG
TGGAAACTGCACAAAGAAGTTAACGCGGGTGTTCGTTACTACACCGAATGGCTGTCTCTGCTGCGTCAG
GAAAACCTGTACCGTCGTTCTCCGAACGGTGACGGTGAACAGGAATGCGACAAAACCGCGGAAGAATG
CAAAGCGGAACTGCTGGAACGTCTGCGTGCGCGTCAGGTTGAAAACGGTCACCGTGGTCCGGCGGGTTC
TGACGACGAACTGCTGCAGCTGGCGCGTCAGCTGTACGAACTGCTGGTTCCGCAGGCGATCGGTGCGAA
AGGTGACGCGCAGCAGATCGCGCGTAAATTCCTGTCTCCGCTGGCGGACAAAGACGCGGTTGGTGGTCT
GGGTATCGCGAAAGCGGGTAACAAACCGCGTTGGGTTCGTATGCGTGAAGCGGGTGAACCGGGTTGGG
AAGAAGAAAAAGAAAAAGCGGAAACCCGTAAATCTGCGGACCGTACCGCGGACGTTCTGCGTGCGCTG
GCGGACTTCGGTCTGAAACCGCTGATGCGTGTTTACACCGACTCTGAAATGTCTTCTGTTGAATGGAAAC
CGCTGCGTAAAGGTCAGGCGGTTCGTACCTGGGACCGTGACATGTTCCAGCAGGCGATCGAACGTATGA
TGTCTTGGGAATCTTGGAACCAGCGTGTTGGTCAGGAATACGCGAAACTGGTTGAACAGAAAAACCGTT
TCGAACAGAAAAACTTCGTTGGTCAGGAACACCTGGTTCACCTGGTTAACCAGCTGCAGCAGGACATGA
AAGAAGCGTCTCCGGGTCTGGAATCTAAAGAACAGACCGCGCACTACGTTACCGGTCGTGCGCTGCGTG
GTTCTGACAAAGTTTTCGAAAAATGGGGTAAACTGGCGCCGGACGCGCCGTTCGACCTGTACGACGCGG
AAATCAAAAACGTTCAGCGTCGTAACACCCGTCGTTTCGGTTCTCACGACCTGTTCGCGAAACTGGCGG
AACCGGAATACCAGGCGCTGTGGCGTGAAGACGCGTCTTTCCTGACCCGTTACGCGGTTTACAACTCTAT
CCTGCGTAAACTGAACCACGCGAAAATGTTCGCGACCTTCACCCTGCCGGACGCGACCGCGCACCCGAT
CTGGACCCGTTTCGACAAACTGGGTGGTAACCTGCACCAGTACACCTTCCTGTTCAACGAATTCGGTGAA
CGTCGTCACGCGATCCGTTTCCACAAACTGCTGAAAGTTGAAAACGGTGTTGCGCGTGAAGTTGACGAC
GTTACCGTTCCGATCTCTATGTCTGAACAGCTGGACAACCTGCTGCCGCGTGACCCGAACGAACCGATCG
CGCTGTACTTCCGTGACTACGGTGCGGAACAGCACTTCACCGGTGAATTCGGTGGTGCGAAAATCCAGT
GCCGTCGTGACCAGCTGGCGCACATGCACCGTCGTCGTGGTGCGCGTGACGTTTACCTGAACGTTTCTGT
TCGTGTTCAGTCTCAGTCTGAAGCGCGTGGTGAACGTCGTCCGCCGTACGCGGCGGTTTTCCGTCTGGTT
GGTGACAACCACCGTGCGTTCGTTCACTTCGACAAACTGTCTGACTACCTGGCGGAACACCCGGACGAC
GGTAAACTGGGTTCTGAAGGTCTGCTGTCTGGTCTGCGTGTTATGTCTGTTGACCTGGGTCTGCGTACCT
CTGCGTCTATCTCTGTTTTCCGTGTTGCGCGTAAAGACGAACTGAAACCGAACTCTAAAGGTCGTGTTCC
GTTCTTCTTCCCGATCAAAGGTAACGACAACCTGGTTGCGGTTCACGAACGTTCTCAGCTGCTGAAACTG
CCGGGTGAAACCGAATCTAAAGACCTGCGTGCGATCCGTGAAGAACGTCAGCGTACCCTGCGTCAGCTG
CGTACCCAGCTGGCGTACCTGCGTCTGCTGGTTCGTTGCGGTTCTGAAGACGTTGGTCGTCGTGAACGTT
CTTGGGCGAAACTGATCGAACAGCCGGTTGACGCGGCGAACCACATGACCCCGGACTGGCGTGAAGCGT
TCGAAAACGAACTGCAGAAACTGAAATCTCTGCACGGTATCTGCTCTGACAAAGAATGGATGGACGCGG
TTTACGAATCTGTTCGTCGTGTTTGGCGTCACATGGGTAAACAGGTTCGTGACTGGCGTAAAGACGTTCG
TTCTGGTGAACGTCCGAAAATCCGTGGTTACGCGAAAGACGTTGTTGGTGGTAACTCTATCGAACAGAT
CGAATACCTGGAACGTCAGTACAAATTCCTGAAATCTTGGTCTTTCTTCGGTAAAGTTTCTGGTCAGGTT
ATCCGTGCGGAAAAAGGTTCTCGTTTCGCGATCACCCTGCGTGAACACATCGACCACGCGAAAGAAGAC
CGTCTGAAAAAACTGGCGGACCGTATCATCATGGAAGCGCTGGGTTACGTTTACGCGCTGGACGAACGT
GGTAAAGGTAAATGGGTTGCGAAATACCCGCCGTGCCAGCTGATCCTGCTGGAAGAACTGTCTGAATAC
CAGTTCAACAACGACCGTCCGCCGTCTGAAAACAACCAGCTGATGCAGTGGTCTCACCGTGGTGTTTTCC
AGGAACTGATCAACCAGGCGCAGGTTCACGACCTGCTGGTTGGTACCATGTACGCGGCGTTCTCTTCTCG
TTTCGACGCGCGTACCGGTGCGCCGGGTATCCGTTGCCGTCGTGTTCCGGCGCGTTGCACCCAGGAACAC
AACCCGGAACCGTTCCCGTGGTGGCTGAACAAATTCGTTGTTGAACACACCCTGGACGCGTGCCCGCTG
CGTGCGGACGACCTGATCCCGACCGGTGAAGGTGAAATCTTCGTTTCTCCGTTCTCTGCGGAAGAAGGT
GACTTCCACCAGATCCACGCGGACCTGAACGCGGCGCAGAACCTGCAGCAGCGTCTGTGGTCTGACTTC
GACATCTCTCAGATCCGTCTGCGTTGCGACTGGGGTGAAGTTGACGGTGAACTGGTTCTGATCCCGCGTC
TGACCGGTAAACGTACCGCGGACTCTTACTCTAACAAAGTTTTCTACACCAACACCGGTGTTACCTACTA
CGAACGTGAACGTGGTAAAAAACGTCGTAAAGTTTTCGCGCAGGAAAAACTGTCTGAAGAAGAAGCGG
AACTGCTGGTTGAAGCGGACGAAGCGCGTGAAAAATCTGTTGTTCTGATGCGTGACCCGTCTGGTATCA
TCAACCGTGGTAACTGGACCCGTCAGAAAGAATTCTGGTCTATGGTTAACCAGCGTATCGAAGGTTACC
TGGTTAAACAGATCCGTTCTCGTGTTCCGCTGCAGGACTCTGCGTGCGAAAACACCGGTGACATCTAAG
AAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTC
ACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
74 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATGCGACCCGTTCTTTCATCCTGAAAATCGAACCGAACGAAGAAGTTAAAAAAGGTCTGTGGAAA
ACCCACGAAGTTCTGAACCACGGTATCGCGTACTACATGAACATCCTGAAACTGATCCGTCAGGAAGCG
ATCTACGAACACCACGAACAGGACCCGAAAAACCCGAAAAAAGTTTCTAAAGCGGAAATCCAGGCGGA
ACTGTGGGACTTCGTTCTGAAAATGCAGAAATGCAACTCTTTCACCCACGAAGTTGACAAAGACGTTGTT
TTCAACATCCTGCGTGAACTGTACGAAGAACTGGTTCCGTCTTCTGTTGAAAAAAAAGGTGAAGCGAAC
CAGCTGTCTAACAAATTCCTGTACCCGCTGGTTGACCCGAACTCTCAGTCTGGTAAAGGTACCGCGTCTT
CTGGTCGTAAACCGCGTTGGTACAACCTGAAAATCGCGGGTGACCCGTCTTGGGAAGAAGAAAAAAAA
AAATGGGAAGAAGACAAAAAAAAAGACCCGCTGGCGAAAATCCTGGGTAAACTGGCGGAATACGGTCT
GATCCCGCTGTTCATCCCGTTCACCGACTCTAACGAACCGATCGTTAAAGAAATCAAATGGATGGAAAA
ATCTCGTAACCAGTCTGTTCGTCGTCTGGACAAAGACATGTTCATCCAGGCGCTGGAACGTTTCCTGTCT
TGGGAATCTTGGAACCTGAAAGTTAAAGAAGAATACGAAAAAGTTGAAAAAGAACACAAAACCCTGGA
AGAACGTATCAAAGAAGACATCCAGGCGTTCAAATCTCTGGAACAGTACGAAAAAGAACGTCAGGAAC
AGCTGCTGCGTGACACCCTGAACACCAACGAATACCGTCTGTCTAAACGTGGTCTGCGTGGTTGGCGTG
AAATCATCCAGAAATGGCTGAAAATGGACGAAAACGAACCGTCTGAAAAATACCTGGAAGTTTTCAAA
GACTACCAGCGTAAACACCCGCGTGAAGCGGGTGACTACTCTGTTTACGAATTCCTGTCTAAAAAAGAA
AACCACTTCATCTGGCGTAACCACCCGGAATACCCGTACCTGTACGCGACCTTCTGCGAAATCGACAAA
AAAAAAAAAGACGCGAAACAGCAGGCGACCTTCACCCTGGCGGACCCGATCAACCACCCGCTGTGGGT
TCGTTTCGAAGAACGTTCTGGTTCTAACCTGAACAAATACCGTATCCTGACCGAACAGCTGCACACCGA
AAAACTGAAAAAAAAACTGACCGTTCAGCTGGACCGTCTGATCTACCCGACCGAATCTGGTGGTTGGGA
AGAAAAAGGTAAAGTTGACATCGTTCTGCTGCCGTCTCGTCAGTTCTACAACCAGATCTTCCTGGACATC
GAAGAAAAAGGTAAACACGCGTTCACCTACAAAGACGAATCTATCAAATTCCCGCTGAAAGGTACCCTG
GGTGGTGCGCGTGTTCAGTTCGACCGTGACCACCTGCGTCGTTACCCGCACAAAGTTGAATCTGGTAACG
TTGGTCGTATCTACTTCAACATGACCGTTAACATCGAACCGACCGAATCTCCGGTTTCTAAATCTCTGAA
AATCCACCGTGACGACTTCCCGAAATTCGTTAACTTCAAACCGAAAGAACTGACCGAATGGATCAAAGA
CTCTAAAGGTAAAAAACTGAAATCTGGTATCGAATCTCTGGAAATCGGTCTGCGTGTTATGTCTATCGAC
CTGGGTCAGCGTCAGGCGGCGGCGGCGTCTATCTTCGAAGTTGTTGACCAGAAACCGGACATCGAAGGT
AAACTGTTCTTCCCGATCAAAGGTACCGAACTGTACGCGGTTCACCGTGCGTCTTTCAACATCAAACTGC
CGGGTGAAACCCTGGTTAAATCTCGTGAAGTTCTGCGTAAAGCGCGTGAAGACAACCTGAAACTGATGA
ACCAGAAACTGAACTTCCTGCGTAACGTTCTGCACTTCCAGCAGTTCGAAGACATCACCGAACGTGAAA
AACGTGTTACCAAATGGATCTCTCGTCAGGAAAACTCTGACGTTCCGCTGGTTTACCAGGACGAACTGAT
CCAGATCCGTGAACTGATGTACAAACCGTACAAAGACTGGGTTGCGTTCCTGAAACAGCTGCACAAACG
TCTGGAAGTTGAAATCGGTAAAGAAGTTAAACACTGGCGTAAATCTCTGTCTGACGGTCGTAAAGGTCT
GTACGGTATCTCTCTGAAAAACATCGACGAAATCGACCGTACCCGTAAATTCCTGCTGCGTTGGTCTCTG
CGTCCGACCGAACCGGGTGAAGTTCGTCGTCTGGAACCGGGTCAGCGTTTCGCGATCGACCAGCTGAAC
CACCTGAACGCGCTGAAAGAAGACCGTCTGAAAAAAATGGCGAACACCATCATCATGCACGCGCTGGG
TTACTGCTACGACGTTCGTAAAAAAAAATGGCAGGCGAAAAACCCGGCGTGCCAGATCATCCTGTTCGA
AGACCTGTCTAACTACAACCCGTACGAAGAACGTTCTCGTTTCGAAAACTCTAAACTGATGAAATGGTCT
CGTCGTGAAATCCCGCGTCAGGTTGCGCTGCAGGGTGAAATCTACGGTCTGCAGGTTGGTGAAGTTGGT
GCGCAGTTCTCTTCTCGTTTCCACGCGAAAACCGGTTCTCCGGGTATCCGTTGCTCTGTTGTTACCAAAG
AAAAACTGCAGGACAACCGTTTCTTCAAAAACCTGCAGCGTGAAGGTCGTCTGACCCTGGACAAAATCG
CGGTTCTGAAAGAAGGTGACCTGTACCCGGACAAAGGTGGTGAAAAATTCATCTCTCTGTCTAAAGACC
GTAAACTGGTTACCACCCACGCGGACATCAACGCGGCGCAGAACCTGCAGAAACGTTTCTGGACCCGTA
CCCACGGTTTCTACAAAGTTTACTGCAAAGCGTACCAGGTTGACGGTCAGACCGTTTACATCCCGGAATC
TAAAGACCAGAAACAGAAAATCATCGAAGAATTCGGTGAAGGTTACTTCATCCTGAAAGACGGTGTTTA
CGAATGGGGTAACGCGGGTAAACTGAAAATCAAAAAAGGTTCTTCTAAACAGTCTTCTTCTGAACTGGT
TGACTCTGACATCCTGAAAGACTCTTTCGACCTGGCGTCTGAACTGAAAGGTGAAAAACTGATGCTGTA
CCGTGACCCGTCTGGTAACGTTTTCCCGTCTGACAAATGGATGGCGGCGGGTGTTTTCTTCGGTAAACTG
GAACGTATCCTGATCTCTAAACTGACCAACCAGTACTCTATCTCTACCATCGAAGACGACTCTTCTAAAC
AGTCTATGTAAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCA
GGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
75 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATCCGACCCGTACCATCAACCTGAAACTGGTTCTGGGTAAAAACCCGGAAAACGCGACCCTGCGT
CGTGCGCTGTTCTCTACCCACCGTCTGGTTAACCAGGCGACCAAACGTATCGAAGAATTCCTGCTGCTGT
GCCGTGGTGAAGCGTACCGTACCGTTGACAACGAAGGTAAAGAAGCGGAAATCCCGCGTCACGCGGTTC
AGGAAGAAGCGCTGGCGTTCGCGAAAGCGGCGCAGCGTCACAACGGTTGCATCTCTACCTACGAAGACC
AGGAAATCCTGGACGTTCTGCGTCAGCTGTACGAACGTCTGGTTCCGTCTGTTAACGAAAACAACGAAG
CGGGTGACGCGCAGGCGGCGAACGCGTGGGTTTCTCCGCTGATGTCTGCGGAATCTGAAGGTGGTCTGT
CTGTTTACGACAAAGTTCTGGACCCGCCGCCGGTTTGGATGAAACTGAAAGAAGAAAAAGCGCCGGGTT
GGGAAGCGGCGTCTCAGATCTGGATCCAGTCTGACGAAGGTCAGTCTCTGCTGAACAAACCGGGTTCTC
CGCCGCGTTGGATCCGTAAACTGCGTTCTGGTCAGCCGTGGCAGGACGACTTCGTTTCTGACCAGAAAA
AAAAACAGGACGAACTGACCAAAGGTAACGCGCCGCTGATCAAACAGCTGAAAGAAATGGGTCTGCTG
CCGCTGGTTAACCCGTTCTTCCGTCACCTGCTGGACCCGGAAGGTAAAGGTGTTTCTCCGTGGGACCGTC
TGGCGGTTCGTGCGGCGGTTGCGCACTTCATCTCTTGGGAATCTTGGAACCACCGTACCCGTGCGGAATA
CAACTCTCTGAAACTGCGTCGTGACGAATTCGAAGCGGCGTCTGACGAATTCAAAGACGACTTCACCCT
GCTGCGTCAGTACGAAGCGAAACGTCACTCTACCCTGAAATCTATCGCGCTGGCGGACGACTCTAACCC
GTACCGTATCGGTGTTCGTTCTCTGCGTGCGTGGAACCGTGTTCGTGAAGAATGGATCGACAAAGGTGC
GACCGAAGAACAGCGTGTTACCATCCTGTCTAAACTGCAGACCCAGCTGCGTGGTAAATTCGGTGACCC
GGACCTGTTCAACTGGCTGGCGCAGGACCGTCACGTTCACCTGTGGTCTCCGCGTGACTCTGTTACCCCG
CTGGTTCGTATCAACGCGGTTGACAAAGTTCTGCGTCGTCGTAAACCGTACGCGCTGATGACCTTCGCGC
ACCCGCGTTTCCACCCGCGTTGGATCCTGTACGAAGCGCCGGGTGGTTCTAACCTGCGTCAGTACGCGCT
GGACTGCACCGAAAACGCGCTGCACATCACCCTGCCGCTGCTGGTTGACGACGCGCACGGTACCTGGAT
CGAAAAAAAAATCCGTGTTCCGCTGGCGCCGTCTGGTCAGATCCAGGACCTGACCCTGGAAAAACTGGA
AAAAAAAAAAAACCGTCTGTACTACCGTTCTGGTTTCCAGCAGTTCGCGGGTCTGGCGGGTGGTGCGGA
AGTTCTGTTCCACCGTCCGTACATGGAACACGACGAACGTTCTGAAGAATCTCTGCTGGAACGTCCGGGT
GCGGTTTGGTTCAAACTGACCCTGGACGTTGCGACCCAGGCGCCGCCGAACTGGCTGGACGGTAAAGGT
CGTGTTCGTACCCCGCCGGAAGTTCACCACTTCAAAACCGCGCTGTCTAACAAATCTAAACACACCCGTA
CCCTGCAGCCGGGTCTGCGTGTTCTGTCTGTTGACCTGGGTATGCGTACCTTCGCGTCTTGCTCTGTTTTC
GAACTGATCGAAGGTAAACCGGAAACCGGTCGTGCGTTCCCGGTTGCGGACGAACGTTCTATGGACTCT
CCGAACAAACTGTGGGCGAAACACGAACGTTCTTTCAAACTGACCCTGCCGGGTGAAACCCCGTCTCGT
AAAGAAGAAGAAGAACGTTCTATCGCGCGTGCGGAAATCTACGCGCTGAAACGTGACATCCAGCGTCTG
AAATCTCTGCTGCGTCTGGGTGAAGAAGACAACGACAACCGTCGTGACGCGCTGCTGGAACAGTTCTTC
AAAGGTTGGGGTGAAGAAGACGTTGTTCCGGGTCAGGCGTTCCCGCGTTCTCTGTTCCAGGGTCTGGGT
GCGGCGCCGTTCCGTTCTACCCCGGAACTGTGGCGTCAGCACTGCCAGACCTACTACGACAAAGCGGAA
GCGTGCCTGGCGAAACACATCTCTGACTGGCGTAAACGTACCCGTCCGCGTCCGACCTCTCGTGAAATGT
GGTACAAAACCCGTTCTTACCACGGTGGTAAATCTATCTGGATGCTGGAATACCTGGACGCGGTTCGTA
AACTGCTGCTGTCTTGGTCTCTGCGTGGTCGTACCTACGGTGCGATCAACCGTCAGGACACCGCGCGTTT
CGGTTCTCTGGCGTCTCGTCTGCTGCACCACATCAACTCTCTGAAAGAAGACCGTATCAAAACCGGTGCG
GACTCTATCGTTCAGGCGGCGCGTGGTTACATCCCGCTGCCGCACGGTAAAGGTTGGGAACAGCGTTAC
GAACCGTGCCAGCTGATCCTGTTCGAAGACCTGGCGCGTTACCGTTTCCGTGTTGACCGTCCGCGTCGTG
AAAACTCTCAGCTGATGCAGTGGAACCACCGTGCGATCGTTGCGGAAACCACCATGCAGGCGGAACTGT
ACGGTCAGATCGTTGAAAACACCGCGGCGGGTTTCTCTTCTCGTTTCCACGCGGCGACCGGTGCGCCGG
GTGTTCGTTGCCGTTTCCTGCTGGAACGTGACTTCGACAACGACCTGCCGAAACCGTACCTGCTGCGTGA
ACTGTCTTGGATGCTGGGTAACACCAAAGTTGAATCTGAAGAAGAAAAACTGCGTCTGCTGTCTGAAAA
AATCCGTCCGGGTTCTCTGGTTCCGTGGGACGGTGGTGAACAGTTCGCGACCCTGCACCCGAAACGTCA
GACCCTGTGCGTTATCCACGCGGACATGAACGCGGCGCAGAACCTGCAGCGTCGTTTCTTCGGTCGTTGC
GGTGAAGCGTTCCGTCTGGTTTGCCAGCCGCACGGTGACGACGTTCTGCGTCTGGCGTCTACCCCGGGTG
CGCGTCTGCTGGGTGCGCTGCAGCAGCTGGAAAACGGTCAGGGTGCGTTCGAACTGGTTCGTGACATGG
GTTCTACCTCTCAGATGAACCGTTTCGTTATGAAATCTCTGGGTAAAAAAAAAATCAAACCGCTGCAGG
ACAACAACGGTGACGACGAACTGGAAGACGTTCTGTCTGTTCTGCCGGAAGAAGACGACACCGGTCGTA
TCACCGTTTTCCGTGACTCTTCTGGTATCTTCTTCCCGTGCAACGTTTGGATCCCGGCGAAACAGTTCTGG
CCGGCGGTTCGTGCGATGATCTGGAAAGTTATGGCGTCTCACTCTCTGGGTTAAGAAATCATCCTTAGCG
AAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATT
ACTCAGGAAGCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
76 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATACCAAACTGCGTCACCGTCAGAAAAAACTGACCCACGACTGGGCGGGTTCTAAAAAACGTGAA
GTTCTGGGTTCTAACGGTAAACTGCAGAACCCGCTGCTGATGCCGGTTAAAAAAGGTCAGGTTACCGAA
TTCCGTAAAGCGTTCTCTGCGTACGCGCGTGCGACCAAAGGTGAAATGACCGACGGTCGTAAAAACATG
TTCACCCACTCTTTCGAACCGTTCAAAACCAAACCGTCTCTGCACCAGTGCGAACTGGCGGACAAAGCG
TACCAGTCTCTGCACTCTTACCTGCCGGGTTCTCTGGCGCACTTCCTGCTGTCTGCGCACGCGCTGGGTTT
CCGTATCTTCTCTAAATCTGGTGAAGCGACCGCGTTCCAGGCGTCTTCTAAAATCGAAGCGTACGAATCT
AAACTGGCGTCTGAACTGGCGTGCGTTGACCTGTCTATCCAGAACCTGACCATCTCTACCCTGTTCAACG
CGCTGACCACCTCTGTTCGTGGTAAAGGTGAAGAAACCTCTGCGGACCCGCTGATCGCGCGTTTCTACAC
CCTGCTGACCGGTAAACCGCTGTCTCGTGACACCCAGGGTCCGGAACGTGACCTGGCGGAAGTTATCTC
TCGTAAAATCGCGTCTTCTTTCGGTACCTGGAAAGAAATGACCGCGAACCCGCTGCAGTCTCTGCAGTTC
TTCGAAGAAGAACTGCACGCGCTGGACGCGAACGTTTCTCTGTCTCCGGCGTTCGACGTTCTGATCAAAA
TGAACGACCTGCAGGGTGACCTGAAAAACCGTACCATCGTTTTCGACCCGGACGCGCCGGTTTTCGAAT
ACAACGCGGAAGACCCGGCGGACATCATCATCAAACTGACCGCGCGTTACGCGAAAGAAGCGGTTATC
AAAAACCAGAACGTTGGTAACTACGTTAAAAACGCGATCACCACCACCAACGCGAACGGTCTGGGTTGG
CTGCTGAACAAAGGTCTGTCTCTGCTGCCGGTTTCTACCGACGACGAACTGCTGGAATTCATCGGTGTTG
AACGTTCTCACCCGTCTTGCCACGCGCTGATCGAACTGATCGCGCAGCTGGAAGCGCCGGAACTGTTCG
AAAAAAACGTTTTCTCTGACACCCGTTCTGAAGTTCAGGGTATGATCGACTCTGCGGTTTCTAACCACAT
CGCGCGTCTGTCTTCTTCTCGTAACTCTCTGTCTATGGACTCTGAAGAACTGGAACGTCTGATCAAATCTT
TCCAGATCCACACCCCGCACTGCTCTCTGTTCATCGGTGCGCAGTCTCTGTCTCAGCAGCTGGAATCTCT
GCCGGAAGCGCTGCAGTCTGGTGTTAACTCTGCGGACATCCTGCTGGGTTCTACCCAGTACATGCTGACC
AACTCTCTGGTTGAAGAATCTATCGCGACCTACCAGCGTACCCTGAACCGTATCAACTACCTGTCTGGTG
TTGCGGGTCAGATCAACGGTGCGATCAAACGTAAAGCGATCGACGGTGAAAAAATCCACCTGCCGGCG
GCGTGGTCTGAACTGATCTCTCTGCCGTTCATCGGTCAGCCGGTTATCGACGTTGAATCTGACCTGGCGC
ACCTGAAAAACCAGTACCAGACCCTGTCTAACGAATTCGACACCCTGATCTCTGCGCTGCAGAAAAACT
TCGACCTGAACTTCAACAAAGCGCTGCTGAACCGTACCCAGCACTTCGAAGCGATGTGCCGTTCTACCA
AAAAAAACGCGCTGTCTAAACCGGAAATCGTTTCTTACCGTGACCTGCTGGCGCGTCTGACCTCTTGCCT
GTACCGTGGTTCTCTGGTTCTGCGTCGTGCGGGTATCGAAGTTCTGAAAAAACACAAAATCTTCGAATCT
AACTCTGAACTGCGTGAACACGTTCACGAACGTAAACACTTCGTTTTCGTTTCTCCGCTGGACCGTAAAG
CGAAAAAACTGCTGCGTCTGACCGACTCTCGTCCGGACCTGCTGCACGTTATCGACGAAATCCTGCAGC
ACGACAACCTGGAAAACAAAGACCGTGAATCTCTGTGGCTGGTTCGTTCTGGTTACCTGCTGGCGGGTCT
GCCGGACCAGCTGTCTTCTTCTTTCATCAACCTGCCGATCATCACCCAGAAAGGTGACCGTCGTCTGATC
GACCTGATCCAGTACGACCAGATCAACCGTGACGCGTTCGTTATGCTGGTTACCTCTGCGTTCAAATCTA
ACCTGTCTGGTCTGCAGTACCGTGCGAACAAACAGTCTTTCGTTGTTACCCGTACCCTGTCTCCGTACCT
GGGTTCTAAACTGGTTTACGTTCCGAAAGACAAAGACTGGCTGGTTCCGTCTCAGATGTTCGAAGGTCGT
TTCGCGGACATCCTGCAGTCTGACTACATGGTTTGGAAAGACGCGGGTCGTCTGTGCGTTATCGACACCG
CGAAACACCTGTCTAACATCAAAAAATCTGTTTTCTCTTCTGAAGAAGTTCTGGCGTTCCTGCGTGAACT
GCCGCACCGTACCTTCATCCAGACCGAAGTTCGTGGTCTGGGTGTTAACGTTGACGGTATCGCGTTCAAC
AACGGTGACATCCCGTCTCTGAAAACCTTCTCTAACTGCGTTCAGGTTAAAGTTTCTCGTACCAACACCT
CTCTGGTTCAGACCCTGAACCGTTGGTTCGAAGGTGGTAAAGTTTCTCCGCCGTCTATCCAGTTCGAACG
TGCGTACTACAAAAAAGACGACCAGATCCACGAAGACGCGGCGAAACGTAAAATCCGTTTCCAGATGC
CGGCGACCGAACTGGTTCACGCGTCTGACGACGCGGGTTGGACCCCGTCTTACCTGCTGGGTATCGACC
CGGGTGAATACGGTATGGGTCTGTCTCTGGTTTCTATCAACAACGGTGAAGTTCTGGACTCTGGTTTCAT
CCACATCAACTCTCTGATCAACTTCGCGTCTAAAAAATCTAACCACCAGACCAAAGTTGTTCCGCGTCAG
CAGTACAAATCTCCGTACGCGAACTACCTGGAACAGTCTAAAGACTCTGCGGCGGGTGACATCGCGCAC
ATCCTGGACCGTCTGATCTACAAACTGAACGCGCTGCCGGTTTTCGAAGCGCTGTCTGGTAACTCTCAGT
CTGCGGCGGACCAGGTTTGGACCAAAGTTCTGTCTTTCTACACCTGGGGTGACAACGACGCGCAGAACT
CTATCCGTAAACAGCACTGGTTCGGTGCGTCTCACTGGGACATCAAAGGTATGCTGCGTCAGCCGCCGA
CCGAAAAAAAACCGAAACCGTACATCGCGTTCCCGGGTTCTCAGGTTTCTTCTTACGGTAACTCTCAGCG
TTGCTCTTGCTGCGGTCGTAACCCGATCGAACAGCTGCGTGAAATGGCGAAAGACACCTCTATCAAAGA
ACTGAAAATCCGTAACTCTGAAATCCAGCTGTTCGACGGTACCATCAAACTGTTCAACCCGGACCCGTCT
ACCGTTATCGAACGTCGTCGTCACAACCTGGGTCCGTCTCGTATCCCGGTTGCGGACCGTACCTTCAAAA
ACATCTCTCCGTCTTCTCTGGAATTCAAAGAACTGATCACCATCGTTTCTCGTTCTATCCGTCACTCTCCG
GAATTCATCGCGAAAAAACGTGGTATCGGTTCTGAATACTTCTGCGCGTACTCTGACTGCAACTCTTCTC
TGAACTCTGAAGCGAACGCGGCGGCGAACGTTGCGCAGAAATTCCAGAAACAGCTGTTCTTCGAACTGT
AAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAAT
ATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
77 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATAAACGTATCCTGAACTCTCTGAAAGTTGCGGCGCTGCGTCTGCTGTTCCGTGGTAAAGGTTCTG
AACTGGTTAAAACCGTTAAATACCCGCTGGTTTCTCCGGTTCAGGGTGCGGTTGAAGAACTGGCGGAAG
CGATCCGTCACGACAACCTGCACCTGTTCGGTCAGAAAGAAATCGTTGACCTGATGGAAAAAGACGAAG
GTACCCAGGTTTACTCTGTTGTTGACTTCTGGCTGGACACCCTGCGTCTGGGTATGTTCTTCTCTCCGTCT
GCGAACGCGCTGAAAATCACCCTGGGTAAATTCAACTCTGACCAGGTTTCTCCGTTCCGTAAAGTTCTGG
AACAGTCTCCGTTCTTCCTGGCGGGTCGTCTGAAAGTTGAACCGGCGGAACGTATCCTGTCTGTTGAAAT
CCGTAAAATCGGTAAACGTGAAAACCGTGTTGAAAACTACGCGGCGGACGTTGAAACCTGCTTCATCGG
TCAGCTGTCTTCTGACGAAAAACAGTCTATCCAGAAACTGGCGAACGACATCTGGGACTCTAAAGACCA
CGAAGAACAGCGTATGCTGAAAGCGGACTTCTTCGCGATCCCGCTGATCAAAGACCCGAAAGCGGTTAC
CGAAGAAGACCCGGAAAACGAAACCGCGGGTAAACAGAAACCGCTGGAACTGTGCGTTTGCCTGGTTC
CGGAACTGTACACCCGTGGTTTCGGTTCTATCGCGGACTTCCTGGTTCAGCGTCTGACCCTGCTGCGTGA
CAAAATGTCTACCGACACCGCGGAAGACTGCCTGGAATACGTTGGTATCGAAGAAGAAAAAGGTAACG
GTATGAACTCTCTGCTGGGTACCTTCCTGAAAAACCTGCAGGGTGACGGTTTCGAACAGATCTTCCAGTT
CATGCTGGGTTCTTACGTTGGTTGGCAGGGTAAAGAAGACGTTCTGCGTGAACGTCTGGACCTGCTGGC
GGAAAAAGTTAAACGTCTGCCGAAACCGAAATTCGCGGGTGAATGGTCTGGTCACCGTATGTTCCTGCA
CGGTCAGCTGAAATCTTGGTCTTCTAACTTCTTCCGTCTGTTCAACGAAACCCGTGAACTGCTGGAATCT
ATCAAATCTGACATCCAGCACGCGACCATGCTGATCTCTTACGTTGAAGAAAAAGGTGGTTACCACCCG
CAGCTGCTGTCTCAGTACCGTAAACTGATGGAACAGCTGCCGGCGCTGCGTACCAAAGTTCTGGACCCG
GAAATCGAAATGACCCACATGTCTGAAGCGGTTCGTTCTTACATCATGATCCACAAATCTGTTGCGGGTT
TCCTGCCGGACCTGCTGGAATCTCTGGACCGTGACAAAGACCGTGAATTCCTGCTGTCTATCTTCCCGCG
TATCCCGAAAATCGACAAAAAAACCAAAGAAATCGTTGCGTGGGAACTGCCGGGTGAACCGGAAGAAG
GTTACCTGTTCACCGCGAACAACCTGTTCCGTAACTTCCTGGAAAACCCGAAACACGTTCCGCGTTTCAT
GGCGGAACGTATCCCGGAAGACTGGACCCGTCTGCGTTCTGCGCCGGTTTGGTTCGACGGTATGGTTAA
ACAGTGGCAGAAAGTTGTTAACCAGCTGGTTGAATCTCCGGGTGCGCTGTACCAGTTCAACGAATCTTTC
CTGCGTCAGCGTCTGCAGGCGATGCTGACCGTTTACAAACGTGACCTGCAGACCGAAAAATTCCTGAAA
CTGCTGGCGGACGTTTGCCGTCCGCTGGTTGACTTCTTCGGTCTGGGTGGTAACGACATCATCTTCAAAT
CTTGCCAGGACCCGCGTAAACAGTGGCAGACCGTTATCCCGCTGTCTGTTCCGGCGGACGTTTACACCGC
GTGCGAAGGTCTGGCGATCCGTCTGCGTGAAACCCTGGGTTTCGAATGGAAAAACCTGAAAGGTCACGA
ACGTGAAGACTTCCTGCGTCTGCACCAGCTGCTGGGTAACCTGCTGTTCTGGATCCGTGACGCGAAACTG
GTTGTTAAACTGGAAGACTGGATGAACAACCCGTGCGTTCAGGAATACGTTGAAGCGCGTAAAGCGATC
GACCTGCCGCTGGAAATCTTCGGTTTCGAAGTTCCGATCTTCCTGAACGGTTACCTGTTCTCTGAACTGC
GTCAGCTGGAACTGCTGCTGCGTCGTAAATCTGTTATGACCTCTTACTCTGTTAAAACCACCGGTTCTCC
GAACCGTCTGTTCCAGCTGGTTTACCTGCCGCTGAACCCGTCTGACCCGGAAAAAAAAAACTCTAACAA
CTTCCAGGAACGTCTGGACACCCCGACCGGTCTGTCTCGTCGTTTCCTGGACCTGACCCTGGACGCGTTC
GCGGGTAAACTGCTGACCGACCCGGTTACCCAGGAACTGAAAACCATGGCGGGTTTCTACGACCACCTG
TTCGGTTTCAAACTGCCGTGCAAACTGGCGGCGATGTCTAACCACCCGGGTTCTTCTTCTAAAATGGTTG
TTCTGGCGAAACCGAAAAAAGGTGTTGCGTCTAACATCGGTTTCGAACCGATCCCGGACCCGGCGCACC
CGGTTTTCCGTGTTCGTTCTTCTTGGCCGGAACTGAAATACCTGGAAGGTCTGCTGTACCTGCCGGAAGA
CACCCCGCTGACCATCGAACTGGCGGAAACCTCTGTTTCTTGCCAGTCTGTTTCTTCTGTTGCGTTCGACC
TGAAAAACCTGACCACCATCCTGGGTCGTGTTGGTGAATTCCGTGTTACCGCGGACCAGCCGTTCAAACT
GACCCCGATCATCCCGGAAAAAGAAGAATCTTTCATCGGTAAAACCTACCTGGGTCTGGACGCGGGTGA
ACGTTCTGGTGTTGGTTTCGCGATCGTTACCGTTGACGGTGACGGTTACGAAGTTCAGCGTCTGGGTGTT
CACGAAGACACCCAGCTGATGGCGCTGCAGCAGGTTGCGTCTAAATCTCTGAAAGAACCGGTTTTCCAG
CCGCTGCGTAAAGGTACCTTCCGTCAGCAGGAACGTATCCGTAAATCTCTGCGTGGTTGCTACTGGAACT
TCTACCACGCGCTGATGATCAAATACCGTGCGAAAGTTGTTCACGAAGAATCTGTTGGTTCTTCTGGTCT
GGTTGGTCAGTGGCTGCGTGCGTTCCAGAAAGACCTGAAAAAAGCGGACGTTCTGCCGAAAAAAGGTG
GTAAAAACGGTGTTGACAAAAAAAAACGTGAATCTTCTGCGCAGGACACCCTGTGGGGTGGTGCGTTCT
CTAAAAAAGAAGAACAGCAGATCGCGTTCGAAGTTCAGGCGGCGGGTTCTTCTCAGTTCTGCCTGAAAT
GCGGTTGGTGGTTCCAGCTGGGTATGCGTGAAGTTAACCGTGTTCAGGAATCTGGTGTTGTTCTGGACTG
GAACCGTTCTATCGTTACCTTCCTGATCGAATCTTCTGGTGAAAAAGTTTACGGTTTCTCTCCGCAGCAG
CTGGAAAAAGGTTTCCGTCCGGACATCGAAACCTTCAAAAAAATGGTTCGTGACTTCATGCGTCCGCCG
ATGTTCGACCGTAAAGGTCGTCCGGCGGCGGCGTACGAACGTTTCGTTCTGGGTCGTCGTCACCGTCGTT
ACCGTTTCGACAAAGTTTTCGAAGAACGTTTCGGTCGTTCTGCGCTGTTCATCTGCCCGCGTGTTGGTTGC
GGTAACTTCGACCACTCTTCTGAACAGTCTGCGGTTGTTCTGGCGCTGATCGGTTACATCGCGGACAAAG
AAGGTATGTCTGGTAAAAAACTGGTTTACGTTCGTCTGGCGGAACTGATGGCGGAATGGAAACTGAAAA
AACTGGAACGTTCTCGTGTTGAAGAACAGTCTTCTGCGCAGTAAGAAATCATCCTTAGCGAAAGCTAAG
GATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAA
GCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
78 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATGCGGAATCTAAACAGATGCAGTGCCGTAAATGCGGTGCGTCTATGAAATACGAAGTTATCGGT
CTGGGTAAAAAATCTTGCCGTTACATGTGCCCGGACTGCGGTAACCACACCTCTGCGCGTAAAATCCAG
AACAAAAAAAAACGTGACAAAAAATACGGTTCTGCGTCTAAAGCGCAGTCTCAGCGTATCGCGGTTGCG
GGTGCGCTGTACCCGGACAAAAAAGTTCAGACCATCAAAACCTACAAATACCCGGCGGACCTGAACGGT
GAAGTTCACGACTCTGGTGTTGCGGAAAAAATCGCGCAGGCGATCCAGGAAGACGAAATCGGTCTGCTG
GGTCCGTCTTCTGAATACGCGTGCTGGATCGCGTCTCAGAAACAGTCTGAACCGTACTCTGTTGTTGACT
TCTGGTTCGACGCGGTTTGCGCGGGTGGTGTTTTCGCGTACTCTGGTGCGCGTCTGCTGTCTACCGTTCTG
CAGCTGTCTGGTGAAGAATCTGTTCTGCGTGCGGCGCTGGCGTCTTCTCCGTTCGTTGACGACATCAACC
TGGCGCAGGCGGAAAAATTCCTGGCGGTTTCTCGTCGTACCGGTCAGGACAAACTGGGTAAACGTATCG
GTGAATGCTTCGCGGAAGGTCGTCTGGAAGCGCTGGGTATCAAAGACCGTATGCGTGAATTCGTTCAGG
CGATCGACGTTGCGCAGACCGCGGGTCAGCGTTTCGCGGCGAAACTGAAAATCTTCGGTATCTCTCAGA
TGCCGGAAGCGAAACAGTGGAACAACGACTCTGGTCTGACCGTTTGCATCCTGCCGGACTACTACGTTC
CGGAAGAAAACCGTGCGGACCAGCTGGTTGTTCTGCTGCGTCGTCTGCGTGAAATCGCGTACTGCATGG
GTATCGAAGACGAAGCGGGTTTCGAACACCTGGGTATCGACCCGGGTGCGCTGTCTAACTTCTCTAACG
GTAACCCGAAACGTGGTTTCCTGGGTCGTCTGCTGAACAACGACATCATCGCGCTGGCGAACAACATGT
CTGCGATGACCCCGTACTGGGAAGGTCGTAAAGGTGAACTGATCGAACGTCTGGCGTGGCTGAAACACC
GTGCGGAAGGTCTGTACCTGAAAGAACCGCACTTCGGTAACTCTTGGGCGGACCACCGTTCTCGTATCTT
CTCTCGTATCGCGGGTTGGCTGTCTGGTTGCGCGGGTAAACTGAAAATCGCGAAAGACCAGATCTCTGG
TGTTCGTACCGACCTGTTCCTGCTGAAACGTCTGCTGGACGCGGTTCCGCAGTCTGCGCCGTCTCCGGAC
TTCATCGCGTCTATCTCTGCGCTGGACCGTTTCCTGGAAGCGGCGGAATCTTCTCAGGACCCGGCGGAAC
AGGTTCGTGCGCTGTACGCGTTCCACCTGAACGCGCCGGCGGTTCGTTCTATCGCGAACAAAGCGGTTCA
GCGTTCTGACTCTCAGGAATGGCTGATCAAAGAACTGGACGCGGTTGACCACCTGGAATTCAACAAAGC
GTTCCCGTTCTTCTCTGACACCGGTAAAAAAAAAAAAAAAGGTGCGAACTCTAACGGTGCGCCGTCTGA
AGAAGAATACACCGAAACCGAATCTATCCAGCAGCCGGAAGACGCGGAACAGGAAGTTAACGGTCAGG
AAGGTAACGGTGCGTCTAAAAACCAGAAAAAATTCCAGCGTATCCCGCGTTTCTTCGGTGAAGGTTCTC
GTTCTGAATACCGTATCCTGACCGAAGCGCCGCAGTACTTCGACATGTTCTGCAACAACATGCGTGCGAT
CTTCATGCAGCTGGAATCTCAGCCGCGTAAAGCGCCGCGTGACTTCAAATGCTTCCTGCAGAACCGTCTG
CAGAAACTGTACAAACAGACCTTCCTGAACGCGCGTTCTAACAAATGCCGTGCGCTGCTGGAATCTGTT
CTGATCTCTTGGGGTGAATTCTACACCTACGGTGCGAACGAAAAAAAATTCCGTCTGCGTCACGAAGCG
TCTGAACGTTCTTCTGACCCGGACTACGTTGTTCAGCAGGCGCTGGAAATCGCGCGTCGTCTGTTCCTGT
TCGGTTTCGAATGGCGTGACTGCTCTGCGGGTGAACGTGTTGACCTGGTTGAAATCCACAAAAAAGCGA
TCTCTTTCCTGCTGGCGATCACCCAGGCGGAAGTTTCTGTTGGTTCTTACAACTGGCTGGGTAACTCTACC
GTTTCTCGTTACCTGTCTGTTGCGGGTACCGACACCCTGTACGGTACCCAGCTGGAAGAATTCCTGAACG
CGACCGTTCTGTCTCAGATGCGTGGTCTGGCGATCCGTCTGTCTTCTCAGGAACTGAAAGACGGTTTCGA
CGTTCAGCTGGAATCTTCTTGCCAGGACAACCTGCAGCACCTGCTGGTTTACCGTGCGTCTCGTGACCTG
GCGGCGTGCAAACGTGCGACCTGCCCGGCGGAACTGGACCCGAAAATCCTGGTTCTGCCGGTTGGTGCG
TTCATCGCGTCTGTTATGAAAATGATCGAACGTGGTGACGAACCGCTGGCGGGTGCGTACCTGCGTCAC
CGTCCGCACTCTTTCGGTTGGCAGATCCGTGTTCGTGGTGTTGCGGAAGTTGGTATGGACCAGGGTACCG
CGCTGGCGTTCCAGAAACCGACCGAATCTGAACCGTTCAAAATCAAACCGTTCTCTGCGCAGTACGGTC
CGGTTCTGTGGCTGAACTCTTCTTCTTACTCTCAGTCTCAGTACCTGGACGGTTTCCTGTCTCAGCCGAAA
AACTGGTCTATGCGTGTTCTGCCGCAGGCGGGTTCTGTTCGTGTTGAACAGCGTGTTGCGCTGATCTGGA
ACCTGCAGGCGGGTAAAATGCGTCTGGAACGTTCTGGTGCGCGTGCGTTCTTCATGCCGGTTCCGTTCTC
TTTCCGTCCGTCTGGTTCTGGTGACGAAGCGGTTCTGGCGCCGAACCGTTACCTGGGTCTGTTCCCGCAC
TCTGGTGGTATCGAATACGCGGTTGTTGACGTTCTGGACTCTGCGGGTTTCAAAATCCTGGAACGTGGTA
CCATCGCGGTTAACGGTTTCTCTCAGAAACGTGGTGAACGTCAGGAAGAAGCGCACCGTGAAAAACAGC
GTCGTGGTATCTCTGACATCGGTCGTAAAAAACCGGTTCAGGCGGAAGTTGACGCGGCGAACGAACTGC
ACCGTAAATACACCGACGTTGCGACCCGTCTGGGTTGCCGTATCGTTGTTCAGTGGGCGCCGCAGCCGA
AACCGGGTACCGCGCCGACCGCGCAGACCGTTTACGCGCGTGCGGTTCGTACCGAAGCGCCGCGTTCTG
GTAACCAGGAAGACCACGCGCGTATGAAATCTTCTTGGGGTTACACCTGGGGTACCTACTGGGAAAAAC
GTAAACCGGAAGACATCCTGGGTATCTCTACCCAGGTTTACTGGACCGGTGGTATCGGTGAATCTTGCCC
GGCGGTTGCGGTTGCGCTGCTGGGTCACATCCGTGCGACCTCTACCCAGACCGAATGGGAAAAAGAAGA
AGTTGTTTTCGGTCGTCTGAAAAAATTCTTCCCGTCTTAAGAAATCATCCTTAGCGAAAGCTAAGGATTT
TTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAA
GAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
79 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATGAAAAACGTATCAACAAAATCCGTAAAAAACTGTCTGCGGACAACGCGACCAAACCGGTTTCT
CGTTCTGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCGACGACCTGAAAAAACGTCTGGAAAAA
CGTCGTAAAAAACCGGAAGTTATGCCGCAGGTTATCTCTAACAACGCGGCGAACAACCTGCGTATGCTG
CTGGACGACTACACCAAAATGAAAGAAGCGATCCTGCAGGTTTACTGGCAGGAATTCAAAGACGACCA
CGTTGGTCTGATGTGCAAATTCGCGCAGCCGGCGTCTAAAAAAATCGACCAGAACAAACTGAAACCGGA
AATGGACGAAAAAGGTAACCTGACCACCGCGGGTTTCGCGTGCTCTCAGTGCGGTCAGCCGCTGTTCGT
TTACAAACTGGAACAGGTTTCTGAAAAAGGTAAAGCGTACACCAACTACTTCGGTCGTTGCAACGTTGC
GGAACACGAAAAACTGATCCTGCTGGCGCAGCTGAAACCGGAAAAAGACTCTGACGAAGCGGTTACCT
ACTCTCTGGGTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTTACCAAAGAATCTACCCA
CCCGGTTAAACCGCTGGCGCAGATCGCGGGTAACCGTTACGCGTCTGGTCCGGTTGGTAAAGCGCTGTC
TGACGCGTGCATGGGTACCATCGCGTCTTTCCTGTCTAAATACCAGGACATCATCATCGAACACCAGAA
AGTTGTTAAAGGTAACCAGAAACGTCTGGAATCTCTGCGTGAACTGGCGGGTAAAGAAAACCTGGAATA
CCCGTCTGTTACCCTGCCGCCGCAGCCGCACACCAAAGAAGGTGTTGACGCGTACAACGAAGTTATCGC
GCGTGTTCGTATGTGGGTTAACCTGAACCTGTGGCAGAAACTGAAACTGTCTCGTGACGACGCGAAACC
GCTGCTGCGTCTGAAAGGTTTCCCGTCTTTCCCGGTTGTTGAACGTCGTGAAAACGAAGTTGACTGGTGG
AACACCATCAACGAAGTTAAAAAACTGATCGACGCGAAACGTGACATGGGTCGTGTTTTCTGGTCTGGT
GTTACCGCGGAAAAACGTAACACCATCCTGGAAGGTTACAACTACCTGCCGAACGAAAACGACCACAA
AAAACGTGAAGGTTCTCTGGAAAACCCGAAAAAACCGGCGAAACGTCAGTTCGGTGACCTGCTGCTGTA
CCTGGAAAAAAAATACGCGGGTGACTGGGGTAAAGTTTTCGACGAAGCGTGGGAACGTATCGACAAAA
AAATCGCGGGTCTGACCTCTCACATCGAACGTGAAGAAGCGCGTAACGCGGAAGACGCGCAGTCTAAA
GCGGTTCTGACCGACTGGCTGCGTGCGAAAGCGTCTTTCGTTCTGGAACGTCTGAAAGAAATGGACGAA
AAAGAATTCTACGCGTGCGAAATCCAGCTGCAGAAATGGTACGGTGACCTGCGTGGTAACCCGTTCGCG
GTTGAAGCGGAAAACCGTGTTGTTGACATCTCTGGTTTCTCTATCGGTTCTGACGGTCACTCTATCCAGT
ACCGTAACCTGCTGGCGTGGAAATACCTGGAAAACGGTAAACGTGAATTCTACCTGCTGATGAACTACG
GTAAAAAAGGTCGTATCCGTTTCACCGACGGTACCGACATCAAAAAATCTGGTAAATGGCAGGGTCTGC
TGTACGGTGGTGGTAAAGCGAAAGTTATCGACCTGACCTTCGACCCGGACGACGAACAGCTGATCATCC
TGCCGCTGGCGTTCGGTACCCGTCAGGGTCGTGAATTCATCTGGAACGACCTGCTGTCTCTGGAAACCGG
TCTGATCAAACTGGCGAACGGTCGTGTTATCGAAAAAACCATCTACAACAAAAAAATCGGTCGTGACGA
ACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAAGTTGTTGACCCGTCTAACATCAAACCGGTT
AACCTGATCGGTGTTGACCGTGGTGAAAACATCCCGGCGGTTATCGCGCTGACCGACCCGGAAGGTTGC
CCGCTGCCGGAATTCAAAGACTCTTCTGGTGGTCCGACCGACATCCTGCGTATCGGTGAAGGTTACAAA
GAAAAACAGCGTGCGATCCAGGCGGCGAAAGAAGTTGAACAGCGTCGTGCGGGTGGTTACTCTCGTAA
ATTCGCGTCTAAATCTCGTAACCTGGCGGACGACATGGTTCGTAACTCTGCGCGTGACCTGTTCTACCAC
GCGGTTACCCACGACGCGGTTCTGGTTTTCGAAAACCTGTCTCGTGGTTTCGGTCGTCAGGGTAAACGTA
CCTTCATGACCGAACGTCAGTACACCAAAATGGAAGACTGGCTGACCGCGAAACTGGCGTACGAAGGTC
TGACCTCTAAAACCTACCTGTCTAAAACCCTGGCGCAGTACACCTCTAAAACCTGCTCTAACTGCGGTTT
CACCATCACCACCGCGGACTACGACGGTATGCTGGTTCGTCTGAAAAAAACCTCTGACGGTTGGGCGAC
CACCCTGAACAACAAAGAACTGAAAGCGGAAGGTCAGATCACCTACTACAACCGTTACAAACGTCAGA
CCGTTGAAAAAGAACTGTCTGCGGAACTGGACCGTCTGTCTGAAGAATCTGGTAACAACGACATCTCTA
AATGGACCAAAGGTCGTCGTGACGAAGCGCTGTTCCTGCTGAAAAAACGTTTCTCTCACCGTCCGGTTCA
GGAACAGTTCGTTTGCCTGGACTGCGGTCACGAAGTTCACGCGGACGAACAGGCGGCGCTGAACATCGC
GCGTTCTTGGCTGTTCCTGAACTCTAACTCTACCGAATTCAAATCTTACAAATCTGGTAAACAGCCGTTC
GTTGGTGCGTGGCAGGCGTTCTACAAACGTCGTCTGAAAGAAGTTTGGAAACCGAACGCGTAAGAAATC
ATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGACCCTCAGGTTAAATATTCACTCA
GGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTGCCGTCACTGCGTC
ID TTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCTGTAACAAAGCGGGA
NO: CCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAGTCCACATTGATTATT
80 TGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTT
TTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGCTAGCAGTAATACGACTCACTATAGGG
GTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTTTAAGAGGAGGATATACCATGCACCATCATCAT
CACCATAAACGTATCAACAAAATCCGTCGTCGTCTGGTTAAAGACTCTAACACCAAAAAAGCGGGTAAA
ACCGGTCCGATGAAAACCCTGCTGGTTCGTGTTATGACCCCGGACCTGCGTGAACGTCTGGAAAACCTG
CGTAAAAAACCGGAAAACATCCCGCAGCCGATCTCTAACACCTCTCGTGCGAACCTGAACAAACTGCTG
ACCGACTACACCGAAATGAAAAAAGCGATCCTGCACGTTTACTGGGAAGAATTCCAGAAAGACCCGGTT
GGTCTGATGTCTCGTGTTGCGCAGCCGGCGCCGAAAAACATCGACCAGCGTAAACTGATCCCGGTTAAA
GACGGTAACGAACGTCTGACCTCTTCTGGTTTCGCGTGCTCTCAGTGCTGCCAGCCGCTGTACGTTTACA
AACTGGAACAGGTTAACGACAAAGGTAAACCGCACACCAACTACTTCGGTCGTTGCAACGTTTCTGAAC
ACGAACGTCTGATCCTGCTGTCTCCGCACAAACCGGAAGCGAACGACGAACTGGTTACCTACTCTCTGG
GTAAATTCGGTCAGCGTGCGCTGGACTTCTACTCTATCCACGTTACCCGTGAATCTAACCACCCGGTTAA
ACCGCTGGAACAGATCGGTGGTAACTCTTGCGCGTCTGGTCCGGTTGGTAAAGCGCTGTCTGACGCGTG
CATGGGTGCGGTTGCGTCTTTCCTGACCAAATACCAGGACATCATCCTGGAACACCAGAAAGTTATCAA
AAAAAACGAAAAACGTCTGGCGAACCTGAAAGACATCGCGTCTGCGAACGGTCTGGCGTTCCCGAAAA
TCACCCTGCCGCCGCAGCCGCACACCAAAGAAGGTATCGAAGCGTACAACAACGTTGTTGCGCAGATCG
TTATCTGGGTTAACCTGAACCTGTGGCAGAAACTGAAAATCGGTCGTGACGAAGCGAAACCGCTGCAGC
GTCTGAAAGGTTTCCCGTCTTTCCCGCTGGTTGAACGTCAGGCGAACGAAGTTGACTGGTGGGACATGGT
TTGCAACGTTAAAAAACTGATCAACGAAAAAAAAGAAGACGGTAAAGTTTTCTGGCAGAACCTGGCGG
GTTACAAACGTCAGGAAGCGCTGCTGCCGTACCTGTCTTCTGAAGAAGACCGTAAAAAAGGTAAAAAAT
TCGCGCGTTACCAGTTCGGTGACCTGCTGCTGCACCTGGAAAAAAAACACGGTGAAGACTGGGGTAAAG
TTTACGACGAAGCGTGGGAACGTATCGACAAAAAAGTTGAAGGTCTGTCTAAACACATCAAACTGGAAG
AAGAACGTCGTTCTGAAGACGCGCAGTCTAAAGCGGCGCTGACCGACTGGCTGCGTGCGAAAGCGTCTT
TCGTTATCGAAGGTCTGAAAGAAGCGGACAAAGACGAATTCTGCCGTTGCGAACTGAAACTGCAGAAAT
GGTACGGTGACCTGCGTGGTAAACCGTTCGCGATCGAAGCGGAAAACTCTATCCTGGACATCTCTGGTTT
CTCTAAACAGTACAACTGCGCGTTCATCTGGCAGAAAGACGGTGTTAAAAAACTGAACCTGTACCTGAT
CATCAACTACTTCAAAGGTGGTAAACTGCGTTTCAAAAAAATCAAACCGGAAGCGTTCGAAGCGAACCG
TTTCTACACCGTTATCAACAAAAAATCTGGTGAAATCGTTCCGATGGAAGTTAACTTCAACTTCGACGAC
CCGAACCTGATCATCCTGCCGCTGGCGTTCGGTAAACGTCAGGGTCGTGAATTCATCTGGAACGACCTGC
TGTCTCTGGAAACCGGTTCTCTGAAACTGGCGAACGGTCGTGTTATCGAAAAAACCCTGTACAACCGTC
GTACCCGTCAGGACGAACCGGCGCTGTTCGTTGCGCTGACCTTCGAACGTCGTGAAGTTCTGGACTCTTC
TAACATCAAACCGATGAACCTGATCGGTATCGACCGTGGTGAAAACATCCCGGCGGTTATCGCGCTGAC
CGACCCGGAAGGTTGCCCGCTGTCTCGTTTCAAAGACTCTCTGGGTAACCCGACCCACATCCTGCGTATC
GGTGAATCTTACAAAGAAAAACAGCGTACCATCCAGGCGGCGAAAGAAGTTGAACAGCGTCGTGCGGG
TGGTTACTCTCGTAAATACGCGTCTAAAGCGAAAAACCTGGCGGACGACATGGTTCGTAACACCGCGCG
TGACCTGCTGTACTACGCGGTTACCCAGGACGCGATGCTGATCTTCGAAAACCTGTCTCGTGGTTTCGGT
CGTCAGGGTAAACGTACCTTCATGGCGGAACGTCAGTACACCCGTATGGAAGACTGGCTGACCGCGAAA
CTGGCGTACGAAGGTCTGCCGTCTAAAACCTACCTGTCTAAAACCCTGGCGCAGTACACCTCTAAAACCT
GCTCTAACTGCGGTTTCACCATCACCTCTGCGGACTACGACCGTGTTCTGGAAAAACTGAAAAAAACCG
CGACCGGTTGGATGACCACCATCAACGGTAAAGAACTGAAAGTTGAAGGTCAGATCACCTACTACAACC
GTTACAAACGTCAGAACGTTGTTAAAGACCTGTCTGTTGAACTGGACCGTCTGTCTGAAGAATCTGTTAA
CAACGACATCTCTTCTTGGACCAAAGGTCGTTCTGGTGAAGCGCTGTCTCTGCTGAAAAAACGTTTCTCT
CACCGTCCGGTTCAGGAAAAATTCGTTTGCCTGAACTGCGGTTTCGAAACCCACGCGGACGAACAGGCG
GCGCTGAACATCGCGCGTTCTTGGCTGTTCCTGCGTTCTCAGGAATACAAAAAATACCAGACCAACAAA
ACCACCGGTAACACCGACAAACGTGCGTTCGTTGAAACCTGGCAGTCTTTCTACCGTAAAAAACTGAAA
GAAGTTTGGAAACCGGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATGTAGGGAGAC
CCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTACA
SEQ tgccgtcactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaacaaagcgggaccaaagccat
ID gacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacattgattatttgcacggcgtcacactttgctatgccatagcatttt
NO: tatccataagattagcggatcctacctgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctatctc
81 gtgtgagataggcggagatacgaactttaagAAGGAGatatacc
SEQ TGCCGTCACTGCGTCTTTTACTGGCTCTTCTCGCTAACCAAACCGGTAACCCCGCTTATTAAAAGCATTCT
ID GTAACAAAGCGGGACCAAAGCCATGACAAAAACGCGTAACAAAAGTGTCTATAATCACGGCAGAAAAG
NO: TCCACATTGATTATTTGCACGGCGTCACACTTTGCTATGCCATAGCATTTTTATCCATAAGATTAGCGGAT
82 CCTACCTGACGCTTTTTATCGCAACTCTCTACTGTTTCTCCATACCCGTTTTTTTGGGTAGCGGATCCTAC
CTGAC
SEQ AATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTCAAACAGGTTTCTAGAGCACAGCT
ID AACACCACGTCGTCCCTATCTGCTGCCCTAGGTCTATGAGTGGTTGCTGGATAACTTTACGGGCATGCAT
NO: AAGGCTCGTAATATATATTCAGGGAGACCACAACGGTTTCCCTCTACAAATAATTTTGTTTAACTTTTAC
83 TAGAGCTAGCAGTAATACGACTCACTATAGGGGTCTCATCTCGTGTGAGATAGGCGGAGATACGAACTT
TAAGAGGAGGATATACCA
SEQ GTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC
ID
NO:
84
SEQ actacattttttaagacctaattttgagt
ID
NO:
85
SEQ ctcaaaactcattcgaatctctactctttgtagat
ID
NO:
86
SEQ CTCTAGCAGGCCTGGCAAATTTCTACTGTTGTAGAT
ID
NO:
87
SEQ CCGTCTAAAACTCATTCAGAATTTCTACTAGTGTAGAT
ID
NO:
88
SEQ GTCTAGGTACTCTCTTTAATTTCTACTATTGT
ID
NO:
89
SEQ gttaagttatatagaataatttctactgttgtaga
ID
NO:
90
SEQ gtttaaaaccactttaaaatttctactattgta
ID
NO:
91
SEQ GTTTGAGAATGATGTAAAAATGTATGGTACACAGAAATGTTTTAATACCATATTTTTACATCACTCTCAA
ID ACATACATCTCTTGTTACTGTTTATCGTATCCAGATTAAATTTCACGTTTTT
NO:
92
SEQ CTCTACAACTGATAAAGAATTTCTACTTTTGTAGAT
ID
NO:
93
SEQ GTCTGGCCCCAAATTTTAATTTCTACTGTTGTAGAT
ID
NO:
94
SEQ GTCAAAAGACCTTTTTAATTTCTACTCTTGTAGAT
ID
NO:
95
SEQ GTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCCCGTTGAGCT
ID TCTACGGAAGTGGCAC
NO:
96
SEQ CGAGGTTCTGTCTTTTGGTCAGGACAACCGTCTAGCTATAAGTGCTGCAGGGGTGTGAGAAACTCCTATT
ID GCTGGACGATGTCTCTTTTAACGAGGCATTAGCAC
NO:
97
SEQ GAACGAGGGACGTTTTGTCTCCAATGATTTTGCTATGACGACCTCGAACTGTGCCTTCAAGTCTGAGGCG
ID AAAAAGAAATGGAAAAAAGTGTCTCATCGCTCTACCTCGTAGTTAGAGG
NO:
98
SEQ AATTACTGATGTTGTGATGAAGG
ID
NO:
99
SEQ TATACCATAAGGATTTAAAGACT
ID
NO:
100
SEQ GTCTTTACTCTCACCTTTCCACCTG
ID
NO:
101
SEQ ATTTGAAGGTATCTCCGATAAGTAAAACGCATCAAAG
ID
NO:
102
SEQ GTTTGAAGATATCTCCGATAAATAAGAAGCATCAAAG
ID
NO:
103
SEQ TTGTTTTAATACCATATTTTTACATCACTCTCAAAC
ID
NO:
104
SEQ AAAGAACGCTCGCTCAGTGTTCTGACCTTTCGAGCGCCTGTTCAGGGCGAAAACCCTGGGAGGCGCTCG
ID AATCATAGGTGGGACAAGGGATTCGCGGCGAAAA
NO:
105
SEQ GTTTGAGAATGATGTAAAAATGTATGGTACACAGAAATGTTTTAATACCATATTTTTACATCACTCTCAA
ID ACATACATCTCTTGTTACTGTTTATCGTATCCAGATTAAATTTCACGTTTTT
NO:
106
SEQ GTCTAGAGGACAGAATTTTTCAACGGGTGTGCCAATGGCCACTTTCCAGGTGGCAAAGCCCGTTGAGCT
ID TCTACGGAAGTGGCAC
NO:
107
SEQ MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVC
ID ISEDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLIL
NO: WLKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSNDIPTSIIYRIVDDNLPK
108 FLENKAKYESLKDKAPEAINYEQIKKDLAEELTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITK
FNTIIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVT
TMQSFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEY
ITQQIAPKNLDNPSKKEQELIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFD
EIAQNKDNLAQISIKYQNQGKKDLLQASAEDDVKAIKDLLDQTNNLLHKLKIFHISQSEDKANILDKDEH
FYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYL
GVMNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKN
GSPQKGYEKFEFNIEDCRKFIDFYKQSISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENIS
ESYIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK
ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND
VHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKEM
KEGYLSQVVHEIAKLVIEYNAIVVFEDLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDKTGG
VLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICYNLD
KGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGEC
IKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDADANGAY
HIGLKGLMLLGRIKNNQEGKKLNLVIKNEEYFEFVQNRNN
SEQ MSIYQEFVNKYSLSKTLRFELIPQGKTLENIKARGLILDDEKRAKDYKKAKQIIDKYHQFFIEEILSSVC
ID ISEDLLQNYSDVYFKLKKSDDDNLQKDFKSAKDTIKKQISEYIKDSEKFKNLFNQNLIDAKKGQESDLIL
NO: WLKQSKDNGIELFKANSDITDIDEALEIIKSFKGWTTYFKGFHENRKNVYSSDDIPTSIIYRIVDDNLPK
109 FLENKAKYESLKDKAPEAINYEQIKKDLAEELTFDIDYKTSEVNQRVFSLDEVFEIANFNNYLNQSGITK
FNTIIGGKFVNGENTKRKGINEYINLYSQQINDKTLKKYKMSVLFKQILSDTESKSFVIDKLEDDSDVVT
TMQSFYEQIAAFKTVEEKSIKETLSLLFDDLKAQKLDLSKIYFKNDKSLTDLSQQVFDDYSVIGTAVLEY
ITQQVAPKNLDNPSKKEQDLIAKKTEKAKYLSLETIKLALEEFNKHRDIDKQCRFEEILANFAAIPMIFD
EIAQNKDNLAQISLKYQNQGKKDLLQASAEEDVKAIKDLLDQTNNLLHRLKIFHISQSEDKANILDKDEH
FYLVFEECYFELANIVPLYNKIRNYITQKPYSDEKFKLNFENSTLANGWDKNKEPDNTAILFIKDDKYYL
GVMNKKNNKIFDDKAIKENKGEGYKKIVYKLLPGANKMLPKVFFSAKSIKFYNPSEDILRIRNHSTHTKN
GNPQKGYEKFEFNIEDCRKFIDFYKESISKHPEWKDFGFRFSDTQRYNSIDEFYREVENQGYKLTFENIS
ESYIDSVVNQGKLYLFQIYNKDFSAYSKGRPNLHTLYWKALFDERNLQDVVYKLNGEAELFYRKQSIPKK
ITHPAKEAIANKNKDNPKKESVFEYDLIKDKRFTEDKFFFHCPITINFKSSGANKFNDEINLLLKEKAND
VHILSIDRGERHLAYYTLVDGKGNIIKQDTFNIIGNDRMKTNYHDKLAAIEKDRDSARKDWKKINNIKEM
KEGYLSQVVHEIAKLVIEHNAIVVFEDLNFGFKRGRFKVEKQVYQKLEKMLIEKLNYLVFKDNEFDKTGG
VLRAYQLTAPFETFKKMGKQTGIIYYVPAGFTSKICPVTGFVNQLYPKYESVSKSQEFFSKFDKICYNLD
KGYFEFSFDYKNFGDKAAKGKWTIASFGSRLINFRNSDKNHNWDTREVYPTKELEKLLKDYSIEYGHGEC
IKAAICGESDKKFFAKLTSVLNTILQMRNSKTGTELDYLISPVADVNGNFFDSRQAPKNMPQDADANGAY
HIGLKGLMLLDRIKNNQEGKKLNLVIKNEEYFEFVQNRNN
SEQ MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRY
ID TRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDST
NO: DKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSR
110 RLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLA
AKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDG
GASQEEFYKFIKPILEKNIDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREK
IEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSL
LYEYFTVYNELTKVKYVTEGNIRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED
RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTG
WGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSP
AIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENT
QLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEV
VKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDEN
DKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDV
RKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSNIPQ
VNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKEL
LGITIMERSSFEKNPIDFLEAKGYKEVKKDLHKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLY
LASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENHH
LFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD
SEQ PKKKRKV
ID
NO:
111
SEQ KRPAATKKAGQAKKKK
ID
NO:
112
SEQ PAAKRVKLD
ID
NO:
113
SEQ RQRRNELKRSP
ID
NO:
114
SEQ NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY
ID
NO:
115
SEQ RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV
ID
NO:
116
SEQ VSRKRPRP
ID
NO:
117
SEQ PPKKARED
ID
NO:
118
SEQ PQPKKKPL
ID
NO:
119
SEQ SALIKKKKKMAP
ID
NO:
120
SEQ DRLRR
ID
NO:
121
SEQ PKQKKRK
ID
NO:
122
SEQ RKLKKKIKKL
ID
NO:
123
SEQ REKKKFLKRR
ID
NO:
124
SEQ KRKGDEVDGVDEVAKKKSKK
ID
NO:
125
SEQ RKCLQAGMNLEARKTKK
ID
NO:
126
SEQ ATGGGTAAGATGTATTATCTGGGTTTGGATATAGGCACTAACTCTGTGGGATATGCAGTAACTGATCCCT
ID CGTATCACTTGTTAAAGTTCAAAGGCGAACCCATGTGGGGAGCACATGTATTTGCTGCGGGTAATCAGA
NO: GTGCCGAAAGGCGATCTTTCAGAACATCCAGGAGGCGATTAGATAGGAGACAGCAAAGAGTAAAGCTT
127 GTGCAAGAGATCTTTGCTCCTGTCATTTCACCTATAGACCCTCGTTTTTTTATAAGATTGCACGAATCGGC
TCTATGGAGAGACGATGTTGCCGAAACAGATAAACATATCTTTTTCAATGATCCCACTTATACAGACAA
GGAATACTACTCCGACTACCCGACAATTCATCATTTGATCGTCGATCTTATGGAGAGCTCTGAAAAGCAT
GACCCCCGACTTGTCTATTTGGCTGTAGCTTGGTTAGTTGCTCATAGAGGTCATTTCTTGAATGAAGTAG
ATAAAGACAATATAGGTGATGTACTTTCTTTTGATGCTTTCTACCCGGAATTTTTGGCCTTTTTGTCAGAC
AATGGCGTCAGTCCCTGGGTCTGTGAGTCGAAGGCCCTTCAAGCTACTCTGCTGTCTAGGAATAGCGTCA
ACGACAAATATAAAGCATTAAAATCGCTGATATTCGGATCGCAAAAACCGGAAGATAACTTTGACGCTA
ACATCTCTGAAGATGGTTTAATCCAATTGCTGGCGGGTAAGAAAGTTAAAGTAAACAAACTATTCCCAC
AAGAGTCCAACGATGCTAGCTTTACGTTGAATGATAAAGAAGACGCTATTGAAGAAATTCTAGGTACTT
TAACGCCTGACGAGTGCGAATGGATCGCTCATATTCGCAGATTGTTCGATTGGGCCATCATGAAACACG
CGCTAAAGGATGGCAGGACGATATCTGAATCAAAAGTGAAGCTATACGAGCAGCATCATCATGACTTGA
CTCAGTTAAAGTACTTTGTGAAGACCTACCTAGCTAAAGAGTATGATGATATCTTCAGAAACGTAGACTC
CGAGACAACTAAAAATTATGTAGCTTATTCTTACCATGTGAAGGAAGTGAAAGGCACATTACCAAAAAA
TAAAGCAACGCAAGAAGAATTTTGTAAATACGTCCTTGGCAAAGTCAAAAACATTGAATGTTCCGAAGC
AGACAAGGTTGATTTTGATGAAATGATACAACGACTTACGGACAATTCTTTTATGCCAAAGCAAGTCTC
AGGTGAAAATAGAGTAATACCATACCAGTTGTACTACTATGAATTAAAGACAATTTTAAACAAAGCCGC
CTCATATCTACCTTTTTTGACACAATGCGGTAAAGATGCTATTTCTAACCAAGACAAATTACTGTCTATA
ATGACATTTCGCATACCATATTTCGTCGGCCCTTTAAGGAAAGATAATTCAGAACATGCCTGGTTGGAAC
GTAAAGCGGGTAAAATTTACCCGTGGAACTTTAATGATAAAGTAGATCTTGATAAATCGGAGGAAGCCT
TTATCCGTAGGATGACCAATACTTGCACGTATTACCCAGGAGAAGACGTGTTACCATTAGATTCACTTAT
CTATGAAAAGTTTATGATCTTGAATGAGATAAACAATATTAGGATTGACGGATACCCCATTTCTGTTGAT
GTGAAACAACAAGTATTTGGTTTATTTGAGAAGAAAAGGCGAGTAACAGTTAAGGATATTCAAAATCTA
CTATTATCTCTTGGAGCGTTGGATAAACACGGTAAGCTGACTGGTATTGACACGACAATACACTCTAATT
ATAACACTTATCATCATTTTAAATCTCTTATGGAGCGGGGAGTATTGACCAGAGATGATGTGGAAAGAA
TAGTGGAAAGAATGACATATTCTGACGATACTAAGAGGGTCAGACTGTGGTTAAATAATAATTATGGAA
CTCTAACAGCTGACGATGTTAAGCATATCTCAAGACTCAGAAAACACGATTTCGGCCGTTTGTCTAAAAT
GTTTTTGACAGGATTGAAAGGTGTTCATAAGGAGACAGGCGAGAGAGCAAGTATACTGGATTTTATGTG
GAATACTAACGACAATTTAATGCAACTACTGTCCGAATGTTACACATTCTCGGATGAGATCACCAAATTA
CAAGAGGCCTACTACGCAAAAGCTCAATTATCGCTAAATGACTTCTTGGACTCTATGTATATATCAAACG
CCGTTAAGAGACCTATTTATCGGACCTTAGCGGTAGTAAATGATATTAGAAAGGCATGCGGGACGGCAC
CTAAAAGAATTTTCATCGAGATGGCGCGAGATGGAGAGTCTAAGAAGAAAAGATCTGTGACTCGTAGA
GAGCAAATTAAAAATCTCTATAGATCAATTCGTAAAGACTTTCAACAAGAAGTTGATTTTCTGGAAAAG
ATATTGGAAAATAAGAGTGACGGGCAGCTTCAGTCTGACGCTTTATATTTGTATTTTGCTCAATTAGGCA
GAGACATGTACACAGGTGATCCAATCAAATTAGAACATATTAAAGACCAATCTTTTTACAACATTGATC
ATATTTATCCTCAATCGATGGTGAAAGATGACAGTTTGGATAACAAGGTACTAGTCCAAAGCGAAATCA
ATGGCGAAAAGAGTTCGCGCTATCCATTAGACGCAGCCATTAGAAACAAAATGAAGCCGTTGTGGGATG
CCTACTATAATCATGGATTAATTTCTCTTAAGAAATACCAGCGTTTGACGAGATCTACTCCATTTACGGA
CGACGAGAAGTGGGATTTTATCAATCGTCAGCTAGTTGAAACTAGGCAATCTACTAAAGCTTTAGCAAT
ATTGTTAAAGCGTAAGTTTCCAGATACTGAAATAGTTTACTCAAAGGCTGGACTATCCAGCGATTTTAGA
CATGAATTCGGCCTGGTTAAGAGTAGGAATATTAATGATCTACACCATGCTAAAGATGCCTTTCTCGCAA
TAGTTACTGGGAACGTTTATCATGAAAGATTTAATAGAAGATGGTTTATGGTTAACCAGCCATACTCTGT
GAAAACTAAGACATTGTTTACCCATTCAATTAAGAATGGCAACTTTGTCGCTTGGAATGGAGAAGAAGA
TCTTGGACGTATCGTAAAGATGTTGAAACAAAACAAGAACACAATCCACTTCACCAGGTTTTCCTTTGAT
AGGAAGGAGGGATTGTTCGATATTCAACCTCTCAAAGCTTCTACCGGATTGGTTCCACGAAAAGCAGGG
TTGGATGTTGTTAAATATGGAGGATACGATAAAAGCACTGCCGCGTATTATTTATTAGTACGTTTTACAC
TCGAGGATAAGAAGACTCAACACAAATTGATGATGATTCCTGTTGAAGGTCTCTACAAAGCACGTATTG
ACCATGATAAAGAGTTTTTAACAGATTATGCTCAGACCACGATCAGCGAAATTCTTCAAAAGGACAAGC
AGAAAGTGATCAACATCATGTTCCCTATGGGCACGAGACATATCAAACTGAATTCGATGATTTCTATTGA
TGGATTCTATCTTTCTATTGGTGGGAAGAGTAGCAAAGGTAAGTCAGTACTATGTCATGCTATGGTGCCA
TTAATCGTCCCACACAAGATAGAATGTTATATCAAGGCTATGGAATCGTTTGCAAGAAAATTCAAAGAA
AATAATAAATTGAGGATCGTTGAAAAGTTTGATAAAATAACTGTTGAAGATAACTTGAACTTATACGAG
CTTTTTCTACAAAAGTTGCAACATAACCCATATAATAAATTTTTCTCTACACAATTTGATGTGTTGACGA
ACGGTAGAAGTACATTCACCAAATTGTCTCCAGAGGAGCAAGTCCAGACTTTACTTAATATACTGAGTA
TATTTAAAACTTGTCGTTCTTCTGGGTGTGATTTAAAATCAATAAATGGTTCCGCTCAAGCGGCTAGAAT
TATGATATCCGCTGATTTAACTGGCTTATCAAAAAAGTATTCAGATATTAGATTAGTTGAGCAAAGCGCA
TCAGGTCTATTTGTTTCAAAATCTCAAAATCTCTTGGAATACTTGCCAAAAAAGAAAAGGAAAGTTTAG
SEQ ATGAGTAGTTTAACAAAGTTTACCAATAAATATAGTAAGCAACTAACTATAAAGAACGAATTGATACCG
ID GTCGGTAAGACTTTGGAAAACATAAAAGAAAATGGGTTGATTGATGGAGACGAGCAATTGAATGAGAA
NO: TTATCAAAAAGCAAAGATAATAGTAGATGATTTTTTGAGAGACTTTATTAATAAAGCTCTAAATAACACT
128 CAAATTGGTAACTGGAGAGAGCTAGCCGACGCCTTGAACAAGGAAGATGAGGATAATATTGAGAAATT
ACAAGATAAGATTAGAGGGATTATCGTGTCTAAGTTTGAGACTTTTGATCTGTTCAGTTCGTATTCGATT
AAAAAGGACGAGAAAATCATCGATGATGATAACGATGTGGAAGAAGAGGAGCTAGACCTTGGGAAGAA
GACATCTAGCTTCAAATACATATTCAAGAAAAATTTGTTCAAACTTGTCCTTCCTTCATATTTAAAAACA
ACAAATCAAGATAAGTTAAAAATCATTTCTTCCTTCGATAATTTTAGTACTTATTTTCGTGGTTTTTTCGA
AAACAGGAAAAATATATTCACTAAAAAGCCTATATCTACCTCTATAGCTTATAGAATTGTTCACGATAAT
TTCCCAAAATTTCTAGATAATATCAGGTGTTTTAATGTTTGGCAAACCGAGTGTCCTCAGTTAATAGTCA
AGGCCGACAACTACCTTAAAAGCAAGAATGTGATTGCAAAAGATAAGTCTTTGGCTAACTATTTTACAG
TCGGTGCCTATGATTATTTTCTGAGTCAAAATGGTATCGATTTCTATAACAACATTATTGGCGGCTTACC
AGCTTTTGCCGGGCATGAGAAGATTCAGGGTTTGAACGAATTTATCAATCAAGAATGTCAAAAGGATTC
TGAATTAAAGTCTAAGCTCAAGAATAGGCACGCTTTCAAAATGGCAGTCTTATTCAAACAAATCCTTTCA
GACAGAGAAAAGTCATTTGTGATTGACGAGTTCGAATCAGACGCTCAGGTAATTGATGCTGTTAAAAAT
TTTTACGCGGAACAATGCAAAGATAATAACGTCATATTTAATTTATTGAATCTGATCAAGAATATTGCTT
TTTTGTCGGATGATGAGTTAGACGGCATTTTCATAGAGGGTAAATACCTGTCCTCTGTGTCTCAAAAATT
GTATAGTGATTGGTCAAAGTTGAGAAATGATATTGAAGATTCGGCTAATTCTAAACAGGGTAACAAAGA
ATTAGCGAAGAAAATCAAAACTAACAAGGGTGATGTTGAAAAGGCTATAAGTAAGTACGAGTTCAGTTT
ATCTGAACTAAATTCAATTGTTCATGATAACACAAAATTTTCCGATCTTTTATCATGCACATTACATAAA
GTTGCAAGTGAAAAATTAGTCAAAGTAAACGAAGGTGATTGGCCAAAACATCTAAAAAACAACGAGGA
AAAACAGAAGATAAAAGAACCTCTTGACGCTTTATTGGAAATATACAATACTCTATTAATATTTAACTGT
AAAAGTTTTAACAAAAATGGTAATTTCTATGTCGACTACGATCGCTGCATTAATGAGTTGTCCAGTGTTG
TGTACTTGTATAATAAAACTCGTAATTATTGTACGAAAAAGCCGTACAACACTGACAAATTTAAGTTGA
ATTTCAACTCCCCACAACTGGGTGAGGGCTTCTCTAAAAGTAAAGAGAATGATTGCCTTACATTATTATT
TAAAAAAGATGATAATTATTATGTCGGAATCATAAGAAAGGGGGCAAAGATCAACTTCGATGACACTCA
GGCCATAGCAGACAACACAGATAACTGTATATTCAAAATGAATTATTTTTTGCTGAAGGATGCTAAAAA
ATTTATCCCCAAATGTTCAATACAATTAAAAGAGGTTAAGGCCCATTTCAAAAAGTCGGAAGATGACTA
TATTTTGTCCGATAAGGAAAAATTCGCTAGTCCGCTTGTTATTAAAAAATCCACATTTCTTCTCGCTACG
GCTCATGTGAAAGGAAAGAAGGGCAATATTAAGAAATTTCAGAAAGAATACTCCAAAGAAAATCCTAC
GGAGTATAGAAATAGTCTGAACGAATGGATAGCATTCTGCAAAGAGTTCTTGAAGACCTATAAAGCTGC
CACCATCTTTGATATTACAACTTTGAAAAAGGCCGAGGAATACGCTGACATTGTGGAATTCTATAAGGA
TGTAGATAATCTTTGTTACAAGTTAGAATTTTGCCCTATCAAAACTTCTTTTATCGAAAATCTTATAGATA
ATGGCGATTTATACCTGTTTAGAATTAATAACAAGGACTTTTCTTCAAAAAGTACAGGCACGAAAAACTT
ACACACATTATACTTGCAGGCTATATTTGACGAGCGAAACTTAAACAACCCCACGATAATGTTGAATGG
AGGTGCAGAGTTATTCTACAGAAAAGAATCTATAGAACAGAAAAATCGGATCACGCACAAAGCCGGTA
GTATCTTAGTGAATAAAGTGTGCAAAGATGGTACAAGTCTAGATGACAAAATCCGTAACGAAATTTACC
AGTATGAAAACAAATTCATTGATACTCTTTCGGACGAAGCTAAAAAGGTTCTGCCAAACGTTATTAAGA
AAGAGGCTACGCATGATATAACAAAAGATAAACGTTTCACTAGCGACAAATTCTTCTTTCATTGTCCTTT
AACAATCAACTACAAGGAAGGTGACACCAAACAATTTAATAATGAAGTGCTCTCATTCCTTAGAGGTAA
CCCCGATATCAATATTATCGGCATTGATAGAGGAGAAAGAAACCTAATCTATGTAACAGTCATTAACCA
AAAAGGCGAAATATTGGATAGCGTCTCCTTCAATACTGTCACCAATAAGTCATCGAAGATAGAACAAAC
TGTTGATTACGAAGAAAAATTGGCCGTTAGAGAAAAGGAACGTATCGAAGCGAAGAGATCTTGGGATA
GCATATCCAAGATTGCCACCTTGAAGGAGGGTTATCTAAGCGCGATCGTACATGAAATCTGCTTATTAAT
GATTAAGCATAATGCTATTGTCGTGTTAGAAAACCTGAATGCCGGTTTTAAAAGGATTAGAGGTGGTTTG
TCAGAAAAGTCAGTATATCAAAAGTTTGAAAAGATGCTTATTAATAAACTCAACTACTTCGTTAGCAAG
AAAGAAAGTGATTGGAATAAACCGTCAGGTTTGCTCAATGGTCTTCAGTTAAGTGATCAATTTGAGTCTT
TCGAAAAATTAGGAATTCAAAGTGGATTCATTTTTTATGTACCAGCCGCGTACACTTCAAAAATTGACCC
TACGACCGGATTTGCCAACGTCTTGAATTTGTCCAAGGTCAGAAATGTTGACGCCATCAAAAGTTTTTTT
AGCAACTTCAATGAAATCTCTTATTCCAAAAAGGAAGCCCTTTTCAAGTTTTCTTTTGACCTAGACTCGTT
ATCGAAGAAAGGATTTTCATCTTTCGTAAAGTTTAGCAAGTCCAAGTGGAATGTATACACATTCGGCGA
GAGAATTATCAAGCCCAAGAACAAACAGGGCTATAGAGAAGACAAGAGAATCAACTTGACTTTTGAGA
TGAAAAAATTACTCAACGAATACAAGGTTTCATTTGATTTGGAGAACAACTTGATTCCCAATTTGACATC
AGCTAACTTGAAGGATACGTTCTGGAAGGAGTTATTCTTTATATTCAAAACGACATTACAACTGCGTAAT
AGTGTTACAAACGGTAAAGAAGATGTATTAATCTCACCTGTAAAGAATGCCAAAGGAGAATTTTTCGTA
TCCGGTACTCACAATAAGACACTACCACAGGATTGCGACGCTAACGGTGCGTATCATATTGCGTTGAAA
GGATTAATGATACTTGAAAGAAATAACCTTGTTCGCGAAGAAAAAGACACCAAGAAGATCATGGCTATT
AGCAATGTTGATTGGTTTGAATACGTGCAAAAGAGGAGAGGTGTTTTGTAA
SEQ ATGAACAATTATGACGAGTTCACAAAGCTATACCCTATCCAAAAAACTATCAGGTTCGAATTGAAACCA
ID CAAGGGAGAACAATGGAACATCTGGAGACATTCAACTTTTTTGAAGAGGACAGAGACAGAGCGGAGAA
NO: ATACAAAATTTTAAAAGAGGCCATCGATGAATATCACAAAAAGTTTATCGACGAGCATTTAACAAACAT
129 GTCTTTGGACTGGAATTCACTTAAACAAATTTCTGAGAAATATTATAAGTCTCGGGAGGAAAAAGACAA
AAAGGTCTTTTTGTCCGAGCAAAAGAGAATGAGACAAGAAATTGTCTCGGAGTTTAAAAAAGATGATCG
GTTCAAAGATTTGTTTAGCAAGAAATTGTTTTCTGAATTGTTGAAGGAGGAGATATACAAGAAAGGCAA
CCATCAAGAAATAGATGCTTTGAAATCGTTTGACAAGTTCAGCGGTTACTTCATTGGTTTACATGAAAAT
AGGAAGAACATGTATAGCGACGGCGATGAGATCACCGCTATATCGAATAGAATCGTTAACGAAAATTTT
CCGAAATTTTTGGATAATTTGCAAAAATACCAGGAAGCTAGGAAAAAGTACCCTGAATGGATAATAAAG
GCGGAATCAGCTTTGGTGGCTCACAACATAAAGATGGATGAAGTCTTCTCGCTGGAATATTTTAACAAA
GTATTAAATCAGGAAGGAATCCAAAGATACAACTTAGCCTTGGGTGGATACGTAACCAAATCAGGTGAG
AAAATGATGGGCTTAAATGATGCACTTAATCTAGCTCACCAATCCGAAAAGTCCTCTAAAGGGAGGATA
CACATGACACCATTGTTTAAGCAAATCCTTTCGGAGAAAGAATCTTTTTCATATATCCCCGATGTTTTCA
CTGAGGATAGTCAATTGTTGCCCAGCATTGGTGGATTTTTTGCACAAATAGAAAATGATAAAGATGGTA
ACATCTTCGATAGAGCCTTGGAATTGATAAGCTCCTATGCAGAATACGATACGGAACGAATATACATTA
GACAAGCTGACATCAACAGAGTAAGCAATGTTATTTTTGGTGAGTGGGGAACTTTAGGTGGATTAATGC
GGGAGTACAAAGCTGACTCAATCAATGATATTAATTTGGAACGTACGTGCAAAAAAGTCGATAAGTGGC
TTGATAGTAAGGAGTTTGCTCTGTCGGATGTACTAGAAGCAATTAAGAGAACAGGAAACAATGATGCAT
TTAATGAATATATTAGTAAAATGAGGACGGCTAGAGAAAAGATAGACGCCGCACGTAAGGAAATGAAG
TTTATTTCCGAGAAAATATCTGGCGATGAAGAGTCGATTCACATCATCAAGACCCTACTCGATTCTGTTC
AGCAATTTCTCCATTTTTTTAACCTCTTCAAAGCAAGACAAGACATTCCCTTAGATGGGGCTTTTTATGCC
GAATTTGATGAAGTTCATTCAAAGTTGTTTGCTATTGTTCCTCTTTACAATAAGGTCCGTAATTACCTTAC
TAAAAATAACTTGAACACCAAGAAAATAAAGTTAAACTTCAAGAATCCGACTCTTGCCAACGGGTGGGA
TCAGAATAAAGTTTATGATTATGCTAGCTTAATATTTCTAAGAGATGGGAATTATTACTTAGGAATCATC
AATCCAAAGCGTAAGAAAAACATTAAATTTGAACAAGGGTCAGGCAATGGCCCATTCTATAGAAAAAT
GGTGTATAAGCAAATACCAGGACCTAACAAGAACTTGCCTCGCGTATTTTTAACTTCAACAAAGGGTAA
AAAAGAATATAAACCAAGCAAAGAAATTATTGAAGGTTACGAAGCAGATAAACACATCAGAGGTGATA
AGTTCGATCTGGATTTCTGCCATAAATTGATTGACTTTTTTAAGGAATCTATAGAAAAACATAAGGACTG
GTCCAAATTTAATTTCTACTTCTCACCTACAGAAAGTTATGGTGACATTTCAGAATTTTATTTAGACGTTG
AGAAACAAGGATATAGGATGCATTTTGAAAATATTTCAGCGGAAACCATCGACGAATACGTTGAGAAG
GGTGATTTATTCTTGTTCCAAATTTACAATAAAGACTTCGTTAAAGCTGCAACCGGAAAGAAGGATATGC
ATACCATATATTGGAACGCTGCATTCTCGCCAGAAAACTTACAAGATGTCGTTGTAAAGCTTAATGGAG
AAGCTGAGCTGTTCTATAGAGACAAGAGTGATATAAAAGAGATTGTGCATCGGGAAGGTGAAATTCTGG
TGAACAGAACTTACAATGGTCGTACACCCGTTCCAGACAAAATACATAAAAAACTGACCGATTATCATA
ATGGTAGGACAAAGGACTTGGGCGAGGCCAAGGAGTACCTCGATAAAGTTAGATATTTCAAGGCACACT
ATGATATTACGAAAGACAGGAGATATTTAAACGATAAAATTTACTTTCATGTCCCTTTGACCCTTAACTT
TAAAGCTAATGGTAAAAAGAATTTGAACAAAATGGTAATTGAGAAGTTTTTATCGGACGAAAAAGCTCA
CATAATCGGAATCGACCGCGGAGAGAGAAATTTACTGTATTATAGTATCATCGACAGAAGTGGAAAGAT
TATTGATCAGCAATCTTTGAACGTCATTGATGGGTTTGACTATCGGGAAAAGTTAAATCAAAGGGAAAT
TGAAATGAAGGATGCGAGACAATCATGGAATGCCATTGGTAAAATTAAAGATCTCAAGGAGGGGTACTT
ATCAAAAGCTGTACACGAGATAACTAAAATGGCTATCCAATATAATGCAATTGTTGTAATGGAAGAATT
GAATTATGGTTTTAAACGCGGCAGGTTTAAAGTCGAAAAACAAATATACCAAAAGTTTGAAAACATGTT
AATTGATAAGATGAACTATCTTGTTTTCAAAGATGCACCTGATGAGAGTCCTGGCGGTGTGCTGAACGCC
TATCAATTAACAAACCCATTAGAGTCCTTTGCTAAACTGGGTAAACAAACTGGCATTCTATTTTATGTTC
CAGCCGCTTACACCTCAAAGATCGATCCAACGACCGGTTTTGTAAACTTATTTAATACTTCTTCCAAAAC
AAACGCGCAAGAACGCAAAGAATTCCTACAAAAATTTGAATCAATATCCTATAGCGCAAAAGATGGAG
GTATATTCGCTTTCGCTTTTGACTACAGAAAGTTTGGCACTTCCAAGACAGATCATAAAAATGTGTGGAC
CGCTTATACCAACGGAGAAAGGATGCGTTATATTAAAGAAAAAAAGAGGAACGAACTATTTGATCCATC
GAAAGAAATTAAAGAAGCTTTGACAAGCAGCGGAATCAAATATGATGGAGGTCAAAACATACTTCCAG
ATATTCTCAGATCTAATAATAACGGTCTTATTTACACGATGTATTCATCTTTTATCGCTGCCATCCAAATG
CGTGTGTATGATGGCAAGGAAGATTATATTATATCTCCTATTAAAAATTCAAAGGGTGAATTTTTTCGCA
CGGATCCAAAAAGAAGAGAGCTTCCAATTGACGCCGATGCTAACGGTGCTTACAATATTGCATTGCGTG
GTGAACTTACTATGAGAGCCATCGCCGAAAAGTTTGATCCGGACAGTGAAAAAATGGCGAAATTGGAGC
TAAAGCACAAGGATTGGTTTGAATTCATGCAGACCCGTGGCGATTGA
SEQ ATGACTAAAACGTTCGACTCCGAGTTTTTTAATCTCTATTCCTTGCAAAAGACCGTTAGGTTTGAATTGA
ID AACCAGTTGGTGAAACTGCCTCATTTGTCGAAGACTTTAAAAACGAGGGATTGAAAAGAGTGGTTAGTG
NO: AAGATGAAAGAAGGGCAGTAGACTATCAAAAGGTTAAAGAAATCATTGACGATTACCACAGAGATTTT
130 ATAGAAGAATCTCTGAACTATTTTCCAGAGCAGGTTTCAAAAGATGCTCTAGAGCAAGCGTTTCATTTGT
ATCAAAAGTTGAAAGCAGCGAAGGTGGAAGAAAGGGAAAAAGCTTTAAAAGAATGGGAAGCATTACA
GAAAAAATTGCGAGAAAAAGTCGTCAAATGTTTCAGCGACTCTAATAAAGCTCGCTTTTCTAGAATCGA
TAAAAAAGAATTGATTAAGGAAGATTTAATAAATTGGCTGGTAGCACAAAACAGAGAGGATGATATTCC
TACTGTTGAAACGTTCAATAATTTTACTACTTACTTCACTGGTTTCCATGAGAACAGGAAGAATATTTAC
TCTAAAGATGATCACGCTACTGCTATAAGTTTTAGGTTGATTCACGAAAACTTGCCTAAATTTTTTGACA
ATGTCATCAGTTTTAACAAGTTGAAAGAAGGTTTCCCGGAATTAAAATTCGACAAAGTTAAAGAAGATT
TAGAAGTAGATTACGACTTGAAGCATGCGTTTGAAATTGAATATTTCGTTAATTTCGTCACACAAGCTGG
TATCGACCAATATAATTACCTGCTTGGAGGCAAAACTCTAGAAGACGGTACGAAGAAACAAGGAATGA
ATGAACAGATTAATTTATTTAAGCAACAACAAACTCGCGATAAAGCTAGACAGATTCCAAAACTGATTC
CACTTTTCAAACAGATTCTATCTGAGAGAACTGAATCTCAGAGTTTTATCCCTAAGCAGTTCGAGTCTGA
TCAGGAACTATTCGATTCCCTGCAGAAATTGCATAACAACTGTCAAGATAAGTTTACCGTTTTGCAACAG
GCGATCTTGGGATTGGCTGAGGCAGATCTTAAAAAGGTCTTTATTAAAACTAGTGATCTAAACGCATTGT
CTAACACTATTTTTGGAAATTATTCTGTGTTCTCAGACGCGCTCAATTTATATAAAGAGTCGCTAAAAAC
TAAAAAGGCTCAAGAAGCTTTTGAAAAGTTGCCTGCACATAGTATTCATGATTTAATCCAATACTTAGAA
CAATTTAATTCGTCTCTCGATGCTGAAAAGCAACAGTCTACCGATACTGTATTAAACTACTTTATTAAAA
CCGACGAATTATATAGTCGTTTCATTAAATCCACCTCTGAGGCATTCACCCAAGTACAACCTCTCTTTGA
ACTGGAAGCTTTGAGCTCCAAAAGAAGACCCCCAGAAAGTGAAGATGAGGGGGCTAAAGGCCAAGAAG
GTTTCGAACAAATTAAGAGAATCAAAGCTTATCTAGACACTCTAATGGAGGCTGTCCACTTTGCTAAGCC
TTTGTATCTTGTCAAGGGTAGAAAGATGATAGAGGGTCTAGACAAGGATCAAAGCTTCTACGAAGCGTT
TGAAATGGCCTACCAGGAGTTGGAGTCTTTAATCATCCCCATTTACAATAAGGCCAGATCTTACCTGTCT
AGGAAGCCATTTAAAGCGGATAAATTCAAAATTAATTTTGACAATAATACACTTCTATCTGGGTGGGAT
GCTAACAAGGAGACGGCTAACGCCAGCATATTGTTTAAGAAGGATGGTTTATACTACCTGGGAATCATG
CCAAAAGGCAAAACTTTCTTGTTCGATTATTTCGTTAGTTCAGAAGATTCTGAAAAGTTGAAACAACGGA
GACAGAAAACCGCAGAGGAAGCGCTCGCACAGGATGGAGAATCCTATTTTGAAAAAATACGGTATAAA
CTCCTACCAGGTGCTAGTAAGATGTTGCCAAAGGTATTTTTTAGCAATAAAAATATTGGGTTTTACAATC
CCTCAGATGATATTCTACGAATTCGGAATACGGCCTCTCATACTAAGAATGGTACTCCCCAGAAGGGTC
ATTCCAAGGTAGAATTTAACTTGAATGACTGTCACAAAATGATTGATTTTTTTAAATCTTCCATACAGAA
ACATCCCGAGTGGGGATCCTTTGGTTTCACTTTTTCTGATACGTCGGACTTTGAAGATATGAGTGCTTTCT
ACCGAGAAGTTGAAAATCAAGGTTACGTTATAAGTTTTGATAAAATAAAAGAAACTTACATTCAGTCTC
AAGTTGAGCAAGGTAACTTATATTTATTTCAAATTTACAACAAAGATTTTAGTCCGTATTCAAAGGGAAA
GCCAAACCTGCACACTTTATACTGGAAAGCTCTGTTTGAAGAGGCTAATTTGAATAACGTAGTGGCTAA
GCTAAACGGCGAAGCAGAAATCTTTTTCAGAAGACACAGTATCAAAGCATCTGATAAAGTGGTACATCC
TGCTAATCAAGCTATAGATAATAAGAATCCCCATACTGAGAAGACGCAGTCCACATTTGAATATGACTT
GGTCAAAGACAAAAGATATACCCAAGACAAATTTTTTTTTCATGTACCGATATCTTTAAACTTTAAGGCT
CAGGGCGTTTCAAAGTTTAATGATAAGGTAAATGGATTCTTAAAGGGCAATCCCGACGTTAATATAATC
GGTATAGATCGAGGTGAGAGACATCTTTTATACTTTACCGTGGTGAATCAAAAAGGAGAAATATTAGTG
CAAGAGTCCTTGAATACATTAATGTCTGACAAGGGTCATGTCAACGATTATCAACAGAAATTGGACAAG
AAGGAACAGGAAAGGGACGCTGCCAGGAAGTCCTGGACGACAGTAGAAAATATTAAAGAATTAAAAGA
AGGTTATTTATCACATGTGGTTCATAAACTTGCACATTTAATCATCAAATATAACGCAATAGTGTGCTTG
GAAGATCTTAATTTTGGCTTCAAGAGGGGTAGGTTCAAGGTCGAAAAACAGGTCTACCAGAAGTTCGAG
AAAGCTCTGATCGATAAATTGAATTATCTTGTTTTCAAAGAAAAAGAATTAGGAGAAGTTGGTCATTATC
TTACAGCATACCAACTCACTGCACCATTTGAAAGCTTCAAAAAGCTAGGCAAGCAATCTGGGATTTTGTT
CTATGTTCCGGCTGATTATACATCAAAGATAGATCCTACCACAGGCTTTGTAAATTTTTTAGATCTTAGG
TACCAATCCGTTGAAAAAGCTAAACAGTTGCTGTCCGATTTTAATGCGATAAGATTTAATAGTGTTCAGA
ATTATTTTGAGTTCGAAATTGATTATAAAAAATTGACACCAAAACGTAAAGTAGGAACACAATCTAAAT
GGGTTATTTGTACCTATGGAGATGTTAGATACCAAAACAGAAGAAATCAGAAAGGTCACTGGGAAACTG
AAGAAGTTAACGTTACTGAAAAACTTAAAGCTCTATTTGCGAGCGATTCAAAAACGACGACGGTGATCG
ATTATGCAAATGATGATAACCTTATTGATGTAATTCTGGAACAAGATAAGGCATCATTTTTTAAAGAACT
ACTATGGTTGTTAAAGCTAACCATGACCCTAAGGCACTCCAAGATAAAGTCAGAGGATGATTTTATCCTC
TCTCCAGTGAAAAACGAACAAGGTGAGTTTTACGACTCAAGAAAGGCGGGTGAAGTCTGGCCTAAGGAT
GCTGATGCCAATGGAGCTTATCACATCGCTCTGAAGGGGCTATGGAACTTACAGCAAATTAACCAATGG
GAAAAAGGTAAAACTTTAAACCTCGCCATAAAGAACCAGGATTGGTTCAGCTTTATCCAAGAAAAACCA
TATCAAGAATAA
SEQ ATGCACACAGGAGGTCTACTCTCGATGGATGCTAAGGAATTTACCGGTCAATATCCGCTGTCCAAAACTT
ID TGCGTTTTGAGCTTAGACCTATTGGCCGAACGTGGGATAACCTAGAGGCTTCTGGTTATTTGGCGGAAGA
NO: TAGACATAGAGCTGAGTGTTATCCCCGAGCTAAAGAATTGCTGGATGATAACCACAGGGCGTTCCTGAA
131 TAGAGTTCTACCGCAAATCGATATGGATTGGCATCCAATTGCTGAAGCTTTCTGCAAGGTGCACAAAAA
TCCAGGTAATAAAGAATTGGCTCAGGATTATAATTTGCAGCTTAGTAAGAGAAGAAAAGAAATTTCCGC
TTATTTGCAGGATGCTGATGGATACAAGGGGTTGTTCGCGAAACCTGCCCTGGACGAAGCTATGAAAAT
AGCTAAGGAAAACGGCAATGAATCTGATATTGAAGTTTTGGAAGCCTTCAATGGATTTTCCGTTTATTTC
ACTGGTTATCATGAGAGTAGGGAGAATATATACTCAGACGAAGATATGGTATCCGTCGCCTATCGCATA
ACTGAAGATAATTTTCCAAGGTTCGTGTCGAACGCGTTAATTTTTGATAAACTAAATGAATCGCACCCGG
ATATTATTTCGGAAGTGTCCGGTAATCTGGGGGTAGACGATATTGGTAAATATTTTGATGTGTCCAACTA
CAATAATTTCCTTAGTCAAGCAGGAATTGATGACTACAACCATATTATAGGAGGGCATACAACTGAAGA
CGGTCTCATTCAAGCTTTTAACGTAGTGTTAAACCTAAGGCACCAAAAAGACCCAGGTTTTGAGAAAAT
TCAATTTAAGCAACTCTACAAGCAGATACTGAGCGTTAGGACTAGTAAGTCATATATCCCAAAGCAATT
CGATAACTCAAAGGAAATGGTCGACTGTATATGCGACTACGTCTCAAAAATAGAAAAATCTGAAACAGT
AGAAAGAGCTCTGAAATTGGTAAGAAATATATCTTCTTTTGATTTAAGAGGTATTTTCGTAAATAAAAAA
AACCTTCGAATTTTGTCTAATAAGTTAATTGGAGACTGGGACGCAATAGAGACAGCTTTGATGCACAGTT
CCAGCAGTGAAAACGATAAGAAATCAGTGTATGACTCTGCAGAGGCATTCACCCTTGATGATATCTTCA
GTTCTGTGAAAAAGTTCAGCGACGCCTCCGCTGAGGATATAGGAAACCGCGCTGAAGACATATGTCGTG
TTATCTCAGAAACAGCTCCTTTCATTAACGACTTAAGGGCTGTAGATTTGGATTCTTTAAATGATGACGG
CTATGAAGCGGCCGTGTCTAAAATACGGGAATCTCTTGAACCCTACATGGATCTATTTCACGAATTGGAG
ATCTTTAGCGTGGGTGATGAGTTTCCTAAATGTGCTGCCTTTTATAGCGAGTTGGAAGAGGTCTCAGAAC
AACTGATTGAAATCATTCCTTTATTTAACAAAGCAAGAAGTTTTTGCACAAGGAAAAGGTATTCAACCG
ACAAAATCAAAGTCAATTTAAAATTCCCTACTCTGGCAGATGGATGGGATCTAAATAAAGAAAGGGATA
ACAAAGCCGCAATTCTAAGAAAAGACGGTAAATACTACCTGGCAATTTTAGACATGAAGAAAGATCTCA
GTAGTATTCGTACGAGCGATGAGGACGAGTCTTCTTTTGAAAAGATGGAATATAAATTGCTCCCTTCTCC
TGTGAAAATGCTTCCAAAAATTTTTGTTAAATCGAAAGCCGCCAAAGAAAAGTACGGGTTGACCGATAG
AATGTTAGAATGCTACGATAAAGGTATGCATAAGTCGGGTAGTGCTTTTGATTTGGGTTTTTGTCATGAA
TTGATCGATTACTATAAGCGCTGCATTGCCGAGTACCCAGGCTGGGATGTTTTCGACTTTAAATTTCGTG
AGACAAGCGATTACGGATCCATGAAAGAATTTAATGAAGACGTCGCTGGCGCAGGTTACTATATGTCAC
TTAGAAAGATTCCATGTTCCGAAGTTTATCGTTTACTGGACGAGAAGTCAATTTACTTGTTTCAAATATA
TAATAAGGATTATAGCGAAAACGCACATGGGAATAAGAATATGCATACGATGTATTGGGAGGGCTTGTT
CTCACCACAAAATTTGGAATCACCAGTCTTCAAATTGTCCGGAGGCGCAGAACTTTTTTTCAGAAAGTCA
TCTATTCCTAATGACGCTAAAACGGTACATCCGAAAGGTTCAGTTCTTGTTCCCAGAAACGACGTCAATG
GTAGAAGAATACCAGACTCGATCTACAGAGAGTTGACAAGGTATTTTAACCGTGGGGATTGCAGGATCA
GTGATGAAGCTAAGTCTTACCTGGACAAGGTCAAGACAAAAAAAGCGGACCATGACATTGTTAAGGAT
AGAAGATTTACTGTAGATAAGATGATGTTCCATGTTCCGATTGCCATGAATTTTAAAGCTATAAGTAAAC
CAAATCTTAATAAGAAAGTTATTGATGGCATAATAGATGATCAAGATTTGAAAATCATCGGTATCGATC
GTGGTGAGAGAAATCTTATTTATGTGACCATGGTCGATAGGAAGGGGAATATATTGTATCAAGACAGTC
TTAATATTTTAAATGGATACGATTACCGCAAAGCTTTAGACGTGAGGGAATATGATAACAAAGAAGCTA
GAAGGAATTGGACTAAAGTAGAAGGTATTAGAAAAATGAAAGAAGGTTATTTATCTTTAGCTGTTAGTA
AATTGGCCGATATGATCATCGAAAATAATGCTATAATCGTAATGGAAGATTTGAATCACGGGTTTAAGG
CAGGTCGTTCCAAAATTGAAAAGCAGGTGTATCAAAAATTCGAATCAATGTTAATCAACAAGTTAGGAT
ACATGGTGCTAAAAGACAAGTCCATTGACCAGTCTGGTGGAGCCCTTCATGGTTACCAATTAGCCAATC
ATGTTACGACCTTAGCTAGCGTGGGTAAACAATGTGGAGTAATTTTTTACATACCTGCAGCTTTTACTTC
GAAGATTGATCCCACCACGGGCTTTGCTGATTTATTCGCTCTCTCTAATGTGAAGAATGTCGCTTCTATG
AGAGAGTTCTTCTCCAAAATGAAGTCAGTAATATATGACAAGGCGGAAGGCAAATTCGCCTTTACATTT
GATTATTTGGATTATAACGTTAAAAGCGAATGTGGACGTACCTTATGGACTGTGTATACAGTTGGTGAAC
GCTTCACCTACTCTAGAGTAAACCGAGAGTATGTTCGGAAAGTCCCAACAGATATCATCTATGATGCATT
ACAAAAAGCTGGTATTAGCGTCGAAGGTGACCTTAGAGATAGAATCGCGGAAAGCGACGGTGACACAT
TAAAGTCTATATTCTACGCTTTTAAATACGCGTTGGATATGAGAGTCGAAAACAGAGAGGAAGACTATA
TACAGTCACCTGTGAAGAATGCTTCTGGTGAGTTCTTTTGTTCAAAAAACGCCGGAAAGTCTTTGCCGCA
GGATTCAGATGCAAATGGTGCCTATAATATAGCTCTGAAAGGGATCCTACAACTCAGAATGTTGAGCGA
ACAATACGATCCAAATGCAGAATCGATTAGATTGCCACTTATAACTAACAAGGCATGGTTAACTTTTATG
CAATCCGGTATGAAAACTTGGAAGAATTAA
SEQ ATGGATTCTCTTAAGGATTTCACTAATTTATATCCAGTCTCGAAAACATTGCGGTTCGAATTGAAACCAG
ID TTGGGAAAACTCTAGAAAACATTGAAAAAGCCGGTATATTGAAAGAAGATGAACACAGAGCGGAATCC
NO: TACCGCCGGGTAAAAAAGATAATTGACACATACCATAAAGTGTTTATTGACAGCTCCTTAGAGAACATG
132 GCTAAAATGGGGATAGAAAATGAAATCAAGGCTATGCTGCAGTCTTTTTGTGAACTCTATAAGAAAGAC
CACAGGACAGAAGGAGAAGATAAAGCTCTTGATAAAATTAGAGCTGTTCTTAGAGGTTTAATCGTTGGG
GCTTTCACTGGTGTATGTGGAAGACGAGAAAACACAGTACAAAATGAAAAGTACGAGAGTTTGTTCAAA
GAAAAATTGATAAAGGAAATTTTGCCAGATTTCGTGTTGTCCACCGAGGCTGAGTCTCTTCCATTCAGCG
TTGAAGAAGCAACAAGGAGCTTAAAAGAGTTTGACTCATTCACTTCTTATTTTGCTGGTTTTTACGAAAA
TAGAAAGAATATTTATTCCACGAAACCGCAAAGTACTGCGATAGCCTACAGATTAATTCATGAAAACTT
GCCTAAATTTATAGATAATATTTTGGTCTTCCAGAAGATTAAAGAACCAATCGCTAAAGAACTTGAGCA
CATAAGAGCAGATTTTAGCGCAGGCGGATATATCAAAAAAGATGAACGGCTAGAAGACATATTCTCATT
AAATTACTACATTCATGTCCTTTCTCAAGCTGGTATAGAAAAATATAATGCTTTAATCGGGAAGATAGTG
ACGGAAGGTGATGGTGAAATGAAAGGTCTTAATGAACATATTAACTTATATAACCAACAGAGGGGTCGA
GAGGATAGGTTGCCCTTGTTTAGGCCTCTATACAAGCAAATCCTGTCCGATAGAGAGCAATTGTCTTATT
TACCTGAATCATTTGAAAAAGATGAAGAGCTGCTTAGAGCACTTAAGGAATTTTACGATCACATCGCCG
AAGACATCTTGGGTAGAACACAGCAATTGATGACTTCAATTTCTGAATACGACTTGTCCCGTATTTATGT
CAGAAATGATTCTCAACTTACAGACATCTCGAAGAAAATGCTAGGAGATTGGAACGCCATTTATATGGC
TAGAGAACGAGCCTACGACCACGAACAGGCTCCTAAACGTATTACTGCTAAATACGAACGTGATAGAAT
CAAGGCCTTAAAAGGTGAAGAGTCAATTTCATTGGCGAATCTGAACAGCTGTATAGCTTTCTTGGACAA
TGTAAGGGATTGTCGAGTTGACACATACCTATCAACTTTGGGGCAGAAAGAGGGTCCTCATGGCTTAAG
TAACTTGGTGGAAAACGTCTTCGCCTCATATCATGAAGCAGAACAGTTATTGTCGTTTCCTTACCCCGAA
GAGAACAACCTTATTCAGGACAAAGACAATGTAGTTTTGATCAAAAACCTATTGGATAATATAAGTGAT
TTACAACGTTTCCTTAAACCTTTGTGGGGAATGGGCGATGAACCTGACAAAGACGAAAGGTTTTACGGT
GAATACAACTATATTAGAGGAGCGCTTGACCAGGTAATACCTTTGTACAATAAAGTAAGGAACTACTTG
ACTCGTAAACCATATTCTACTAGAAAAGTTAAATTGAACTTTGGTAATTCACAGCTGCTGAGTGGTTGGG
ATCGTAATAAAGAAAAAGATAACTCCTGTGTTATCTTGCGAAAAGGACAAAACTTTTACTTGGCAATTA
TGAACAACCGTCACAAAAGGTCCTTCGAGAACAAAGTTCTGCCTGAATACAAAGAAGGTGAACCATATT
TTGAAAAAATGGACTATAAATTCCTGCCAGATCCTAATAAAATGTTGCCTAAGGTCTTCTTGTCTAAAAA
AGGTATAGAAATATATAAACCATCCCCGAAGTTGCTGGAGCAATATGGTCATGGAACGCACAAAAAAG
GTGACACTTTTAGTATGGATGACTTGCACGAGTTGATTGATTTTTTTAAACATTCCATTGAAGCGCACGA
AGATTGGAAACAATTTGGTTTCAAGTTCTCTGACACAGCCACTTACGAAAATGTATCGTCCTTTTATAGA
GAAGTGGAAGATCAGGGTTATAAACTGTCATTCCGTAAGGTTAGTGAAAGCTATGTGTACTCGTTGATC
GATCAAGGGAAGCTTTATCTTTTTCAAATCTATAATAAAGATTTCTCTCCTTGTTCAAAGGGCACACCTA
ATCTTCATACACTATACTGGAGAATGCTTTTCGATGAAAGAAATTTGGCTGATGTGATCTATAAATTAGA
CGGTAAAGCTGAGATTTTTTTCAGAGAGAAATCCCTGAAAAACGACCATCCAACTCATCCGGCAGGTAA
ACCGATTAAAAAGAAATCCCGGCAAAAAAAGGGCGAAGAGAGTTTATTCGAGTATGATTTAGTTAAGG
ACAGACATTATACAATGGACAAATTTCAATTTCATGTGCCCATTACTATGAACTTTAAGTGTAGTGCAGG
GTCTAAGGTTAATGATATGGTAAACGCACATATTAGAGAAGCTAAAGATATGCACGTCATCGGTATTGA
TCGCGGAGAAAGAAATTTACTTTACATTTGCGTTATCGATTCTAGGGGCACCATCTTGGATCAAATCTCT
TTGAACACTATAAATGATATTGACTATCATGATCTACTAGAGAGTCGGGATAAAGACAGGCAACAAGAA
AGAAGAAATTGGCAAACAATTGAAGGTATTAAAGAATTAAAGCAAGGCTATCTAAGCCAGGCTGTACA
CAGAATTGCCGAATTAATGGTAGCATATAAAGCTGTCGTAGCTCTAGAAGACTTGAACATGGGTTTCAA
AAGAGGGCGCCAGAAGGTCGAAAGTAGTGTTTATCAACAATTTGAAAAACAGTTAATAGATAAGTTGA
ATTATCTAGTGGATAAAAAAAAGCGTCCTGAGGACATTGGCGGTTTATTAAGAGCCTACCAATTCACTG
CGCCATTTAAATCGTTCAAAGAAATGGGTAAACAAAACGGTTTTCTATTCTACATCCCCGCATGGAATAC
CTCAAATATAGATCCAACTACCGGTTTCGTCAACTTATTTCATGCTCAATATGAGAATGTGGACAAAGCA
AAATCATTCTTTCAAAAATTTGATAGCATTAGCTACAATCCTAAAAAAGATTGGTTTGAATTTGCGTTCG
ATTATAAAAATTTCACCAAGAAGGCTGAAGGTTCCAGATCTATGTGGATATTGTGCACCCACGGAAGTA
GAATTAAGAACTTCCGTAATTCACAGAAAAACGGCCAGTGGGACAGCGAAGAATTCGCCCTAACCGAA
GCTTTCAAAAGTCTTTTCGTAAGATACGAGATAGACTATACAGCTGATCTAAAGACAGCTATTGTGGATG
AGAAGCAAAAAGACTTCTTTGTCGACCTTCTTAAGTTGTTCAAGTTAACTGTGCAGATGAGAAATAGTTG
GAAGGAAAAAGACCTAGATTACTTGATTAGCCCAGTCGCTGGTGCAGATGGCAGATTTTTTGATACACG
TGAAGGCAATAAATCACTACCAAAAGACGCGGACGCTAATGGCGCATACAACATCGCATTGAAGGGTTT
GTGGGCTCTCAGGCAGATTAGGCAGACAAGTGAGGGTGGTAAGCTTAAGCTGGCGATTTCTAATAAGGA
ATGGTTACAGTTTGTTCAAGAAAGATCCTACGAAAAAGATTAA
SEQ ATGAACAATGGTACTAATAATTTTCAAAACTTCATAGGGATTTCTAGCCTTCAAAAGACATTGAGAAAT
ID GCTTTAATTCCAACAGAAACGACTCAACAATTCATAGTGAAAAATGGTATTATAAAAGAAGACGAGTTG
NO: CGTGGCGAGAATAGACAAATTTTGAAAGATATCATGGATGACTACTACAGAGGGTTCATCTCCGAAACA
133 TTGTCTTCTATTGACGACATTGACTGGACCAGCTTATTCGAAAAAATGGAAATACAGCTGAAGAACGGA
GATAACAAGGACACTCTTATAAAGGAGCAAACGGAATATAGAAAGGCTATACACAAAAAGTTTGCTAA
TGACGATAGATTTAAAAACATGTTTAGTGCGAAGTTAATTTCTGATATTCTACCCGAGTTTGTCATTCAT
AATAATAACTACTCTGCATCTGAAAAAGAGGAGAAGACCCAGGTTATAAAGTTGTTTTCAAGATTTGCC
ACATCATTTAAAGACTACTTCAAGAACAGGGCGAATTGCTTCTCTGCTGATGATATTAGCTCTTCCAGCT
GTCATAGAATTGTTAACGATAATGCCGAAATTTTTTTTAGTAATGCCTTGGTATATAGACGCATAGTCAA
GTCACTAAGCAATGATGATATAAACAAGATTAGTGGTGATATGAAAGATAGCCTTAAAGAAATGAGCCT
TGAAGAGATATATTCATATGAGAAGTACGGTGAATTTATAACTCAAGAAGGAATTTCTTTTTATAACGAT
ATTTGTGGTAAGGTTAATTCTTTTATGAATTTGTATTGCCAGAAGAACAAGGAAAATAAGAATCTATATA
AACTACAAAAGTTGCATAAACAGATTTTGTGTATAGCTGATACATCCTACGAAGTTCCGTATAAATTTGA
ATCTGATGAGGAAGTTTATCAATCGGTAAACGGTTTTCTTGACAACATTTCCAGCAAACATATCGTTGAG
AGACTACGTAAAATTGGAGACAACTATAATGGTTACAATCTAGATAAAATATACATAGTGTCCAAGTTT
TATGAGTCTGTCTCTCAAAAGACATATCGTGATTGGGAGACCATTAATACTGCACTTGAAATTCATTATA
ACAACATATTGCCTGGTAACGGGAAGAGTAAAGCTGATAAGGTTAAAAAGGCCGTCAAAAACGACTTG
CAAAAGTCTATTACCGAGATAAATGAATTAGTGTCAAACTACAAACTATGCTCAGATGATAATATTAAA
GCGGAAACATACATCCACGAAATTTCCCACATACTGAATAACTTTGAAGCTCAGGAGCTTAAATATAAC
CCGGAAATACACTTGGTTGAGAGCGAGTTAAAAGCATCTGAGTTGAAAAATGTATTAGACGTCATCATG
AATGCGTTTCATTGGTGTTCAGTTTTCATGACTGAAGAATTAGTCGACAAAGATAACAATTTTTATGCCG
AATTAGAGGAAATATATGATGAAATTTATCCCGTAATTAGTTTATACAATCTAGTTAGAAATTATGTTAC
ACAAAAGCCGTATAGTACCAAGAAAATAAAGCTTAATTTCGGAATACCTACGCTTGCTGATGGTTGGTC
AAAAAGTAAAGAATATAGCAATAATGCAATAATTTTAATGAGAGATAACCTATATTATTTGGGTATTTTT
AACGCTAAGAACAAACCAGACAAGAAAATAATTGAAGGTAATACATCTGAAAACAAGGGCGACTATAA
AAAGATGATATACAATTTGCTCCCAGGTCCTAATAAAATGATTCCTAAGGTTTTCCTGAGTAGCAAGACT
GGCGTTGAAACTTACAAGCCTAGTGCGTATATCCTGGAGGGTTATAAACAGAACAAGCATATCAAATCC
TCTAAGGACTTCGATATCACCTTTTGCCATGACTTAATCGATTATTTTAAAAATTGTATCGCAATTCATCC
AGAATGGAAAAATTTCGGATTTGATTTTAGTGATACCAGCACTTACGAGGATATCTCTGGGTTCTACAGA
GAAGTGGAGTTGCAGGGCTACAAAATCGATTGGACTTACATATCTGAAAAGGACATAGATTTGCTGCAG
GAGAAAGGTCAGCTATATTTGTTTCAAATCTACAACAAAGACTTTTCTAAAAAGTCTACCGGTAATGAC
AATCTGCACACAATGTACTTGAAGAACTTATTCTCCGAGGAGAACTTAAAGGACATTGTACTCAAGTTG
AATGGAGAAGCCGAGATTTTTTTTAGAAAGAGCAGTATAAAGAATCCTATAATCCACAAGAAGGGCTCA
ATTCTCGTGAATAGGACGTATGAGGCAGAAGAAAAGGACCAATTTGGGAATATACAAATTGTAAGAAA
AAACATCCCAGAAAATATCTACCAGGAATTATATAAGTATTTTAATGACAAATCTGATAAGGAACTGTC
TGACGAAGCCGCTAAGCTCAAGAATGTTGTGGGCCACCATGAAGCTGCTACTAATATAGTGAAGGACTA
CAGATATACCTACGATAAATATTTCCTGCATATGCCAATTACTATAAACTTCAAAGCAAATAAAACAGG
TTTTATAAATGATAGAATCCTGCAGTATATTGCTAAAGAAAAGGATTTACATGTAATTGGGATTGATAGA
GGTGAACGCAATCTGATCTATGTCAGCGTAATAGATACTTGTGGTAATATTGTGGAACAAAAGTCCTTTA
ATATTGTGAACGGATATGATTACCAAATCAAGTTGAAACAACAAGAGGGAGCACGCCAAATTGCCCGTA
AGGAATGGAAAGAGATAGGTAAGATCAAGGAAATTAAGGAAGGTTATCTTTCATTAGTTATTCACGAAA
TTTCGAAGATGGTAATCAAATACAACGCAATAATTGCTATGGAGGACCTGTCATATGGATTTAAGAAAG
GTAGATTCAAGGTTGAGAGACAGGTATACCAGAAATTTGAAACTATGTTGATCAACAAATTAAATTACT
TAGTCTTTAAGGACATATCAATAACGGAAAACGGCGGGCTTTTAAAAGGGTATCAACTTACATACATAC
CTGATAAGTTGAAAAATGTGGGTCATCAGTGTGGGTGCATCTTTTATGTTCCAGCCGCTTACACATCAAA
AATCGATCCTACTACTGGGTTCGTAAACATATTTAAATTTAAAGATCTAACCGTTGATGCAAAAAGAGA
GTTTATCAAGAAATTTGATAGCATTAGGTACGATTCAGAAAAAAATCTATTCTGTTTTACTTTTGACTAC
AACAACTTTATAACGCAGAATACAGTGATGTCAAAATCGTCCTGGTCAGTGTATACTTATGGTGTTAGAA
TTAAGAGACGTTTCGTAAACGGTCGTTTTTCTAACGAGTCCGATACAATCGACATCACTAAAGATATGGA
AAAAACTTTGGAAATGACAGATATAAACTGGAGAGATGGTCACGACCTTAGACAAGATATAATCGATTA
TGAAATCGTACAGCATATTTTTGAAATTTTTCGCTTAACAGTTCAGATGCGTAACTCTCTTAGTGAGCTA
GAAGATAGAGATTATGATAGACTTATCTCGCCTGTTCTTAACGAAAATAATATCTTCTATGACTCGGCAA
AAGCCGGTGATGCACTTCCAAAAGATGCTGATGCAAATGGCGCGTACTGCATCGCATTGAAGGGGCTCT
ACGAGATTAAACAAATCACCGAAAACTGGAAAGAAGATGGTAAATTTTCTAGGGATAAGTTGAAAATC
AGTAATAAAGATTGGTTCGATTTTATACAAAATAAGCGATACTTATAG
SEQ ATGACCAATAAGTTTACTAATCAATACTCATTGTCTAAAACGTTAAGATTCGAGTTAATTCCCCAGGGAA
ID AGACACTAGAATTTATTCAAGAAAAAGGTCTTCTCTCTCAGGATAAACAAAGAGCAGAATCATACCAGG
NO: AGATGAAAAAAACCATAGATAAATTTCATAAGTACTTCATCGACTTGGCACTATCGAACGCCAAGCTAA
134 CACATTTGGAAACCTACCTGGAGTTGTATAATAAATCGGCAGAGACGAAAAAGGAACAAAAATTCAAG
GATGACCTGAAGAAGGTTCAAGATAATCTGCGAAAGGAAATAGTGAAGTCGTTTAGTGATGGTGATGCA
AAGTCAATCTTTGCTATTTTAGACAAGAAGGAATTAATAACCGTGGAACTTGAAAAGTGGTTTGAAAAT
AACGAACAGAAAGATATTTACTTCGACGAAAAATTTAAAACGTTTACTACGTACTTTACAGGGTTCCATC
AGAACCGCAAAAACATGTACTCCGTTGAACCAAACTCTACTGCAATCGCCTACAGATTAATACACGAAA
ATTTGCCTAAGTTTTTAGAAAATGCAAAGGCTTTTGAAAAGATAAAGCAAGTCGAATCGTTACAGGTAA
ACTTTCGCGAATTAATGGGCGAATTTGGAGATGAAGGTCTTATTTTTGTCAATGAATTAGAGGAAATGTT
TCAAATTAATTATTATAACGATGTCTTGAGTCAGAACGGCATTACTATCTACAACTCAATTATCAGTGGT
TTCACTAAGAATGATATAAAATATAAAGGTTTGAATGAATACATTAATAATTATAATCAAACTAAAGAT
AAGAAGGACAGGCTTCCGAAATTGAAGCAATTGTACAAGCAGATTCTAAGTGATAGGATTAGTTTGTCT
TTCTTGCCAGACGCATTTACTGATGGCAAGCAAGTCTTAAAGGCTATATTCGATTTCTACAAGATTAACC
TACTTTCGTACACAATTGAAGGTCAAGAAGAATCTCAAAATCTGCTGCTTTTGATTAGGCAAACTATAGA
AAATTTGTCGTCCTTTGACACTCAAAAAATTTACCTGAAGAATGATACACACCTGACTACAATATCACAG
CAGGTCTTTGGGGATTTTTCTGTCTTCTCCACGGCCCTAAACTATTGGTATGAGACAAAAGTTAATCCAA
AATTTGAAACAGAATATAGTAAGGCGAATGAAAAAAAGAGAGAAATTTTGGATAAAGCGAAGGCAGTA
TTCACAAAACAAGACTATTTTTCTATCGCATTTCTCCAAGAAGTCTTATCCGAATATATTTTGACACTCGA
TCACACCTCTGATATAGTTAAGAAACATTCGTCCAACTGCATCGCAGATTACTTCAAGAATCACTTCGTG
GCTAAGAAAGAAAACGAAACGGATAAAACTTTTGACTTCATTGCTAACATAACCGCTAAATACCAATGT
ATTCAGGGCATATTAGAAAATGCAGACCAGTACGAAGACGAGTTAAAACAGGACCAAAAGTTAATAGA
TAATCTAAAGTTTTTCTTAGATGCTATACTTGAGTTATTACATTTTATAAAGCCATTGCATCTAAAATCGG
AAAGTATTACTGAAAAAGACACTGCGTTCTATGATGTGTTCGAAAATTATTATGAGGCTTTATCTTTATT
GACCCCCCTTTACAACATGGTCCGCAATTATGTTACTCAGAAGCCTTACTCTACTGAAAAGATCAAATTA
AACTTTGAAAATGCTCAGTTGCTGAATGGTTGGGATGCCAATAAGGAAGGTGACTACCTGACGACTATT
CTAAAAAAAGACGGTAATTATTTCTTAGCAATCATGGATAAAAAACATAACAAGGCATTTCAAAAATTT
CCAGAAGGAAAAGAAAACTATGAAAAGATGGTTTATAAATTGTTGCCTGGAGTTAATAAAATGTTGCCA
AAAGTTTTTTTTAGCAATAAGAACATAGCTTACTTTAATCCATCTAAGGAACTGCTCGAGAACTACAAGA
AGGAAACACATAAAAAAGGTGATACATTTAATTTGGAACATTGCCATACTCTGATTGATTTTTTTAAGGA
CTCTCTTAATAAACATGAAGACTGGAAATATTTTGATTTTCAATTTTCGGAAACTAAATCATACCAAGAT
CTAAGTGGATTTTACAGAGAAGTTGAACACCAAGGTTATAAGATTAACTTCAAGAATATAGATTCTGAA
TACATTGATGGTCTTGTAAACGAGGGTAAACTATTCCTGTTCCAAATCTACTCTAAGGACTTCTCACCTTT
TTCCAAAGGAAAACCTAATATGCATACGTTGTACTGGAAGGCTCTATTTGAAGAACAAAATTTGCAAAA
TGTAATCTACAAACTGAACGGCCAAGCTGAAATATTCTTCAGAAAAGCCTCAATTAAGCCAAAAAACAT
TATTCTTCATAAAAAGAAGATCAAGATTGCGAAGAAACATTTTATTGATAAGAAGACCAAGACTTCCGA
AATTGTACCAGTACAAACAATCAAGAATCTCAATATGTATTATCAAGGCAAGATAAGTGAGAAAGAGTT
AACCCAGGATGATTTACGTTATATAGACAATTTCTCTATATTCAACGAGAAGAACAAAACAATAGACAT
TATCAAAGATAAAAGGTTTACTGTTGACAAATTTCAATTTCATGTGCCTATCACAATGAACTTTAAGGCC
ACAGGTGGTTCGTACATTAATCAAACTGTTTTAGAATATCTGCAAAATAACCCAGAGGTCAAGATCATC
GGTCTTGATAGGGGTGAGAGACATCTGGTGTATCTAACACTCATTGATCAACAAGGCAACATCTTGAAG
CAAGAATCATTGAACACTATCACAGACTCCAAGATCTCGACTCCATATCACAAACTCCTTGACAATAAA
GAAAACGAAAGGGATCTTGCCAGAAAAAATTGGGGTACAGTTGAAAATATTAAGGAACTAAAAGAAGG
TTACATTTCGCAAGTAGTTCACAAGATTGCAACACTCATGTTGGAAGAAAACGCAATCGTTGTCATGGA
AGATTTAAATTTCGGATTTAAGAGAGGAAGATTTAAAGTAGAAAAGCAAATCTACCAGAAGTTGGAGA
AGATGTTAATTGACAAATTGAACTACTTAGTGCTGAAAGACAAACAGCCTCAAGAATTGGGCGGTCTAT
ACAACGCTTTACAACTGACAAATAAATTTGAGTCATTCCAAAAGATGGGTAAGCAGAGTGGTTTTTTGTT
TTATGTTCCGGCATGGAACACATCCAAAATCGATCCAACTACAGGCTTCGTGAATTATTTCTACACTAAA
TATGAAAATGTGGATAAAGCAAAAGCTTTCTTTGAGAAGTTCGAGGCGATCCGTTTTAACGCTGAAAAG
AAGTACTTCGAGTTCGAGGTCAAAAAGTATTCAGATTTTAACCCCAAGGCTGAAGGCACCCAGCAAGCA
TGGACTATTTGCACGTACGGTGAGCGAATCGAAACTAAAAGGCAAAAGGATCAAAATAATAAGTTTGTA
AGCACACCCATTAACTTGACAGAAAAGATAGAAGATTTTCTTGGAAAAAACCAAATTGTATATGGTGAC
GGTAACTGTATCAAGTCACAAATTGCTTCTAAAGACGATAAGGCCTTCTTCGAAACTCTGCTATACTGGT
TTAAAATGACGTTGCAAATGAGAAACAGTGAAACTAGAACTGATATCGACTATTTAATATCACCCGTGA
TGAACGATAATGGTACCTTTTACAATTCAAGAGATTACGAGAAATTGGAGAACCCCACACTACCAAAAG
ACGCAGACGCTAATGGTGCCTACCATATTGCTAAAAAGGGACTGATGTTGTTGAACAAGATAGATCAAG
CCGACTTAACTAAAAAAGTTGATTTGTCAATTTCGAATAGAGATTGGTTGCAATTCGTCCAGAAAAATA
AGTAA
SEQ ATGGAACAGGAATACTACTTGGGTTTGGATATGGGAACTGGTTCAGTCGGTTGGGCTGTTACGGACTCC
ID GAGTACCACGTGTTGAGAAAACACGGAAAGGCTTTATGGGGTGTCAGACTATTCGAATCAGCATCGACC
NO: GCGGAAGAGAGAAGAATGTTTAGAACTTCAAGAAGAAGGCTGGATCGTAGGAATTGGCGGATAGAAAT
135 TTTACAAGAAATATTCGCCGAAGAAATCTCTAAAAAAGATCCAGGATTTTTTCTACGTATGAAGGAATC
CAAATACTATCCGGAAGATAAACGTGATATTAATGGCAATTGTCCAGAGTTACCCTATGCTTTATTTGTG
GACGACGATTTCACCGATAAAGATTACCATAAGAAGTTCCCAACAATTTACCATCTGAGAAAGATGTTA
ATGAACACTGAAGAAACCCCGGATATAAGACTGGTCTATCTAGCCATTCATCATATGATGAAACACAGG
GGACACTTCTTGCTATCAGGGGATATAAATGAAATTAAAGAATTTGGTACAACATTTTCTAAATTATTGG
AAAATATTAAAAACGAAGAATTAGATTGGAATTTAGAATTAGGCAAGGAGGAATACGCAGTTGTCGAA
TCGATTCTGAAAGATAACATGTTGAACAGATCAACGAAAAAAACAAGGCTGATCAAGGCTTTAAAAGC
GAAATCAATATGCGAAAAAGCAGTATTGAATTTGTTAGCTGGGGGGACTGTCAAGTTGTCTGATATTTTC
GGATTGGAAGAATTGAATGAAACAGAGAGACCGAAGATATCCTTCGCCGATAATGGCTACGATGATTAT
ATAGGCGAAGTCGAAAATGAGCTGGGCGAACAATTCTACATTATCGAGACTGCCAAGGCTGTTTATGAT
TGGGCGGTGTTAGTCGAAATCCTTGGCAAATACACTTCCATCTCCGAAGCTAAGGTGGCAACCTACGAA
AAGCATAAAAGTGATTTGCAATTCCTTAAGAAAATTGTCCGAAAGTACTTGACCAAAGAAGAGTACAAG
GATATTTTCGTATCAACATCGGACAAACTGAAGAATTATTCAGCTTATATTGGCATGACGAAAATTAATG
GTAAGAAAGTTGATTTGCAATCCAAGAGATGTTCTAAAGAAGAATTTTACGATTTCATTAAAAAAAATG
TCCTAAAAAAGTTGGAGGGACAACCTGAATATGAGTATTTAAAGGAAGAACTGGAAAGAGAAACTTTC
CTACCAAAGCAAGTTAATCGTGATAATGGCGTTATTCCATACCAAATACACTTGTACGAATTAAAGAAG
ATCTTGGGTAACTTGAGGGACAAAATTGATTTAATCAAGGAAAATGAAGACAAACTGGTACAATTATTT
GAATTTAGAATACCTTACTACGTGGGCCCTTTAAACAAAATAGACGATGGTAAGGAAGGGAAGTTCACA
TGGGCAGTCAGAAAGTCCAATGAAAAAATTTACCCATGGAATTTCGAAAACGTTGTAGATATTGAAGCT
TCTGCTGAGAAATTTATTAGGAGAATGACAAATAAATGCACTTATCTTATGGGGGAAGACGTGTTGCCT
AAAGATAGTTTATTATATTCAAAGTATATGGTCTTAAATGAATTAAACAATGTTAAATTAGATGGTGAAA
AACTTTCCGTCGAATTGAAACAAAGATTGTATACAGATGTATTCTGCAAATATAGAAAAGTAACTGTAA
AGAAGATTAAAAACTACCTTAAATGTGAAGGCATTATCAGCGGAAATGTTGAGATCACTGGTATCGATG
GTGATTTTAAGGCATCTTTAACCGCATATCACGACTTTAAGGAAATATTGACGGGTACTGAGCTTGCTAA
AAAAGACAAAGAGAACATTATCACCAATATCGTGCTCTTCGGAGACGACAAGAAATTATTGAAAAAGA
GATTGAACCGCCTATACCCTCAGATTACCCCTAACCAATTGAAGAAAATCTGCGCTCTGTCTTATACTGG
ATGGGGTCGTTTTAGCAAGAAGTTTCTAGAAGAAATTACTGCTCCGGATCCTGAAACTGGGGAAGTCTG
GAATATAATTACCGCGCTATGGGAATCGAATAATAATTTAATGCAATTACTATCTAATGAATACAGATTT
ATGGAAGAAGTCGAAACTTACAATATGGGAAAACAAACAAAAACTTTGAGCTACGAAACAGTAGAGAA
TATGTATGTCTCACCATCTGTAAAGCGGCAGATCTGGCAAACCTTGAAGATAGTTAAAGAATTAGAAAA
AGTGATGAAGGAAAGTCCAAAAAGGGTTTTTATTGAAATGGCCCGAGAAAAACAAGAATCTAAAAGGA
CGGAAAGTAGGAAAAAGCAACTTATAGATCTATATAAAGCCTGCAAAAATGAAGAAAAAGATTGGGTA
AAGGAATTAGGTGACCAGGAAGAGCAAAAATTGAGATCTGACAAGCTGTACTTGTATTATACGCAAAA
GGGCCGGTGTATGTATTCGGGTGAGGTAATAGAATTGAAAGATTTATGGGATAACACTAAGTATGACAT
TGACCATATTTACCCCCAGTCTAAGACAATGGACGATTCATTAAATAACCGAGTTCTTGTCAAAAAGAA
GTACAATGCCACAAAGAGCGATAAGTACCCATTGAACGAAAATATAAGACATGAACGAAAAGGTTTCT
GGAAATCATTGTTGGACGGTGGATTTATTTCCAAAGAAAAATACGAGAGATTGATTAGAAACACTGAAC
TATCTCCAGAGGAGTTAGCTGGCTTTATCGAAAGACAAATTGTTGAAACTAGACAGTCTACAAAAGCAG
TTGCAGAAATCTTAAAACAAGTATTTCCAGAATCCGAAATTGTGTACGTCAAAGCCGGAACAGTAAGTA
GATTTAGAAAAGACTTTGAATTATTGAAAGTACGAGAGGTTAACGACCTACATCATGCTAAGGATGCTT
ATTTAAATATAGTCGTTGGTAATTCGTATTACGTGAAATTCACAAAAAACGCATCTTGGTTCATCAAGGA
GAATCCTGGTAGGACATACAACTTGAAAAAGATGTTTACATCAGGATGGAATATCGAAAGAAATGGTGA
GGTTGCGTGGGAGGTAGGCAAGAAGGGAACCATTGTTACTGTAAAGCAAATTATGAATAAAAACAATA
TACTTGTTACGAGACAGGTGCACGAAGCCAAAGGAGGGTTGTTTGACCAGCAAATCATGAAGAAAGGT
AAAGGTCAGATAGCAATAAAAGAGACTGATGAGCGTTTAGCTAGTATAGAAAAATATGGGGGCTACAA
TAAGGCAGCTGGTGCTTACTTCATGTTGGTCGAATCAAAGGATAAAAAAGGGAAGACGATCCGGACCAT
AGAGTTTATCCCTCTGTACTTGAAGAATAAGATTGAGTCTGACGAAAGCATCGCATTGAATTTCTTGGAA
AAGGGGCGCGGTCTAAAGGAGCCAAAAATATTGTTAAAGAAAATTAAAATAGACACCCTATTCGACGTC
GATGGGTTTAAGATGTGGCTTAGTGGTCGTACTGGGGACAGATTATTATTCAAGTGTGCCAATCAGTTAA
TCCTTGACGAGAAAATCATTGTTACAATGAAAAAAATTGTTAAGTTTATTCAAAGGCGACAAGAAAATA
GAGAACTAAAGTTGAGTGATAAGGATGGAATCGATAATGAAGTGTTAATGGAGATTTATAACACTTTTG
TCGACAAATTGGAGAATACGGTGTACAGAATTAGGCTATCTGAACAGGCTAAAACCCTAATTGATAAAC
AGAAGGAGTTTGAGCGACTTTCTCTTGAAGACAAATCTTCAACTCTTTTCGAGATCCTACATATCTTTCA
GTGTCAATCTTCTGCAGCTAATTTGAAAATGATTGGAGGTCCTGGTAAGGCTGGTATATTAGTCATGAAC
AACAACATATCTAAGTGTAATAAGATTAGTATAATTAACCAATCACCGACAGGTATCTTTGAAAATGAA
ATTGATTTACTTAAA
SEQ ATGAAATCATTCGACTCGTTCACCAACTTGTACTCCCTGTCTAAAACATTGAAATTTGAAATGCGACCTG
ID TTGGTAACACCCAAAAGATGTTAGATAATGCAGGAGTTTTCGAAAAGGATAAACTGATCCAGAAAAAAT
NO: ACGGTAAAACGAAACCATATTTCGATAGGTTGCATCGGGAATTTATAGAAGAAGCTTTGACTGGTGTAG
136 AATTAATTGGCTTAGATGAGAATTTCCGTACTCTAGTCGATTGGCAAAAAGATAAAAAGAACAATGTTG
CCATGAAGGCATACGAAAATAGTCTACAAAGACTAAGAACAGAGATCGGGAAAATTTTCAATTTGAAG
GCAGAAGACTGGGTGAAGAACAAATATCCAATATTGGGTCTTAAGAATAAGAATACTGATATATTGTTC
GAGGAGGCCGTTTTCGGTATTCTTAAGGCAAGATATGGTGAAGAGAAAGACACGTTTATTGAAGTTGAG
GAGATTGATAAAACCGGTAAGTCCAAAATCAACCAGATCTCTATCTTCGACAGTTGGAAGGGCTTCACT
GGTTATTTTAAGAAGTTCTTCGAAACTAGGAAGAACTTCTATAAAAACGATGGTACTTCCACGGCTATTG
CTACAAGAATTATCGACCAAAACCTTAAGCGTTTTATTGATAACCTATCAATTGTTGAAAGTGTTCGACA
GAAAGTAGATTTGGCTGAAACTGAAAAATCTTTTAGTATCTCCTTATCCCAGTTTTTCTCTATAGATTTTT
ATAATAAATGTTTGCTGCAAGATGGCATTGACTACTATAATAAAATAATTGGTGGAGAGACATTGAAAA
ACGGAGAGAAGCTGATTGGCCTTAATGAGTTGATAAATCAATATAGACAAAATAATAAGGACCAGAAA
ATCCCTTTCTTTAAATTGCTAGACAAACAGATTTTGTCTGAAAAGATCCTATTCTTGGATGAAATAAAGA
ACGATACTGAATTGATTGAAGCTTTGTCCCAGTTTGCTAAAACAGCTGAAGAAAAGACAAAGATTGTGA
AAAAATTGTTTGCTGATTTCGTAGAAAACAATTCTAAATATGATCTAGCCCAGATTTATATAAGTCAAGA
AGCTTTCAATACAATAAGTAATAAGTGGACAAGTGAAACAGAAACTTTTGCTAAGTATTTATTCGAAGC
CATGAAGTCTGGTAAACTTGCCAAATACGAAAAAAAAGATAACAGTTATAAATTTCCAGACTTTATAGC
CCTTTCACAGATGAAGTCTGCCTTATTGTCGATATCCTTAGAAGGTCATTTTTGGAAGGAAAAATATTAT
AAGATAAGCAAGTTCCAAGAAAAGACTAATTGGGAACAATTTTTGGCTATATTTCTATATGAGTTCAATT
CATTATTTTCCGATAAAATCAACACTAAGGATGGAGAGACTAAGCAAGTTGGCTACTATTTGTTCGCAA
AAGATCTGCACAATTTGATTCTATCAGAACAAATAGATATACCAAAAGATTCAAAGGTAACTATAAAGG
ATTTCGCAGATTCCGTCCTCACCATTTATCAAATGGCTAAATATTTTGCCGTTGAAAAAAAGAGAGCGTG
GTTAGCAGAATACGAGTTGGACTCGTTTTATACTCAGCCAGATACTGGATACTTGCAATTCTACGATAAT
GCATACGAAGACATTGTACAGGTATACAATAAACTTAGAAATTACTTAACCAAGAAGCCCTACAGTGAA
GAAAAATGGAAGCTGAACTTTGAAAATTCGACTTTGGCAAATGGTTGGGATAAAAATAAAGAAAGTGA
CAACTCCGCAGTGATTTTGCAAAAGGGTGGGAAATATTACTTGGGTTTAATCACAAAAGGCCACAATAA
GATTTTTGATGATAGATTTCAAGAAAAATTCATAGTTGGTATAGAAGGTGGCAAATACGAGAAAATTGT
CTATAAATTCTTCCCTGATCAAGCCAAAATGTTCCCAAAAGTTTGCTTTTCTGCTAAAGGATTGGAGTTTT
TCCGGCCTAGCGAGGAGATCCTTCGTATCTACAACAATGCTGAATTCAAAAAAGGAGAAACCTATAGCA
TAGATTCTATGCAAAAACTGATAGATTTTTATAAGGATTGTTTAACAAAGTACGAAGGCTGGGCCTGCTA
TACATTTAGACATTTAAAGCCCACAGAAGAATACCAAAATAACATTGGTGAATTCTTTCGGGACGTTGC
CGAAGACGGCTATAGGATCGATTTTCAAGGTATCTCAGATCAATATATCCACGAAAAGAACGAGAAGGG
TGAGCTGCACCTTTTCGAAATTCATAATAAGGACTGGAATTTGGATAAGGCGAGAGATGGTAAATCGAA
GACCACTCAAAAGAACTTGCATACTTTATATTTTGAGTCCTTGTTTTCTAATGATAACGTCGTCCAAAATT
TTCCAATAAAGTTGAATGGACAAGCGGAAATTTTCTATCGGCCTAAGACAGAGAAAGACAAATTAGAAT
CAAAGAAAGATAAAAAGGGAAATAAAGTCATTGATCACAAACGATACTCTGAGAATAAAATATTTTTCC
ACGTACCATTGACACTCAACAGGACTAAGAATGACTCTTATAGATTTAATGCTCAGATTAATAATTTTTT
GGCAAATAACAAGGATATTAACATAATTGGGGTGGATAGAGGTGAAAAGCACTTGGTATATTACTCTGT
CATCACTCAGGCTTCTGATATATTGGAAAGCGGGTCTCTAAATGAATTGAACGGTGTTAACTACGCCGA
AAAGCTAGGTAAAAAAGCTGAAAACAGAGAGCAGGCTCGGCGCGATTGGCAAGATGTTCAAGGAATTA
AAGACCTTAAAAAAGGCTACATTAGTCAAGTAGTTAGAAAGTTAGCCGATCTTGCTATTAAACATAACG
CAATCATTATTCTGGAGGACCTAAATATGCGTTTTAAGCAAGTTAGGGGTGGCATAGAAAAAAGTATTT
ATCAGCAGCTTGAGAAGGCTTTGATAGATAAGTTATCGTTCCTAGTTGACAAAGGTGAAAAAAATCCTG
AACAAGCTGGTCATCTGTTGAAAGCTTATCAGCTGAGCGCACCTTTTGAAACATTTCAAAAAATGGGAA
AACAAACAGGTATTATTTTCTATACTCAAGCGAGTTATACAAGTAAATCTGACCCAGTGACAGGATGGA
GACCACACCTTTATCTAAAATATTTTTCTGCTAAAAAGGCCAAAGATGACATCGCTAAGTTTACAAAAAT
AGAATTTGTCAACGATAGATTTGAATTGACTTACGATATTAAAGATTTTCAGCAAGCAAAAGAATACCC
AAATAAGACAGTGTGGAAAGTATGCTCCAATGTGGAGAGATTTAGATGGGATAAAAATCTCAATCAAA
ACAAGGGTGGTTACACACATTATACTAATATAACTGAAAATATTCAAGAATTGTTTACTAAGTACGGAA
TTGACATAACCAAAGACTTACTAACTCAGATTTCAACTATTGACGAAAAACAAAATACCTCATTTTTCCG
CGACTTTATTTTTTATTTCAACTTGATCTGTCAAATTCGTAACACGGATGATTCCGAAATTGCCAAGAAG
AACGGAAAAGATGATTTCATCCTATCTCCAGTGGAACCATTTTTTGACTCAAGAAAAGATAATGGTAAT
AAGTTGCCTGAGAACGGAGATGATAACGGCGCTTATAATATCGCTCGGAAGGGTATTGTAATTCTTAAT
AAAATATCTCAGTACTCTGAAAAGAACGAAAACTGCGAGAAAATGAAGTGGGGCGACTTGTATGTATCT
AATATAGATTGGGATAATTTCGTTACTCAAGCCAACGCGAGACATTGA
SEQ ATGGAAAATTTTAAAAACCTATATCCAATTAATAAGACACTTAGATTCGAGCTTAGGCCATACGGCAAA
ID ACACTAGAAAATTTTAAGAAGTCAGGCCTATTAGAAAAAGACGCCTTTAAGGCAAATTCCAGAAGATCA
NO: ATGCAGGCAATTATTGATGAGAAATTTAAAGAGACTATCGAGGAAAGGTTGAAATACACTGAATTCTCT
137 GAGTGCGATCTGGGAAACATGACTTCCAAGGATAAAAAGATTACCGATAAGGCTGCTACCAACCTCAAA
AAGCAAGTCATCTTATCGTTTGATGATGAAATTTTTAATAACTACTTAAAGCCGGACAAAAACATTGACG
CCCTATTCAAAAATGATCCGTCCAACCCCGTAATTTCAACTTTTAAGGGTTTTACCACGTACTTTGTAAAT
TTTTTTGAGATTCGTAAACATATCTTCAAAGGAGAATCGTCGGGTTCCATGGCCTATAGGATAATTGATG
AAAATCTTACGACTTACTTAAACAATATCGAAAAGATAAAAAAGTTACCAGAAGAATTAAAGTCTCAAT
TGGAAGGTATTGACCAAATAGACAAATTAAATAACTATAATGAGTTCATAACTCAAAGCGGTATCACAC
ATTACAATGAAATTATCGGTGGTATATCTAAAAGTGAGAACGTAAAAATACAGGGAATAAACGAGGGG
ATCAATCTATACTGTCAGAAGAATAAAGTAAAATTACCAAGACTAACGCCATTATACAAAATGATTCTG
TCTGATAGAGTTTCCAACTCGTTCGTGCTTGATACTATAGAAAATGATACTGAATTAATTGAGATGATTA
GCGACTTGATTAATAAAACAGAAATATCTCAAGACGTAATAATGTCAGACATTCAGAACATTTTCATAA
AATATAAACAGCTTGGTAATTTACCGGGGATAAGTTACTCTAGCATCGTGAATGCTATTTGCTCCGATTA
TGACAATAATTTTGGTGACGGAAAAAGAAAAAAATCATATGAGAACGATAGGAAGAAACACCTTGAAA
CAAACGTATACTCAATTAACTATATATCGGAACTGTTAACAGACACCGATGTATCATCTAATATAAAAAT
GAGATATAAGGAACTTGAACAAAATTACCAGGTGTGTAAGGAGAATTTCAATGCTACCAACTGGATGAA
CATTAAGAATATTAAACAGAGTGAAAAGACAAACTTGATTAAAGATCTACTAGATATACTGAAATCAAT
ACAGAGATTCTACGATCTGTTTGATATAGTTGATGAAGACAAAAATCCTAGTGCTGAGTTTTACACGTGG
CTAAGTAAAAATGCGGAAAAGTTAGATTTCGAGTTCAACTCTGTTTATAATAAATCTAGGAATTATTTAA
CTAGAAAGCAGTATTCTGATAAAAAGATAAAATTGAACTTCGACTCCCCTACGTTGGCAAAGGGTTGGG
ATGCAAACAAAGAAATCGATAACTCCACCATAATAATGCGTAAGTTTAACAATGATAGGGGGGATTACG
ATTATTTTTTGGGAATTTGGAACAAATCTACCCCAGCGAATGAAAAAATTATTCCCCTTGAAGACAATGG
TCTTTTTGAAAAAATGCAGTATAAATTATATCCAGACCCATCCAAGATGCTTCCAAAGCAATTTCTGTCA
AAAATTTGGAAGGCTAAACACCCTACTACTCCTGAATTTGATAAGAAGTATAAGGAGGGCCGACACAAA
AAGGGTCCAGATTTTGAAAAAGAATTCCTGCATGAATTGATAGATTGTTTTAAGCATGGTTTGGTAAATC
ATGATGAAAAATATCAGGATGTCTTTGGATTCAATTTGAGAAATACAGAGGATTACAACTCATATACAG
AATTTCTCGAGGACGTCGAACGTTGCAATTATAATCTCAGTTTCAACAAGATCGCAGACACTTCAAACTT
AATTAACGACGGAAAATTGTACGTTTTTCAAATCTGGTCGAAAGACTTTAGTATTGATTCAAAGGGTACA
AAAAACCTAAATACAATATATTTCGAAAGTCTATTCTCGGAAGAAAACATGATCGAAAAAATGTTCAAA
CTGTCAGGCGAAGCTGAAATATTCTACCGTCCCGCAAGCCTTAATTATTGTGAGGATATCATTAAAAAA
GGACATCACCATGCAGAGTTAAAAGATAAATTCGATTACCCAATAATTAAAGATAAAAGATACTCCCAG
GATAAGTTCTTTTTCCATGTACCTATGGTTATTAACTACAAGTCGGAAAAACTAAACTCGAAGTCATTAA
ATAATAGAACTAACGAGAACTTGGGACAATTCACACATATAATTGGTATTGATCGTGGCGAAAGACATT
TAATATATCTGACTGTTGTTGATGTTTCAACAGGAGAAATTGTTGAACAGAAACATCTTGATGAAATTAT
AAACACAGATACAAAAGGCGTTGAGCATAAAACTCATTATCTAAATAAATTGGAGGAAAAGTCGAAGA
CTCGCGATAACGAGAGAAAGAGTTGGGAAGCAATTGAAACCATAAAAGAGCTTAAAGAAGGTTACATT
AGTCACGTCATCAATGAAATACAAAAGTTACAAGAAAAGTATAACGCTTTGATTGTAATGGAAAATCTA
AATTATGGTTTTAAGAATTCAAGAATCAAAGTCGAAAAGCAGGTCTATCAGAAATTTGAAACGGCACTT
ATTAAAAAGTTTAACTACATTATTGATAAAAAGGACCCAGAAACTTATATTCATGGTTACCAACTGACG
AACCCAATCACAACATTGGACAAAATTGGAAACCAAAGTGGAATTGTTTTATACATTCCAGCTTGGAAT
ACATCCAAAATAGACCCTGTCACGGGGTTTGTCAACTTGTTATATGCCGACGATTTAAAGTATAAAAACC
AAGAACAAGCAAAGTCTTTTATTCAAAAGATTGATAATATTTATTTCGAAAACGGTGAATTTAAATTCGA
CATAGATTTTTCTAAATGGAACAACCGTTATTCAATAAGTAAAACTAAATGGACACTCACCTCATACGGC
ACTCGTATCCAAACCTTTCGGAATCCCCAAAAAAATAACAAATGGGATTCTGCAGAATACGACTTGACC
GAGGAATTTAAATTAATTCTTAATATAGACGGTACACTCAAAAGTCAAGACGTGGAGACATACAAGAAG
TTTATGTCGTTATTCAAGCTTATGCTTCAGTTGAGGAACTCCGTTACAGGCACTGATATTGATTACATGAT
TTCACCAGTAACGGATAAGACTGGGACTCATTTCGATTCTAGGGAAAATATTAAAAATTTACCTGCTGAC
GCAGACGCAAACGGCGCATACAATATAGCAAGAAAAGGGATTATGGCCATTGAGAATATTATGAATGG
CATATCAGATCCATTAAAGATAAGCAATGAAGACTACTTAAAATACATTCAGAATCAGCAAGAATAA
SEQ ATGACCCAGTTTGAAGGTTTCACCAATTTGTACCAAGTAAGTAAAACCTTGAGGTTCGAATTGATCCCAC
ID AGGGCAAGACATTGAAGCATATTCAAGAGCAAGGATTTATAGAAGAAGATAAAGCGAGAAACGATCAC
NO: TATAAAGAGTTAAAACCCATTATTGACAGGATCTATAAAACATACGCCGATCAATGCCTTCAATTAGTG
138 CAATTAGATTGGGAAAACTTGAGCGCTGCCATCGATTCCTACAGGAAGGAAAAAACAGAAGAAACAAG
AAATGCCTTAATCGAGGAACAAGCAACCTATAGAAACGCTATACACGATTACTTCATCGGTAGAACTGA
TAATCTAACAGATGCAATAAATAAGAGACATGCTGAGATATATAAAGGACTATTTAAAGCAGAATTATT
CAACGGAAAGGTGTTGAAACAGTTAGGTACCGTTACAACTACTGAGCATGAAAATGCCTTGCTGAGAAG
CTTTGACAAGTTTACTACCTACTTTTCGGGTTTCTACGAAAATCGCAAAAATGTATTTTCTGCGGAAGAT
ATTTCAACTGCAATCCCTCATAGGATTGTTCAAGATAATTTCCCTAAGTTTAAAGAGAACTGTCACATTT
TTACAAGGTTAATTACTGCGGTTCCAAGTCTAAGAGAACATTTTGAGAATGTAAAAAAAGCGATTGGTA
TATTTGTATCCACTAGCATTGAAGAGGTTTTCAGCTTCCCTTTTTATAACCAATTACTTACCCAAACACAG
ATCGACCTGTACAACCAATTGTTAGGTGGTATATCGAGGGAGGCTGGTACGGAAAAGATTAAAGGATTA
AATGAAGTTCTTAATTTGGCCATACAAAAAAATGATGAAACCGCGCACATTATCGCATCTTTACCACATA
GGTTTATACCGTTATTCAAGCAAATATTATCTGATCGTAATACCTTATCGTTCATATTAGAGGAGTTTAA
ATCTGACGAAGAAGTTATACAATCTTTTTGCAAGTATAAGACGCTATTGAGAAACGAAAACGTTCTGGA
AACAGCCGAAGCACTGTTCAATGAATTAAACAGTATCGACTTGACTCATATTTTTATATCGCATAAAAAG
TTGGAGACAATTTCTTCAGCATTGTGCGATCACTGGGACACTTTAAGGAACGCACTATATGAACGTAGG
ATCTCAGAATTGACAGGTAAGATAACGAAGTCTGCTAAAGAGAAAGTGCAGAGATCCCTAAAACACGA
GGATATAAATTTGCAGGAGATAATTTCAGCTGCAGGTAAAGAGTTGTCTGAAGCGTTCAAGCAAAAGAC
TTCCGAAATCTTGTCACACGCACACGCCGCATTAGATCAACCTTTACCCACTACTTTGAAAAAACAAGAA
GAGAAGGAGATATTAAAATCACAACTTGATTCTTTACTTGGCCTTTATCATCTTTTAGATTGGTTCGCTGT
TGACGAGAGCAATGAAGTGGATCCAGAGTTTTCCGCAAGATTGACCGGTATAAAGTTGGAAATGGAACC
TTCGTTATCATTTTACAACAAAGCTAGGAACTATGCTACAAAAAAACCTTATTCTGTCGAAAAATTTAAA
CTGAACTTCCAAATGCCTACTCTAGCAAGTGGCTGGGATGTTAATAAAGAAAAGAACAATGGCGCTATT
TTGTTTGTAAAAAATGGCCTATACTATCTTGGAATTATGCCTAAACAAAAAGGTCGCTACAAGGCTTTGT
CATTTGAACCTACTGAAAAGACTAGCGAAGGTTTCGATAAGATGTATTACGATTATTTCCCGGATGCCGC
TAAAATGATCCCCAAGTGCTCTACTCAATTGAAGGCAGTAACTGCTCATTTCCAAACGCATACCACGCCA
ATACTGCTTTCTAACAACTTTATAGAACCACTAGAAATAACGAAAGAAATTTACGACCTAAATAACCCA
GAGAAAGAACCAAAAAAGTTCCAGACGGCCTACGCCAAAAAGACAGGGGACCAAAAAGGTTACCGCG
AGGCGTTATGTAAATGGATTGATTTTACTAGGGACTTTTTATCAAAATACACTAAAACGACGTCTATTGA
TCTTAGCTCCTTACGCCCGTCCTCCCAATACAAGGATCTAGGTGAGTATTACGCAGAGTTGAACCCGCTA
TTATACCATATTTCCTTCCAAAGGATTGCTGAAAAGGAAATTATGGACGCTGTTGAAACTGGGAAATTGT
ACCTGTTTCAGATTTATAATAAGGACTTCGCAAAGGGTCACCATGGTAAGCCTAACCTTCACACTTTGTA
CTGGACCGGACTATTCTCGCCTGAAAATTTGGCTAAAACAAGTATCAAGTTAAACGGTCAGGCCGAGTT
ATTTTATAGACCCAAATCTAGAATGAAAAGAATGGCCCATAGATTAGGCGAAAAGATGTTAAACAAGA
AATTAAAGGACCAAAAAACCCCGATACCAGACACTCTATACCAAGAACTGTACGACTATGTGAATCACA
GGCTTAGTCACGATTTATCAGATGAAGCGAGGGCTTTATTGCCAAATGTCATCACCAAGGAAGTATCAC
ATGAAATAATTAAGGATAGAAGGTTCACATCTGATAAATTCTTTTTTCATGTCCCAATTACATTGAATTA
TCAAGCAGCGAACTCACCATCTAAATTTAATCAGCGCGTCAACGCCTATTTGAAAGAACATCCCGAAAC
ACCAATCATCGGCATAGATCGAGGTGAGAGAAACTTAATATATATAACTGTGATTGATTCTACAGGAAA
AATCCTGGAGCAACGATCTTTAAATACCATACAACAGTTTGATTATCAAAAAAAGTTGGATAACAGAGA
AAAAGAACGTGTTGCCGCTAGGCAGGCTTGGTCTGTGGTAGGAACAATTAAGGACTTAAAGCAGGGCTA
TCTGTCCCAAGTTATTCATGAAATAGTCGATCTGATGATACATTATCAGGCAGTTGTCGTGTTGGAAAAT
TTGAATTTTGGCTTTAAATCAAAAAGAACTGGCATAGCAGAAAAAGCTGTGTACCAGCAGTTTGAAAAG
ATGTTAATCGATAAGCTAAACTGCCTTGTTCTTAAAGATTACCCCGCAGAAAAAGTAGGTGGTGTTCTTA
ATCCATATCAGTTGACAGACCAATTTACATCCTTTGCGAAAATGGGTACGCAAAGCGGGTTCTTATTCTA
CGTACCGGCCCCCTATACTTCTAAGATCGACCCACTAACAGGTTTTGTGGACCCTTTTGTTTGGAAGACG
ATAAAGAACCACGAGTCACGCAAACATTTCTTAGAGGGCTTTGATTTCTTGCACTACGACGTGAAAACT
GGTGATTTTATCTTACACTTTAAAATGAACAGAAATCTCTCTTTCCAACGTGGACTGCCCGGATTCATGC
CGGCTTGGGACATCGTTTTTGAAAAGAATGAAACGCAGTTTGACGCCAAAGGTACACCATTTATAGCGG
GTAAGAGAATTGTGCCGGTCATAGAAAACCATAGATTTACAGGTAGATATAGGGATCTGTACCCTGCTA
ATGAATTGATTGCATTACTCGAAGAGAAAGGAATTGTGTTTCGAGATGGATCGAATATTTTACCTAAGTT
GTTGGAAAATGATGATTCACACGCAATTGATACTATGGTTGCCCTCATAAGATCGGTATTGCAAATGAG
AAACTCAAATGCTGCTACGGGAGAGGATTATATAAACAGCCCCGTTCGCGATCTTAATGGTGTTTGTTTT
GATTCACGTTTTCAGAACCCCGAATGGCCAATGGATGCCGACGCAAACGGAGCATATCATATTGCTCTT
AAAGGCCAACTACTATTAAATCACTTAAAGGAATCCAAAGACCTAAAATTGCAAAACGGGATATCTAAT
CAGGATTGGCTGGCTTACATACAAGAACTACGTAACTAG
SEQ ATGGCCGTTAAGTCAATCAAAGTGAAACTTAGACTGGATGACATGCCAGAGATTCGTGCGGGGTTATGG
ID AAACTTCATAAGGAAGTTAACGCAGGGGTAAGATATTATACCGAATGGTTATCATTACTTCGACAAGAG
NO: AATTTGTACAGAAGGTCCCCGAACGGCGACGGTGAGCAAGAATGCGATAAGACGGCTGAAGAATGTAA
139 GGCAGAACTTTTGGAGCGCCTGAGAGCCCGTCAGGTTGAAAATGGCCATAGAGGTCCTGCGGGATCTGA
TGATGAGCTTTTACAGCTAGCTAGACAATTGTATGAATTGTTGGTCCCTCAGGCTATTGGGGCTAAAGGA
GACGCTCAACAAATCGCCAGAAAGTTCTTGTCACCTCTGGCTGACAAAGATGCCGTGGGAGGATTAGGT
ATCGCTAAAGCAGGTAATAAACCAAGATGGGTTAGAATGAGAGAAGCAGGCGAACCTGGTTGGGAAGA
AGAGAAAGAAAAGGCCGAAACTAGAAAAAGCGCTGACAGAACCGCAGATGTTTTACGGGCCTTGGCTG
ATTTTGGACTGAAGCCTTTGATGAGAGTGTATACTGATTCAGAAATGTCTTCCGTTGAATGGAAGCCCCT
AAGGAAGGGACAAGCGGTCAGAACCTGGGATAGGGATATGTTTCAACAGGCTATTGAAAGGATGATGT
CATGGGAATCCTGGAATCAAAGAGTAGGTCAAGAATACGCTAAACTGGTCGAACAAAAGAATAGATTT
GAACAAAAAAATTTTGTAGGTCAAGAACATTTAGTACATTTGGTTAATCAACTTCAACAAGATATGAAA
GAGGCATCTCCTGGTTTGGAATCAAAAGAACAAACAGCACACTATGTTACCGGCCGAGCTTTGCGAGGT
TCTGACAAAGTATTTGAAAAGTGGGGGAAATTAGCTCCCGATGCCCCCTTTGATCTATATGATGCTGAAA
TTAAAAACGTTCAAAGAAGGAACACTAGACGTTTTGGATCCCATGATCTTTTTGCAAAGCTAGCTGAGC
CAGAATACCAGGCTCTATGGCGTGAAGACGCCTCGTTTTTGACTAGATACGCAGTATACAATTCAATACT
CAGAAAACTAAACCATGCCAAGATGTTTGCTACATTCACCCTGCCCGATGCTACCGCTCATCCTATTTGG
ACTAGATTTGACAAGTTGGGGGGGAATCTACATCAGTACACATTTTTATTTAATGAATTCGGTGAAAGA
AGACACGCTATTAGATTCCACAAGCTCCTAAAGGTTGAAAACGGCGTTGCGAGAGAAGTTGATGATGTA
ACAGTTCCCATTTCTATGTCGGAGCAATTGGATAATCTATTGCCTAGAGACCCTAATGAACCAATTGCTT
TGTACTTTCGTGACTACGGTGCAGAACAACACTTTACAGGTGAATTCGGCGGAGCCAAGATTCAATGTA
GACGTGATCAACTCGCACACATGCATAGAAGAAGAGGCGCTCGTGATGTTTATTTAAATGTGTCTGTTA
GAGTTCAATCCCAATCGGAGGCTAGAGGTGAAAGAAGGCCACCATACGCAGCAGTTTTTAGGTTAGTAG
GTGATAATCATAGGGCATTTGTCCACTTCGACAAATTAAGTGATTATTTAGCAGAGCACCCTGATGATGG
AAAGTTGGGCAGTGAGGGATTATTAAGTGGGTTGAGGGTAATGTCTGTAGATCTTGGTCTTCGTACTTCT
GCGAGTATCTCTGTCTTTAGAGTAGCACGTAAGGATGAGTTGAAACCTAATAGCAAAGGAAGAGTCCCG
TTTTTTTTTCCTATTAAGGGTAACGATAACCTGGTGGCCGTGCATGAAAGATCACAACTTTTGAAATTGC
CAGGAGAAACGGAGTCCAAGGACTTGAGGGCAATTAGAGAGGAACGTCAGCGTACATTGCGACAGCTG
AGAACTCAATTGGCTTATTTGAGGTTGTTGGTTAGGTGTGGTTCCGAGGATGTTGGCAGAAGAGAAAGG
TCTTGGGCCAAATTGATAGAACAACCAGTGGACGCCGCAAATCACATGACACCAGATTGGAGAGAAGCT
TTCGAAAATGAACTCCAGAAATTAAAGAGCCTACATGGCATATGCTCTGATAAAGAGTGGATGGATGCC
GTATACGAATCCGTTCGTAGAGTCTGGCGCCACATGGGTAAGCAAGTACGGGACTGGAGAAAGGATGTT
CGTTCCGGCGAAAGACCGAAGATAAGGGGGTATGCAAAGGACGTTGTAGGCGGTAATTCTATTGAACA
GATTGAGTATTTGGAAAGGCAGTACAAATTTCTTAAATCCTGGAGCTTCTTCGGCAAAGTGTCAGGACA
AGTCATCAGGGCTGAAAAAGGTTCCAGATTTGCTATTACGCTAAGGGAACATATTGATCATGCGAAAGA
AGATAGACTGAAAAAACTAGCAGATAGAATAATTATGGAAGCACTTGGTTACGTCTATGCACTTGATGA
AAGAGGCAAGGGGAAATGGGTAGCTAAATACCCGCCTTGTCAACTTATTTTATTAGAAGAATTAAGCGA
GTACCAATTTAACAACGATAGACCTCCATCCGAAAATAATCAGCTGATGCAATGGTCCCATAGGGGTGT
TTTTCAAGAATTGATAAATCAAGCTCAAGTACACGATTTGCTGGTAGGTACTATGTACGCAGCGTTTTCG
AGCCGTTTTGATGCAAGAACTGGTGCCCCAGGTATCAGATGTCGACGTGTTCCGGCCAGATGTACACAG
GAACATAACCCTGAGCCATTTCCGTGGTGGCTTAATAAGTTTGTTGTCGAGCACACATTAGACGCATGCC
CTCTGAGAGCAGATGACCTTATACCCACTGGAGAAGGCGAAATATTTGTTAGTCCATTCTCTGCAGAAG
AAGGTGACTTTCACCAGATACATGCAGACTTAAATGCAGCACAGAATCTCCAACAAAGGTTGTGGTCGG
ATTTTGATATTTCGCAAATAAGACTAAGATGCGATTGGGGAGAGGTTGATGGAGAATTGGTGCTGATTC
CAAGATTAACCGGAAAGCGAACTGCCGATTCCTATTCTAACAAGGTGTTTTACACAAATACTGGTGTTAC
CTATTACGAAAGAGAAAGGGGTAAGAAGAGACGTAAAGTATTTGCTCAAGAAAAATTGTCAGAAGAGG
AGGCAGAACTGTTAGTAGAAGCAGACGAAGCCAGAGAAAAATCAGTTGTGCTTATGCGTGACCCTTCCG
GCATTATAAATCGTGGTAATTGGACACGACAAAAAGAATTTTGGTCTATGGTCAATCAACGTATCGAAG
GCTACCTAGTTAAGCAAATCAGGTCTAGGGTTCCACTACAAGATAGCGCATGTGAAAATACGGGTGATA
TATAA
SEQ ATGGCTACTAGATCTTTCATTTTAAAAATTGAACCTAATGAAGAAGTGAAGAAGGGTCTCTGGAAAACT
ID CACGAAGTACTTAATCATGGCATTGCCTATTATATGAATATCCTGAAGCTTATTCGTCAAGAAGCTATAT
NO: ACGAGCATCATGAGCAAGATCCTAAGAACCCTAAGAAAGTAAGCAAAGCGGAAATTCAGGCTGAATTG
140 TGGGACTTCGTCTTGAAGATGCAGAAGTGTAACAGTTTTACGCACGAAGTTGATAAAGATGTGGTGTTT
AATATTTTGAGGGAGCTATATGAGGAGTTGGTGCCCTCGAGTGTCGAAAAAAAAGGAGAAGCTAATCAG
CTGTCAAATAAATTTTTATATCCTCTGGTGGATCCAAACTCTCAATCAGGTAAAGGCACTGCCAGTAGTG
GTCGAAAACCGAGATGGTATAATTTGAAAATCGCAGGTGATCCATCGTGGGAAGAAGAAAAAAAAAAA
TGGGAAGAAGATAAAAAAAAAGATCCCCTTGCCAAAATACTAGGTAAGCTAGCCGAGTATGGACTTAT
ACCATTATTCATTCCTTTCACGGACTCTAATGAACCAATTGTGAAGGAAATCAAATGGATGGAAAAATC
ACGTAATCAGTCTGTTAGGAGGTTGGACAAAGATATGTTTATACAGGCTCTTGAGAGGTTTTTGTCGTGG
GAGTCCTGGAATTTGAAAGTGAAAGAAGAATATGAAAAAGTGGAAAAGGAGCATAAGACGTTGGAAGA
AAGGATTAAGGAAGATATTCAGGCCTTTAAGAGTCTGGAACAGTACGAAAAAGAAAGACAGGAACAGT
TATTGAGAGATACTCTAAACACTAATGAATATAGGCTTTCCAAGAGGGGCTTGCGAGGATGGAGAGAGA
TAATTCAGAAATGGTTGAAAATGGATGAGAACGAGCCATCGGAGAAATATCTAGAGGTGTTTAAAGATT
ACCAAAGAAAGCACCCTCGCGAAGCTGGTGATTACTCTGTTTATGAATTCCTTTCGAAGAAGGAAAATC
ACTTCATCTGGCGAAATCATCCAGAGTACCCATATTTATATGCTACATTTTGCGAAATTGACAAGAAAAA
AAAAGATGCTAAACAGCAAGCGACATTCACCCTCGCTGATCCCATCAACCACCCATTATGGGTCAGGTT
CGAAGAGAGATCAGGCTCGAACCTGAATAAGTACAGGATCTTGACTGAGCAATTGCATACTGAGAAGTT
AAAAAAGAAATTGACGGTCCAACTTGACAGATTGATTTATCCCACTGAATCTGGTGGATGGGAGGAGAA
AGGTAAGGTTGATATTGTCCTATTGCCTTCTCGTCAATTTTACAACCAAATATTTCTGGACATCGAAGAG
AAGGGTAAACATGCTTTTACCTATAAGGATGAGAGTATTAAATTTCCATTGAAGGGAACGCTTGGCGGC
GCTAGAGTTCAGTTCGATAGAGATCATTTGAGAAGATACCCGCATAAAGTGGAATCTGGTAATGTAGGT
CGGATCTACTTTAACATGACGGTAAATATTGAACCTACCGAGTCACCAGTCAGTAAGTCTTTAAAGATTC
ATAGGGATGATTTCCCTAAATTTGTCAACTTCAAGCCTAAGGAACTAACCGAGTGGATCAAAGACAGTA
AAGGCAAAAAGTTAAAGAGCGGTATTGAGTCCCTGGAGATAGGTCTTAGAGTCATGTCTATCGATTTGG
GTCAAAGACAAGCAGCCGCAGCATCTATTTTCGAAGTTGTTGACCAAAAACCGGATATCGAGGGGAAAT
TATTTTTTCCAATAAAAGGAACTGAGCTATACGCTGTGCATCGCGCATCCTTCAATATAAAACTGCCAGG
AGAAACACTAGTAAAATCTAGAGAGGTCTTGCGTAAAGCACGTGAGGACAATCTCAAATTAATGAATCA
GAAGTTAAATTTCCTTAGGAACGTGTTGCATTTCCAACAGTTCGAGGACATAACTGAACGCGAGAAAAG
AGTCACTAAGTGGATCTCAAGACAAGAAAATAGTGATGTGCCATTAGTGTATCAAGACGAACTTATTCA
AATAAGAGAGCTAATGTATAAACCATATAAAGACTGGGTGGCATTCTTAAAACAATTACACAAGCGGCT
TGAAGTAGAAATAGGAAAAGAAGTAAAGCATTGGAGGAAGAGTCTGTCCGATGGTCGCAAAGGCCTGT
ACGGGATATCACTTAAAAATATTGATGAAATTGACAGAACACGAAAATTTTTGTTAAGATGGTCATTGA
GACCAACCGAACCAGGTGAGGTTAGAAGGTTGGAACCAGGCCAAAGGTTTGCCATCGATCAATTAAACC
ATCTTAACGCACTGAAAGAAGATAGATTGAAGAAGATGGCGAACACTATTATTATGCACGCTCTAGGTT
ATTGCTATGATGTGAGAAAGAAAAAATGGCAAGCCAAGAACCCTGCATGCCAAATTATTTTGTTTGAAG
ATCTTTCTAATTACAATCCATACGAAGAGCGTTCACGTTTTGAAAACTCTAAATTGATGAAATGGTCTAG
AAGAGAGATTCCGAGACAGGTCGCTCTACAAGGGGAGATTTACGGTCTTCAAGTCGGTGAGGTTGGTGC
TCAATTTTCTTCCAGATTTCATGCAAAAACTGGGTCTCCAGGCATTAGGTGTTCGGTCGTTACTAAGGAA
AAGTTACAGGACAACCGTTTCTTCAAAAATTTGCAACGTGAAGGCCGTTTAACACTTGATAAGATAGCT
GTCCTTAAGGAAGGCGATCTGTACCCAGATAAAGGTGGTGAGAAATTCATATCTTTGAGTAAAGACAGG
AAACTGGTTACAACACACGCCGACATTAACGCAGCTCAGAACTTGCAAAAGAGATTCTGGACAAGGACC
CACGGCTTCTATAAGGTGTACTGTAAAGCTTATCAAGTAGATGGACAAACGGTTTATATTCCTGAATCAA
AGGACCAGAAACAAAAAATTATAGAAGAATTTGGTGAAGGATACTTTATCTTGAAGGATGGAGTTTATG
AGTGGGGCAATGCAGGTAAGTTAAAGATAAAGAAAGGTTCATCAAAGCAATCAAGTAGCGAACTGGTC
GATTCGGATATTTTAAAGGATAGCTTTGATCTAGCTAGTGAATTGAAGGGAGAAAAGTTAATGTTATAC
AGAGATCCCAGTGGGAATGTATTTCCATCTGATAAGTGGATGGCCGCCGGAGTGTTTTTTGGCAAATTAG
AGAGAATCTTGATTTCTAAACTGACCAATCAATACTCAATTTCGACCATCGAAGACGACTCTTCAAAACA
ATCCATGTGA
SEQ ATGCCTACTCGCACCATCAATCTGAAGTTAGTTTTGGGGAAGAACCCAGAAAATGCGACTCTAAGACGG
ID GCACTATTCTCTACACATAGACTTGTCAACCAAGCGACTAAGAGAATTGAAGAATTTTTACTGTTGTGTA
NO: GAGGAGAAGCTTATCGTACCGTAGATAATGAAGGTAAAGAAGCTGAGATCCCACGCCATGCTGTTCAAG
141 AAGAGGCGCTTGCTTTTGCAAAAGCTGCACAACGACATAACGGCTGTATCTCCACATATGAGGACCAGG
AAATCTTGGATGTGCTTAGACAATTGTATGAAAGATTAGTACCTAGCGTCAATGAAAACAACGAGGCTG
GGGATGCCCAAGCCGCTAACGCTTGGGTGAGTCCATTAATGAGTGCAGAGTCCGAAGGTGGACTATCGG
TCTATGATAAAGTGTTAGACCCGCCGCCAGTATGGATGAAACTCAAAGAAGAGAAAGCGCCTGGTTGGG
AAGCTGCTTCTCAGATTTGGATACAGTCCGACGAAGGTCAATCGCTGCTAAATAAACCGGGTAGCCCAC
CACGTTGGATTAGAAAACTTAGATCTGGTCAACCGTGGCAAGATGACTTCGTTTCAGACCAAAAAAAAA
AGCAAGATGAACTAACGAAAGGTAACGCACCACTCATAAAACAATTGAAAGAGATGGGCCTCTTGCCTT
TAGTTAATCCCTTTTTTAGACATTTGTTGGATCCCGAGGGTAAGGGTGTATCCCCATGGGACAGATTGGC
CGTAAGGGCCGCGGTGGCGCACTTCATCTCTTGGGAAAGTTGGAACCACAGAACAAGAGCTGAGTATAA
CAGTTTGAAACTGCGAAGAGATGAATTTGAGGCCGCATCTGATGAATTCAAGGACGATTTTACATTGCT
ACGACAATATGAGGCTAAGCGACATAGTACGCTTAAGTCAATTGCCTTAGCTGATGACTCTAACCCGTA
CCGAATTGGTGTAAGGTCCTTGAGAGCCTGGAATAGGGTTAGAGAAGAATGGATTGACAAAGGCGCAA
CCGAGGAACAAAGGGTTACCATCCTTAGTAAGCTTCAAACACAATTACGGGGTAAATTCGGTGATCCAG
ACCTATTTAATTGGCTAGCCCAAGATAGACACGTACACCTGTGGTCCCCGAGAGATTCCGTCACGCCCCT
CGTAAGGATTAATGCCGTCGACAAAGTGCTTAGAAGACGTAAGCCTTATGCACTGATGACTTTTGCACA
TCCGAGATTCCATCCAAGATGGATTCTATACGAAGCGCCTGGTGGTTCTAACTTGCGACAATACGCTTTA
GATTGTACTGAAAATGCTCTGCATATTACACTTCCATTACTCGTCGACGACGCCCATGGTACATGGATTG
AGAAAAAAATCCGCGTACCACTCGCTCCTAGTGGACAAATACAAGATTTAACTTTAGAAAAACTTGAAA
AGAAAAAAAACAGATTATACTATAGATCAGGATTCCAACAATTTGCTGGATTAGCCGGTGGTGCTGAGG
TGTTGTTTCATAGGCCGTATATGGAACATGATGAGAGATCAGAAGAATCTCTGTTGGAAAGGCCAGGCG
CTGTGTGGTTCAAATTAACCTTAGATGTTGCTACCCAAGCACCACCTAACTGGTTAGATGGTAAAGGCAG
AGTTAGGACACCTCCAGAAGTTCATCATTTCAAAACCGCTCTGTCAAATAAATCTAAACATACGAGAAC
CTTGCAACCAGGATTGAGAGTCCTTTCTGTTGATTTGGGTATGAGAACATTTGCTTCTTGTTCTGTTTTCG
AATTGATCGAAGGTAAACCTGAAACAGGTAGAGCATTCCCTGTTGCTGACGAAAGATCAATGGATAGTC
CAAATAAGTTATGGGCCAAGCACGAGAGAAGCTTTAAACTAACTCTGCCTGGAGAAACACCGAGCAGA
AAGGAGGAAGAAGAGAGAAGCATTGCTAGGGCAGAGATTTACGCGCTGAAAAGAGATATTCAAAGACT
GAAATCACTCCTAAGATTAGGTGAGGAAGATAATGATAATAGAAGAGATGCTTTGTTAGAGCAATTCTT
TAAAGGATGGGGTGAAGAGGACGTAGTTCCTGGTCAAGCTTTCCCTAGAAGCCTCTTTCAGGGATTAGG
CGCTGCACCCTTTAGGTCAACACCCGAATTGTGGAGACAGCACTGTCAGACGTATTACGACAAAGCGGA
AGCTTGCCTGGCAAAGCATATTTCCGACTGGAGGAAGAGAACTAGACCTCGTCCGACTTCGAGAGAGAT
GTGGTATAAGACAAGATCTTACCATGGTGGCAAAAGTATTTGGATGCTAGAATACTTAGATGCTGTCCG
CAAATTACTACTTTCATGGTCGTTAAGAGGTCGTACTTACGGAGCTATTAATAGACAAGACACCGCTCGT
TTTGGTTCCTTAGCTTCTAGATTGTTGCATCATATCAACTCTTTAAAGGAAGACCGCATCAAAACCGGTG
CAGATAGTATTGTGCAGGCCGCAAGGGGCTATATTCCTCTCCCACATGGCAAGGGTTGGGAACAGCGTT
ATGAACCCTGTCAGTTGATATTATTTGAAGATCTAGCTAGGTACAGATTTCGTGTAGACAGACCTCGGAG
AGAGAATTCGCAATTGATGCAGTGGAATCATCGAGCTATAGTAGCAGAAACGACGATGCAAGCTGAACT
ATACGGTCAAATAGTCGAAAATACCGCTGCTGGTTTCTCCTCAAGATTTCATGCTGCAACTGGTGCTCCT
GGTGTCAGATGTCGCTTTTTGTTAGAACGAGATTTCGATAATGACCTACCAAAGCCGTACTTACTGAGAG
AACTAAGTTGGATGTTAGGTAACACAAAGGTTGAATCAGAGGAAGAAAAATTGCGTCTTCTAAGCGAGA
AAATTAGACCAGGTTCATTAGTCCCTTGGGATGGGGGTGAACAATTCGCGACATTACACCCGAAAAGAC
AAACTCTTTGTGTCATTCACGCAGATATGAACGCTGCTCAAAACCTGCAACGCAGATTTTTCGGAAGGTG
TGGGGAAGCCTTTCGCCTTGTGTGTCAGCCACATGGTGATGATGTTTTGAGGCTAGCGTCTACACCAGGT
GCAAGACTTTTGGGTGCATTACAACAACTGGAAAATGGTCAGGGAGCTTTCGAATTAGTTCGTGATATG
GGTAGCACATCACAAATGAATCGTTTCGTCATGAAGTCGTTGGGCAAAAAAAAGATCAAGCCATTACAA
GACAATAACGGGGATGATGAACTAGAAGACGTGCTATCTGTTTTACCTGAAGAAGATGATACCGGACGA
ATTACTGTATTTCGGGACTCTTCGGGTATATTCTTCCCTTGTAACGTTTGGATCCCGGCAAAACAGTTCTG
GCCTGCGGTCCGTGCTATGATTTGGAAGGTTATGGCATCACATTCATTGGGTTAG
SEQ ATGACAAAGTTAAGGCATAGACAGAAGAAGTTAACTCACGATTGGGCGGGGTCTAAAAAGAGAGAAGT
ID TCTAGGGAGCAATGGTAAATTACAGAATCCATTGCTAATGCCCGTCAAAAAAGGTCAGGTGACAGAATT
NO: TCGAAAAGCATTTTCCGCATACGCCCGAGCAACCAAAGGGGAAATGACGGATGGCAGAAAAAATATGT
142 TTACTCACTCATTTGAACCATTCAAGACCAAGCCTTCGTTACATCAGTGCGAACTGGCTGACAAAGCCTA
CCAGAGCTTGCATTCATATTTACCGGGTTCTTTGGCGCATTTTCTTTTATCTGCCCATGCACTTGGTTTTA
GGATTTTTAGCAAATCAGGGGAAGCCACTGCATTCCAAGCGTCCTCAAAGATTGAAGCTTACGAAAGCA
AGTTAGCTAGCGAGCTTGCTTGTGTTGATTTGTCTATTCAGAACTTGACTATTTCAACTTTGTTCAACGCA
TTAACGACTTCCGTAAGAGGTAAAGGTGAGGAGACATCGGCAGATCCACTGATAGCTAGATTTTACACC
TTACTTACCGGTAAACCACTAAGCAGAGACACTCAGGGCCCAGAACGAGATTTAGCCGAGGTGATAAGC
AGAAAAATTGCAAGTTCTTTTGGAACTTGGAAGGAGATGACTGCCAATCCACTTCAATCTCTTCAATTTT
TTGAAGAGGAGTTGCATGCGCTAGATGCAAATGTTAGTTTGTCACCTGCCTTCGATGTTCTGATTAAGAT
GAACGACCTGCAGGGTGACTTGAAGAACAGAACGATAGTTTTTGATCCAGATGCTCCTGTGTTTGAATA
TAATGCTGAGGATCCTGCTGACATCATCATTAAACTGACAGCTAGATATGCGAAAGAAGCAGTGATTAA
AAATCAAAATGTCGGGAATTATGTTAAGAACGCTATTACGACAACTAACGCAAACGGACTAGGTTGGTT
GCTGAACAAAGGCCTTTCCTTATTGCCTGTCTCCACTGATGACGAACTATTGGAGTTTATTGGGGTCGAG
AGATCCCATCCTAGCTGTCATGCGTTGATAGAACTTATCGCTCAGTTAGAAGCACCTGAACTGTTCGAAA
AAAATGTTTTTTCTGATACTCGTTCCGAGGTTCAAGGTATGATAGATTCAGCTGTAAGCAATCATATCGC
CAGGCTGTCAAGCTCTCGTAATTCATTGAGCATGGACTCAGAGGAACTTGAGAGATTGATAAAATCTTTT
CAAATTCATACACCACATTGTTCATTATTTATAGGGGCTCAATCCTTATCTCAACAATTGGAAAGCCTAC
CCGAAGCATTGCAGTCAGGAGTGAACAGTGCTGATATTCTGCTCGGCTCAACCCAATACATGTTGACAA
ATTCTTTGGTCGAGGAGTCAATCGCTACGTATCAGAGAACCTTAAATAGAATTAACTACCTGTCCGGCGT
TGCAGGACAGATTAACGGTGCTATTAAGAGGAAAGCTATTGATGGTGAGAAGATACATTTACCCGCTGC
TTGGTCAGAGTTAATTTCTTTACCCTTTATTGGGCAACCAGTGATTGATGTTGAATCAGATTTAGCCCACT
TAAAGAACCAATACCAGACATTGTCTAACGAATTTGATACGCTGATTTCCGCACTGCAAAAGAATTTCG
ACTTAAATTTTAATAAAGCCTTGCTTAATCGAACACAACATTTCGAGGCTATGTGTAGATCAACAAAAA
AGAATGCCCTTTCTAAGCCTGAGATCGTTAGTTATAGAGATTTGCTAGCCAGGTTGACTTCTTGTCTTTAT
AGGGGCTCTCTAGTCTTGAGGAGGGCGGGTATAGAAGTACTGAAAAAGCACAAGATATTTGAGTCCAAC
TCTGAATTAAGAGAGCACGTTCATGAAAGAAAACACTTCGTATTTGTTTCTCCGCTCGATAGAAAAGCC
AAGAAGCTCCTACGTTTGACTGACTCTAGGCCTGATTTATTGCACGTAATTGATGAAATACTACAACATG
ATAATTTAGAGAACAAGGATAGAGAATCTTTGTGGTTAGTTCGATCTGGTTATTTACTGGCCGGCCTACC
AGACCAACTCTCCTCTTCCTTTATAAATCTTCCAATCATTACTCAAAAAGGCGATCGTCGCTTGATAGAT
CTCATTCAATACGACCAAATTAATAGAGATGCTTTTGTGATGTTGGTAACTTCCGCTTTTAAGTCGAACT
TAAGTGGGCTGCAGTACAGAGCAAACAAACAATCTTTTGTGGTTACGCGCACTTTGTCACCATATTTGGG
ATCTAAATTGGTTTATGTGCCCAAAGATAAAGATTGGCTGGTCCCTTCCCAAATGTTCGAGGGGAGATTT
GCGGACATTTTGCAATCCGATTATATGGTGTGGAAGGACGCTGGAAGATTGTGTGTTATTGACACAGCT
AAGCATTTGTCTAACATTAAAAAATCTGTATTCTCAAGTGAAGAAGTCCTCGCGTTTTTAAGAGAATTGC
CACACCGTACGTTTATCCAAACTGAGGTCAGGGGTTTAGGGGTGAATGTGGACGGTATTGCATTTAATA
ACGGGGATATACCCTCTCTGAAGACGTTTAGCAATTGCGTGCAAGTCAAAGTGAGTCGGACAAACACTA
GTCTGGTCCAAACATTAAATAGATGGTTTGAAGGCGGTAAGGTCTCGCCGCCTAGCATCCAATTTGAGA
GAGCATATTACAAAAAAGATGATCAAATCCACGAGGACGCTGCAAAAAGGAAGATAAGGTTTCAAATG
CCAGCTACAGAGTTGGTACACGCGTCAGACGACGCAGGATGGACCCCCTCCTATTTACTTGGTATCGATC
CCGGTGAATATGGTATGGGTTTGTCATTGGTCTCAATAAATAATGGCGAAGTTTTAGATAGCGGATTTAT
ACACATAAATTCATTGATAAATTTCGCTTCTAAGAAATCAAATCATCAAACCAAAGTTGTTCCGAGGCA
GCAATACAAGTCACCATACGCCAACTATCTAGAACAATCTAAAGATTCTGCAGCAGGAGACATAGCTCA
TATTTTGGATAGACTTATCTACAAGTTGAACGCCCTACCCGTTTTCGAAGCTCTATCTGGCAATAGTCAA
AGCGCAGCGGATCAGGTTTGGACAAAAGTCCTCAGCTTCTACACCTGGGGAGATAATGATGCACAAAAT
TCAATTCGTAAGCAACATTGGTTCGGTGCTTCACACTGGGACATTAAAGGCATGTTGAGGCAACCGCCA
ACAGAAAAAAAGCCCAAACCATACATTGCCTTTCCCGGTTCACAAGTTTCTTCTTATGGTAATTCTCAAA
GGTGTTCATGTTGTGGACGTAACCCAATTGAACAATTGCGCGAAATGGCGAAGGACACATCCATTAAGG
AGTTGAAGATTAGAAATTCAGAAATTCAATTGTTCGACGGTACTATAAAGTTATTTAATCCAGACCCGTC
AACGGTCATAGAAAGAAGAAGACATAATTTAGGGCCATCAAGAATTCCTGTAGCTGATAGAACTTTCAA
AAATATAAGTCCAAGCTCACTAGAATTCAAAGAACTAATAACGATTGTGTCACGGTCTATACGTCATTCC
CCAGAATTTATTGCTAAAAAAAGAGGTATAGGTAGTGAGTACTTTTGTGCTTATAGTGATTGTAATTCCT
CCTTAAATTCAGAAGCAAATGCGGCTGCGAACGTTGCCCAAAAGTTCCAAAAGCAATTGTTTTTCGAATT
ATAG
SEQ ATGAAAAGAATCTTGAACTCTTTAAAGGTTGCCGCCCTGCGTTTGTTATTTAGAGGTAAAGGATCTGAAC
ID TTGTCAAGACTGTTAAATACCCTTTGGTCTCGCCGGTTCAGGGTGCAGTTGAGGAGTTAGCTGAGGCGAT
NO: CCGCCATGATAACCTACATCTGTTTGGTCAAAAAGAAATTGTTGACCTTATGGAAAAGGATGAAGGTAC
143 GCAAGTTTACTCAGTGGTTGATTTCTGGTTAGATACCCTTCGTTTGGGGATGTTTTTCAGTCCATCAGCAA
ACGCATTAAAAATCACGCTGGGTAAGTTTAATTCTGATCAGGTTAGCCCTTTTAGGAAAGTGTTAGAGCA
GTCTCCATTCTTCTTGGCTGGTAGGCTGAAGGTTGAACCGGCAGAACGTATATTATCTGTCGAGATCCGT
AAGATTGGGAAGAGGGAAAACAGAGTTGAGAACTATGCTGCTGACGTAGAAACGTGTTTTATAGGCCA
ATTAAGTTCAGATGAGAAACAGTCAATACAAAAATTAGCTAATGATATCTGGGATAGTAAAGATCATGA
AGAGCAAAGAATGTTAAAGGCAGATTTCTTCGCTATCCCTTTGATTAAGGATCCAAAGGCTGTGACCGA
AGAGGATCCTGAAAATGAAACTGCTGGTAAACAAAAACCCTTGGAGTTGTGTGTCTGCCTTGTCCCAGA
ACTTTACACAAGAGGATTCGGGTCAATAGCCGATTTTTTGGTTCAACGCTTAACTCTTTTAAGGGATAAA
ATGTCTACAGATACTGCAGAAGATTGTTTAGAATATGTCGGGATTGAGGAGGAAAAAGGTAACGGCATG
AACTCATTGTTGGGAACGTTCTTAAAGAATTTGCAAGGCGATGGATTTGAGCAGATTTTCCAATTTATGT
TAGGGAGCTATGTCGGTTGGCAAGGGAAGGAAGATGTTTTAAGAGAGAGATTAGACTTATTGGCTGAAA
AAGTGAAGAGGTTACCGAAACCAAAATTTGCTGGCGAATGGTCTGGTCATAGGATGTTCTTGCATGGCC
AATTGAAGTCTTGGTCTTCAAATTTTTTTAGACTATTTAACGAGACAAGGGAACTTCTAGAGTCTATTAA
GTCAGATATACAGCATGCCACAATGCTAATATCATATGTAGAAGAAAAAGGTGGTTATCATCCTCAATT
ACTTAGTCAATATAGAAAACTTATGGAACAACTACCAGCTTTGCGTACCAAGGTATTGGACCCTGAGAT
TGAAATGACACATATGTCCGAAGCAGTTCGCTCTTATATAATGATACATAAATCTGTTGCGGGTTTTTTA
CCGGATTTATTAGAATCATTAGATAGAGACAAGGATCGTGAGTTTCTGCTTAGTATTTTTCCAAGAATCC
CAAAAATTGATAAAAAAACCAAGGAAATTGTAGCTTGGGAACTGCCGGGAGAACCAGAAGAAGGTTAT
TTATTTACTGCTAATAACTTGTTCAGAAACTTCTTAGAGAATCCGAAACATGTCCCGAGATTTATGGCCG
AAAGGATCCCAGAAGATTGGACTCGATTACGCTCTGCTCCTGTCTGGTTCGATGGAATGGTAAAACAAT
GGCAAAAAGTCGTTAACCAGTTAGTAGAATCACCAGGTGCTTTATATCAATTTAACGAATCCTTCTTGAG
ACAAAGGTTACAGGCCATGTTAACTGTGTATAAGAGGGACTTACAAACTGAAAAATTTCTTAAACTTTT
GGCGGATGTTTGTAGGCCTCTTGTAGATTTTTTTGGTTTGGGTGGAAATGATATTATTTTTAAGAGCTGTC
AAGACCCAAGAAAACAATGGCAAACCGTTATTCCTCTCTCTGTTCCGGCAGATGTCTATACTGCTTGCGA
AGGTTTGGCGATTAGACTAAGGGAGACATTAGGATTCGAATGGAAGAATTTGAAAGGTCACGAGAGAG
AAGATTTCTTAAGATTGCACCAGTTATTGGGCAATTTACTTTTCTGGATTCGTGATGCTAAATTGGTAGT
AAAATTAGAGGATTGGATGAACAACCCATGTGTTCAGGAATATGTAGAAGCCCGGAAAGCTATCGATCT
TCCACTAGAAATATTCGGTTTTGAAGTGCCTATCTTCCTGAATGGCTATCTATTTTCGGAGTTGAGACAA
TTAGAACTTTTGCTTAGGAGAAAAAGTGTGATGACTAGCTACAGTGTAAAGACTACTGGATCTCCTAAT
AGGCTATTTCAGCTAGTTTATTTACCTCTAAACCCTAGTGACCCCGAAAAGAAGAACTCAAATAACTTTC
AAGAACGTTTGGATACCCCAACTGGTTTGTCCCGTCGTTTCCTAGACCTAACCCTTGATGCATTCGCAGG
TAAGTTACTTACCGATCCAGTTACACAAGAATTGAAGACAATGGCAGGTTTTTACGATCATCTTTTTGGA
TTCAAATTGCCATGTAAACTCGCCGCCATGTCGAATCATCCAGGTTCTTCTTCAAAGATGGTTGTGTTAG
CGAAACCCAAAAAAGGTGTTGCTTCTAATATAGGGTTTGAACCGATCCCAGATCCCGCTCATCCCGTATT
TAGGGTTAGATCCAGTTGGCCAGAGTTGAAGTACCTCGAGGGGCTATTGTATTTGCCAGAAGACACACC
TTTGACCATCGAATTAGCAGAGACCTCCGTATCGTGCCAAAGTGTCTCGTCAGTTGCATTCGATTTGAAA
AACTTGACAACGATCTTAGGTCGTGTGGGAGAATTTAGGGTCACAGCTGATCAACCCTTTAAACTAACG
CCTATAATCCCGGAGAAAGAAGAATCTTTTATTGGTAAAACTTATTTGGGTCTCGACGCGGGTGAAAGG
AGCGGCGTCGGTTTCGCTATTGTTACAGTGGACGGAGATGGGTACGAAGTGCAAAGATTGGGGGTCCAC
GAGGATACACAGCTTATGGCCTTGCAGCAAGTTGCTAGTAAATCCTTAAAAGAGCCAGTATTTCAGCCT
CTAAGAAAAGGCACCTTTAGACAACAAGAAAGAATACGGAAATCCTTACGTGGTTGCTACTGGAATTTT
TATCATGCCTTGATGATAAAATATAGGGCCAAAGTAGTACATGAGGAATCTGTCGGAAGTAGTGGTCTT
GTGGGTCAATGGTTGAGGGCTTTTCAGAAGGATTTGAAGAAAGCCGATGTTCTCCCCAAGAAGGGCGGT
AAAAACGGTGTAGATAAGAAGAAGAGAGAGTCCTCAGCTCAAGACACTCTTTGGGGTGGTGCTTTCTCT
AAAAAGGAGGAGCAACAGATTGCGTTTGAGGTGCAAGCTGCAGGTTCTTCGCAATTTTGTTTGAAGTGC
GGATGGTGGTTCCAACTAGGCATGCGTGAAGTAAACAGGGTACAAGAATCGGGCGTCGTGTTAGATTGG
AATAGAAGCATAGTTACCTTTTTAATAGAATCATCCGGCGAAAAAGTTTATGGTTTCTCCCCACAGCAAT
TAGAGAAGGGTTTCAGACCAGACATCGAAACTTTTAAAAAGATGGTAAGAGACTTTATGAGACCTCCTA
TGTTTGATAGAAAAGGCAGACCGGCCGCAGCTTACGAGAGATTTGTTTTAGGAAGGAGACATCGAAGGT
ACAGGTTTGATAAAGTATTTGAGGAAAGATTTGGGAGGTCTGCTCTTTTCATTTGTCCTAGAGTAGGTTG
TGGAAATTTTGACCACAGCTCCGAACAGTCCGCGGTTGTTTTGGCCTTGATCGGATATATTGCCGATAAG
GAGGGAATGTCAGGTAAGAAGTTGGTTTATGTACGGCTGGCCGAACTTATGGCCGAATGGAAACTAAAA
AAATTAGAAAGATCCAGAGTTGAAGAACAATCATCCGCTCAATAA
SEQ ATGGCAGAAAGCAAACAAATGCAGTGTAGGAAATGTGGAGCTAGTATGAAGTACGAAGTCATCGGTTT
ID GGGTAAAAAGTCATGTAGATACATGTGTCCCGATTGTGGCAACCATACCTCGGCAAGAAAGATACAAAA
NO: CAAAAAAAAAAGAGATAAAAAATATGGGTCAGCCAGTAAAGCCCAATCTCAAAGAATTGCTGTAGCAG
144 GTGCTCTTTACCCTGACAAAAAAGTACAAACTATCAAAACCTATAAATATCCAGCAGACTTGAATGGTG
AGGTGCATGATAGCGGTGTTGCCGAGAAAATCGCACAAGCAATACAAGAGGACGAGATTGGACTTTTG
GGACCAAGCTCAGAATATGCATGCTGGATTGCATCTCAAAAACAGTCTGAGCCTTACAGTGTAGTCGAT
TTCTGGTTTGATGCAGTGTGCGCAGGGGGAGTCTTCGCCTACTCTGGCGCTAGATTATTGAGTACAGTTT
TACAGTTATCCGGTGAGGAATCGGTGCTTAGAGCTGCCTTAGCCTCGTCTCCATTCGTTGACGATATAAA
CTTAGCGCAAGCCGAAAAGTTTTTGGCGGTTAGCAGGCGTACAGGTCAAGATAAGTTAGGTAAGAGAAT
TGGGGAGTGCTTTGCAGAAGGAAGATTGGAAGCTTTAGGGATAAAAGATAGAATGAGGGAATTTGTTCA
AGCTATCGATGTTGCACAGACCGCCGGACAACGTTTCGCTGCCAAATTGAAGATATTCGGTATAAGTCA
GATGCCAGAAGCTAAGCAATGGAATAACGATTCCGGACTGACTGTCTGTATACTACCTGATTATTATGTT
CCCGAAGAGAATCGCGCGGACCAACTTGTAGTGTTGTTAAGAAGACTTCGCGAGATTGCATATTGCATG
GGTATTGAAGATGAAGCGGGTTTCGAACATCTTGGAATAGATCCTGGTGCTCTTTCGAATTTTTCAAACG
GTAACCCTAAGAGAGGATTTCTAGGGAGGCTGTTAAATAACGATATTATTGCGTTGGCAAACAATATGA
GTGCGATGACTCCATATTGGGAAGGGCGTAAGGGTGAACTCATAGAAAGGCTTGCGTGGTTAAAGCACA
GGGCAGAAGGGCTGTATCTTAAAGAACCTCATTTCGGTAACTCCTGGGCCGATCATAGGTCACGAATTTT
CTCAAGGATCGCAGGCTGGTTATCTGGTTGCGCTGGCAAGTTGAAAATTGCGAAAGACCAAATTTCTGG
AGTACGTACAGATCTATTTCTGCTAAAAAGACTGCTGGACGCAGTTCCGCAATCGGCGCCATCCCCCGAT
TTTATTGCGTCAATTTCGGCACTTGACAGGTTTTTAGAAGCTGCAGAATCGAGCCAGGACCCTGCTGAAC
AAGTGAGGGCTCTCTACGCTTTTCACTTGAACGCACCTGCAGTCCGAAGTATAGCCAATAAAGCAGTGC
AAAGGTCCGACAGCCAAGAATGGCTGATAAAAGAACTAGACGCTGTTGACCATTTAGAATTTAACAAAG
CGTTCCCATTTTTCTCTGACACAGGAAAAAAAAAAAAAAAAGGTGCTAATAGCAACGGTGCTCCATCGG
AAGAAGAGTACACTGAAACGGAATCAATACAACAACCTGAGGACGCGGAACAGGAAGTAAACGGACA
AGAAGGGAACGGAGCGTCTAAAAATCAAAAGAAATTTCAAAGAATACCTAGATTCTTCGGTGAAGGCT
CCAGATCTGAATACAGAATTTTAACGGAAGCTCCACAGTATTTCGATATGTTTTGTAATAACATGAGGGC
TATATTTATGCAGTTAGAAAGTCAACCCCGTAAAGCTCCCAGAGATTTTAAATGTTTCCTACAAAATCGA
TTACAAAAATTATACAAACAGACTTTCTTGAATGCACGAAGCAACAAGTGTCGCGCTCTGCTTGAGTCA
GTTTTAATCTCTTGGGGAGAATTTTATACATACGGTGCCAACGAAAAGAAATTTAGATTAAGACATGAA
GCTTCAGAACGCAGCAGTGACCCAGATTACGTAGTTCAGCAAGCCTTGGAAATCGCGCGTCGTCTATTC
CTTTTTGGCTTCGAATGGAGAGATTGCTCCGCTGGTGAAAGAGTGGATTTGGTTGAAATTCACAAAAAG
GCTATCAGTTTTTTGTTGGCTATTACTCAAGCTGAGGTCTCTGTTGGTTCATACAATTGGCTTGGCAACTC
AACAGTATCGAGATATTTATCCGTTGCGGGAACTGATACCTTATACGGTACCCAATTGGAAGAATTCCTG
AACGCTACAGTGTTGAGTCAAATGCGTGGTCTGGCCATTAGATTGAGTTCTCAAGAACTTAAGGACGGT
TTTGATGTGCAGCTCGAGTCTTCCTGCCAGGACAATCTGCAACACCTATTGGTGTATAGGGCTTCGAGAG
ATTTGGCGGCTTGCAAGCGCGCTACTTGTCCAGCCGAACTCGATCCTAAGATTTTAGTTTTACCGGTAGG
TGCATTCATCGCTTCCGTAATGAAAATGATAGAAAGAGGTGACGAACCTTTAGCTGGTGCTTATTTACGG
CATAGGCCACACTCTTTCGGATGGCAAATTAGGGTCCGCGGTGTTGCTGAGGTAGGGATGGATCAGGGT
ACAGCATTGGCCTTTCAAAAGCCAACAGAGTCAGAACCTTTTAAAATTAAGCCCTTCTCTGCACAGTATG
GACCAGTTCTGTGGTTGAACAGTAGTAGTTATTCTCAATCACAATATTTGGACGGTTTTCTATCTCAACC
AAAAAATTGGAGTATGAGGGTGTTGCCTCAGGCGGGTTCAGTTCGCGTCGAACAACGAGTTGCTTTGAT
ATGGAACTTACAAGCAGGCAAGATGAGACTAGAACGCTCCGGTGCGAGGGCCTTTTTCATGCCTGTACC
GTTTTCATTTAGGCCATCCGGCAGTGGGGACGAAGCAGTTTTGGCGCCCAACCGGTACTTGGGTCTGTTC
CCTCATTCCGGAGGTATAGAATACGCTGTAGTGGATGTCCTGGATTCTGCTGGATTTAAAATTCTTGAAA
GAGGCACTATTGCTGTCAATGGTTTCTCTCAGAAAAGGGGAGAGCGCCAAGAAGAAGCCCATCGTGAAA
AACAAAGAAGGGGGATAAGTGATATAGGGCGAAAGAAGCCTGTGCAGGCAGAAGTCGATGCGGCGAA
CGAATTGCATAGAAAGTACACTGATGTTGCCACAAGATTAGGTTGTAGAATCGTCGTTCAATGGGCACC
ACAACCTAAACCAGGGACAGCACCGACAGCGCAAACTGTTTACGCGAGGGCTGTTAGGACAGAAGCTC
CGAGGAGCGGCAACCAAGAAGATCATGCAAGAATGAAAAGTTCTTGGGGTTACACCTGGGGTACGTATT
GGGAGAAACGAAAACCAGAAGATATTTTAGGGATTTCTACACAGGTGTATTGGACAGGAGGTATAGGC
GAATCCTGTCCTGCTGTAGCAGTCGCTTTATTAGGTCATATTAGAGCAACTTCAACACAAACGGAGTGGG
AAAAGGAAGAAGTTGTCTTTGGAAGACTGAAGAAGTTCTTTCCGAGTTAA
SEQ ATGGAGAAGAGAATTAATAAGATACGGAAAAAATTATCTGCGGATAATGCAACAAAGCCAGTCTCTCGT
ID TCAGGCCCCATGAAAACCCTGCTTGTAAGAGTAATGACGGATGATTTAAAAAAGAGGTTGGAAAAGCGT
NO: AGAAAAAAACCAGAAGTGATGCCGCAAGTGATCTCAAATAACGCAGCTAATAATCTAAGGATGCTACTT
145 GATGATTATACAAAAATGAAAGAAGCAATCCTGCAAGTTTACTGGCAGGAATTCAAGGATGACCATGTT
GGACTAATGTGCAAATTCGCACAACCAGCGTCTAAGAAAATTGACCAAAATAAATTGAAACCCGAAATG
GACGAAAAAGGGAATTTAACAACTGCCGGGTTTGCCTGCTCGCAATGTGGGCAACCATTATTTGTTTATA
AATTAGAGCAGGTTTCGGAAAAAGGAAAGGCTTACACAAATTACTTCGGCAGATGTAATGTTGCCGAAC
ACGAAAAACTCATATTGTTAGCTCAGTTGAAGCCTGAGAAAGACTCTGATGAGGCCGTTACTTACTCGTT
GGGGAAGTTTGGTCAAAGAGCTCTCGATTTTTATTCTATTCATGTGACAAAGGAGTCCACACATCCCGTC
AAGCCCTTGGCACAAATTGCGGGTAATAGATACGCTTCGGGTCCAGTTGGGAAGGCCCTTTCTGATGCA
TGTATGGGCACAATTGCTAGCTTTCTTAGTAAATACCAGGATATCATAATAGAGCATCAAAAAGTTGTA
AAGGGTAACCAAAAGAGATTAGAATCGCTGCGTGAGTTGGCGGGTAAAGAAAACTTGGAATATCCATCT
GTCACTCTGCCTCCTCAACCTCATACTAAGGAAGGTGTAGATGCGTACAATGAAGTTATCGCTAGAGTCC
GTATGTGGGTGAATTTAAATTTGTGGCAAAAATTGAAGTTATCGCGTGATGATGCAAAACCTCTTCTTAG
ACTAAAGGGCTTTCCTAGCTTCCCTGTAGTGGAAAGACGCGAAAATGAAGTCGATTGGTGGAATACAAT
TAACGAAGTCAAAAAACTGATCGATGCAAAGCGAGATATGGGTCGAGTTTTTTGGTCTGGTGTTACAGC
TGAAAAAAGGAATACGATCTTAGAAGGTTACAACTACTTGCCAAATGAGAACGATCATAAAAAAAGAG
AAGGCAGTTTAGAAAATCCAAAAAAGCCAGCTAAGAGACAATTTGGTGATTTGCTACTTTACCTAGAAA
AAAAGTACGCCGGAGATTGGGGGAAAGTCTTTGACGAAGCTTGGGAGAGAATAGATAAAAAAATAGCA
GGATTGACGTCACACATTGAAAGAGAAGAGGCGAGAAATGCAGAAGATGCTCAGTCCAAAGCTGTCCT
CACCGACTGGTTGAGAGCCAAAGCGTCCTTTGTTCTCGAACGCCTAAAAGAAATGGATGAGAAGGAATT
TTATGCCTGCGAAATCCAGCTACAAAAATGGTACGGAGACTTGAGAGGTAACCCCTTTGCCGTGGAAGC
AGAGAACCGTGTTGTAGATATCTCCGGTTTCTCAATCGGTAGCGATGGACACTCCATTCAGTATCGCAAC
TTGTTGGCCTGGAAATATTTGGAAAACGGTAAGAGGGAATTCTATTTACTTATGAATTATGGCAAGAAA
GGTAGAATCAGGTTTACTGACGGAACAGACATTAAAAAGAGTGGTAAGTGGCAAGGCCTTTTGTACGGT
GGTGGCAAGGCCAAAGTAATAGACTTAACATTTGACCCCGACGACGAACAACTGATAATACTGCCTTTA
GCTTTTGGTACTCGACAGGGGCGAGAGTTCATTTGGAATGATCTTTTGTCACTCGAGACTGGTTTGATAA
AACTTGCAAATGGAAGAGTCATCGAGAAGACAATTTACAACAAAAAGATAGGTCGCGATGAGCCTGCA
CTATTTGTGGCCTTGACCTTTGAGAGAAGGGAAGTTGTCGACCCATCCAATATTAAACCAGTCAACCTAA
TCGGTGTAGATAGAGGTGAAAACATCCCAGCTGTTATCGCTCTGACAGACCCTGAAGGTTGCCCTTTGCC
AGAATTTAAAGATTCGTCTGGTGGACCAACAGATATATTACGTATTGGGGAAGGCTATAAAGAGAAACA
ACGTGCTATTCAGGCTGCAAAAGAAGTTGAACAGAGGAGAGCTGGAGGTTACAGTAGAAAATTCGCCA
GTAAAAGTAGAAACTTAGCAGATGACATGGTTAGAAACTCTGCCCGGGATTTGTTCTATCATGCGGTTA
CTCACGATGCAGTCTTAGTCTTTGAAAATCTATCGCGCGGTTTTGGTAGGCAAGGCAAGAGGACTTTTAT
GACAGAGAGACAATATACAAAAATGGAAGATTGGTTAACCGCGAAGCTCGCATATGAAGGTCTTACTTC
GAAAACGTACCTCAGCAAAACGCTGGCTCAATATACTTCTAAAACTTGTTCAAATTGTGGTTTTACTATT
ACCACGGCAGACTACGACGGGATGTTGGTGAGATTGAAGAAGACGAGCGATGGTTGGGCAACAACATT
GAATAATAAGGAATTAAAAGCAGAAGGACAGATTACGTATTACAATCGTTATAAACGCCAAACGGTTG
AGAAAGAGTTGTCAGCCGAGTTGGATAGACTAAGTGAAGAGAGCGGTAACAATGATATCTCAAAGTGG
ACTAAAGGGAGGCGGGATGAAGCCCTCTTTTTACTAAAGAAGAGATTCTCACATAGACCTGTGCAAGAA
CAATTCGTTTGTTTAGATTGTGGCCATGAGGTTCATGCAGACGAACAGGCTGCGTTAAATATTGCGAGAA
GCTGGCTATTTCTAAATTCTAATTCAACAGAGTTCAAGAGCTATAAATCCGGAAAACAACCTTTCGTAGG
CGCGTGGCAAGCCTTCTATAAAAGGAGATTAAAAGAGGTTTGGAAACCAAATGCA
SEQ ATGAAAAGAATTAACAAAATTAGAAGGAGGCTGGTCAAAGATTCTAATACCAAGAAAGCTGGTAAGAC
ID TGGTCCGATGAAAACCCTATTAGTCAGAGTTATGACCCCAGATTTGAGAGAAAGATTGGAGAACCTCAG
NO: GAAAAAGCCCGAAAACATCCCACAACCCATTAGTAACACATCAAGAGCTAATTTAAACAAGTTATTAAC
146 TGACTACACTGAAATGAAAAAAGCAATATTGCATGTTTACTGGGAAGAGTTCCAGAAAGATCCTGTTGG
GTTGATGTCTAGAGTTGCTCAACCGGCCCCAAAGAATATAGATCAAAGGAAACTTATTCCTGTGAAGGA
CGGCAATGAAAGATTAACCAGCTCCGGTTTCGCTTGCTCCCAGTGCTGCCAACCCCTGTATGTATACAAA
CTGGAACAAGTAAATGATAAAGGTAAGCCACATACTAACTACTTTGGTAGGTGTAATGTATCCGAGCAT
GAAAGATTGATCTTGTTAAGTCCCCATAAACCAGAAGCTAATGATGAGTTAGTAACTTATAGTTTAGGTA
AGTTCGGACAACGAGCTTTAGATTTCTATAGCATCCATGTTACAAGAGAAAGCAATCACCCCGTCAAAC
CACTGGAACAAATCGGTGGTAATAGTTGTGCGTCAGGTCCAGTAGGCAAAGCTTTATCAGACGCTTGCA
TGGGTGCCGTGGCTAGTTTTTTGACGAAATACCAAGATATTATACTGGAACATCAAAAGGTAATTAAAA
AGAATGAAAAGAGACTCGCTAACTTAAAAGATATTGCAAGTGCCAATGGTTTAGCTTTTCCTAAAATTA
CCTTGCCACCTCAGCCACATACAAAGGAGGGAATTGAAGCTTACAATAATGTAGTAGCCCAAATAGTTA
TTTGGGTGAACCTTAACCTATGGCAAAAGTTAAAAATTGGTAGAGACGAAGCCAAACCCCTGCAGAGGC
TGAAGGGTTTTCCCTCCTTCCCCTTAGTAGAGAGACAAGCTAATGAAGTGGACTGGTGGGATATGGTGT
GCAATGTTAAAAAATTGATTAATGAGAAGAAAGAGGATGGTAAAGTGTTTTGGCAGAATCTTGCTGGCT
ACAAGAGACAGGAAGCTTTACTGCCTTATTTATCTTCTGAGGAAGATAGGAAAAAAGGTAAAAAATTTG
CTAGATATCAATTCGGAGACCTACTTCTGCATTTAGAAAAAAAACATGGCGAAGATTGGGGTAAAGTTT
ATGATGAAGCCTGGGAAAGAATTGATAAGAAGGTAGAAGGTCTCTCCAAACATATTAAATTAGAGGAA
GAACGTAGGTCCGAAGACGCTCAATCAAAGGCAGCATTAACTGATTGGTTGAGAGCAAAAGCCTCTTTC
GTTATTGAAGGATTAAAAGAAGCCGACAAAGATGAATTTTGTAGATGTGAGTTAAAGTTGCAAAAGTGG
TATGGAGACCTCCGTGGTAAACCTTTTGCTATTGAGGCTGAAAATTCTATACTCGATATCTCTGGATTTTC
AAAACAATATAACTGCGCATTTATATGGCAGAAAGATGGTGTTAAAAAGCTAAATCTATACTTAATTAT
CAATTACTTTAAAGGTGGTAAATTGCGTTTTAAGAAGATAAAGCCTGAAGCCTTTGAGGCAAACCGTTTT
TACACTGTTATCAATAAAAAATCTGGGGAAATCGTACCAATGGAAGTTAATTTCAATTTCGATGATCCTA
ATCTTATTATTTTACCTCTTGCTTTCGGCAAAAGGCAAGGTAGGGAGTTTATTTGGAATGATTTATTGTCG
CTGGAAACGGGGTCTCTCAAACTCGCAAACGGTAGGGTGATAGAAAAAACATTATACAACAGGAGAAC
TCGGCAGGATGAGCCAGCTCTTTTTGTGGCTCTGACATTCGAGAGAAGGGAAGTTTTAGATTCATCTAAC
ATCAAACCAATGAATTTAATAGGTATTGACCGGGGTGAAAATATACCTGCAGTTATTGCTTTAACTGATC
CTGAGGGATGTCCTCTTAGCAGATTCAAGGACTCGTTGGGTAACCCTACTCACATCTTAAGGATTGGAGA
AAGTTACAAGGAGAAACAAAGGACAATACAAGCTGCTAAAGAAGTAGAACAAAGGAGGGCGGGTGGA
TATAGTCGGAAATATGCCAGCAAGGCCAAGAATTTAGCTGACGACATGGTTAGGAATACAGCTAGAGAC
CTTTTATACTATGCCGTCACCCAGGATGCCATGTTGATATTTGAAAATTTAAGTAGAGGCTTCGGTAGAC
AAGGTAAGCGCACCTTCATGGCAGAGAGACAATATACTAGAATGGAAGATTGGTTGACTGCCAAATTGG
CATACGAAGGTCTACCTAGTAAGACGTACTTATCTAAAACACTAGCGCAGTATACTTCCAAGACATGCA
GTAATTGTGGTTTCACAATCACTTCTGCCGATTACGATCGCGTCTTGGAAAAACTAAAAAAAACAGCGA
CAGGTTGGATGACTACTATTAATGGGAAAGAATTGAAGGTCGAAGGACAAATAACTTACTATAATAGAT
ATAAACGGCAAAACGTTGTAAAAGACCTGTCAGTCGAACTCGATCGACTTAGTGAAGAATCTGTTAATA
ATGATATTAGTTCGTGGACAAAAGGTAGATCCGGTGAAGCTTTGAGCCTCCTGAAAAAACGTTTTAGCC
ATAGGCCTGTCCAAGAAAAGTTTGTATGTTTAAACTGTGGTTTTGAGACCCATGCAGACGAGCAGGCCG
CTCTTAATATTGCTAGATCATGGTTATTTTTAAGATCTCAGGAATACAAGAAGTACCAGACTAACAAGAC
AACAGGCAACACAGATAAGCGAGCATTCGTTGAGACTTGGCAATCTTTTTATAGAAAGAAATTGAAGGA
AGTCTGGAAACCA
SEQ ATGGGAAAAATGTATTATCTAGGCCTGGACATAGGGACCAATTCAGTAGGCTACGCTGTCACTGACCCC
ID TCCTACCATTTGCTGAAGTTCAAGGGGGAACCCATGTGGGGAGCACACGTGTTTGCGGCCGGCAACCAG
NO: AGCGCAGAGCGGAGAAGCTTCCGCACCTCCAGGAGAAGGCTGGATCGCAGGCAGCAGCGTGTGAAGCT
147 GGTCCAAGAGATATTTGCCCCAGTGATTTCCCCCATCGATCCGCGCTTCTTTATTAGGCTCCACGAGTCC
GCTCTCTGGCGCGACGACGTGGCCGAAACTGATAAACATATTTTCTTTAATGACCCAACATACACTGACA
AGGAGTACTATTCAGATTACCCAACAATTCACCATTTGATCGTGGACCTTATGGAAAGTTCGGAGAAGC
ATGATCCTCGACTTGTCTATTTGGCCGTGGCGTGGCTCGTGGCACATAGGGGCCACTTCTTGAACGAGGT
GGACAAGGATAACATCGGGGATGTGTTATCTTTCGACGCTTTCTATCCTGAATTCCTTGCTTTTCTGTCTG
ACAATGGCGTCAGCCCGTGGGTCTGCGAATCCAAGGCCCTCCAGGCTACGCTATTGTCAAGAAATAGCG
TGAACGACAAGTACAAGGCTCTTAAGTCTTTGATTTTTGGAAGCCAGAAGCCCGAGGACAACTTTGATG
CAAATATCTCGGAGGACGGGCTGATTCAGCTCCTCGCTGGGAAAAAGGTCAAGGTCAATAAGCTGTTTC
CACAGGAGTCAAATGACGCGAGCTTCACCCTTAACGACAAAGAGGATGCCATTGAAGAGATCCTGGGG
ACACTCACCCCAGACGAGTGCGAGTGGATAGCCCATATTAGGCGCCTCTTTGATTGGGCCATAATGAAA
CATGCGCTTAAGGACGGGCGCACGATATCCGAAAGCAAGGTCAAATTGTACGAGCAGCACCACCATGAT
CTGACCCAGCTAAAATATTTTGTAAAAACATATCTGGCCAAGGAGTACGATGATATCTTCCGCAACGTG
GATAGTGAGACCACCAAAAACTACGTCGCGTACTCATACCACGTGAAAGAAGTTAAGGGCACGCTGCCT
AAGAACAAGGCAACACAAGAGGAGTTCTGCAAGTACGTTCTCGGGAAAGTTAAAAATATAGAGTGCAG
CGAGGCCGACAAAGTGGATTTTGACGAGATGATTCAACGCCTGACCGACAATTCGTTTATGCCTAAACA
GGTGAGTGGAGAGAATCGCGTGATTCCATATCAGCTCTATTACTATGAACTCAAGACTATTCTGAATAA
GGCCGCTAGCTATTTACCCTTCCTTACGCAGTGCGGGAAGGATGCCATTTCTAACCAGGATAAACTCTTG
AGTATAATGACATTTCGAATTCCCTATTTCGTGGGTCCGCTTCGTAAGGATAACAGTGAGCACGCTTGGC
TGGAGCGGAAGGCTGGCAAAATTTATCCATGGAATTTCAACGACAAGGTGGATCTGGACAAATCCGAAG
AAGCCTTTATCCGCAGGATGACCAATACTTGCACATACTATCCTGGGGAGGATGTCCTTCCACTGGACTC
TCTGATCTACGAAAAGTTCATGATTTTGAATGAAATTAACAACATAAGGATCGATGGGTATCCTATTTCC
GTCGACGTGAAGCAGCAGGTGTTCGGGCTCTTTGAGAAGAAGCGACGGGTGACCGTGAAGGATATTCAG
AATCTTCTCTTATCGCTGGGAGCCCTGGATAAACACGGAAAACTGACCGGGATAGATACTACGATTCAT
TCTAATTACAACACGTATCACCATTTTAAGTCACTGATGGAGAGGGGCGTCCTAACAAGAGATGACGTG
GAGAGAATAGTGGAACGAATGACATATTCTGATGACACCAAGAGAGTGCGGCTTTGGCTGAATAACAA
CTACGGCACTCTGACGGCGGATGATGTAAAGCATATTTCCCGACTCCGTAAGCATGACTTCGGGCGGCT
GTCTAAGATGTTTCTAACAGGCCTCAAGGGTGTGCATAAGGAAACTGGGGAGCGCGCTAGCATCCTGGA
TTTTATGTGGAACACCAATGATAACCTGATGCAGCTCCTGTCAGAATGCTACACATTTTCGGACGAAATC
ACCAAGCTGCAGGAGGCTTACTATGCCAAGGCCCAACTAAGCTTGAATGATTTCCTGGATTCTATGTACA
TCAGCAACGCCGTAAAACGACCAATTTATAGGACACTGGCAGTGGTTAACGACATTAGGAAAGCATGCG
GAACAGCTCCCAAGCGAATCTTTATCGAGATGGCCCGCGACGGCGAGAGTAAGAAGAAAAGGTCAGTG
ACTAGGCGGGAGCAGATCAAGAACCTTTACCGCTCTATCCGAAAAGACTTCCAGCAAGAGGTTGATTTC
CTTGAGAAGATCTTAGAGAACAAGTCAGATGGACAGCTCCAATCCGATGCTCTGTATCTGTACTTCGCTC
AGCTGGGACGAGATATGTACACTGGCGACCCCATTAAACTAGAACATATCAAGGACCAATCGTTTTATA
ATATCGACCACATCTACCCTCAGTCCATGGTGAAAGACGATAGTCTGGACAATAAGGTGCTCGTCCAAA
GTGAGATTAACGGAGAAAAGTCGAGCAGATATCCTTTGGACGCTGCGATCCGCAACAAGATGAAGCCCC
TGTGGGATGCTTACTACAATCATGGACTGATCAGCCTGAAGAAGTATCAGAGACTGACCCGGAGTACCC
CTTTCACAGACGATGAGAAGTGGGATTTTATCAATAGACAACTGGTGGAAACCAGGCAGTCCACGAAAG
CTCTGGCCATTCTTCTGAAGAGAAAGTTTCCAGACACAGAGATCGTCTATTCAAAGGCCGGCCTCAGTTC
CGACTTTAGACATGAGTTCGGACTCGTTAAATCACGAAATATAAACGATCTCCACCATGCAAAGGACGC
ATTCCTCGCGATTGTGACTGGAAATGTCTATCACGAAAGATTTAATAGGCGGTGGTTCATGGTTAACCAG
CCATACTCAGTGAAGACCAAGACCCTTTTCACTCACTCTATTAAAAATGGCAACTTCGTGGCTTGGAATG
GTGAGGAGGATCTTGGAAGAATTGTGAAGATGTTAAAACAGAATAAGAATACCATCCACTTTACTAGAT
TCAGCTTTGACCGAAAAGAGGGGCTATTCGATATTCAACCGTTAAAGGCTTCAACAGGTCTCGTTCCACG
AAAGGCCGGACTGGACGTAGTGAAATACGGCGGCTATGATAAGAGCACCGCAGCTTACTACCTCCTTGT
GCGATTTACGCTCGAGGATAAGAAGACCCAACACAAGCTGATGATGATTCCCGTGGAGGGACTGTACAA
AGCTCGAATTGACCATGATAAAGAGTTTCTCACAGATTACGCACAAACCACCATCTCTGAGATTCTCCAG
AAAGACAAACAAAAAGTTATAAACATAATGTTTCCAATGGGTACAAGGCATATTAAACTGAACAGCATG
ATCTCCATTGATGGCTTTTATTTGTCCATTGGAGGAAAGTCTAGTAAAGGCAAGTCTGTCCTCTGCCATG
CCATGGTACCCCTAATCGTCCCACACAAGATTGAATGCTACATCAAGGCTATGGAGAGTTTTGCTCGGA
AATTTAAAGAGAATAATAAGCTGCGTATTGTGGAAAAATTCGACAAGATAACCGTTGAAGACAATCTGA
ATCTGTACGAGCTCTTTCTGCAGAAGCTGCAGCATAACCCCTATAATAAGTTCTTCTCCACACAGTTCGA
TGTACTGACCAACGGGCGATCAACTTTCACAAAGCTAAGTCCTGAGGAACAGGTGCAAACACTCCTAAA
CATTCTTTCCATTTTTAAGACCTGCAGATCTTCAGGATGCGACTTGAAGAGCATTAACGGGAGCGCACAG
GCAGCTAGGATCATGATCTCAGCTGACCTGACAGGGCTGAGTAAAAAATACTCCGACATTCGGCTTGTA
GAGCAAAGCGCCAGTGGGTTGTTCGTTAGTAAGTCGCAGAACCTGCTGGAATACCTGTAA
SEQ ATGTCTTCTTTGACGAAGTTTACAAACAAATACTCTAAGCAGCTTACAATTAAGAACGAACTGATTCCCG
ID TAGGAAAGACTCTGGAAAACATCAAAGAGAATGGGCTGATAGACGGCGACGAACAACTGAATGAGAAC
NO: TATCAGAAGGCCAAAATTATCGTGGATGACTTCCTGAGGGATTTTATTAACAAGGCCCTGAATAATACC
148 CAGATCGGCAATTGGCGGGAACTGGCCGACGCTCTGAACAAAGAAGATGAGGACAATATCGAAAAATT
ACAAGACAAAATCAGGGGCATTATTGTCAGTAAGTTCGAGACATTCGATCTGTTCTCTTCGTACTCCATT
AAGAAGGACGAGAAAATCATCGATGATGACAATGACGTTGAGGAAGAAGAACTGGACTTGGGTAAAAA
GACCTCATCCTTCAAGTATATTTTTAAAAAAAATCTGTTTAAATTAGTGCTCCCCAGTTATTTAAAGACA
ACTAACCAGGACAAGCTTAAGATTATCTCCTCTTTTGACAACTTTAGCACCTATTTTAGAGGCTTCTTTGA
AAATCGCAAGAATATTTTCACTAAGAAGCCCATAAGCACCTCTATTGCCTACAGAATCGTACATGATAA
CTTCCCAAAATTTTTGGATAACATTAGATGTTTTAATGTATGGCAGACCGAATGTCCTCAGTTAATTGTG
AAGGCGGATAACTACCTCAAATCCAAGAATGTGATCGCCAAAGATAAGTCTCTTGCTAACTACTTTACG
GTCGGAGCCTACGATTACTTCTTATCTCAAAACGGTATTGACTTTTACAATAACATTATCGGGGGATTGC
CTGCCTTCGCCGGCCATGAGAAAATTCAGGGCTTAAACGAGTTCATAAATCAGGAATGTCAAAAGGACT
CAGAGCTGAAATCAAAGCTTAAGAATCGACACGCATTTAAAATGGCGGTCTTGTTCAAACAGATCCTCA
GCGATAGAGAGAAAAGCTTCGTTATTGATGAATTCGAGAGCGACGCACAGGTGATTGATGCCGTGAAGA
ACTTCTATGCGGAACAGTGTAAAGACAATAATGTTATTTTCAACCTATTAAACTTGATTAAGAATATCGC
GTTTTTAAGTGACGATGAACTCGACGGTATCTTTATAGAAGGCAAGTACCTGTCCTCTGTCAGCCAAAAA
CTCTACTCAGATTGGTCCAAGCTAAGAAATGACATCGAGGACAGTGCTAACAGCAAACAGGGCAATAA
AGAGCTGGCAAAGAAAATCAAGACTAATAAAGGGGATGTGGAGAAGGCGATATCTAAATATGAGTTCT
CCCTCTCCGAACTGAACTCCATCGTCCACGATAATACCAAGTTTAGTGATCTGTTGTCGTGTACACTGCA
CAAAGTGGCCAGTGAAAAACTCGTCAAGGTGAACGAAGGCGATTGGCCCAAACACCTGAAAAATAATG
AGGAGAAACAGAAGATCAAAGAACCTTTGGATGCGTTGCTCGAAATATATAACACACTGTTGATCTTCA
ACTGTAAAAGCTTCAACAAGAACGGGAACTTTTATGTAGACTACGATCGATGTATAAATGAACTGAGCA
GCGTCGTTTACCTGTACAACAAGACTCGCAATTATTGTACGAAAAAACCATATAACACCGATAAGTTCA
AGCTTAATTTCAACAGTCCCCAGCTGGGAGAAGGGTTCAGCAAATCAAAAGAAAACGATTGCCTGACAT
TACTCTTTAAAAAGGATGATAATTATTATGTTGGGATTATTAGGAAAGGCGCTAAGATCAACTTTGACGA
CACACAGGCCATAGCTGACAACACTGATAACTGCATCTTTAAAATGAATTACTTTCTGTTGAAGGACGCC
AAAAAATTCATTCCAAAATGCTCTATTCAGCTCAAGGAGGTTAAGGCCCATTTCAAGAAGTCTGAAGAT
GACTACATCCTCTCTGACAAGGAAAAATTCGCTAGTCCTCTGGTTATCAAAAAAAGTACCTTCTTGCTGG
CTACAGCTCACGTGAAAGGCAAGAAAGGGAACATTAAGAAGTTCCAAAAGGAATACAGCAAAGAGAAT
CCAACCGAGTACAGAAATTCTCTGAACGAATGGATCGCATTCTGTAAAGAATTTCTAAAGACGTACAAG
GCCGCTACCATTTTCGATATTACCACCTTGAAAAAAGCCGAGGAGTACGCCGACATCGTCGAATTCTATA
AAGACGTGGATAACCTGTGTTACAAATTGGAATTCTGCCCAATTAAGACCTCTTTCATTGAAAACCTCAT
CGACAATGGGGACCTCTACTTATTTAGAATTAACAATAAGGATTTTTCTTCGAAATCTACCGGAACTAAA
AATCTGCACACACTGTATCTGCAAGCAATCTTCGATGAACGTAATCTCAACAACCCTACAATAATGCTGA
ACGGCGGTGCTGAACTGTTCTACCGTAAAGAGAGTATTGAACAGAAGAATCGAATCACACACAAAGCG
GGCAGTATTCTCGTCAATAAGGTGTGCAAAGACGGGACCAGCCTGGACGATAAGATCAGGAATGAAAT
ATATCAGTATGAGAACAAGTTTATCGACACCTTGTCGGATGAGGCAAAGAAGGTGCTACCTAACGTTAT
CAAGAAGGAAGCTACCCATGACATAACCAAGGATAAGCGGTTCACTTCTGACAAGTTCTTCTTCCACTG
TCCTCTGACCATTAACTACAAGGAAGGAGACACTAAACAATTCAATAATGAAGTACTTAGCTTTTTGCG
GGGTAATCCCGATATTAACATAATTGGTATCGACCGGGGAGAACGGAACCTGATATACGTGACAGTAAT
TAATCAGAAAGGAGAAATCCTGGATTCCGTATCCTTCAATACCGTGACTAATAAATCTAGTAAAATCGA
GCAGACGGTCGACTACGAGGAAAAGTTAGCAGTCAGAGAGAAGGAGAGAATCGAGGCCAAACGTTCCT
GGGATAGTATCAGCAAGATTGCTACTCTGAAAGAAGGATATCTGTCCGCTATCGTCCATGAGATCTGTTT
GTTGATGATCAAGCACAATGCTATAGTGGTTCTGGAGAACCTGAACGCAGGCTTCAAGCGAATTAGAGG
GGGCCTGTCGGAAAAAAGCGTTTACCAGAAGTTTGAAAAGATGCTAATCAATAAGTTAAATTACTTTGT
AAGTAAAAAAGAAAGCGATTGGAATAAGCCATCAGGACTTTTAAACGGGCTGCAACTGAGCGACCAGT
TTGAGTCATTCGAAAAACTGGGTATTCAGAGTGGTTTCATATTCTACGTACCTGCCGCTTACACTTCAAA
GATCGATCCTACAACTGGTTTTGCGAATGTCCTGAATCTGTCTAAGGTGAGGAATGTGGACGCAATCAA
GTCTTTCTTCAGCAACTTCAACGAGATATCTTACAGCAAGAAAGAGGCTCTGTTTAAATTCAGTTTTGAT
CTGGATAGCCTGAGCAAGAAAGGATTCTCTTCTTTCGTAAAGTTTTCTAAGTCCAAATGGAACGTCTACA
CGTTCGGAGAGAGAATCATTAAACCAAAGAACAAGCAGGGGTATCGGGAAGACAAAAGGATCAATCTG
ACTTTCGAAATGAAGAAACTATTGAATGAGTACAAAGTCTCATTCGATTTGGAGAACAATCTGATCCCC
AATCTGACCAGCGCTAACCTCAAAGACACATTCTGGAAGGAGCTGTTTTTCATCTTTAAGACCACCCTGC
AGCTACGGAATAGTGTCACAAATGGGAAAGAGGATGTACTGATCTCACCTGTGAAAAACGCCAAGGGG
GAGTTCTTTGTGTCCGGCACCCATAACAAAACCCTGCCTCAGGACTGTGACGCGAACGGGGCCTACCAC
ATCGCGCTAAAGGGGTTAATGATTCTCGAACGTAATAATCTGGTGCGCGAAGAAAAAGACACAAAGAA
AATTATGGCCATCAGCAACGTTGACTGGTTTGAGTACGTGCAGAAGCGTCGAGGAGTTTTGTAA
SEQ ATGAACAACTATGACGAGTTCACTAAACTTTACCCCATTCAGAAAACCATCAGATTTGAACTGAAGCCT
ID CAGGGTCGTACCATGGAACACTTGGAAACTTTCAACTTTTTCGAGGAGGACAGGGATAGAGCTGAGAAA
NO: TACAAGATCTTGAAAGAGGCCATCGACGAGTATCACAAAAAATTCATCGATGAGCATCTCACCAACATG
149 TCGCTGGATTGGAACAGTCTCAAGCAGATTTCCGAGAAGTACTATAAATCTCGGGAGGAGAAAGATAAA
AAGGTGTTTTTGAGCGAGCAAAAGCGAATGCGACAGGAGATAGTCTCTGAATTTAAGAAAGATGATCGG
TTTAAAGACCTATTTTCCAAAAAGCTTTTTTCAGAGCTGCTGAAGGAAGAGATCTATAAAAAAGGCAAT
CACCAAGAAATTGATGCCCTGAAATCATTCGACAAATTCAGTGGGTATTTCATAGGACTGCATGAGAAC
CGGAAGAATATGTATAGTGATGGAGACGAGATCACAGCCATAAGCAATCGAATCGTTAACGAGAATTTC
CCGAAGTTCCTGGATAACCTGCAGAAGTATCAAGAGGCTAGGAAAAAGTACCCTGAGTGGATCATCAAG
GCTGAATCAGCTCTGGTGGCTCACAATATCAAGATGGATGAAGTCTTTAGTCTTGAGTACTTTAATAAAG
TCCTTAACCAGGAGGGCATCCAGCGCTATAACCTGGCTCTCGGTGGCTACGTCACAAAAAGCGGAGAAA
AGATGATGGGTCTCAACGATGCACTGAATTTGGCTCATCAGTCGGAGAAGTCATCTAAGGGACGCATAC
ACATGACACCACTGTTTAAACAAATCCTGAGCGAAAAGGAATCATTTTCCTACATTCCCGACGTATTCAC
CGAGGACTCACAACTGCTGCCTAGTATAGGGGGGTTTTTCGCTCAGATAGAGAACGACAAAGATGGCAA
CATTTTTGACAGAGCCTTGGAGTTGATTTCATCTTACGCCGAGTACGATACGGAGCGCATTTATATTCGC
CAGGCGGATATCAACAGGGTTTCCAATGTGATCTTTGGCGAGTGGGGAACGCTGGGCGGGCTGATGCGG
GAATACAAAGCCGACTCGATCAATGACATCAACCTGGAGAGAACATGCAAGAAGGTCGATAAATGGTT
GGATAGCAAAGAGTTCGCCCTGAGTGACGTCTTGGAAGCTATCAAAAGAACCGGAAATAATGACGCGTT
CAACGAGTATATCTCTAAAATGAGGACCGCGAGAGAAAAAATTGATGCAGCAAGGAAGGAGATGAAGT
TTATATCTGAGAAGATCTCAGGCGATGAAGAGTCCATCCATATTATTAAAACTCTTCTGGACTCAGTGCA
GCAATTCCTGCACTTTTTTAACCTCTTCAAGGCCAGGCAGGATATACCGTTAGACGGGGCTTTTTATGCC
GAGTTTGATGAAGTTCATTCGAAACTTTTTGCTATAGTGCCTCTCTATAATAAAGTTCGCAATTACCTGA
CAAAGAATAACTTAAACACAAAGAAAATCAAGCTCAACTTCAAAAACCCAACACTGGCAAACGGATGG
GATCAGAACAAGGTATATGATTACGCCTCATTGATTTTCCTCCGGGACGGGAATTACTATCTGGGGATCA
TCAACCCTAAGCGCAAAAAGAACATTAAGTTCGAACAGGGATCTGGCAATGGTCCCTTCTATAGGAAAA
TGGTATACAAACAGATTCCTGGCCCCAACAAGAATCTCCCACGCGTCTTTCTGACGTCCACTAAGGGAA
AGAAGGAGTACAAGCCGTCTAAAGAAATTATCGAGGGCTATGAGGCAGACAAGCATATTAGGGGTGAC
AAGTTTGACCTAGACTTTTGTCATAAGCTTATCGACTTTTTCAAGGAGTCCATAGAGAAGCACAAAGATT
GGTCAAAGTTTAATTTCTATTTTTCTCCAACAGAGTCCTACGGGGATATCTCTGAGTTCTATCTGGATGTT
GAAAAGCAGGGGTACAGAATGCACTTCGAAAATATCTCAGCAGAAACTATCGATGAGTACGTAGAGAA
AGGAGATCTGTTTCTTTTCCAAATCTACAATAAGGATTTTGTGAAGGCCGCCACTGGGAAGAAGGACAT
GCACACTATTTACTGGAACGCTGCATTTTCCCCTGAAAATCTGCAGGACGTAGTAGTGAAATTAAATGGT
GAGGCAGAACTGTTTTACCGCGATAAATCAGACATCAAGGAAATAGTGCACCGGGAAGGCGAGATTCTT
GTTAACCGAACATATAATGGCAGGACACCTGTCCCTGATAAAATTCATAAGAAACTGACCGATTACCAC
AACGGTCGAACCAAGGATCTGGGCGAGGCCAAGGAATACCTCGATAAGGTGAGGTACTTCAAAGCCCA
TTATGACATCACCAAGGACCGAAGATACCTTAACGACAAAATCTACTTCCATGTCCCACTCACCTTGAAC
TTCAAAGCTAACGGTAAGAAGAACCTCAATAAAATGGTGATTGAAAAATTTCTGTCCGATGAGAAGGCC
CATATCATCGGCATTGATCGCGGCGAGAGAAATCTCCTTTACTATTCTATCATTGATCGGTCGGGAAAGA
TTATCGACCAACAATCACTGAATGTCATCGACGGATTCGACTATAGAGAGAAGCTGAACCAACGGGAAA
TCGAGATGAAGGACGCGCGCCAGTCCTGGAACGCTATCGGCAAAATTAAAGATTTGAAAGAAGGTTACC
TCTCCAAAGCAGTGCACGAAATTACCAAAATGGCAATCCAGTACAATGCTATTGTGGTAATGGAGGAGT
TAAATTACGGATTTAAGCGCGGGAGGTTCAAGGTTGAAAAGCAAATTTACCAAAAATTTGAGAACATGT
TGATTGATAAGATGAACTACCTGGTGTTCAAGGACGCACCTGACGAGTCGCCAGGCGGCGTGTTAAATG
CATATCAGCTGACAAATCCACTGGAGAGCTTTGCCAAGCTAGGAAAGCAGACTGGCATTCTCTTTTACGT
CCCTGCAGCGTATACATCCAAAATTGACCCCACCACTGGCTTCGTCAATCTGTTTAACACCTCCTCCAAA
ACCAACGCACAAGAACGGAAAGAATTTTTGCAAAAGTTTGAGTCCATTAGCTACTCTGCCAAAGACGGC
GGGATCTTTGCTTTCGCATTCGACTACAGGAAATTCGGGACGAGTAAGACAGACCACAAGAACGTCTGG
ACCGCGTACACTAATGGGGAACGCATGCGCTACATCAAAGAGAAAAAGAGGAATGAACTTTTTGACCCT
TCAAAGGAAATCAAGGAAGCTCTCACCTCAAGCGGTATCAAATACGATGGCGGGCAGAATATTTTGCCA
GATATCCTCAGATCGAACAATAATGGACTTATCTATACTATGTACTCCTCCTTCATTGCAGCAATTCAAA
TGAGAGTGTACGATGGAAAGGAGGATTACATTATATCGCCAATTAAGAACTCCAAAGGCGAATTCTTCC
GCACGGATCCTAAGCGAAGAGAACTCCCAATCGACGCTGATGCGAACGGCGCCTATAATATAGCCCTGC
GGGGTGAATTAACAATGCGCGCTATTGCCGAGAAGTTCGACCCCGATTCAGAAAAAATGGCTAAGCTTG
AGCTGAAACACAAAGATTGGTTCGAATTCATGCAGACAAGAGGCGACTAA
SEQ ATGACTAAGACCTTCGATTCCGAGTTCTTCAACCTTTATTCCCTGCAGAAAACTGTAAGGTTTGAGCTGA
ID AGCCGGTGGGCGAGACAGCCAGCTTCGTAGAGGATTTCAAGAATGAGGGTCTCAAACGGGTAGTTAGTG
NO: AGGATGAGAGGAGAGCAGTGGACTATCAGAAGGTGAAAGAGATCATCGATGACTATCACCGGGATTTC
150 ATAGAGGAGTCGTTGAATTACTTCCCTGAGCAAGTATCCAAAGACGCGCTGGAACAGGCCTTTCATCTTT
ACCAGAAACTGAAGGCAGCGAAGGTTGAGGAGCGGGAAAAGGCCTTGAAAGAGTGGGAAGCCCTGCA
GAAAAAGCTCAGAGAAAAGGTTGTCAAATGCTTCAGCGACAGCAACAAAGCCAGGTTCAGTAGGATCG
ATAAGAAAGAACTGATCAAAGAAGACTTGATCAATTGGCTGGTTGCACAGAACCGGGAAGATGATATTC
CCACCGTAGAGACCTTCAACAACTTCACAACTTACTTCACCGGCTTCCATGAGAATCGTAAAAACATCTA
CAGTAAAGATGATCATGCAACCGCCATCTCCTTCCGGTTGATCCACGAGAATCTCCCCAAGTTCTTTGAC
AACGTGATAAGTTTCAATAAGTTGAAAGAGGGATTTCCCGAACTCAAGTTCGATAAAGTGAAGGAGGAT
CTGGAAGTGGATTATGACCTTAAGCACGCTTTCGAGATAGAGTACTTCGTGAACTTTGTGACTCAGGCCG
GCATCGATCAGTATAACTACCTCCTCGGGGGTAAGACGCTCGAGGACGGTACTAAGAAGCAAGGAATG
AATGAGCAAATTAATCTATTTAAACAGCAGCAGACCAGGGATAAGGCTAGACAGATCCCCAAGCTTATT
CCTCTTTTTAAACAGATCCTAAGTGAAAGGACAGAAAGTCAAAGCTTCATACCTAAGCAATTTGAAAGT
GATCAGGAGCTGTTTGACTCCCTGCAAAAGCTGCACAACAATTGCCAGGACAAGTTTACCGTGCTGCAG
CAGGCTATCCTCGGACTGGCTGAGGCGGATCTTAAGAAGGTATTCATTAAGACTAGCGACCTCAATGCC
CTTAGTAACACCATCTTTGGAAATTACTCCGTTTTCAGCGATGCCCTCAATCTATACAAAGAGAGCTTGA
AGACTAAAAAAGCTCAGGAAGCTTTTGAAAAATTACCGGCACATTCTATACACGACCTTATACAATACT
TAGAGCAGTTCAACAGCAGCCTCGACGCTGAGAAACAGCAATCCACAGACACCGTCCTGAATTACTTCA
TCAAAACCGATGAACTGTACTCCCGATTTATCAAGAGCACTTCAGAAGCCTTCACGCAAGTTCAGCCTCT
GTTCGAGCTGGAGGCACTGTCCAGCAAGAGACGACCGCCAGAGTCTGAAGACGAGGGAGCCAAGGGTC
AAGAGGGGTTTGAACAGATAAAGCGAATTAAGGCTTACTTGGATACTCTCATGGAGGCGGTGCATTTCG
CTAAGCCTTTGTACCTGGTTAAAGGCCGAAAAATGATTGAGGGGCTAGATAAGGATCAGTCTTTTTACG
AGGCTTTTGAAATGGCCTACCAGGAATTGGAATCCTTGATCATTCCAATCTATAATAAAGCCCGGAGTTA
TCTGAGCAGGAAGCCCTTCAAAGCCGACAAGTTCAAAATAAATTTTGACAATAATACGCTACTGTCTGG
TTGGGACGCTAACAAGGAAACAGCCAATGCTTCCATCCTGTTTAAGAAAGACGGCCTGTACTACCTGGG
AATTATGCCAAAAGGCAAAACTTTTTTGTTCGATTACTTTGTGTCATCAGAGGATAGCGAGAAGTTAAAG
CAAAGACGGCAGAAGACCGCCGAAGAAGCCCTCGCACAAGACGGAGAATCATATTTCGAGAAAATTCG
ATATAAGCTCCTGCCTGGCGCATCAAAGATGTTGCCAAAAGTCTTCTTTTCCAACAAAAACATCGGCTTT
TATAACCCCAGCGATGATATCCTTCGCATCCGGAACACCGCCTCACATACCAAAAATGGAACTCCACAG
AAGGGCCACTCGAAGGTTGAATTCAACCTTAACGATTGTCACAAAATGATTGATTTTTTTAAGAGCTCCA
TTCAGAAACACCCCGAATGGGGGTCCTTTGGCTTCACCTTTTCTGATACTTCAGACTTCGAGGACATGTC
CGCCTTCTACAGGGAGGTGGAGAACCAGGGCTATGTCATCTCCTTCGACAAAATAAAAGAGACATACAT
TCAGAGCCAGGTCGAGCAGGGAAATCTGTACCTGTTTCAGATCTATAACAAGGATTTCAGTCCCTATAG
CAAGGGCAAGCCCAATTTACATACCCTGTACTGGAAGGCCCTGTTCGAAGAGGCAAACCTTAACAATGT
AGTTGCTAAGCTGAATGGGGAAGCAGAGATCTTCTTCCGAAGGCACAGCATCAAGGCAAGCGACAAAG
TTGTACATCCTGCTAACCAGGCCATCGATAACAAGAACCCGCATACAGAAAAGACACAGTCAACCTTTG
AATACGACCTCGTGAAGGACAAGAGGTACACACAAGATAAATTCTTCTTCCACGTGCCCATCAGCTTGA
ATTTTAAAGCGCAGGGAGTGAGCAAATTTAACGACAAGGTCAACGGCTTCCTGAAGGGAAACCCCGAC
GTGAATATCATCGGAATTGATCGCGGTGAAAGACATCTCCTCTACTTTACTGTGGTGAACCAGAAGGGT
GAGATCCTAGTACAGGAGAGCCTGAACACCCTTATGAGTGATAAGGGCCATGTGAATGATTACCAGCAG
AAGCTGGACAAGAAGGAACAGGAAAGGGACGCAGCGCGGAAGTCCTGGACCACTGTTGAGAATATCAA
AGAACTGAAGGAGGGATATCTTAGCCATGTGGTACACAAACTTGCACATCTGATTATCAAGTATAATGC
CATAGTCTGCCTGGAAGACTTGAACTTCGGTTTCAAGCGAGGAAGGTTTAAAGTGGAGAAGCAGGTGTA
CCAGAAGTTTGAGAAAGCCCTTATTGATAAGCTAAACTACCTTGTCTTTAAGGAAAAAGAACTCGGCGA
AGTTGGCCACTATTTAACCGCCTACCAACTAACCGCCCCTTTCGAGTCTTTTAAGAAACTGGGAAAGCAG
AGCGGAATACTCTTCTATGTGCCTGCAGACTACACCTCTAAGATCGACCCCACTACCGGCTTTGTAAACT
TTCTAGATCTCCGCTATCAGTCAGTAGAAAAAGCCAAACAGCTCTTGTCAGATTTTAACGCCATCCGATT
TAATTCCGTCCAAAATTACTTCGAGTTCGAAATCGACTATAAAAAACTTACCCCCAAGAGAAAGGTTGG
GACGCAGTCTAAGTGGGTAATCTGCACTTACGGTGACGTGAGATACCAGAACCGCCGAAACCAGAAAG
GTCATTGGGAAACCGAGGAAGTGAATGTGACTGAGAAGCTCAAGGCCCTCTTCGCTAGCGACAGTAAAA
CAACAACAGTTATCGATTACGCCAATGACGATAATCTTATAGACGTGATCTTGGAACAAGACAAAGCCT
CTTTTTTTAAGGAATTGTTGTGGTTGCTGAAACTTACAATGACCCTTAGGCACAGCAAGATCAAATCAGA
GGATGACTTCATCCTCAGCCCGGTGAAGAATGAACAGGGAGAGTTCTACGATTCACGGAAGGCTGGAGA
GGTGTGGCCCAAGGATGCCGACGCGAACGGGGCCTACCACATAGCTCTAAAAGGTCTGTGGAACCTGCA
ACAAATCAATCAATGGGAGAAAGGTAAGACACTGAACCTGGCCATCAAAAATCAAGATTGGTTCTCATT
CATCCAGGAAAAGCCTTATCAAGAGTGA
SEQ ATGCATACGGGAGGCCTTTTATCAATGGACGCAAAAGAGTTCACCGGGCAGTATCCATTATCTAAGACA
ID CTCCGCTTCGAGCTGAGGCCCATTGGCAGGACCTGGGACAACCTGGAGGCGTCGGGCTACCTGGCTGAG
NO: GACAGACATCGCGCAGAATGCTATCCGAGAGCTAAGGAGCTTTTGGACGACAATCATCGCGCGTTCCTT
151 AACCGGGTGCTCCCACAGATCGATATGGACTGGCACCCGATCGCTGAGGCTTTTTGCAAGGTCCATAAG
AACCCTGGGAACAAAGAGCTCGCCCAGGACTACAACTTGCAGCTGAGCAAGCGACGGAAAGAGATTTC
TGCCTACCTTCAAGACGCCGATGGCTACAAAGGGCTCTTCGCAAAGCCCGCATTGGATGAGGCCATGAA
AATCGCCAAGGAGAACGGGAATGAAAGTGACATCGAAGTTCTCGAAGCGTTTAACGGATTTAGCGTGTA
CTTTACCGGCTATCATGAGTCAAGGGAGAATATTTATAGCGATGAGGACATGGTCTCTGTGGCCTACCG
GATTACCGAGGATAATTTCCCGAGGTTTGTTTCAAATGCACTAATATTCGACAAGTTAAATGAGAGCCAC
CCAGACATCATCTCGGAGGTCAGCGGCAACCTCGGAGTTGACGATATTGGCAAATACTTCGACGTGAGC
AACTATAACAACTTCCTCTCACAGGCTGGCATCGACGACTATAATCATATTATAGGCGGCCACACTACTG
AGGATGGTCTCATTCAGGCATTCAATGTAGTCTTGAATCTTAGGCACCAGAAGGACCCTGGGTTTGAAA
AGATACAGTTCAAGCAGCTGTATAAGCAGATATTATCCGTGCGAACATCTAAAAGTTACATCCCCAAAC
AGTTTGATAACTCAAAGGAGATGGTGGATTGCATATGCGATTATGTGTCAAAAATTGAAAAGAGCGAGA
CTGTGGAGCGGGCTCTGAAGCTCGTCAGGAACATTAGCTCCTTTGACCTTAGAGGAATTTTCGTCAATAA
AAAGAATCTGAGGATCCTGAGCAATAAGCTAATAGGAGATTGGGACGCCATAGAGACAGCATTGATGC
ATTCCAGCTCAAGCGAGAATGATAAGAAGTCTGTCTACGATAGCGCTGAAGCCTTCACGCTGGACGATA
TCTTCTCTTCCGTGAAAAAATTTAGTGATGCGTCCGCAGAAGATATCGGGAATCGAGCCGAAGATATCT
GCAGGGTAATTTCAGAGACCGCCCCTTTCATCAATGACCTGCGCGCCGTGGACCTGGATAGCCTGAATG
ACGATGGTTACGAAGCTGCAGTTTCTAAGATCAGGGAGTCTCTGGAGCCATATATGGACTTGTTTCACGA
ACTTGAGATCTTTAGCGTGGGCGACGAGTTCCCGAAATGCGCAGCTTTCTATAGCGAGTTAGAGGAGGT
CAGCGAGCAATTAATCGAGATCATACCCCTGTTTAATAAGGCACGGAGCTTTTGTACTCGCAAGCGCTA
CAGCACCGACAAGATTAAAGTTAATCTGAAATTTCCAACTCTCGCAGACGGGTGGGACCTAAACAAGGA
ACGCGATAATAAAGCCGCCATCCTTAGAAAGGACGGAAAGTACTATCTTGCCATCCTAGATATGAAAAA
AGATCTGAGTTCCATTCGTACTAGCGATGAAGACGAATCTTCTTTCGAAAAAATGGAGTATAAGCTGCTC
CCCTCGCCAGTCAAGATGCTACCCAAGATCTTTGTGAAGAGCAAAGCAGCCAAGGAAAAGTACGGGCTG
ACGGACAGGATGCTGGAGTGCTACGATAAGGGAATGCATAAATCAGGGTCAGCTTTTGACTTGGGCTTT
TGCCATGAGCTAATCGATTACTACAAGCGCTGTATCGCCGAGTATCCAGGATGGGACGTTTTCGACTTTA
AATTTCGGGAGACTTCTGATTATGGTTCAATGAAGGAGTTCAACGAAGATGTCGCTGGTGCCGGTTACTA
CATGAGCCTTCGCAAGATTCCTTGTTCCGAAGTCTACCGGCTACTGGACGAGAAATCTATATATTTGTTC
CAGATATATAACAAGGACTACAGTGAGAATGCACATGGGAATAAGAATATGCATACTATGTATTGGGAA
GGTCTCTTTTCACCCCAAAATTTGGAGTCACCCGTGTTCAAACTTAGCGGTGGCGCAGAGCTGTTCTTTA
GGAAATCCAGTATACCCAATGACGCCAAGACAGTCCACCCAAAGGGTAGCGTCCTGGTGCCCAGAAAC
GATGTGAACGGCAGGAGAATCCCTGACAGCATTTACCGAGAACTTACCAGGTACTTCAACCGCGGCGAC
TGTAGAATCTCTGATGAGGCAAAGTCTTATCTGGATAAGGTGAAGACTAAGAAGGCAGATCATGACATT
GTGAAAGACCGCCGCTTTACTGTCGACAAAATGATGTTTCACGTGCCTATCGCAATGAATTTTAAGGCAA
TCTCAAAACCGAATCTGAACAAGAAGGTGATAGATGGCATTATCGATGACCAGGACCTCAAGATCATCG
GAATCGACAGAGGTGAGCGAAACCTGATATACGTCACAATGGTAGATCGGAAGGGTAATATTCTGTACC
AGGATTCACTAAACATCCTCAATGGATATGACTATCGAAAAGCTCTCGATGTCAGGGAATACGACAACA
AGGAGGCGCGACGGAATTGGACAAAGGTGGAAGGCATACGGAAGATGAAGGAAGGCTATCTGTCACTA
GCTGTCTCCAAATTGGCTGATATGATTATAGAGAACAACGCCATTATCGTGATGGAAGATCTCAACCAT
GGATTCAAGGCAGGAAGAAGTAAAATTGAGAAGCAGGTGTATCAGAAGTTCGAAAGCATGCTTATTAA
TAAGTTGGGTTATATGGTCTTAAAGGACAAGTCTATCGATCAGAGCGGCGGCGCACTCCATGGGTATCA
GCTGGCTAACCATGTCACCACACTAGCATCCGTAGGCAAACAGTGTGGCGTGATTTTCTACATTCCTGCT
GCGTTCACTTCTAAGATCGATCCTACCACGGGATTCGCAGACCTGTTCGCACTGAGCAATGTTAAAAACG
TGGCCTCCATGAGGGAGTTCTTTAGCAAAATGAAAAGCGTGATTTATGACAAGGCCGAGGGCAAGTTCG
CTTTCACATTTGACTACCTGGACTACAATGTGAAATCAGAGTGCGGGAGAACCCTGTGGACCGTATACA
CGGTAGGGGAAAGATTCACTTACAGTCGAGTTAATCGGGAGTATGTCCGTAAAGTGCCAACTGACATCA
TCTACGATGCCCTTCAGAAGGCTGGCATAAGTGTTGAGGGGGATCTAAGGGACAGGATCGCTGAATCGG
ATGGCGATACTCTCAAATCAATCTTCTACGCCTTCAAGTATGCCCTCGACATGAGGGTAGAGAACCGGG
AGGAGGACTATATACAGTCTCCCGTGAAGAATGCGTCGGGAGAGTTCTTCTGCTCAAAAAACGCCGGGA
AATCTTTGCCGCAGGATTCTGATGCAAATGGGGCTTATAACATTGCTCTCAAAGGCATCCTGCAGCTGCG
CATGCTATCTGAACAATATGACCCAAACGCTGAAAGCATTAGATTGCCATTGATCACCAATAAGGCTTG
GCTGACTTTCATGCAGAGCGGTATGAAGACATGGAAAAACTAA
SEQ ATGGATTCCCTTAAGGACTTCACAAATCTTTACCCCGTGAGTAAAACCCTGAGATTTGAACTCAAGCCCG
ID TGGGAAAGACTCTCGAGAATATCGAGAAGGCCGGGATTTTGAAGGAAGACGAGCATCGGGCGGAAAGT
NO: TACAGACGGGTGAAGAAGATTATAGATACTTATCACAAGGTCTTTATAGACAGCTCTTTAGAGAACATG
152 GCAAAGATGGGCATCGAGAACGAAATCAAGGCCATGCTGCAGTCCTTCTGCGAGCTGTATAAAAAGGAT
CATCGGACCGAAGGCGAAGACAAGGCGCTGGATAAGATCAGGGCAGTGCTGCGCGGCCTCATTGTGGG
TGCCTTCACTGGGGTGTGCGGGCGGAGAGAGAACACTGTGCAGAATGAGAAATACGAGAGTTTGTTCAA
AGAGAAACTCATCAAGGAAATCCTGCCCGACTTCGTCTTAAGCACAGAAGCCGAATCTCTCCCATTTTCT
GTCGAGGAGGCCACGCGTTCCCTTAAAGAGTTCGACAGTTTCACTTCATACTTTGCCGGATTTTATGAAA
ACCGTAAAAATATATACTCCACTAAACCACAGTCAACTGCAATAGCTTACAGGTTAATCCACGAAAACC
TGCCAAAATTCATCGACAATATACTCGTCTTTCAAAAAATCAAGGAACCAATCGCGAAGGAACTTGAAC
ACATCCGGGCTGACTTTAGTGCGGGAGGATACATCAAAAAAGACGAGCGCCTGGAGGATATATTTTCAC
TAAATTATTATATTCATGTACTGAGCCAGGCTGGCATAGAAAAGTACAACGCTCTAATTGGGAAAATCG
TGACAGAAGGTGACGGGGAAATGAAAGGGCTAAACGAACATATTAACTTATATAACCAACAGCGGGGT
CGAGAAGATCGTCTGCCCCTGTTCAGACCTCTGTATAAGCAAATACTCTCCGACAGAGAGCAGCTATCA
TATCTGCCCGAGTCCTTTGAGAAAGATGAAGAGCTGCTCCGGGCGCTCAAGGAGTTCTATGATCATATA
GCCGAGGACATTTTGGGCAGAACTCAGCAACTCATGACGTCTATTTCTGAATATGATCTGTCTCGTATCT
ATGTCAGGAATGATAGCCAGCTGACCGATATATCCAAGAAGATGCTGGGGGACTGGAACGCCATTTATA
TGGCGAGGGAGCGAGCATACGATCACGAGCAGGCACCCAAGAGAATCACAGCCAAATATGAGAGAGAC
CGCATTAAGGCGCTGAAGGGCGAAGAAAGTATCAGTCTGGCCAATCTGAACTCCTGCATAGCTTTCCTT
GATAACGTGAGGGATTGCAGAGTTGATACTTACCTGAGTACCCTGGGCCAGAAGGAAGGGCCTCACGGC
CTCTCTAATCTAGTGGAGAATGTATTTGCCTCCTACCACGAAGCTGAGCAGCTGCTGTCATTTCCGTACC
CAGAGGAAAATAATTTAATACAGGATAAGGACAACGTAGTGCTTATCAAAAATCTACTGGATAACATTT
CCGACCTCCAGCGCTTTCTCAAACCACTTTGGGGGATGGGCGACGAGCCTGATAAGGATGAGCGCTTTT
ACGGCGAGTACAACTACATCAGGGGCGCCTTGGACCAGGTGATTCCCCTCTATAATAAAGTCAGGAATT
ACCTGACCCGAAAGCCATACAGTACAAGAAAGGTGAAATTAAATTTCGGCAATAGTCAGCTGCTGTCTG
GTTGGGACCGAAATAAGGAGAAAGACAACAGCTGCGTAATTCTCAGAAAAGGACAGAACTTTTATTTGG
CCATCATGAATAACAGACACAAGAGATCTTTCGAGAACAAAGTGCTCCCTGAGTATAAGGAGGGGGAA
CCCTACTTCGAGAAGATGGACTATAAATTCCTTCCTGATCCAAATAAAATGCTGCCTAAAGTATTTCTGT
CAAAAAAAGGTATAGAAATCTACAAACCTTCACCTAAGCTACTTGAACAGTATGGCCACGGCACCCATA
AAAAAGGGGACACGTTCAGCATGGACGACCTACACGAACTGATTGACTTCTTTAAGCACAGCATAGAAG
CTCATGAGGACTGGAAACAGTTCGGATTCAAATTCTCAGATACCGCGACCTACGAAAACGTGTCTAGTT
TTTACCGGGAAGTCGAGGACCAGGGCTACAAGCTCAGCTTCAGAAAAGTTAGCGAATCTTACGTCTACT
CCCTTATAGATCAAGGTAAGCTGTATCTCTTTCAAATCTACAACAAGGACTTTTCCCCATGTAGCAAGGG
CACCCCCAATCTGCACACTCTCTACTGGCGGATGCTGTTCGACGAGCGTAACCTGGCAGACGTGATCTAC
AAATTAGATGGTAAAGCTGAGATCTTCTTTCGTGAAAAGAGCCTAAAGAACGATCACCCCACTCACCCC
GCCGGAAAGCCCATTAAGAAGAAAAGTAGGCAGAAGAAAGGAGAAGAATCGCTATTTGAGTACGACCT
CGTCAAGGATCGGCATTATACAATGGATAAGTTCCAGTTCCATGTGCCAATAACTATGAATTTCAAGTGC
AGTGCTGGCAGTAAGGTGAATGACATGGTAAACGCTCATATCCGGGAGGCAAAGGACATGCATGTTATT
GGAATTGATAGGGGTGAGCGTAATCTCCTCTACATCTGTGTTATTGACTCCCGCGGCACAATCCTCGATC
AGATTTCCTTGAATACAATTAATGATATAGACTACCATGACTTGCTTGAGTCTCGCGACAAAGATAGACA
GCAGGAGAGAAGAAATTGGCAGACCATCGAAGGCATCAAGGAACTCAAGCAAGGCTACCTTTCTCAGG
CAGTGCATCGAATAGCCGAGCTGATGGTGGCTTATAAAGCCGTCGTGGCACTAGAAGACCTAAATATGG
GATTTAAACGAGGCAGGCAGAAGGTGGAATCATCCGTATACCAGCAGTTCGAAAAACAGTTGATAGAC
AAACTCAATTACCTTGTAGACAAGAAGAAGCGGCCTGAGGACATAGGGGGCCTGCTTAGAGCGTATCAA
TTTACAGCCCCATTCAAGTCTTTCAAAGAAATGGGTAAACAGAACGGTTTTCTGTTTTACATCCCAGCGT
GGAACACCAGCAATATAGATCCAACCACTGGCTTCGTCAATCTGTTTCATGCTCAGTATGAAAATGTGG
ACAAGGCCAAATCCTTCTTTCAGAAATTTGACAGCATCTCCTATAACCCAAAGAAAGACTGGTTTGAATT
CGCCTTTGACTATAAGAATTTCACTAAGAAGGCCGAGGGATCAAGAAGCATGTGGATATTGTGCACGCA
TGGCTCACGTATAAAGAACTTTAGAAACTCGCAAAAAAACGGGCAGTGGGACTCAGAAGAATTCGCACT
CACCGAGGCTTTCAAATCCCTCTTCGTCCGGTATGAGATCGATTACACCGCCGATCTGAAGACGGCAATC
GTCGACGAGAAACAGAAAGACTTCTTTGTAGATCTACTTAAGCTCTTTAAGCTAACCGTTCAGATGCGA
AACAGTTGGAAAGAAAAGGATCTCGACTATCTCATTAGTCCAGTGGCTGGCGCGGATGGTAGATTTTTC
GATACCCGGGAAGGTAACAAGTCCCTTCCCAAAGACGCCGACGCGAATGGTGCCTACAATATTGCACTA
AAGGGGCTCTGGGCGCTGCGGCAAATTAGACAGACATCTGAAGGGGGCAAGCTTAAGCTGGCTATTTCT
AATAAAGAGTGGTTGCAGTTTGTGCAGGAAAGGAGTTATGAGAAGGACTAG
SEQ ATGAACAACGGCACCAACAACTTCCAGAACTTCATCGGCATATCGTCTCTGCAGAAAACACTTAGGAAT
ID GCCCTGATTCCAACTGAGACAACACAGCAGTTTATTGTGAAGAATGGGATCATCAAAGAGGACGAATTG
NO: CGCGGGGAGAATAGGCAGATCCTGAAGGACATCATGGACGATTACTACAGGGGTTTTATCTCCGAAACG
153 CTGAGCTCGATTGACGATATTGACTGGACGTCCCTCTTTGAGAAGATGGAAATCCAACTTAAAAATGGC
GATAATAAAGATACCCTGATAAAGGAACAAACCGAATATAGAAAGGCTATACACAAAAAATTCGCAAA
TGACGACCGCTTTAAGAACATGTTTTCTGCAAAACTGATTAGCGATATTCTGCCCGAGTTTGTGATTCAC
AATAATAACTATTCCGCTTCGGAGAAGGAGGAAAAGACTCAGGTGATTAAACTGTTTTCTCGGTTCGCC
ACTTCTTTCAAAGATTATTTCAAAAATCGCGCCAACTGTTTTTCCGCTGACGACATCTCCTCCTCTTCCTG
CCACCGGATCGTAAACGACAATGCCGAGATCTTTTTTAGTAACGCCCTTGTGTATCGGAGGATAGTGAA
GAGCCTGTCCAATGATGACATAAACAAAATTTCTGGCGATATGAAGGATAGCCTCAAAGAGATGAGCCT
TGAAGAAATTTACTCCTACGAGAAGTATGGGGAGTTCATCACCCAGGAGGGGATTTCCTTCTATAATGA
CATCTGTGGCAAGGTGAACAGCTTCATGAACCTGTACTGCCAGAAGAATAAGGAAAACAAAAATCTGTA
CAAGCTTCAGAAGTTACATAAGCAGATCCTGTGTATCGCGGATACCTCATATGAGGTTCCTTATAAGTTC
GAGAGTGATGAAGAAGTGTACCAGTCTGTAAATGGATTCTTAGACAATATTTCGTCCAAACATATAGTG
GAGAGACTGAGAAAGATCGGGGACAATTACAATGGGTACAATCTCGACAAGATTTATATCGTGTCGAAG
TTTTACGAATCTGTGAGCCAGAAAACATACAGGGATTGGGAAACCATTAATACCGCGCTTGAAATTCAC
TACAATAATATTCTGCCTGGCAACGGAAAAAGCAAGGCCGATAAGGTAAAAAAGGCAGTCAAAAATGA
CCTTCAGAAAAGTATCACCGAAATCAATGAGTTGGTGAGCAACTACAAATTGTGTTCAGACGATAATAT
TAAAGCGGAAACGTACATACATGAAATTAGCCATATTCTGAATAACTTTGAGGCGCAGGAACTTAAGTA
CAACCCTGAAATTCATCTCGTCGAAAGCGAATTGAAGGCCTCTGAATTGAAAAACGTTCTTGACGTGAT
AATGAACGCTTTCCATTGGTGCTCTGTGTTTATGACTGAAGAGCTGGTTGATAAGGACAACAACTTTTAT
GCTGAACTTGAGGAAATCTACGACGAGATCTACCCTGTGATTAGCTTGTATAACCTCGTCAGAAACTAC
GTTACCCAGAAGCCGTACAGCACGAAAAAAATAAAGCTGAACTTTGGTATTCCGACTCTCGCCGATGGA
TGGAGCAAGTCGAAGGAATATTCCAACAATGCCATCATTCTTATGCGAGACAATCTGTATTACCTCGGC
ATCTTTAACGCCAAAAACAAGCCGGATAAGAAAATCATTGAAGGGAATACGAGCGAGAATAAGGGCGA
CTATAAGAAAATGATCTACAACTTACTGCCAGGTCCCAATAAAATGATTCCTAAGGTGTTTCTGTCATCG
AAAACAGGTGTAGAAACATATAAGCCCAGCGCATACATCCTGGAAGGCTACAAGCAAAACAAACACAT
CAAAAGCAGCAAGGACTTTGATATCACATTCTGCCACGATCTAATCGACTACTTCAAAAATTGCATCGCC
ATTCACCCTGAGTGGAAGAACTTCGGCTTTGACTTCTCCGACACCAGTACCTACGAAGACATTTCTGGAT
TCTACCGTGAGGTTGAGCTGCAGGGTTATAAAATTGACTGGACATACATCAGTGAAAAAGACATCGATC
TACTGCAGGAGAAGGGGCAGCTCTATCTCTTCCAGATTTATAATAAGGATTTCAGCAAGAAGTCCACTG
GAAACGACAATCTGCATACAATGTATCTTAAGAACTTGTTTAGCGAAGAGAATTTGAAAGATATCGTTC
TAAAGTTAAACGGGGAAGCCGAGATTTTCTTTCGAAAGTCTTCCATTAAGAATCCAATTATTCACAAGA
AGGGCAGTATCCTGGTCAACAGAACCTATGAGGCCGAGGAAAAGGACCAGTTCGGTAATATACAAATT
GTGCGCAAGAACATCCCCGAGAACATTTACCAGGAGCTCTATAAATACTTCAACGACAAAAGCGATAAG
GAGCTTTCCGACGAGGCTGCCAAGCTGAAAAACGTGGTGGGACACCATGAAGCAGCCACCAACATCGTC
AAAGATTATCGTTATACATATGACAAATATTTTCTGCACATGCCTATTACAATAAACTTTAAGGCAAACA
AGACCGGGTTCATCAATGACCGGATACTCCAGTACATCGCAAAAGAGAAGGACCTGCATGTGATCGGCA
TCGACCGCGGTGAAAGAAATCTCATTTACGTCAGCGTTATCGACACTTGTGGAAACATTGTGGAGCAGA
AGTCCTTCAACATTGTTAACGGCTATGACTATCAGATCAAGCTCAAACAGCAGGAAGGTGCTCGTCAGA
TTGCGAGGAAAGAATGGAAAGAGATCGGCAAGATCAAGGAGATCAAAGAAGGGTATCTGAGCTTGGTC
ATTCACGAGATCTCCAAAATGGTCATCAAGTACAACGCTATTATCGCGATGGAAGACCTCTCTTACGGCT
TTAAGAAGGGGCGCTTTAAAGTGGAGCGCCAGGTCTATCAGAAGTTCGAGACTATGCTTATCAATAAGC
TGAATTACTTGGTCTTTAAGGATATCAGTATCACCGAGAACGGAGGACTGCTGAAAGGTTACCAGCTCA
CATATATTCCCGATAAGCTCAAGAATGTGGGCCACCAATGCGGTTGTATTTTTTACGTTCCAGCTGCCTA
CACATCTAAGATCGATCCTACCACCGGATTCGTCAATATATTTAAATTTAAAGATCTAACCGTTGATGCC
AAGCGTGAGTTTATTAAGAAATTTGATTCAATCAGGTACGACAGCGAAAAGAACCTCTTCTGTTTCACTT
TCGACTACAACAACTTCATCACACAAAATACTGTGATGAGCAAGTCATCATGGAGCGTTTATACTTATGG
TGTAAGGATAAAAAGGCGCTTTGTTAATGGAAGGTTTTCCAATGAAAGCGATACAATAGACATCACAAA
AGACATGGAGAAGACACTGGAGATGACAGATATTAATTGGAGGGACGGGCATGACCTTAGACAGGACA
TCATCGACTACGAAATCGTCCAACACATTTTTGAGATATTCAGACTCACTGTCCAGATGCGAAACAGCCT
GTCGGAACTCGAAGACCGGGACTACGATAGACTGATCTCCCCGGTGTTAAACGAAAATAATATTTTCTA
CGATTCTGCTAAGGCAGGAGACGCTCTTCCTAAAGATGCGGACGCCAATGGCGCTTACTGTATAGCGTT
GAAGGGATTGTATGAGATTAAACAGATCACTGAGAATTGGAAAGAAGACGGTAAATTCTCCAGAGACA
AGCTGAAAATCTCCAACAAAGACTGGTTTGATTTTATTCAAAATAAGCGCTACCTGTAA
SEQ ATGACAAACAAATTTACTAATCAGTACAGCCTGTCAAAGACCCTCCGCTTCGAACTGATTCCACAAGGG
ID AAGACCCTTGAATTCATCCAGGAAAAGGGTTTATTATCCCAGGATAAACAACGCGCAGAAAGCTATCAA
NO: GAGATGAAGAAGACGATCGATAAATTTCATAAGTATTTCATAGATTTAGCCCTGAGCAACGCTAAATTG
154 ACCCACCTGGAAACCTATTTGGAGCTGTACAACAAGTCAGCCGAGACAAAGAAAGAGCAGAAGTTTAA
GGACGACCTGAAAAAAGTACAGGACAATTTGCGAAAAGAGATCGTCAAGTCTTTTTCCGACGGAGACGC
CAAGTCAATATTTGCCATCCTGGACAAAAAGGAACTCATCACTGTGGAGTTGGAGAAGTGGTTTGAGAA
TAATGAGCAGAAGGACATCTATTTTGACGAAAAGTTCAAGACATTTACTACTTACTTCACCGGATTTCAC
CAAAACCGGAAGAACATGTACTCTGTTGAGCCGAACTCAACCGCCATCGCCTACCGCCTTATTCACGAA
AATCTGCCAAAGTTTCTCGAGAATGCTAAAGCCTTTGAGAAAATTAAGCAGGTCGAGTCGCTCCAGGTG
AACTTTCGAGAGCTGATGGGTGAATTCGGGGACGAGGGCCTGATTTTCGTGAATGAACTCGAAGAGATG
TTTCAGATCAACTACTATAATGATGTACTCTCACAGAACGGGATCACTATCTACAACAGCATTATCTCTG
GATTCACTAAGAACGATATCAAGTATAAAGGGCTGAATGAATACATCAACAATTATAATCAGACTAAGG
ACAAAAAGGACAGGCTGCCTAAATTGAAACAGCTGTATAAGCAGATCCTCAGTGATAGAATTAGCTTGT
CATTTCTCCCAGATGCCTTCACTGACGGAAAGCAGGTGCTTAAGGCGATATTCGATTTCTATAAGATCAA
CCTCCTCTCTTATACAATCGAGGGCCAGGAGGAGTCACAGAACCTCCTGCTCCTGATTCGACAAACTATT
GAAAATCTGTCCTCTTTCGATACGCAGAAGATATACCTGAAAAATGACACCCATCTCACTACAATATCCC
AACAGGTATTCGGAGATTTCTCCGTCTTCAGTACAGCCCTGAATTACTGGTACGAGACAAAGGTGAACC
CTAAGTTCGAAACAGAGTACAGCAAGGCGAACGAAAAGAAGAGGGAGATCCTGGACAAAGCCAAAGC
CGTTTTCACCAAGCAAGATTACTTTAGCATCGCATTTCTGCAGGAAGTCCTGTCTGAGTACATACTGACA
CTCGATCACACAAGCGACATAGTTAAGAAGCACTCTTCCAATTGTATCGCGGACTACTTCAAAAATCATT
TTGTCGCGAAAAAGGAGAACGAGACAGATAAGACCTTCGATTTTATCGCGAATATTACCGCAAAGTATC
AATGCATTCAGGGTATCTTGGAGAACGCCGACCAGTACGAAGACGAGCTTAAACAGGATCAGAAGCTC
ATCGACAACCTAAAGTTCTTTTTGGACGCTATACTGGAACTCCTTCATTTTATTAAGCCACTACATCTGA
AGAGTGAGTCTATCACTGAGAAGGACACTGCTTTTTACGACGTTTTCGAGAATTACTACGAAGCACTGTC
TCTGCTAACCCCTCTGTATAACATGGTGAGAAACTATGTGACACAGAAACCTTATAGTACCGAGAAGAT
TAAGTTGAACTTCGAGAACGCACAATTGCTGAATGGGTGGGATGCAAACAAAGAGGGTGATTACCTCAC
AACAATCCTCAAGAAAGATGGCAATTACTTCCTGGCCATTATGGATAAAAAACATAACAAGGCATTTCA
GAAATTTCCCGAGGGGAAGGAAAATTATGAAAAGATGGTATACAAGTTGCTGCCCGGGGTGAACAAAA
TGCTCCCGAAGGTGTTTTTCTCGAATAAGAATATCGCGTACTTTAACCCGTCCAAGGAACTGTTGGAAAA
TTATAAAAAGGAAACACACAAGAAGGGGGACACTTTTAATTTGGAGCACTGCCACACACTCATTGACTT
CTTTAAAGATAGTCTCAACAAACATGAGGATTGGAAATATTTTGACTTTCAGTTTAGCGAGACCAAGTCT
TATCAGGATCTGTCGGGATTTTATAGGGAAGTTGAGCACCAGGGTTACAAGATAAATTTCAAGAACATC
GATAGCGAGTACATTGACGGACTGGTGAACGAAGGGAAGCTGTTCCTGTTTCAGATTTACAGCAAAGAT
TTCTCTCCTTTCTCAAAAGGCAAGCCGAACATGCATACCCTGTATTGGAAGGCCCTGTTCGAGGAGCAAA
ACCTTCAGAATGTGATTTACAAGCTGAACGGTCAGGCCGAGATTTTTTTTAGGAAGGCCTCTATCAAGCC
CAAAAACATCATTCTGCACAAGAAAAAGATAAAGATCGCCAAAAAACACTTCATTGATAAAAAGACAA
AGACTTCTGAGATCGTACCTGTTCAGACAATCAAGAATCTCAACATGTATTATCAGGGGAAGATTAGCG
AGAAAGAGCTGACACAGGACGATTTGAGGTACATCGACAACTTCTCTATCTTTAACGAGAAGAACAAGA
CAATCGATATCATCAAGGACAAGCGGTTTACCGTCGATAAATTCCAGTTCCATGTGCCTATCACGATGAA
TTTCAAGGCCACCGGTGGGAGTTATATCAACCAGACTGTGCTGGAGTATCTGCAGAACAACCCCGAAGT
AAAAATTATTGGCCTGGACAGAGGAGAGCGGCATCTGGTGTACTTGACCCTCATCGATCAGCAGGGAAA
TATCCTGAAACAAGAATCTCTGAATACTATTACGGACTCCAAAATCAGCACACCTTACCACAAGCTGCTT
GATAATAAAGAGAATGAGAGGGACTTGGCCCGCAAAAATTGGGGCACCGTCGAGAATATTAAGGAATT
GAAAGAAGGATACATCTCACAGGTGGTTCACAAAATCGCAACCCTGATGTTAGAAGAGAACGCTATTGT
GGTGATGGAGGACTTAAACTTCGGATTTAAAAGAGGAAGATTTAAAGTCGAGAAACAGATTTATCAGAA
ACTGGAAAAAATGCTCATTGACAAATTAAATTACCTGGTGCTGAAAGATAAACAGCCACAGGAGCTGGG
TGGCCTGTATAATGCTCTGCAGCTGACCAACAAGTTCGAGTCGTTTCAGAAAATGGGCAAGCAGTCAGG
CTTCCTTTTTTACGTGCCCGCTTGGAACACCTCAAAAATCGACCCTACAACAGGCTTTGTGAATTATTTCT
ATACCAAGTATGAAAACGTGGACAAGGCAAAGGCCTTTTTCGAGAAGTTTGAAGCAATCAGGTTCAATG
CCGAGAAAAAATACTTTGAGTTCGAGGTCAAAAAATATAGCGACTTCAACCCTAAGGCCGAAGGCACGC
AACAAGCCTGGACAATATGCACGTATGGGGAGAGAATTGAGACTAAGCGGCAGAAGGATCAGAATAAC
AAATTCGTGAGCACACCGATTAACCTGACAGAGAAGATAGAGGACTTCCTCGGCAAGAATCAGATCGTG
TACGGCGACGGCAATTGCATCAAGTCACAAATTGCATCTAAAGATGACAAAGCATTCTTCGAAACACTG
CTGTATTGGTTCAAGATGACACTCCAGATGCGAAATAGCGAAACAAGAACAGATATTGACTACCTCATC
AGCCCTGTGATGAATGATAACGGCACGTTTTACAATTCCCGGGACTATGAAAAATTAGAGAACCCGACA
CTGCCAAAAGACGCCGACGCAAATGGTGCATATCACATCGCAAAGAAAGGTTTGATGCTGTTGAACAAA
ATTGATCAGGCTGATCTGACAAAAAAGGTCGATCTGAGTATCAGTAACCGCGACTGGTTGCAGTTTGTC
CAGAAGAACAAATAA
SEQ ATGGAACAAGAGTACTATCTGGGCCTGGACATGGGCACCGGGAGTGTCGGATGGGCAGTCACCGACTCA
ID GAGTACCACGTCCTCAGAAAGCACGGTAAGGCACTTTGGGGAGTGCGACTCTTCGAGTCCGCTAGTACT
NO: GCTGAAGAGAGGAGGATGTTTCGAACTTCCAGGCGCAGGCTGGATCGGCGAAACTGGAGAATAGAGAT
155 TCTCCAGGAGATATTTGCTGAAGAGATTTCAAAGAAGGATCCTGGTTTTTTCCTGCGCATGAAAGAATCT
AAGTATTACCCCGAAGATAAACGCGACATCAACGGCAATTGTCCTGAACTGCCCTATGCTCTGTTTGTCG
ACGACGATTTCACCGACAAAGATTACCACAAGAAATTCCCCACCATATACCACCTGAGAAAGATGTTGA
TGAACACCGAGGAGACACCCGACATACGTCTGGTTTACCTGGCTATCCATCATATGATGAAGCACCGCG
GGCATTTCCTGCTGTCTGGAGACATCAATGAGATAAAGGAATTTGGTACTACGTTCTCCAAGTTGTTAGA
AAACATTAAGAATGAAGAGTTGGACTGGAATCTTGAACTGGGAAAGGAAGAGTATGCAGTTGTAGAGT
CGATTTTGAAAGATAACATGTTAAACCGGTCAACTAAGAAAACCAGGTTAATTAAGGCACTAAAGGCCA
AATCGATATGCGAGAAGGCTGTGCTAAATCTGCTGGCTGGAGGCACCGTGAAACTGTCTGATATTTTCG
GCCTGGAAGAGCTCAATGAAACCGAGCGGCCTAAAATTTCTTTCGCCGATAACGGATACGATGACTATA
TTGGGGAGGTGGAAAACGAGCTCGGAGAACAATTCTACATTATTGAAACCGCTAAGGCAGTCTATGACT
GGGCCGTGCTCGTCGAGATTTTAGGCAAGTACACCAGCATTAGCGAAGCAAAGGTGGCTACCTATGAAA
AGCACAAATCTGACCTCCAGTTTCTGAAAAAGATTGTGCGCAAATACTTAACAAAAGAAGAGTACAAGG
ACATCTTTGTGAGCACATCAGATAAGCTCAAGAATTACTCAGCATACATTGGAATGACAAAGATTAACG
GGAAGAAGGTGGATCTCCAAAGCAAACGTTGTTCAAAGGAGGAGTTTTACGATTTCATAAAGAAGAAC
GTGCTGAAGAAACTGGAGGGACAACCGGAGTACGAGTATTTAAAGGAGGAGCTCGAGCGAGAAACTTT
CCTGCCCAAGCAAGTGAACAGAGACAATGGTGTCATTCCTTACCAGATTCACTTATATGAGCTGAAGAA
AATCCTGGGGAACTTGAGAGACAAGATAGACCTCATCAAGGAAAATGAAGATAAGTTGGTCCAGTTGTT
CGAATTCAGAATCCCATATTACGTCGGCCCGCTCAATAAGATCGACGACGGCAAGGAAGGCAAATTCAC
TTGGGCGGTGCGAAAAAGCAACGAAAAAATATACCCATGGAACTTTGAGAACGTCGTTGACATCGAGG
CCAGCGCCGAGAAATTTATAAGACGCATGACTAATAAGTGTACTTACCTCATGGGCGAGGATGTTCTGC
CCAAGGACAGCCTGCTGTATTCCAAGTACATGGTGCTTAACGAGCTGAATAATGTAAAGTTAGATGGTG
AGAAGCTCAGCGTGGAGCTTAAACAGAGGCTGTACACTGATGTGTTTTGCAAGTATCGGAAAGTTACCG
TTAAGAAGATAAAGAATTACCTGAAATGCGAAGGGATCATTTCCGGCAACGTGGAAATTACCGGAATCG
ACGGCGATTTTAAGGCGTCGTTGACCGCTTATCATGATTTCAAGGAGATTTTAACCGGCACGGAGCTCGC
GAAGAAAGACAAGGAGAACATAATCACGAATATAGTTCTGTTTGGGGACGATAAAAAACTTCTTAAAA
AACGACTCAATCGACTGTATCCGCAGATTACCCCCAACCAGCTGAAGAAGATTTGCGCTCTGAGCTATA
CCGGGTGGGGCCGGTTCTCTAAGAAATTCCTCGAGGAGATCACAGCACCAGACCCAGAGACTGGTGAGG
TGTGGAATATTATTACAGCTCTGTGGGAATCCAATAATAACCTTATGCAATTGTTGAGCAATGAATATAG
GTTCATGGAGGAAGTGGAAACCTACAATATGGGCAAGCAGACAAAGACCCTATCTTACGAGACCGTTGA
GAATATGTATGTCTCCCCTTCAGTGAAACGGCAAATCTGGCAAACTTTGAAGATCGTGAAGGAGCTCGA
AAAGGTGATGAAAGAGAGCCCGAAGAGGGTTTTTATTGAAATGGCCAGAGAGAAACAGGAGAGCAAGA
GAACAGAGTCTAGGAAGAAGCAGCTAATCGATTTGTATAAAGCCTGCAAGAACGAGGAAAAAGACTGG
GTCAAGGAGCTAGGCGATCAGGAAGAACAGAAGTTGCGCTCTGATAAGCTGTACTTATATTATACCCAG
AAAGGACGGTGCATGTACTCAGGTGAGGTCATTGAGCTGAAAGATCTGTGGGACAATACTAAGTATGAT
ATTGATCACATCTACCCTCAGTCAAAAACTATGGACGACTCCCTCAACAACAGGGTGTTGGTTAAGAAG
AAATACAATGCTACAAAGTCCGATAAATACCCTCTTAACGAAAACATCCGGCACGAAAGAAAGGGCTTC
TGGAAGTCCCTGCTGGATGGGGGTTTTATCAGTAAAGAAAAGTATGAGAGGCTGATCCGAAATACCGAG
CTCTCCCCCGAGGAACTGGCTGGCTTTATCGAAAGGCAGATCGTAGAGACTAGGCAATCTACAAAGGCA
GTCGCTGAGATCCTGAAGCAAGTGTTTCCTGAGTCAGAAATCGTGTACGTCAAAGCTGGCACAGTGTCA
CGGTTCCGAAAGGACTTTGAGTTGTTAAAAGTTCGGGAGGTGAATGACCTGCACCACGCTAAAGACGCC
TATCTGAATATCGTTGTGGGGAACTCCTATTATGTTAAGTTTACTAAGAATGCGTCCTGGTTTATTAAGG
AGAACCCGGGGCGCACCTATAACCTGAAGAAGATGTTCACCTCCGGCTGGAACATAGAACGGAACGGA
GAAGTCGCGTGGGAGGTGGGTAAGAAAGGGACCATTGTGACCGTCAAACAGATTATGAACAAAAACAA
CATATTGGTAACTCGCCAGGTGCATGAGGCCAAAGGGGGCCTCTTTGATCAGCAGATTATGAAAAAGGG
CAAAGGACAGATCGCAATCAAGGAAACCGACGAGCGCCTGGCATCCATTGAGAAGTACGGAGGCTACA
ACAAGGCGGCAGGTGCGTACTTCATGCTCGTCGAGTCCAAAGATAAGAAAGGCAAAACTATTAGAACA
ATCGAGTTCATCCCTCTATATTTGAAAAATAAGATCGAAAGTGACGAAAGCATCGCCCTTAACTTCTTGG
AGAAGGGCCGGGGCTTAAAGGAACCAAAGATTCTGCTCAAGAAGATCAAGATCGACACACTCTTCGAT
GTGGATGGTTTTAAGATGTGGCTGTCAGGCAGGACAGGGGATCGCTTGCTGTTCAAATGCGCAAATCAG
TTGATTCTGGACGAAAAGATCATTGTGACGATGAAGAAGATCGTTAAATTCATTCAGCGGAGACAGGAA
AACAGAGAACTGAAACTCTCCGATAAGGATGGAATTGACAATGAAGTCCTCATGGAGATTTACAATACC
TTTGTGGACAAGCTTGAGAACACAGTCTATCGGATCCGACTGTCCGAACAGGCAAAGACTCTGATCGAC
AAACAGAAAGAATTCGAAAGACTAAGCTTAGAGGACAAAAGTTCAACTCTCTTTGAAATTCTCCACATC
TTCCAATGTCAAAGTAGTGCAGCCAACTTGAAGATGATCGGGGGTCCCGGCAAGGCTGGAATCTTAGTC
ATGAACAACAACATCTCCAAATGTAACAAAATCTCCATCATAAACCAGTCTCCCACCGGCATTTTCGAG
AACGAAATTGATTTACTCAAG
SEQ ATGAAATCTTTCGATTCTTTCACCAACCTCTACTCCCTTAGCAAAACCCTTAAGTTTGAAATGAGGCCGG
ID TGGGGAATACACAGAAGATGCTTGACAATGCTGGCGTCTTTGAAAAGGACAAATTAATCCAGAAGAAGT
NO: ATGGTAAAACAAAGCCATATTTTGACCGATTGCATCGGGAATTCATTGAAGAGGCTCTTACAGGAGTAG
156 AATTGATCGGACTGGACGAGAACTTCCGTACCTTAGTAGACTGGCAGAAGGACAAGAAGAACAACGTG
GCAATGAAGGCCTATGAGAACTCACTCCAGCGCCTTAGAACCGAGATCGGAAAGATCTTTAATCTTAAG
GCGGAAGATTGGGTAAAAAATAAGTACCCGATCCTGGGACTGAAAAACAAAAACACAGACATCCTGTT
TGAAGAAGCCGTCTTTGGTATCTTGAAGGCCAGGTATGGAGAGGAGAAAGACACGTTTATAGAGGTAGA
GGAGATTGATAAAACAGGCAAGAGTAAGATTAATCAGATCAGTATCTTTGATTCTTGGAAGGGGTTCAC
AGGCTACTTTAAGAAGTTTTTCGAAACCAGGAAAAATTTCTATAAGAACGATGGCACCTCCACAGCTAT
CGCGACACGCATCATAGATCAGAATCTGAAACGGTTCATTGATAATCTGAGCATTGTTGAATCCGTGCG
CCAGAAGGTCGACCTAGCTGAGACTGAGAAGTCTTTCTCTATATCACTCTCCCAGTTCTTCTCAATAGAT
TTTTATAATAAGTGCCTTCTGCAAGATGGCATAGACTACTATAACAAGATCATCGGCGGCGAAACTCTCA
AAAACGGTGAAAAGCTCATTGGCCTGAATGAGCTCATCAACCAATATAGACAAAATAACAAGGATCAG
AAAATCCCATTCTTTAAGCTGCTAGATAAACAGATCCTATCAGAAAAAATCCTGTTCCTCGACGAAATCA
AAAACGACACCGAACTCATCGAGGCTCTCTCGCAGTTTGCCAAGACGGCTGAGGAGAAGACGAAGATT
GTGAAAAAGCTGTTTGCAGACTTTGTGGAGAACAACTCTAAATACGATTTGGCTCAGATTTATATCTCCC
AGGAAGCATTTAACACAATCTCCAATAAGTGGACTAGCGAGACTGAAACCTTCGCCAAATACCTGTTCG
AGGCCATGAAAAGCGGCAAGCTCGCCAAATACGAGAAGAAGGACAATTCCTATAAGTTTCCCGATTTCA
TCGCATTATCTCAGATGAAGTCCGCGCTACTTAGCATTAGCCTGGAAGGCCATTTTTGGAAGGAGAAAT
ACTATAAGATTTCCAAATTCCAAGAAAAGACCAATTGGGAGCAGTTCTTGGCTATTTTTCTATACGAGTT
CAACTCTTTGTTCAGTGACAAGATCAACACTAAGGACGGTGAGACCAAACAAGTGGGGTACTACCTCTT
CGCCAAAGATCTTCATAACCTGATACTGTCCGAACAGATCGACATACCCAAGGATTCAAAGGTGACCAT
CAAGGATTTTGCGGATTCGGTATTGACGATCTATCAGATGGCGAAGTATTTCGCTGTCGAGAAAAAGCG
GGCATGGCTGGCCGAATACGAGTTGGACTCCTTCTATACTCAACCCGATACAGGGTACCTGCAGTTTTAC
GATAATGCATACGAGGATATAGTCCAGGTGTACAATAAACTCAGGAACTACCTCACTAAGAAACCATAC
TCCGAAGAAAAATGGAAACTTAATTTTGAGAATAGTACACTGGCCAATGGATGGGACAAGAACAAGGA
ATCAGACAACTCCGCTGTAATTCTCCAGAAGGGTGGCAAGTATTATCTGGGACTGATAACAAAGGGCCA
TAACAAGATTTTCGATGACCGTTTTCAGGAGAAGTTTATAGTGGGCATAGAGGGTGGCAAGTATGAAAA
AATAGTCTACAAGTTCTTTCCCGATCAGGCGAAGATGTTCCCCAAAGTATGCTTCAGTGCTAAAGGCCTC
GAGTTTTTCCGGCCATCTGAAGAGATACTCCGCATCTATAATAACGCAGAGTTTAAAAAGGGAGAGACG
TACTCAATCGACTCGATGCAGAAACTCATTGACTTCTACAAAGATTGTCTCACAAAATACGAGGGCTGG
GCTTGCTACACGTTTCGGCACTTGAAGCCAACCGAGGAATATCAAAACAACATCGGGGAGTTCTTCCGT
GACGTCGCCGAAGACGGCTATAGAATTGACTTTCAGGGCATAAGTGATCAGTATATTCACGAGAAGAAT
GAGAAAGGTGAGTTGCATCTTTTCGAAATCCACAATAAAGACTGGAATCTTGACAAGGCTCGCGATGGA
AAATCAAAGACTACCCAGAAGAATCTTCATACACTTTACTTCGAGTCCCTCTTTTCCAACGACAACGTCG
TACAGAATTTCCCAATAAAACTGAACGGCCAGGCCGAAATTTTTTACAGGCCCAAAACCGAAAAAGATA
AACTGGAATCCAAGAAAGACAAGAAGGGAAATAAGGTGATAGATCACAAAAGGTATTCCGAGAACAAG
ATTTTTTTCCACGTACCTCTTACCCTGAACAGAACGAAGAACGACTCTTATAGATTCAATGCCCAGATAA
ACAACTTTCTCGCAAACAACAAAGATATCAATATTATCGGCGTCGATAGAGGTGAGAAGCACTTGGTAT
ATTATTCTGTGATCACGCAAGCATCCGATATCTTGGAGTCCGGTTCTTTGAACGAACTGAATGGTGTCAA
CTACGCCGAGAAACTCGGTAAGAAAGCTGAGAATCGGGAGCAGGCTAGAAGGGACTGGCAGGACGTTC
AGGGTATCAAGGACCTGAAGAAGGGCTACATTTCTCAGGTGGTTCGAAAACTGGCTGATTTGGCCATTA
AGCACAATGCAATCATCATTTTAGAAGATTTGAACATGCGGTTTAAACAAGTCAGGGGGGGGATAGAGA
AATCAATTTACCAACAGCTGGAAAAAGCTCTGATTGATAAACTCTCTTTTTTGGTTGATAAGGGCGAAAA
GAACCCCGAGCAAGCAGGACATCTCCTTAAAGCCTATCAACTGAGCGCACCTTTCGAGACATTCCAGAA
GATGGGAAAGCAAACCGGCATCATTTTCTATACCCAGGCTTCCTATACATCCAAGTCTGATCCAGTGACT
GGGTGGAGACCCCATCTCTACCTCAAGTACTTTTCTGCCAAAAAAGCTAAGGACGACATTGCTAAGTTC
ACAAAAATCGAGTTCGTGAACGACAGGTTCGAGCTGACTTATGACATAAAAGATTTCCAGCAGGCCAAG
GAGTACCCAAACAAGACAGTTTGGAAAGTGTGTTCCAATGTGGAGAGGTTTCGGTGGGACAAGAATCTG
AATCAGAATAAAGGGGGATATACTCACTACACCAACATTACCGAGAACATCCAAGAGTTGTTCACCAAA
TACGGCATCGACATTACTAAAGATCTGCTGACACAGATCTCCACCATCGATGAGAAGCAGAACACATCT
TTCTTCCGGGATTTCATCTTTTATTTTAACTTGATCTGTCAGATTAGAAATACCGACGACAGTGAGATAG
CTAAAAAAAACGGGAAAGACGATTTCATTCTCTCTCCCGTGGAGCCGTTTTTTGACTCCCGCAAAGACA
ATGGCAATAAGCTTCCGGAAAACGGGGACGATAACGGCGCCTACAACATCGCTCGTAAGGGAATCGTTA
TCCTCAATAAAATAAGCCAGTATTCCGAGAAGAACGAGAATTGTGAAAAAATGAAGTGGGGGGACCTTT
ACGTCAGCAACATCGATTGGGATAACTTTGTGACACAAGCCAATGCGAGACACTAG
SEQ ATGGAAAACTTCAAAAACCTCTACCCCATCAACAAGACCTTGAGGTTTGAGCTCCGGCCATATGGGAAG
ID ACACTGGAGAACTTCAAAAAGTCCGGTCTGCTGGAAAAGGATGCTTTTAAGGCTAACTCTAGGAGGTCT
NO: ATGCAGGCCATTATCGATGAGAAATTCAAGGAGACCATAGAGGAGCGTCTGAAATATACTGAGTTTTCC
157 GAGTGTGACCTAGGAAATATGACCAGTAAGGACAAAAAGATCACCGACAAGGCAGCGACAAACCTGAA
GAAACAGGTGATTTTAAGCTTTGATGATGAGATTTTCAATAACTACTTGAAGCCGGACAAAAACATCGA
CGCTCTGTTCAAGAATGATCCAAGCAACCCGGTCATCTCTACTTTCAAGGGCTTCACCACATACTTTGTA
AATTTCTTCGAAATACGGAAACACATCTTCAAGGGAGAGTCTTCCGGTAGCATGGCTTACAGAATAATC
GATGAGAACCTAACTACATATCTAAACAATATCGAGAAGATCAAGAAATTGCCTGAAGAACTGAAATCT
CAGCTTGAGGGAATCGATCAAATTGACAAACTGAACAACTATAACGAGTTCATCACCCAGTCCGGCATT
ACTCATTATAACGAAATTATTGGAGGGATTTCGAAGTCTGAAAATGTCAAAATTCAAGGCATTAACGAA
GGGATTAATCTTTACTGTCAAAAGAATAAAGTGAAGCTACCACGCTTAACTCCTCTGTATAAGATGATTC
TCTCTGATCGGGTCTCTAATTCCTTTGTGCTGGATACCATTGAAAATGATACCGAGTTAATTGAAATGAT
CTCTGATCTGATAAATAAGACAGAGATAAGTCAGGATGTTATTATGTCCGACATCCAAAATATTTTCATC
AAATATAAACAACTCGGCAACTTGCCGGGGATTAGCTACTCATCTATAGTGAATGCTATCTGTTCGGATT
ACGACAATAACTTTGGTGACGGCAAACGTAAAAAAAGCTATGAGAATGATCGCAAAAAACACCTCGAG
ACTAACGTGTATAGCATTAACTATATCTCAGAGTTACTGACAGACACCGACGTCTCCAGCAACATAAAG
ATGCGGTACAAAGAGCTGGAGCAGAATTATCAGGTATGCAAGGAAAATTTCAACGCCACTAACTGGATG
AACATCAAAAACATTAAGCAGTCTGAGAAAACCAATCTGATCAAGGACCTTCTTGACATCCTCAAGAGC
ATCCAGCGGTTTTATGATTTGTTTGACATCGTGGATGAAGACAAAAATCCTAGTGCTGAGTTCTATACCT
GGCTGTCTAAAAACGCGGAGAAACTGGACTTCGAGTTTAATTCAGTGTACAACAAGAGCAGGAACTACC
TCACGAGAAAGCAGTACTCCGATAAAAAGATTAAGTTGAACTTCGATAGTCCTACTCTCGCCAAGGGGT
GGGATGCGAACAAAGAAATTGATAATAGCACAATTATCATGAGGAAGTTCAACAACGACCGGGGCGAT
TACGATTACTTCTTGGGGATCTGGAATAAGAGCACACCTGCCAACGAAAAGATCATCCCATTAGAGGAT
AATGGACTGTTTGAAAAAATGCAATATAAGCTGTATCCCGATCCTAGTAAAATGCTGCCAAAGCAATTC
CTTTCTAAGATCTGGAAAGCTAAACATCCAACTACACCCGAGTTTGATAAGAAGTACAAAGAAGGTCGG
CACAAGAAGGGGCCTGATTTTGAGAAAGAGTTTCTGCACGAGTTGATCGATTGCTTTAAGCATGGATTG
GTAAACCACGACGAAAAATATCAGGATGTGTTCGGGTTCAATCTGCGCAACACGGAAGACTACAACTCT
TATACAGAGTTTCTGGAGGACGTCGAAAGGTGCAACTATAATCTTAGTTTCAATAAAATCGCTGACACG
TCTAACTTGATAAATGATGGGAAACTCTATGTTTTTCAGATCTGGAGCAAGGATTTCAGCATAGATAGCA
AGGGAACAAAAAACTTGAACACAATATACTTTGAATCCCTCTTCTCGGAGGAAAATATGATCGAGAAGA
TGTTCAAGCTCTCAGGGGAAGCCGAAATATTCTATCGTCCAGCAAGTTTGAATTATTGTGAAGATATTAT
CAAGAAGGGACACCACCACGCCGAACTGAAGGACAAATTCGACTATCCCATCATCAAGGACAAGCGAT
ATAGCCAGGACAAATTTTTTTTTCATGTCCCCATGGTTATCAACTACAAAAGCGAGAAGTTAAACTCCAA
ATCACTTAACAATAGGACGAACGAAAATTTAGGCCAATTCACGCACATCATCGGTATCGACCGCGGAGA
GCGACATCTCATCTACCTGACCGTGGTGGATGTGTCCACCGGTGAGATCGTTGAGCAAAAGCACCTGGA
TGAAATTATAAATACAGATACAAAAGGCGTCGAGCATAAAACTCATTATCTCAATAAATTAGAAGAGAA
GTCCAAGACGCGGGATAATGAAAGAAAGTCCTGGGAAGCAATCGAGACGATTAAGGAGCTGAAAGAAG
GCTATATTAGCCACGTGATCAATGAAATCCAGAAATTGCAGGAAAAGTATAACGCACTGATAGTGATGG
AGAACCTCAATTATGGGTTTAAGAACTCGCGTATCAAAGTGGAAAAGCAGGTCTACCAGAAATTCGAGA
CCGCCCTGATTAAAAAGTTTAATTACATCATTGACAAGAAAGATCCTGAAACCTACATTCATGGATACC
AACTGACGAATCCAATCACTACACTCGATAAAATTGGTAACCAGAGCGGTATTGTGTTGTACATTCCGG
CTTGGAATACAAGCAAGATTGATCCAGTCACTGGTTTCGTTAACCTCCTGTATGCAGACGATTTGAAATA
CAAGAACCAGGAGCAGGCTAAAAGCTTTATCCAGAAAATCGATAATATCTACTTCGAAAATGGTGAGTT
TAAATTTGATATAGATTTCAGCAAATGGAACAACCGCTACTCAATTAGCAAGACGAAATGGACACTGAC
AAGCTACGGAACCCGGATACAGACGTTCCGAAACCCCCAGAAAAATAACAAGTGGGACAGCGCCGAGT
ATGACCTGACCGAAGAGTTTAAATTAATCCTGAACATCGATGGTACTCTGAAATCTCAGGATGTGGAAA
CCTATAAGAAATTCATGTCTTTATTCAAGCTGATGTTGCAGCTGCGAAACTCCGTTACTGGAACAGACAT
TGACTACATGATTAGCCCTGTGACAGATAAAACTGGAACCCACTTTGATTCACGGGAGAATATCAAGAA
CCTGCCCGCCGATGCTGATGCGAACGGAGCTTACAACATTGCTAGGAAGGGCATCATGGCAATCGAGAA
TATTATGAACGGCATTAGCGACCCTCTGAAGATCAGTAATGAGGACTACCTGAAGTACATTCAGAACCA
ACAAGAGTAA
SEQ ATGACCCAGTTTGAGGGTTTCACCAATCTTTATCAGGTGTCAAAAACACTCAGATTTGAGCTCATCCCAC
ID AGGGTAAAACTTTAAAGCATATTCAAGAGCAGGGCTTTATAGAGGAAGACAAAGCCAGAAACGACCAT
NO: TATAAGGAACTAAAACCGATCATTGACCGCATCTACAAAACCTATGCCGACCAATGCCTTCAGCTCGTC
158 CAACTCGATTGGGAGAATCTGAGCGCCGCTATTGACAGCTACAGGAAGGAGAAGACCGAGGAGACTAG
AAACGCCCTGATCGAGGAGCAGGCGACCTATAGAAACGCTATTCACGATTATTTTATCGGCCGCACCGA
CAATTTGACAGATGCCATCAACAAGCGGCACGCCGAAATTTATAAGGGGTTATTTAAGGCCGAGCTGTT
CAATGGAAAAGTACTGAAACAGCTGGGCACCGTAACAACCACCGAACACGAGAATGCTCTGTTGAGGT
CCTTCGACAAGTTTACTACCTACTTTAGCGGCTTCTACGAAAACCGTAAAAACGTGTTTTCCGCGGAGGA
TATTTCAACAGCCATTCCTCATAGGATCGTGCAGGATAATTTCCCCAAGTTTAAGGAGAACTGCCATATC
TTTACCAGACTTATCACTGCTGTGCCAAGTTTACGAGAACACTTCGAGAATGTTAAGAAGGCTATAGGC
ATATTCGTTTCCACCTCCATCGAAGAAGTATTCAGTTTTCCATTCTACAATCAGTTACTCACGCAGACCC
AGATAGATCTCTACAATCAGCTGCTCGGAGGCATTTCTAGAGAAGCAGGCACGGAAAAGATCAAGGGCT
TAAATGAAGTACTCAATCTTGCAATTCAGAAGAACGATGAGACAGCACACATTATTGCATCTCTCCCTCA
CAGATTCATTCCCCTGTTCAAACAGATCCTGTCCGATCGCAACACACTAAGCTTTATACTTGAGGAGTTT
AAGTCAGATGAGGAAGTGATCCAGAGCTTCTGTAAGTATAAGACTTTGCTCCGTAATGAAAACGTGCTT
GAGACAGCAGAGGCTCTCTTTAACGAGTTGAATTCCATCGACCTGACACACATTTTTATCAGCCATAAAA
AGCTGGAAACGATTAGCTCTGCCTTGTGCGACCACTGGGACACCCTGCGTAACGCCCTCTATGAAAGGC
GCATTTCCGAGCTCACCGGGAAGATCACAAAAAGTGCCAAGGAAAAAGTCCAGAGGTCCCTTAAACAT
GAAGACATCAACCTACAAGAGATCATCTCTGCGGCTGGGAAAGAGCTGTCAGAAGCATTTAAACAGAA
GACTTCCGAGATCCTGAGCCACGCACACGCCGCATTAGACCAGCCCCTGCCTACAACTCTTAAAAAACA
GGAGGAGAAGGAGATTTTAAAGAGCCAGCTGGACTCATTACTCGGCCTGTATCATCTCCTGGACTGGTT
CGCCGTGGACGAATCCAACGAGGTGGACCCAGAATTTAGCGCCAGGCTGACAGGAATTAAACTGGAAA
TGGAGCCAAGTTTGAGCTTTTACAACAAGGCTCGGAACTATGCCACTAAAAAGCCCTACAGCGTGGAAA
AGTTCAAGCTGAATTTTCAGATGCCGACCCTGGCTTCCGGGTGGGATGTTAATAAGGAAAAGAATAATG
GGGCTATACTGTTCGTCAAAAATGGTCTCTACTACCTGGGAATCATGCCCAAACAGAAGGGCAGGTACA
AAGCCCTTTCGTTTGAGCCGACCGAAAAAACCAGCGAAGGCTTTGATAAGATGTATTACGACTATTTCCC
AGATGCAGCCAAGATGATCCCAAAATGTAGCACTCAGTTGAAGGCGGTAACCGCTCACTTTCAGACACA
CACCACTCCTATCTTGCTCTCCAACAACTTTATTGAGCCGCTGGAGATCACGAAGGAAATCTACGACCTT
AACAACCCAGAGAAGGAACCCAAGAAATTCCAAACAGCTTATGCTAAGAAGACTGGGGATCAAAAGGG
CTATCGAGAGGCTTTGTGTAAGTGGATTGACTTTACACGGGATTTCCTGAGTAAGTATACCAAGACCACA
TCTATTGACCTGTCCTCACTGAGACCTTCCTCACAATATAAGGATCTCGGAGAGTATTATGCCGAACTCA
ACCCTCTACTCTATCACATCTCTTTCCAGAGGATCGCCGAAAAGGAAATTATGGACGCCGTCGAGACAG
GCAAGCTGTACCTCTTCCAGATTTACAACAAGGATTTCGCAAAGGGCCACCACGGAAAACCCAATTTGC
ACACTTTGTACTGGACAGGGCTCTTCTCTCCCGAAAATTTGGCCAAAACTTCAATAAAACTGAACGGGC
AAGCCGAGCTGTTCTATCGGCCCAAGTCACGTATGAAGCGGATGGCCCACCGGCTGGGCGAGAAGATGC
TCAACAAGAAACTGAAGGATCAGAAGACGCCCATACCAGACACTCTTTACCAAGAGCTGTATGACTACG
TGAATCACAGACTGAGTCACGACCTGTCTGATGAAGCCCGGGCTCTTCTTCCAAATGTGATTACCAAAG
AAGTTTCCCACGAAATTATCAAGGACCGGCGCTTCACCTCTGACAAATTCTTTTTCCACGTCCCAATCAC
CCTCAACTACCAGGCAGCCAATTCCCCTTCAAAGTTTAACCAGCGTGTGAATGCCTACCTGAAAGAGCA
TCCGGAGACCCCCATCATAGGGATAGACAGAGGAGAGCGGAATCTTATCTACATTACTGTGATTGACAG
CACAGGTAAGATCTTGGAGCAGAGATCTTTAAATACAATCCAGCAGTTTGACTACCAGAAGAAACTGGA
TAACCGAGAGAAGGAAAGGGTTGCTGCAAGACAGGCCTGGTCAGTGGTCGGCACCATCAAAGACCTGA
AGCAGGGCTACTTATCCCAAGTAATTCACGAAATTGTCGATCTTATGATTCATTATCAAGCCGTTGTTGT
GCTGGAGAACCTGAATTTTGGCTTCAAAAGCAAACGAACAGGTATCGCCGAGAAAGCCGTGTATCAGCA
GTTCGAAAAGATGCTCATAGACAAGCTGAACTGCTTAGTGCTGAAGGATTATCCTGCTGAGAAGGTCGG
CGGCGTACTTAACCCATACCAGCTGACCGATCAGTTCACTAGTTTCGCCAAGATGGGAACGCAAAGTGG
CTTCCTTTTCTACGTGCCCGCTCCCTACACGAGTAAGATCGACCCTCTGACCGGCTTCGTCGACCCATTCG
TCTGGAAGACCATCAAGAATCACGAATCACGGAAACACTTCTTAGAGGGGTTTGACTTCCTGCACTACG
ACGTGAAGACAGGGGACTTCATCTTACACTTTAAGATGAATCGAAACCTCTCCTTCCAGCGGGGCCTGC
CTGGTTTCATGCCCGCATGGGACATCGTGTTTGAGAAAAACGAGACACAGTTTGACGCTAAGGGAACCC
CCTTTATTGCGGGGAAGCGGATTGTCCCAGTCATCGAAAACCATCGGTTCACCGGGCGATACCGGGATC
TGTACCCGGCCAACGAGCTCATCGCGCTGCTGGAGGAGAAGGGTATTGTGTTTAGGGATGGATCCAACA
TTCTGCCTAAGTTGCTGGAAAATGATGATTCGCACGCCATTGATACCATGGTTGCACTGATTAGATCCGT
ACTGCAGATGAGGAATAGCAATGCTGCAACCGGGGAGGATTATATTAATTCCCCAGTGCGAGATCTGAA
TGGTGTCTGTTTTGACTCGCGCTTTCAGAATCCAGAATGGCCAATGGATGCAGACGCTAACGGGGCGTA
CCACATTGCTCTGAAAGGCCAGCTACTCCTGAACCACCTCAAGGAGAGCAAAGATCTGAAGCTGCAGAA
CGGCATTTCCAACCAAGACTGGCTCGCCTACATACAAGAACTGCGCAATTAA
SEQ ATGGCTGTCAAATCCATCAAGGTTAAATTACGGCTTGATGACATGCCCGAGATCCGCGCCGGGCTCTGG
ID AAACTCCATAAAGAAGTGAATGCTGGCGTTAGATACTACACAGAATGGCTCTCCCTGCTGCGCCAGGAA
NO: AATTTGTACCGCCGGTCACCTAATGGAGATGGAGAGCAGGAATGCGATAAAACAGCAGAAGAGTGCAA
159 AGCCGAATTGCTGGAGCGACTGCGGGCACGGCAGGTTGAGAATGGACACCGAGGTCCGGCGGGATCGG
ACGACGAGCTGCTCCAGCTCGCCAGACAATTATATGAACTGCTGGTGCCTCAGGCTATTGGGGCAAAGG
GTGACGCACAGCAGATTGCTAGAAAATTTCTGTCTCCCCTCGCCGACAAAGACGCTGTCGGCGGCCTTG
GGATAGCCAAAGCCGGCAACAAACCCCGATGGGTGCGCATGAGGGAGGCTGGTGAGCCTGGCTGGGAG
GAAGAAAAGGAAAAGGCCGAAACCAGAAAGTCCGCCGACAGGACCGCGGACGTACTCCGAGCATTGGC
CGATTTTGGGCTGAAGCCCTTAATGCGAGTCTACACCGATAGTGAAATGTCTAGCGTGGAGTGGAAGCC
ATTACGCAAAGGGCAGGCAGTGCGGACGTGGGACCGTGACATGTTCCAGCAAGCCATCGAGCGAATGA
TGAGCTGGGAGAGCTGGAACCAGAGAGTGGGGCAGGAGTATGCCAAGCTGGTCGAGCAGAAAAACCGG
TTTGAGCAAAAAAATTTTGTAGGTCAGGAACACCTGGTGCATCTCGTTAACCAGCTCCAGCAAGATATG
AAGGAAGCTTCGCCTGGATTAGAGAGCAAAGAGCAGACTGCACACTATGTAACCGGAAGAGCACTGAG
GGGCAGTGACAAAGTGTTCGAAAAATGGGGAAAACTGGCTCCCGATGCCCCCTTTGACCTGTACGACGC
AGAAATAAAAAACGTGCAGCGGCGAAACACCAGGCGATTTGGTAGCCATGATCTGTTCGCCAAATTGGC
AGAGCCGGAATATCAGGCTCTTTGGCGAGAAGACGCATCATTTCTCACTAGGTACGCGGTCTATAACTC
CATTTTGAGGAAATTGAACCACGCAAAAATGTTTGCCACCTTCACGTTGCCTGACGCCACCGCTCATCCC
ATTTGGACACGGTTTGATAAGCTGGGCGGCAATCTGCATCAGTATACATTCCTGTTTAACGAGTTTGGAG
AGCGAAGACATGCGATACGATTCCACAAGCTACTGAAGGTCGAAAATGGCGTGGCACGTGAGGTGGAC
GATGTCACCGTGCCCATCAGCATGAGCGAACAGCTGGATAATTTGTTGCCGCGGGACCCAAATGAACCT
ATAGCCCTTTATTTTAGGGACTACGGGGCGGAGCAACATTTCACTGGGGAGTTTGGCGGCGCAAAAATT
CAGTGCCGACGCGACCAGCTCGCCCACATGCATAGAAGACGCGGGGCCCGGGACGTATACCTTAACGTC
TCTGTGAGGGTGCAGTCCCAGTCAGAGGCAAGAGGGGAACGCAGACCACCTTACGCAGCAGTATTCAG
GCTGGTAGGCGATAACCACCGGGCGTTTGTACACTTTGATAAACTTTCTGACTACCTGGCCGAACACCCG
GATGACGGCAAATTAGGATCGGAGGGGCTGCTTAGCGGCCTGCGTGTGATGAGCGTCGATCTGGGGCTA
CGGACCTCTGCTTCCATCTCTGTGTTCCGTGTGGCCCGAAAGGACGAGTTGAAACCTAATTCGAAGGGCC
GTGTACCATTCTTTTTCCCTATTAAGGGAAATGATAATCTCGTCGCGGTGCACGAGCGTTCCCAACTGCT
GAAACTGCCTGGCGAGACCGAGTCCAAAGATCTCAGAGCAATCCGGGAGGAGCGACAACGTACACTTA
GGCAACTCCGCACCCAGCTGGCCTATCTGCGCTTGCTGGTGCGGTGCGGCTCCGAGGATGTAGGGAGAA
GAGAGCGAAGCTGGGCAAAGCTGATAGAGCAACCAGTTGACGCCGCGAATCACATGACCCCCGACTGG
CGCGAAGCGTTTGAAAATGAGCTGCAGAAGTTGAAATCTCTGCATGGGATTTGCTCAGATAAGGAGTGG
ATGGACGCCGTATACGAGTCTGTTCGCCGGGTATGGCGGCACATGGGGAAGCAGGTGAGAGATTGGAG
AAAGGACGTTCGCTCTGGGGAACGGCCGAAAATTCGGGGATACGCAAAGGATGTCGTGGGCGGCAATA
GCATTGAGCAGATCGAGTACCTGGAAAGGCAATACAAATTTCTGAAATCTTGGTCTTTCTTTGGGAAGGT
AAGCGGACAAGTTATCAGAGCCGAAAAGGGATCTCGCTTTGCTATCACATTGAGGGAACACATTGATCA
CGCCAAAGAAGACAGGTTGAAAAAGTTGGCTGATCGCATTATCATGGAAGCACTCGGTTACGTCTACGC
CCTTGATGAGCGCGGTAAAGGGAAGTGGGTAGCCAAGTATCCCCCATGTCAGCTGATCCTGCTCGAGGA
ACTTTCTGAGTATCAGTTCAATAACGACCGTCCTCCCTCCGAAAATAATCAGCTCATGCAATGGTCCCAC
CGGGGTGTGTTCCAAGAACTGATCAATCAGGCTCAGGTGCACGACCTCCTCGTAGGCACTATGTATGCA
GCCTTTAGCTCCCGTTTTGACGCGCGCACAGGCGCCCCTGGAATACGATGTAGGCGAGTTCCCGCACGGT
GCACTCAAGAACATAACCCGGAGCCTTTCCCATGGTGGCTCAATAAGTTTGTTGTGGAGCATACCCTCGA
CGCTTGCCCATTGAGGGCGGATGACTTGATTCCCACAGGCGAGGGGGAGATCTTCGTGAGCCCATTTTCT
GCCGAAGAAGGGGATTTCCACCAAATACATGCCGACTTGAATGCTGCCCAAAATCTGCAGCAAAGGCTG
TGGTCAGACTTCGACATCTCGCAAATCAGACTGCGGTGTGACTGGGGCGAAGTAGACGGCGAGCTGGTG
CTGATACCTAGACTGACGGGTAAGCGTACCGCCGATAGCTATAGTAATAAGGTTTTTTATACGAATACG
GGGGTGACATATTACGAGCGTGAGAGAGGCAAGAAGCGTCGGAAGGTGTTCGCGCAGGAGAAGCTGAG
CGAAGAGGAGGCGGAGCTACTGGTAGAGGCAGATGAGGCAAGAGAAAAGTCCGTCGTCCTGATGCGGG
ATCCTAGCGGGATTATTAACAGAGGTAATTGGACACGGCAGAAAGAATTCTGGAGCATGGTGAATCAAA
GAATCGAGGGTTACCTGGTGAAGCAAATTCGAAGCCGGGTGCCCCTTCAAGACAGCGCATGTGAAAACA
CTGGGGACATCTAG
SEQ ATGGCTACTCGGTCCTTCATCCTGAAAATCGAGCCAAATGAAGAGGTGAAAAAGGGCCTGTGGAAGACC
ID CATGAGGTACTTAACCACGGCATAGCATACTATATGAATATCCTAAAACTTATACGGCAGGAGGCTATC
NO: TACGAGCATCACGAGCAAGATCCTAAAAATCCAAAGAAGGTTAGTAAGGCTGAAATCCAGGCTGAATT
160 GTGGGACTTCGTGCTGAAGATGCAGAAATGCAACAGTTTCACGCATGAAGTTGATAAGGACGTCGTGTT
TAATATACTCCGGGAGCTGTACGAAGAACTGGTACCAAGCTCTGTGGAAAAGAAAGGAGAGGCCAACC
AGCTAAGTAATAAGTTCCTCTATCCTCTCGTGGACCCCAATTCACAGAGCGGCAAAGGTACCGCATCTTC
TGGGAGGAAACCACGCTGGTACAACTTGAAGATCGCTGGCGATCCCAGCTGGGAGGAGGAAAAGAAGA
AATGGGAAGAGGATAAAAAGAAAGACCCCCTGGCCAAAATCTTAGGCAAGCTCGCCGAGTACGGTCTG
ATTCCACTTTTCATCCCGTTCACAGATAGCAATGAGCCGATCGTCAAGGAGATTAAGTGGATGGAAAAG
AGCCGCAATCAGAGTGTGCGGAGGCTGGACAAAGACATGTTTATTCAGGCCCTGGAACGCTTCCTTAGC
TGGGAAAGCTGGAACCTGAAGGTTAAGGAAGAGTACGAAAAAGTCGAGAAGGAGCATAAGACTTTGGA
GGAGCGCATCAAAGAAGACATCCAGGCCTTTAAGTCTCTAGAACAGTATGAGAAAGAACGGCAGGAAC
AGCTGCTGCGTGATACACTGAACACAAACGAATATCGCCTGAGCAAGAGGGGACTCAGAGGCTGGAGA
GAAATCATTCAAAAGTGGCTCAAAATGGATGAAAATGAGCCGTCTGAAAAATACCTTGAAGTTTTCAAG
GACTACCAGCGGAAGCACCCTAGAGAAGCCGGCGACTATAGTGTTTACGAATTCTTGAGCAAGAAGGA
GAATCATTTTATATGGAGGAATCACCCGGAGTACCCATATCTGTACGCAACCTTCTGCGAAATCGACAA
GAAAAAAAAAGACGCCAAGCAACAGGCTACATTTACTCTGGCCGACCCTATCAATCACCCTCTATGGGT
CCGGTTTGAGGAGCGCTCCGGAAGCAATCTGAATAAATATCGTATTCTGACTGAACAGTTACACACAGA
GAAGCTCAAGAAGAAACTTACGGTGCAGCTGGACCGCCTGATATACCCAACAGAGTCCGGAGGATGGG
AAGAGAAAGGAAAGGTTGACATCGTACTGCTTCCATCTCGTCAGTTTTACAACCAGATATTCCTGGACAT
CGAGGAGAAGGGGAAACACGCCTTCACATACAAGGACGAGTCCATAAAGTTCCCACTGAAGGGTACTTT
AGGCGGTGCTAGGGTGCAGTTCGACCGCGATCACCTGAGACGGTACCCCCACAAGGTGGAGAGCGGGA
ACGTGGGACGAATCTACTTTAATATGACAGTGAACATTGAACCCACAGAGAGTCCAGTTAGTAAATCCC
TGAAAATTCACCGTGACGACTTTCCGAAATTTGTGAATTTCAAGCCAAAGGAGCTTACGGAGTGGATCA
AGGATTCAAAGGGAAAGAAGCTGAAATCTGGTATCGAATCTCTCGAGATCGGTCTCCGTGTCATGAGCA
TCGATCTGGGACAGCGCCAGGCAGCTGCCGCCAGTATATTCGAGGTGGTAGACCAAAAGCCTGACATCG
AGGGAAAGCTCTTCTTCCCAATCAAAGGCACAGAGCTGTATGCGGTGCACCGGGCGTCCTTTAATATAA
AGCTGCCCGGTGAAACCCTGGTGAAGTCACGGGAGGTGCTTAGAAAAGCGCGAGAGGATAACCTCAAA
CTGATGAACCAAAAACTGAACTTTCTGAGGAACGTCCTGCACTTTCAGCAGTTCGAAGATATTACCGAA
CGCGAAAAGAGAGTAACCAAGTGGATATCTCGTCAAGAGAACAGCGACGTCCCGTTAGTCTATCAGGAC
GAACTCATCCAAATACGGGAGTTGATGTATAAGCCCTACAAGGATTGGGTCGCCTTTCTTAAGCAGCTTC
ACAAACGCCTAGAGGTCGAAATAGGTAAAGAGGTGAAACATTGGCGGAAGTCGCTCAGCGACGGGAGG
AAGGGACTTTATGGCATCTCTTTGAAGAACATTGACGAAATCGATAGAACCAGAAAATTTTTGTTGAGA
TGGTCCCTCCGACCCACCGAGCCTGGAGAGGTGAGGCGGTTAGAACCAGGACAGAGGTTCGCTATCGAT
CAGCTGAATCACCTCAATGCTCTGAAGGAGGACCGCCTCAAGAAAATGGCCAATACAATCATAATGCAC
GCCCTTGGCTACTGCTACGACGTCCGAAAGAAGAAGTGGCAGGCCAAGAATCCCGCCTGTCAAATTATC
CTTTTTGAGGATCTTAGCAATTACAACCCCTATGAAGAGCGGTCCAGATTCGAAAATAGTAAGCTCATG
AAGTGGAGCCGCAGGGAGATCCCGCGCCAAGTGGCCCTTCAGGGGGAAATTTATGGGCTGCAGGTAGG
CGAGGTCGGGGCCCAATTCTCCTCGCGCTTTCATGCGAAAACTGGAAGTCCTGGAATCCGGTGCTCAGT
GGTGACAAAGGAGAAGTTGCAAGACAATCGGTTTTTTAAAAACTTACAGCGGGAGGGAAGGCTGACCC
TGGATAAGATAGCCGTACTTAAGGAAGGAGATCTGTACCCTGACAAAGGCGGTGAAAAGTTCATTAGCT
TGAGCAAGGACCGAAAACTTGTGACCACCCACGCTGACATCAATGCGGCACAGAACCTGCAGAAGAGA
TTTTGGACTCGCACCCACGGATTCTACAAAGTTTACTGCAAAGCATATCAAGTAGACGGACAGACCGTA
TACATCCCCGAGTCCAAAGATCAGAAGCAGAAAATTATTGAAGAGTTTGGGGAAGGGTACTTTATCCTG
AAGGATGGTGTCTACGAATGGGGCAACGCTGGTAAACTTAAAATTAAGAAGGGCAGCTCTAAACAGTCC
TCCAGCGAGTTAGTTGATTCTGATATTCTGAAAGACAGTTTCGACCTGGCCAGCGAACTTAAAGGGGAA
AAATTAATGCTGTACCGGGACCCCAGCGGAAACGTCTTTCCATCCGATAAGTGGATGGCCGCTGGAGTG
TTCTTTGGCAAGTTAGAGAGGATTCTCATAAGTAAGCTGACCAACCAATACTCAATCTCCACAATCGAG
GATGACTCATCCAAGCAGTCTATGTGA
SEQ ATGCCTACACGCACTATCAACCTGAAACTGGTTCTTGGCAAGAATCCAGAGAATGCTACCCTTCGTCGG
ID GCACTATTTTCAACGCATAGACTGGTGAATCAGGCTACCAAACGGATTGAAGAGTTCCTCTTGCTTTGTC
NO: GGGGGGAAGCATATAGGACGGTGGATAATGAGGGGAAAGAGGCTGAAATTCCGAGACACGCCGTGCAG
161 GAGGAAGCTCTTGCGTTTGCAAAGGCCGCTCAACGGCACAATGGTTGCATCTCTACTTATGAAGACCAG
GAAATCCTGGATGTGCTCCGGCAACTGTATGAAAGGCTGGTGCCTTCTGTGAATGAAAATAATGAAGCA
GGGGACGCTCAAGCCGCAAACGCGTGGGTGTCGCCACTGATGTCCGCCGAGTCCGAGGGAGGGCTCAG
CGTTTACGACAAGGTGCTGGACCCACCCCCAGTGTGGATGAAACTCAAAGAGGAAAAAGCTCCGGGCTG
GGAGGCTGCTTCCCAGATCTGGATCCAGTCCGACGAAGGGCAGTCCCTTCTTAACAAGCCTGGTTCGCCC
CCGCGGTGGATTAGGAAACTGAGGTCAGGCCAGCCTTGGCAGGACGATTTTGTTAGCGACCAGAAAAAG
AAGCAGGACGAGCTGACAAAGGGGAATGCGCCACTGATCAAACAATTAAAGGAAATGGGCTTATTGCC
TCTTGTGAATCCCTTTTTTAGACATCTGCTTGACCCGGAGGGGAAGGGGGTGTCACCTTGGGACAGACTC
GCTGTTAGGGCCGCTGTCGCTCATTTCATATCATGGGAATCATGGAACCACCGGACACGCGCCGAATAC
AATAGTTTGAAGCTGCGGAGGGATGAGTTCGAAGCAGCTTCCGACGAATTCAAGGACGACTTCACGCTG
CTTCGGCAGTACGAGGCTAAGAGGCACTCCACACTGAAGAGTATAGCTTTAGCCGATGATTCAAACCCT
TATAGGATCGGCGTACGCTCCCTCCGCGCTTGGAACCGCGTCCGCGAGGAGTGGATCGACAAGGGAGCG
ACCGAGGAGCAGCGGGTCACCATTCTCAGCAAGTTGCAGACCCAACTAAGGGGCAAATTTGGAGATCCT
GACTTGTTCAACTGGCTGGCGCAGGACCGGCACGTGCACCTCTGGAGCCCTAGAGATAGTGTTACCCCA
CTGGTTAGGATCAACGCTGTTGACAAAGTATTGCGACGGAGAAAACCGTACGCCTTGATGACTTTTGCC
CACCCAAGATTCCACCCTCGGTGGATACTTTACGAAGCCCCAGGGGGCAGCAATCTCCGCCAGTATGCA
CTGGATTGTACCGAAAATGCTCTGCACATTACACTGCCTCTGCTGGTTGACGATGCACATGGCACATGGA
TTGAGAAAAAAATTAGGGTTCCTCTTGCCCCCAGCGGCCAGATTCAGGACCTGACACTAGAAAAGCTCG
AGAAGAAGAAAAATCGTCTCTACTACCGTTCTGGGTTCCAGCAGTTTGCCGGCCTGGCCGGAGGTGCCG
AGGTGCTTTTCCATCGACCATACATGGAGCACGATGAGAGGAGCGAGGAGAGCTTATTAGAACGCCCTG
GTGCTGTTTGGTTCAAACTCACCTTGGACGTGGCAACCCAGGCCCCTCCAAACTGGTTGGACGGAAAGG
GCCGCGTCCGAACGCCCCCCGAGGTTCACCACTTCAAGACAGCCCTCAGTAACAAGTCTAAGCACACAC
GGACCCTCCAGCCCGGACTCAGAGTGTTATCCGTGGATCTGGGAATGCGCACCTTCGCCTCTTGCTCCGT
ATTTGAGCTGATCGAGGGCAAACCAGAGACTGGCAGAGCGTTCCCTGTGGCCGACGAACGTTCCATGGA
TTCACCAAACAAGCTGTGGGCCAAGCACGAAAGATCCTTTAAACTCACGCTCCCCGGCGAAACCCCCAG
TCGGAAAGAAGAGGAGGAACGGAGCATTGCAAGAGCCGAAATCTATGCGTTGAAAAGAGATATTCAGA
GATTAAAAAGTCTTCTGCGCCTGGGGGAAGAGGATAACGATAATAGACGCGATGCACTTCTTGAGCAAT
TTTTCAAGGGCTGGGGCGAGGAAGACGTGGTTCCAGGTCAGGCCTTTCCCCGGAGTCTGTTCCAGGGGC
TGGGGGCCGCCCCATTCAGATCCACCCCTGAGTTGTGGAGACAACACTGTCAAACCTATTATGATAAAG
CAGAGGCGTGCCTGGCTAAACACATCAGCGATTGGCGCAAGAGAACCAGGCCTAGGCCTACCTCACGTG
AGATGTGGTACAAGACACGCTCTTATCACGGCGGAAAGTCAATCTGGATGCTGGAATACCTCGACGCTG
TGAGGAAACTGCTCTTATCCTGGAGCCTCAGAGGCCGGACCTACGGGGCTATCAACAGACAGGACACAG
CAAGGTTCGGGAGCTTAGCCAGCCGGCTCCTTCACCACATTAACTCACTCAAAGAGGATCGAATAAAGA
CCGGAGCCGACTCGATCGTGCAGGCAGCCCGAGGGTACATCCCCCTGCCTCATGGGAAGGGCTGGGAGC
AGCGATATGAACCCTGCCAGCTGATCTTGTTTGAGGACCTTGCCCGTTATAGATTTCGCGTTGATAGACC
TCGCCGTGAGAATTCTCAGCTGATGCAGTGGAACCACAGAGCGATCGTGGCTGAGACCACTATGCAGGC
CGAGCTGTATGGACAGATCGTGGAGAACACCGCCGCAGGGTTCAGTTCTCGGTTTCATGCTGCCACCGG
AGCTCCCGGCGTCCGGTGCCGCTTCCTCTTAGAGCGTGATTTTGACAATGACCTCCCAAAGCCCTATCTG
CTGAGGGAACTGAGCTGGATGCTGGGGAACACAAAAGTAGAATCGGAGGAGGAGAAGCTACGGCTCCT
CTCCGAAAAGATACGTCCAGGCTCTCTGGTACCATGGGACGGAGGAGAGCAGTTCGCGACACTGCATCC
TAAGAGACAGACGTTATGTGTGATTCACGCCGATATGAACGCCGCTCAGAATCTGCAGCGAAGATTCTT
TGGCCGCTGCGGCGAAGCCTTCAGGCTGGTATGTCAGCCCCACGGGGATGATGTGCTGCGGCTGGCCTC
AACCCCTGGGGCTAGACTCTTGGGGGCACTCCAGCAGCTGGAAAATGGCCAAGGGGCTTTCGAACTCGT
TCGGGACATGGGCAGCACAAGCCAGATGAACAGATTCGTCATGAAGAGCCTGGGAAAGAAAAAGATCA
AACCCTTACAGGACAATAATGGCGACGACGAACTGGAGGACGTGTTGTCCGTGCTGCCAGAGGAAGAC
GACACAGGCCGCATCACTGTCTTCCGCGACTCAAGTGGGATATTCTTTCCTTGCAACGTGTGGATTCCGG
CCAAACAGTTCTGGCCTGCCGTCAGAGCCATGATTTGGAAAGTGATGGCTAGTCATTCATTGGGATGA
SEQ ATGACAAAGCTGAGGCACAGACAAAAGAAGCTTACACACGACTGGGCAGGGAGCAAGAAACGTGAGGT
ID CCTTGGGTCAAATGGAAAACTGCAGAACCCCTTGCTCATGCCTGTAAAGAAGGGGCAGGTAACAGAATT
NO: TAGAAAAGCATTCTCCGCGTACGCTCGGGCAACTAAGGGGGAAATGACCGATGGACGGAAGAACATGT
162 TCACCCATTCTTTCGAGCCATTCAAAACAAAGCCGTCATTGCACCAATGCGAGCTGGCCGATAAGGCTTA
CCAGTCTTTGCATAGTTACCTCCCCGGTTCCCTGGCCCATTTCTTGCTTTCCGCACACGCACTGGGCTTTC
GTATTTTCTCTAAATCTGGGGAGGCAACTGCCTTCCAGGCCAGCTCAAAAATCGAGGCCTATGAGTCCA
AGCTCGCTTCGGAGCTAGCCTGTGTCGATTTGAGTATCCAGAATTTGACGATTAGTACTCTTTTCAACGC
TCTCACAACTTCAGTTCGGGGCAAGGGGGAGGAAACTTCAGCAGATCCCCTTATCGCACGGTTCTACAC
TCTCCTGACGGGCAAGCCCCTGAGCCGAGACACACAGGGCCCAGAACGGGACTTGGCAGAGGTCATCTC
CAGAAAGATCGCCTCGTCCTTCGGCACATGGAAGGAAATGACTGCCAACCCTCTGCAGAGCCTCCAGTT
CTTCGAAGAAGAGCTTCATGCACTAGATGCCAACGTGTCTTTATCTCCAGCTTTTGATGTGTTAATCAAG
ATGAATGATCTCCAAGGTGATCTGAAGAACCGTACTATAGTGTTCGACCCAGATGCACCCGTGTTCGAG
TACAACGCTGAGGATCCAGCCGATATCATCATAAAGCTGACAGCTCGGTATGCGAAGGAGGCCGTCATC
AAGAATCAGAACGTGGGCAATTATGTGAAAAACGCCATTACCACCACTAATGCCAATGGGCTGGGGTGG
CTCCTCAATAAAGGGCTTTCACTACTGCCAGTTTCTACTGACGATGAGCTGCTCGAATTCATTGGGGTGG
AGAGAAGCCATCCCAGCTGTCACGCGCTGATAGAGCTGATTGCCCAGCTAGAGGCGCCGGAACTGTTTG
AGAAGAATGTGTTTAGTGACACCCGTTCCGAGGTTCAGGGTATGATCGACAGTGCAGTGTCGAACCACA
TTGCTCGGCTGTCCAGCAGCCGAAACTCCCTGAGCATGGACAGCGAGGAATTGGAACGCTTGATTAAAT
CTTTCCAGATTCATACTCCCCATTGTTCTCTGTTCATAGGCGCTCAGTCCTTATCTCAGCAGCTGGAGAGC
TTACCTGAGGCGCTGCAGTCCGGAGTGAACAGCGCTGATATCTTATTAGGCAGCACACAGTATATGCTG
ACCAACTCTCTCGTTGAAGAGTCAATTGCAACATATCAAAGGACATTAAATAGGATCAATTACCTGAGT
GGGGTGGCTGGGCAGATTAACGGTGCTATCAAAAGAAAGGCAATCGACGGCGAAAAAATACACCTGCC
TGCCGCCTGGAGTGAGCTCATCTCCTTACCTTTCATTGGACAGCCGGTGATTGATGTGGAGAGCGACCTG
GCACACTTAAAAAACCAGTACCAGACCCTGTCCAATGAATTTGACACCCTCATTTCGGCCCTGCAGAAG
AACTTCGATTTGAATTTCAACAAAGCACTCCTTAACCGCACGCAGCATTTCGAGGCAATGTGCCGGAGC
ACAAAAAAAAATGCTTTATCTAAGCCCGAGATCGTGTCCTACAGAGATCTGCTGGCGCGGCTGACCAGT
TGCCTTTATCGAGGCTCGCTGGTTCTCAGAAGGGCGGGAATCGAAGTTCTGAAAAAGCACAAAATCTTT
GAGTCGAATAGTGAGCTGAGAGAACACGTCCACGAGCGAAAGCACTTCGTGTTCGTTAGTCCATTGGAC
AGAAAGGCAAAAAAACTGTTGCGCCTGACCGATTCCCGCCCTGACTTGCTCCATGTGATCGATGAGATC
CTGCAACATGACAATCTGGAGAATAAGGACAGAGAGTCCCTTTGGCTGGTCCGGTCTGGGTACCTCCTT
GCTGGTCTGCCGGACCAGCTGAGTTCTTCGTTTATCAATCTCCCCATAATCACGCAAAAGGGCGATCGCC
GGCTGATTGACCTGATTCAGTATGACCAGATCAATCGCGATGCTTTCGTAATGTTGGTGACAAGTGCTTT
CAAAAGCAATCTCTCTGGGTTGCAGTACCGCGCTAACAAGCAGTCTTTCGTGGTCACCCGCACCCTGTCT
CCTTACCTGGGTAGTAAGCTCGTATACGTCCCTAAAGACAAAGATTGGCTGGTCCCATCCCAGATGTTTG
AGGGAAGATTCGCCGATATTCTGCAGAGTGACTACATGGTCTGGAAGGATGCCGGACGCCTGTGCGTGA
TCGACACTGCCAAACATCTCTCTAACATTAAAAAAAGCGTGTTTAGTAGCGAAGAAGTCCTTGCTTTTCT
TCGAGAGCTGCCTCACCGGACCTTCATCCAGACCGAGGTACGGGGGTTAGGAGTGAACGTCGATGGAAT
CGCATTTAATAACGGGGATATCCCGAGCTTGAAGACATTCTCGAATTGTGTGCAGGTGAAGGTGAGTAG
GACTAATACTAGTCTCGTGCAGACTCTAAACAGGTGGTTCGAGGGTGGCAAAGTGTCACCTCCCTCTATT
CAGTTCGAAAGAGCTTACTACAAAAAAGACGATCAGATTCACGAGGACGCAGCCAAGAGAAAGATACG
CTTCCAGATGCCAGCAACGGAATTAGTGCACGCCAGCGATGACGCTGGTTGGACCCCCAGCTACCTGCT
GGGCATCGACCCCGGTGAGTACGGAATGGGTCTCAGTTTGGTGTCCATCAACAATGGAGAGGTCCTGGA
TTCTGGATTCATCCACATTAATTCCCTGATCAATTTCGCGTCCAAAAAAAGCAATCACCAGACCAAAGTA
GTCCCCCGCCAGCAGTACAAGTCCCCCTACGCGAATTATCTCGAGCAGTCAAAGGATTCAGCAGCAGGG
GATATAGCTCACATTCTGGATCGGCTAATCTACAAATTGAACGCCTTGCCTGTGTTCGAGGCGCTGTCTG
GCAACAGTCAGAGTGCTGCTGATCAGGTATGGACCAAAGTTCTATCCTTCTATACATGGGGAGACAACG
ACGCACAGAACAGTATACGGAAGCAGCACTGGTTCGGTGCCTCACACTGGGATATTAAGGGGATGCTGC
GCCAACCCCCAACCGAAAAAAAACCCAAACCATATATAGCCTTTCCCGGGAGTCAAGTGTCATCCTATG
GAAATAGTCAAAGGTGTAGTTGTTGCGGCCGCAATCCCATTGAGCAGTTGCGTGAGATGGCAAAGGACA
CGAGTATCAAGGAGCTGAAAATCCGAAATAGTGAGATCCAACTATTCGATGGTACAATCAAGCTGTTTA
ACCCCGACCCTTCCACCGTCATCGAGAGGCGGCGGCATAACCTAGGACCCTCACGCATTCCTGTGGCAG
ACCGAACTTTCAAGAATATTAGCCCTTCTTCGTTAGAGTTCAAGGAGCTCATTACTATCGTTTCTCGAAG
CATCCGCCATAGCCCCGAATTTATTGCTAAGAAACGGGGTATCGGGTCTGAGTACTTTTGTGCTTATTCT
GACTGCAACTCCTCACTGAACTCAGAGGCCAATGCCGCGGCCAATGTGGCACAGAAGTTTCAGAAGCAA
CTCTTTTTCGAACTCTGA
SEQ ATGAAACGTATTCTGAACTCTCTGAAAGTCGCCGCACTGAGGCTGCTGTTTCGAGGAAAGGGCTCAGAG
ID CTGGTGAAGACCGTCAAGTACCCTCTGGTTTCGCCCGTCCAGGGTGCTGTGGAAGAACTCGCCGAAGCA
NO: ATACGCCACGACAACCTACATTTATTTGGGCAGAAGGAAATCGTAGATCTGATGGAGAAGGACGAGGG
163 CACCCAGGTCTACTCGGTGGTGGACTTTTGGCTCGACACACTCCGTCTAGGGATGTTCTTCAGTCCAAGT
GCTAATGCCCTTAAGATCACTCTGGGGAAGTTTAACAGCGACCAAGTTTCCCCTTTCAGGAAGGTTCTGG
AGCAGTCCCCTTTCTTTCTCGCGGGTAGACTCAAAGTGGAGCCCGCTGAACGTATCCTCAGCGTGGAGAT
CCGCAAGATCGGTAAGAGGGAGAATAGAGTGGAGAACTACGCCGCAGATGTAGAGACTTGTTTTATCG
GTCAGCTGTCTAGTGATGAAAAGCAGTCTATCCAGAAGCTCGCTAACGATATCTGGGACTCTAAGGATC
ACGAAGAGCAAAGGATGCTTAAGGCGGATTTCTTTGCCATTCCCCTCATCAAAGACCCAAAGGCAGTGA
CCGAGGAAGATCCCGAGAATGAAACCGCAGGCAAACAGAAGCCTCTCGAATTATGTGTGTGCTTAGTGC
CCGAGTTGTACACCCGCGGGTTCGGTTCAATAGCGGACTTCCTGGTCCAGCGTCTGACACTATTAAGAGA
CAAAATGAGCACAGACACAGCAGAAGACTGCCTTGAGTATGTCGGCATAGAGGAGGAGAAGGGTAATG
GGATGAACTCGCTGCTGGGGACGTTCCTCAAGAACCTGCAGGGAGACGGGTTCGAACAGATCTTCCAAT
TTATGCTCGGCAGTTACGTGGGATGGCAAGGTAAGGAAGACGTCCTACGCGAACGGCTTGATTTGCTAG
CGGAGAAGGTTAAAAGACTGCCGAAACCTAAGTTTGCCGGCGAGTGGTCCGGCCATCGGATGTTCCTGC
ATGGTCAATTGAAGAGCTGGTCCTCTAACTTTTTCCGCCTGTTTAACGAGACTAGGGAGCTCCTCGAAAG
CATAAAATCCGACATCCAACACGCGACCATGTTAATCAGCTACGTCGAAGAGAAAGGGGGATACCACCC
ACAACTCTTGTCACAGTACAGGAAACTAATGGAGCAGCTGCCAGCTCTCAGAACAAAGGTGTTAGATCC
AGAGATAGAAATGACTCACATGAGCGAGGCGGTAAGGTCGTACATTATGATCCACAAGTCGGTAGCAG
GATTTCTGCCTGACTTACTCGAGTCCCTCGATAGGGACAAGGACAGGGAATTCCTGCTGAGTATATTTCC
AAGGATCCCCAAAATTGACAAAAAAACTAAGGAAATCGTGGCCTGGGAGCTCCCAGGCGAGCCCGAAG
AAGGATACCTGTTCACTGCCAATAATCTTTTTCGCAACTTTCTGGAGAATCCTAAACATGTTCCACGTTTC
ATGGCAGAAAGGATCCCGGAAGATTGGACGCGCCTGCGGTCCGCTCCCGTATGGTTTGACGGCATGGTG
AAACAATGGCAGAAAGTGGTAAACCAGCTGGTGGAGTCACCTGGAGCATTGTATCAGTTCAATGAAAGC
TTTCTCCGACAACGTTTACAGGCAATGCTGACAGTGTATAAGAGAGACCTGCAGACAGAGAAATTCCTT
AAGTTGTTGGCTGATGTCTGCAGGCCTCTGGTGGACTTCTTTGGGCTGGGGGGAAACGATATCATCTTCA
AAAGCTGCCAGGACCCGAGGAAACAATGGCAAACTGTCATTCCCTTGAGTGTCCCCGCTGATGTGTACA
CCGCGTGTGAGGGGCTGGCAATCCGGCTTCGTGAGACATTGGGATTTGAGTGGAAGAACCTTAAGGGCC
ATGAAAGGGAGGACTTTCTAAGACTGCACCAGCTTTTAGGGAATCTGCTTTTCTGGATTCGAGATGCCAA
ACTGGTGGTGAAATTGGAAGATTGGATGAATAATCCCTGTGTTCAGGAGTACGTTGAGGCTCGTAAGGC
CATTGATCTCCCACTGGAGATCTTCGGCTTTGAGGTCCCCATCTTCCTGAACGGATATCTGTTTAGTGAA
CTGAGGCAGTTAGAACTGCTGCTCCGCCGTAAGTCGGTTATGACCAGCTATTCGGTTAAGACAACTGGC
AGTCCAAACAGGCTTTTCCAGTTAGTCTACCTGCCATTAAATCCTTCCGACCCTGAGAAAAAAAATTCTA
ATAACTTTCAGGAACGCCTGGACACCCCCACTGGCTTATCACGTCGCTTCCTGGACCTTACTCTGGACGC
CTTCGCCGGCAAGTTGCTGACAGACCCCGTGACTCAAGAGCTTAAAACTATGGCTGGGTTCTACGATCA
CCTGTTTGGTTTCAAGCTCCCATGTAAGCTGGCAGCCATGTCTAACCACCCTGGCTCTAGCAGCAAGATG
GTCGTGTTGGCCAAACCTAAAAAAGGGGTTGCATCTAATATAGGATTCGAACCAATCCCTGATCCCGCG
CACCCCGTATTCCGGGTGAGATCATCATGGCCAGAGCTGAAGTATCTGGAGGGGTTACTGTATCTTCCAG
AAGACACTCCACTGACAATAGAGCTCGCAGAGACAAGTGTTAGTTGTCAGAGCGTCAGTAGCGTGGCAT
TCGATCTGAAAAATCTGACTACTATCCTTGGACGCGTGGGTGAGTTCCGTGTGACCGCAGACCAGCCTTT
TAAGTTGACCCCCATCATCCCTGAGAAGGAGGAGTCCTTCATAGGAAAAACATATCTAGGCCTTGATGC
CGGGGAACGCTCAGGCGTAGGGTTCGCTATCGTCACAGTCGACGGGGATGGGTACGAGGTACAGCGCCT
GGGGGTGCATGAAGATACACAGCTGATGGCCCTACAGCAGGTGGCCTCTAAAAGCTTGAAGGAGCCGG
TGTTCCAGCCGCTCAGAAAGGGTACTTTTCGGCAGCAGGAACGTATTAGAAAATCTCTCAGAGGATGTT
ATTGGAACTTCTATCACGCTCTGATGATTAAGTACCGCGCCAAGGTAGTGCACGAAGAGAGCGTGGGCA
GTTCCGGCCTGGTTGGGCAGTGGTTACGAGCATTCCAGAAGGACCTCAAGAAAGCCGATGTGTTGCCAA
AAAAGGGAGGCAAAAACGGAGTCGATAAGAAAAAGAGAGAGTCTTCTGCACAAGACACATTGTGGGGA
GGGGCTTTTAGCAAGAAGGAAGAACAGCAGATAGCTTTCGAAGTCCAAGCTGCTGGTTCTAGCCAGTTC
TGCCTGAAGTGCGGATGGTGGTTCCAACTCGGAATGCGTGAGGTTAATCGCGTGCAGGAATCCGGCGTC
GTGCTGGATTGGAATCGGAGTATTGTCACATTCCTGATTGAGAGCTCTGGCGAGAAAGTGTATGGGTTCT
CCCCTCAGCAACTCGAAAAGGGGTTCAGACCAGACATTGAAACCTTCAAGAAGATGGTTCGGGATTTCA
TGCGCCCGCCTATGTTTGACCGGAAGGGTCGCCCAGCAGCTGCCTACGAAAGGTTTGTCTTGGGACGCC
GGCATCGGCGGTATAGATTCGACAAGGTTTTTGAAGAACGATTCGGACGATCCGCGCTATTCATTTGCCC
GAGGGTTGGCTGTGGCAACTTTGACCACAGCAGCGAGCAGTCAGCCGTAGTGCTGGCTCTAATCGGATA
TATTGCCGACAAAGAGGGGATGAGCGGAAAAAAGCTAGTCTACGTGCGTCTGGCAGAACTAATGGCGG
AATGGAAATTGAAGAAACTGGAGAGGAGTAGAGTTGAGGAGCAAAGCTCCGCTCAGTGA
SEQ ATGGCGGAGTCGAAGCAAATGCAGTGCAGGAAGTGTGGAGCCTCTATGAAGTACGAAGTGATCGGCCT
ID CGGGAAGAAAAGCTGCAGATATATGTGTCCCGACTGCGGGAATCACACATCTGCAAGAAAGATTCAGA
NO: ATAAGAAGAAAAGGGACAAGAAGTATGGATCTGCCAGTAAAGCACAAAGCCAACGAATCGCAGTTGCA
164 GGGGCCTTATACCCGGATAAAAAGGTTCAGACCATCAAGACTTATAAGTATCCAGCCGACCTGAATGGT
GAGGTCCATGACTCAGGGGTGGCCGAAAAAATAGCCCAAGCAATCCAGGAGGATGAAATAGGGCTCCT
CGGCCCCTCTTCCGAGTACGCCTGTTGGATCGCTAGCCAGAAACAGAGCGAGCCCTACAGTGTTGTAGA
CTTTTGGTTTGACGCTGTGTGCGCCGGAGGCGTGTTCGCCTATTCTGGGGCTAGATTGCTGTCTACCGTCC
TGCAGCTATCTGGGGAGGAGAGCGTCCTACGCGCAGCCCTGGCATCCTCCCCTTTTGTCGACGATATCAA
TCTGGCACAGGCCGAAAAATTTCTGGCGGTGTCCAGGCGAACCGGCCAAGATAAGCTGGGGAAGCGCA
TTGGAGAGTGCTTCGCAGAGGGCCGACTTGAGGCCCTAGGCATCAAGGACCGGATGCGTGAATTTGTCC
AGGCTATCGATGTCGCTCAGACCGCTGGGCAGCGTTTTGCCGCGAAACTGAAAATCTTTGGGATTTCTCA
GATGCCCGAGGCAAAGCAGTGGAACAATGACAGCGGACTCACCGTGTGCATCCTGCCCGACTATTACGT
CCCAGAAGAAAATCGCGCAGATCAGTTGGTCGTCCTGCTAAGACGACTGAGAGAGATAGCATACTGTAT
GGGGATCGAAGATGAGGCCGGTTTTGAACATCTTGGAATTGATCCTGGCGCACTATCAAATTTTTCCAAT
GGCAATCCTAAACGCGGATTTTTGGGCCGCCTGCTGAACAATGATATTATTGCCTTAGCGAACAACATGT
CCGCCATGACGCCTTACTGGGAGGGCAGGAAGGGAGAACTGATTGAAAGATTGGCTTGGCTGAAGCAC
CGTGCAGAGGGGCTTTATCTGAAGGAACCGCATTTTGGAAATAGTTGGGCCGACCATAGGTCTAGAATT
TTTTCCAGAATAGCCGGGTGGCTTTCTGGGTGCGCTGGGAAGCTAAAGATCGCCAAAGACCAGATCAGC
GGAGTGCGTACTGATCTGTTCCTTCTGAAGAGACTGCTGGATGCGGTCCCGCAGTCCGCCCCTTCTCCCG
ACTTCATAGCCTCTATCTCTGCCTTGGATCGCTTCCTGGAGGCCGCAGAATCTAGTCAGGATCCTGCCGA
ACAGGTGAGGGCCCTATACGCCTTTCATCTGAACGCACCCGCGGTGCGAAGCATCGCCAACAAGGCAGT
CCAGCGATCCGACAGCCAAGAATGGCTTATAAAGGAACTGGACGCTGTGGACCACCTGGAGTTTAACAA
GGCCTTTCCCTTCTTCTCTGATACGGGAAAGAAGAAAAAGAAAGGGGCTAACTCGAATGGCGCTCCGTC
CGAGGAGGAGTACACCGAGACTGAGAGCATCCAGCAGCCCGAGGACGCTGAGCAAGAGGTTAATGGTC
AGGAAGGCAACGGGGCCTCGAAGAACCAGAAGAAGTTTCAGAGAATCCCCCGATTCTTCGGCGAGGGG
AGTCGCAGCGAGTATCGCATCCTCACTGAAGCCCCGCAGTACTTCGACATGTTCTGTAACAACATGCGG
GCCATCTTTATGCAATTAGAATCCCAACCGCGTAAAGCTCCCAGGGATTTTAAGTGTTTCCTGCAGAATC
GGCTGCAGAAATTGTATAAGCAGACATTCCTGAACGCTCGATCCAACAAGTGCCGGGCATTACTAGAGT
CCGTATTGATTAGTTGGGGAGAGTTTTACACCTACGGGGCTAACGAGAAAAAATTTCGACTGCGTCATG
AAGCTTCTGAGCGCTCCTCGGACCCAGATTACGTGGTGCAACAGGCGCTGGAGATCGCTCGGAGGCTGT
TTCTCTTCGGCTTTGAGTGGAGGGACTGTAGCGCAGGTGAAAGAGTGGATCTGGTCGAAATACATAAGA
AAGCCATATCTTTCCTGTTGGCCATCACTCAGGCTGAGGTGTCTGTGGGCAGCTATAACTGGCTGGGCAA
TTCTACCGTGAGTCGGTACCTGTCCGTGGCAGGGACTGATACCCTTTACGGCACCCAGCTGGAAGAATTC
TTAAATGCAACCGTGTTATCTCAGATGCGGGGGCTGGCTATCAGGTTATCATCTCAGGAACTGAAGGAT
GGATTTGACGTACAGCTGGAGTCTAGTTGCCAGGATAATCTGCAACACTTGCTCGTGTACAGGGCTTCAC
GAGACCTTGCCGCCTGCAAGCGCGCTACTTGTCCAGCTGAGTTGGATCCTAAGATTCTGGTACTGCCCGT
GGGGGCCTTTATCGCTAGCGTGATGAAAATGATTGAAAGAGGGGATGAGCCTTTAGCTGGAGCTTATCT
GAGACACAGACCCCATAGTTTCGGGTGGCAGATCCGCGTTCGAGGTGTGGCAGAGGTGGGAATGGACC
AAGGGACCGCCCTGGCGTTCCAGAAACCGACCGAGAGCGAACCCTTCAAGATAAAGCCGTTTTCCGCTC
AATACGGCCCCGTTCTATGGCTGAACAGCTCCAGTTATAGCCAGAGCCAGTACCTGGACGGGTTCCTATC
ACAGCCCAAGAACTGGAGTATGCGGGTGCTGCCACAGGCCGGCTCAGTGCGGGTAGAACAGCGCGTCG
CCTTGATTTGGAATCTCCAGGCCGGAAAGATGAGGCTGGAACGGAGCGGAGCGCGGGCTTTCTTCATGC
CCGTCCCATTCAGTTTCCGCCCCAGTGGCAGCGGCGACGAGGCAGTCCTGGCTCCAAATAGGTACCTGG
GACTCTTTCCACACAGCGGCGGCATAGAGTACGCTGTGGTCGATGTTCTTGACTCTGCCGGCTTCAAAAT
ACTCGAGAGAGGAACAATAGCCGTCAATGGCTTCTCCCAGAAACGAGGAGAAAGACAAGAGGAAGCCC
ATCGCGAAAAACAAAGACGCGGTATCTCCGATATTGGGCGCAAGAAGCCAGTCCAGGCCGAAGTCGAT
GCGGCCAACGAGCTCCATCGAAAATACACCGATGTTGCTACTCGGCTGGGGTGTCGAATTGTCGTTCAA
TGGGCACCCCAACCCAAACCAGGCACTGCGCCGACCGCTCAGACTGTGTACGCTAGGGCCGTGAGGACT
GAAGCACCAAGATCCGGCAATCAGGAAGATCACGCCAGGATGAAATCTTCCTGGGGATACACATGGGG
TACGTATTGGGAAAAAAGGAAGCCCGAGGACATCCTCGGCATTAGTACCCAGGTGTATTGGACAGGCGG
GATCGGCGAGTCCTGCCCGGCTGTCGCCGTCGCGCTATTGGGACACATCAGGGCCACCTCAACCCAGAC
TGAATGGGAGAAAGAGGAAGTCGTGTTTGGGCGATTGAAAAAGTTCTTCCCATCCTGA
SEQ ATGGAGAAGCGCATCAATAAAATTCGCAAGAAGCTGTCTGCCGATAACGCCACAAAACCAGTTAGTCGA
ID AGCGGCCCAATGAAGACCCTGCTAGTTCGAGTGATGACTGATGATCTGAAGAAAAGGCTCGAAAAGCG
NO: ACGCAAGAAGCCTGAGGTAATGCCTCAGGTTATAAGTAACAATGCAGCAAACAATCTGCGGATGCTGCT
165 TGACGATTACACAAAGATGAAGGAAGCCATTCTCCAGGTGTATTGGCAGGAGTTCAAGGATGATCACGT
AGGCCTGATGTGTAAATTCGCGCAACCTGCAAGCAAGAAGATCGACCAAAACAAGCTGAAACCCGAGA
TGGATGAAAAAGGCAATTTAACAACCGCCGGATTCGCTTGTTCCCAGTGTGGGCAGCCACTGTTCGTGT
ACAAGTTAGAACAGGTGTCGGAAAAAGGAAAGGCATACACTAACTACTTTGGACGGTGCAATGTTGCA
GAACACGAAAAGCTGATACTGCTTGCCCAGCTTAAGCCCGAAAAAGACAGCGACGAAGCGGTGACCTA
CAGCCTGGGAAAATTCGGGCAGCGGGCACTGGACTTCTATTCTATCCACGTTACCAAGGAGAGCACCCA
CCCAGTGAAGCCGTTGGCCCAAATCGCTGGAAACCGGTACGCCAGCGGACCAGTCGGCAAGGCCCTGTC
CGATGCCTGTATGGGCACAATTGCTTCTTTCCTGTCCAAGTACCAGGACATCATAATCGAGCACCAAAAA
GTTGTGAAAGGGAATCAGAAACGCCTGGAATCCCTTCGAGAACTGGCCGGCAAGGAGAACCTTGAGTA
CCCGTCCGTGACCCTGCCTCCACAGCCACATACCAAAGAGGGCGTAGACGCGTATAATGAGGTCATTGC
CCGCGTTCGCATGTGGGTTAATTTAAACCTGTGGCAGAAATTAAAACTAAGCCGAGATGATGCTAAACC
GTTACTGAGATTGAAGGGATTCCCTAGCTTTCCTGTGGTGGAGAGAAGGGAAAACGAGGTTGATTGGTG
GAATACTATTAATGAGGTGAAAAAGCTTATTGACGCCAAGAGGGATATGGGCAGGGTGTTCTGGAGCGG
GGTGACTGCCGAAAAGAGAAATACCATCCTCGAGGGATACAATTACCTCCCCAACGAGAATGATCATAA
GAAAAGAGAGGGGAGCTTAGAGAATCCAAAGAAACCTGCAAAGAGGCAATTCGGTGATCTCCTGCTCT
ACCTCGAGAAGAAATACGCGGGGGACTGGGGAAAAGTTTTTGACGAAGCCTGGGAGCGCATTGACAAG
AAGATCGCCGGGCTGACGTCTCACATTGAACGGGAAGAGGCACGGAATGCAGAGGACGCCCAGTCTAA
GGCCGTGCTGACTGACTGGCTGCGCGCAAAGGCCTCCTTCGTGCTCGAACGTCTGAAGGAAATGGATGA
GAAAGAGTTTTACGCGTGTGAAATACAGCTGCAGAAGTGGTACGGCGATCTAAGGGGAAATCCCTTCGC
AGTGGAAGCCGAGAATAGGGTAGTTGACATCAGTGGGTTCTCCATCGGCAGTGATGGACATTCTATCCA
GTATAGAAACCTGCTCGCCTGGAAGTACTTAGAGAACGGCAAGAGAGAGTTCTATCTGCTGATGAACTA
CGGGAAAAAAGGTAGAATTCGCTTTACAGATGGCACCGACATAAAGAAGTCCGGAAAGTGGCAAGGCC
TCTTATACGGAGGCGGCAAAGCAAAGGTGATAGACTTGACTTTTGACCCTGACGACGAACAGCTGATAA
TCTTGCCGCTGGCCTTTGGCACAAGACAAGGTAGGGAATTTATCTGGAATGATCTTCTTTCTCTCGAGAC
CGGACTCATCAAGCTCGCAAACGGAAGGGTCATCGAGAAGACAATCTACAATAAAAAGATAGGCCGAG
ACGAGCCAGCCCTGTTTGTGGCTTTGACATTTGAGCGGAGAGAGGTCGTAGATCCCAGCAACATCAAAC
CCGTGAACCTGATCGGTGTTGACAGGGGCGAGAACATCCCGGCGGTTATCGCACTGACGGATCCAGAAG
GATGTCCTCTGCCCGAGTTCAAAGATTCATCGGGAGGGCCAACCGACATTTTGAGGATAGGGGAGGGGT
ACAAGGAGAAGCAGCGAGCTATCCAGGCGGCCAAAGAAGTGGAGCAACGAAGAGCTGGTGGTTATTCT
CGCAAGTTCGCTTCCAAAAGTCGTAACCTGGCTGACGATATGGTGCGCAATTCTGCCCGTGACCTTTTCT
ACCACGCCGTTACACACGACGCCGTGTTAGTGTTTGAAAATCTTAGTCGAGGCTTCGGGCGACAGGGGA
AGCGGACCTTTATGACCGAGAGACAGTATACAAAAATGGAGGATTGGCTGACCGCCAAACTGGCGTATG
AAGGACTCACATCCAAGACCTATCTCTCAAAAACTTTGGCCCAGTATACATCTAAGACGTGCAGTAACT
GTGGCTTCACCATTACCACAGCTGACTACGATGGCATGCTGGTCCGCTTAAAAAAGACATCTGACGGCT
GGGCTACTACCCTCAACAATAAAGAGCTCAAAGCCGAAGGACAAATTACCTATTATAACAGGTATAAAA
GACAGACTGTCGAGAAGGAGTTGAGCGCGGAGCTGGACCGCCTATCAGAGGAGTCAGGGAACAACGAT
ATCTCTAAGTGGACTAAGGGACGCCGAGACGAGGCGTTGTTCTTGCTGAAAAAGCGGTTCTCTCATCGA
CCCGTGCAGGAGCAGTTCGTGTGTCTGGACTGCGGCCACGAGGTTCATGCTGATGAGCAAGCTGCTCTA
AATATTGCCCGTAGTTGGTTGTTCCTGAACAGCAATTCAACAGAGTTCAAGTCATACAAGAGCGGAAAG
CAGCCGTTTGTGGGCGCATGGCAGGCATTTTACAAAAGACGCCTGAAGGAAGTGTGGAAGCCAAACGCC
SEQ ATGAAAAGGATTAACAAAATCCGAAGGCGGCTTGTAAAGGATTCTAACACCAAAAAGGCTGGCAAGAC
ID GGGGCCCATGAAAACATTACTCGTTAGAGTTATGACCCCCGACCTCAGAGAGCGACTGGAAAATTTACG
NO: CAAGAAGCCAGAGAACATACCTCAGCCAATTAGTAATACCTCTCGGGCAAACCTAAACAAGTTGCTTAC
166 TGATTACACGGAGATGAAAAAGGCCATACTGCATGTGTACTGGGAGGAGTTTCAAAAGGACCCTGTCGG
GCTAATGAGCAGGGTGGCTCAGCCTGCACCTAAAAACATCGACCAGCGGAAACTCATCCCAGTTAAGGA
CGGAAATGAGAGATTGACAAGTTCAGGTTTCGCCTGCTCACAGTGCTGTCAACCGCTGTACGTTTATAAG
TTAGAACAAGTGAATGACAAAGGAAAGCCTCACACAAATTATTTTGGCCGGTGTAATGTCTCTGAGCAT
GAGCGTCTGATTCTGTTGTCCCCGCATAAACCGGAAGCTAATGACGAGCTCGTAACCTACAGCTTGGGG
AAGTTTGGCCAAAGAGCATTGGACTTCTATTCAATCCATGTGACCCGCGAATCCAATCATCCCGTCAAGC
CCTTGGAGCAGATAGGGGGCAATAGTTGCGCTTCTGGCCCTGTGGGCAAAGCCCTGTCCGACGCCTGTA
TGGGAGCCGTGGCTTCATTCCTGACCAAATATCAGGATATCATCTTGGAGCACCAGAAAGTGATCAAGA
AAAATGAAAAAAGGTTAGCAAACCTCAAGGATATTGCAAGCGCTAACGGCTTGGCTTTTCCTAAAATCA
CACTTCCACCTCAGCCTCACACAAAGGAAGGCATCGAGGCATACAACAATGTGGTGGCCCAGATCGTCA
TCTGGGTTAACTTAAACCTGTGGCAGAAACTTAAAATTGGCAGGGATGAGGCAAAACCCTTACAGCGCC
TGAAAGGATTCCCCAGCTTTCCACTGGTGGAGCGCCAGGCTAACGAAGTGGACTGGTGGGATATGGTGT
GTAACGTCAAGAAGCTCATCAATGAAAAGAAAGAGGACGGTAAAGTCTTCTGGCAGAACCTCGCCGGTT
ACAAACGGCAGGAGGCGCTGTTACCTTATCTGTCGAGTGAAGAGGACCGGAAAAAAGGCAAGAAATTT
GCTCGTTATCAGTTTGGTGATTTGCTCCTACATTTGGAGAAGAAGCACGGCGAGGACTGGGGAAAAGTA
TACGATGAGGCCTGGGAGAGGATTGACAAAAAGGTGGAGGGACTGTCAAAGCACATCAAGCTCGAAGA
AGAGCGCAGAAGCGAGGACGCCCAATCCAAAGCAGCGCTGACTGACTGGCTGCGGGCGAAGGCCAGTT
TTGTAATCGAAGGCCTTAAAGAAGCCGACAAGGATGAATTCTGCAGATGCGAATTAAAACTCCAGAAGT
GGTACGGCGATCTCCGAGGTAAGCCTTTCGCAATCGAGGCCGAGAATTCCATACTGGACATTAGTGGAT
TCAGTAAACAGTATAATTGTGCCTTTATATGGCAGAAGGATGGTGTCAAGAAACTCAACCTGTACCTTAT
TATTAATTATTTCAAAGGCGGGAAACTGAGATTTAAGAAGATAAAGCCTGAAGCCTTTGAGGCGAACCG
ATTCTACACAGTTATTAACAAGAAATCTGGTGAAATTGTACCCATGGAGGTAAACTTCAACTTCGATGAT
CCCAATCTGATTATATTGCCACTAGCTTTTGGCAAGCGGCAGGGTAGGGAATTCATTTGGAACGATTTGC
TTTCACTGGAAACAGGGTCCCTTAAGCTGGCAAACGGGAGAGTGATTGAAAAGACATTGTACAATCGGA
GGACACGTCAGGATGAACCTGCCCTTTTCGTGGCTCTGACATTCGAGCGCAGGGAGGTTCTGGACTCTA
GCAATATCAAGCCAATGAACCTGATCGGCATAGACCGAGGAGAGAATATTCCGGCTGTGATCGCACTCA
CCGATCCCGAAGGATGTCCCCTTTCTCGGTTCAAGGACTCCTTAGGCAATCCAACTCATATCCTGAGAAT
CGGCGAGTCATACAAGGAGAAGCAGCGAACAATTCAGGCCGCCAAGGAAGTCGAGCAGAGGCGAGCTG
GCGGCTACAGCCGTAAATACGCTAGTAAAGCTAAGAACCTGGCCGACGATATGGTGCGCAATACTGCTA
GAGACCTGCTGTACTATGCAGTGACGCAGGACGCAATGCTGATATTCGAGAATCTGTCCAGAGGATTCG
GAAGGCAGGGCAAGCGGACGTTCATGGCCGAGCGCCAGTATACAAGGATGGAGGATTGGTTAACGGCC
AAGCTTGCCTATGAGGGGCTACCTAGTAAGACCTATCTGTCTAAGACGCTGGCTCAATACACCAGTAAG
ACCTGCTCAAACTGTGGCTTTACAATCACTTCTGCTGATTATGATAGAGTGCTCGAGAAGCTAAAAAAA
ACTGCCACCGGCTGGATGACTACTATTAATGGGAAGGAACTGAAAGTGGAAGGACAGATTACCTATTAT
AATCGCTACAAGCGTCAAAACGTCGTCAAGGACCTGTCGGTGGAATTGGACAGACTCAGTGAAGAGTCC
GTGAACAATGATATCAGCTCCTGGACAAAAGGGCGCAGTGGGGAGGCACTCAGCTTGCTTAAAAAGAG
GTTTTCACATCGGCCGGTCCAGGAGAAATTTGTCTGCCTGAACTGCGGATTCGAGACACACGCCGACGA
GCAGGCAGCACTGAACATTGCCAGATCCTGGCTGTTCCTTAGGTCCCAGGAATATAAGAAGTACCAGAC
TAACAAAACCACGGGAAACACAGATAAAAGGGCCTTTGTCGAAACTTGGCAATCCTTTTACCGGAAGAA
GTTAAAGGAAGTGTGGAAGCCC
SEQ ATGGATAAGAAATACTCAATAGGCTTAGCAATCGGCACAAATAGCGTCGGATGGGCGGTGATCACTGAT
ID GAATATAAGGTTCCGTCTAAAAAGTTCAAGGTTCTGGGAAATACAGACCGCCACAGTATCAAAAAAAAT
NO: CTTATAGGGGCTCTTTTATTTGACAGTGGAGAGACAGCGGAAGCGACTCGTCTCAAACGGACAGCTCGT
167 AGAAGGTATACACGTCGGAAGAATCGTATTTGTTATCTACAGGAGATTTTTTCAAATGAGATGGCGAAA
GTAGATGATAGTTTCTTTCATCGACTTGAAGAGTCTTTTTTGGTGGAAGAAGACAAGAAGCATGAACGTC
ATCCTATTTTTGGAAATATAGTAGATGAAGTTGCTTATCATGAGAAATATCCAACTATCTATCATCTGCG
AAAAAAATTGGTAGATTCTACTGATAAAGCGGATTTGCGCTTAATCTATTTGGCCTTAGCGCATATGATT
AAGTTTCGTGGTCATTTTTTGATTGAGGGAGATTTAAATCCTGATAATAGTGATGTGGACAAACTATTTA
TCCAGTTGGTACAAACCTACAATCAATTATTTGAAGAAAACCCTATTAACGCAAGTGGAGTAGATGCTA
AAGCGATTCTTTCTGCACGATTGAGTAAATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGA
GAAGAAAAATGGCTTATTTGGGAATCTCATTGCTTTGTCATTGGGTTTGACCCCTAATTTTAAATCAAAT
TTTGATTTGGCAGAAGATGCTAAATTACAGCTTTCAAAAGATACTTACGATGATGATTTAGATAATTTAT
TGGCGCAAATTGGAGATCAATATGCTGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCTATTTTACT
TTCAGATATCCTAAGAGTAAATACTGAAATAACTAAGGCTCCCCTATCAGCTTCAATGATTAAACGCTAC
GATGAACATCATCAAGACTTGACTCTTTTAAAAGCTTTAGTTCGACAACAACTTCCAGAAAAGTATAAA
GAAATCTTTTTTGATCAATCAAAAAACGGATATGCAGGTTATATTGATGGGGGAGCTAGCCAAGAAGAA
TTTTATAAATTTATCAAACCAATTTTAGAAAAAATGGATGGTACTGAGGAATTATTGGTGAAACTAAATC
GTGAAGATTTGCTGCGCAAGCAACGGACCTTTGACAACGGCTCTATTCCCCATCAAATTCACTTGGGTGA
GCTGCATGCTATTTTGAGAAGACAAGAAGACTTTTATCCATTTTTAAAAGACAATCGTGAGAAGATTGA
AAAAATCTTGACTTTTCGAATTCCTTATTATGTTGGTCCATTGGCGCGTGGCAATAGTCGTTTTGCATGGA
TGACTCGGAAGTCTGAAGAAACAATTACCCCATGGAATTTTGAAGAAGTTGTCGATAAAGGTGCTTCAG
CTCAATCATTTATTGAACGCATGACAAACTTTGATAAAAATCTTCCAAATGAAAAAGTACTACCAAAAC
ATAGTTTGCTTTATGAGTATTTTACGGTTTATAACGAATTGACAAAGGTCAAATATGTTACTGAAGGAAT
GCGAAAACCAGCATTTCTTTCAGGTGAACAGAAGAAAGCCATTGTTGATTTACTCTTCAAAACAAATCG
AAAAGTAACCGTTAAGCAATTAAAAGAAGATTATTTCAAAAAAATAGAATGTTTTGATAGTGTTGAAAT
TTCAGGAGTTGAAGATAGATTTAATGCTTCATTAGGTACCTACCATGATTTGCTAAAAATTATTAAAGAT
AAAGATTTTTTGGATAATGAAGAAAATGAAGATATCTTAGAGGATATTGTTTTAACATTGACCTTATTTG
AAGATAGGGAGATGATTGAGGAAAGACTTAAAACATATGCTCACCTCTTTGATGATAAGGTGATGAAAC
AGCTTAAACGTCGCCGTTATACTGGTTGGGGACGTTTGTCTCGAAAATTGATTAATGGTATTAGGGATAA
GCAATCTGGCAAAACAATATTAGATTTTTTGAAATCAGATGGTTTTGCCAATCGCAATTTTATGCAGCTG
ATCCATGATGATAGTTTGACATTTAAAGAAGACATTCAAAAAGCACAAGTGTCTGGACAAGGCGATAGT
TTACATGAACATATTGCAAATTTAGCTGGTAGCCCTGCTATTAAAAAAGGTATTTTACAGACTGTAAAAG
TTGTTGATGAATTGGTCAAAGTAATGGGGCGGCATAAGCCAGAAAATATCGTTATTGAAATGGCACGTG
AAAATCAGACAACTCAAAAGGGCCAGAAAAATTCGCGAGAGCGTATGAAACGAATCGAAGAAGGTATC
AAAGAATTAGGAAGTCAGATTCTTAAAGAGCATCCTGTTGAAAATACTCAATTGCAAAATGAAAAGCTC
TATCTCTATTATCTCCAAAATGGAAGAGACATGTATGTGGACCAAGAATTAGATATTAATCGTTTAAGTG
ATTATGATGTCGATCACATTGTTCCACAAAGTTTCCTTAAAGACGATTCAATAGACAATAAGGTCTTAAC
GCGTTCTGATAAAAATCGTGGTAAATCGGATAACGTTCCAAGTGAAGAAGTAGTCAAAAAGATGAAAA
ACTATTGGAGACAACTTCTAAACGCCAAGTTAATCACTCAACGTAAGTTTGATAATTTAACGAAAGCTG
AACGTGGAGGTTTGAGTGAACTTGATAAAGCTGGTTTTATCAAACGCCAATTGGTTGAAACTCGCCAAA
TCACTAAGCATGTGGCACAAATTTTGGATAGTCGCATGAATACTAAATACGATGAAAATGATAAACTTA
TTCGAGAGGTTAAAGTGATTACCTTAAAATCTAAATTAGTTTCTGACTTCCGAAAAGATTTCCAATTCTA
TAAAGTACGTGAGATTAACAATTACCATCATGCCCATGATGCGTATCTAAATGCCGTCGTTGGAACTGCT
TTGATTAAGAAATATCCAAAACTTGAATCGGAGTTTGTCTATGGTGATTATAAAGTTTATGATGTTCGTA
AAATGATTGCTAAGTCTGAGCAAGAAATAGGCAAAGCAACCGCAAAATATTTCTTTTACTCTAATATCA
TGAACTTCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACGCCCTCTAATCGAAACTA
ATGGGGAAACTGGAGAAATTGTCTGGGATAAAGGGCGAGATTTTGCCACAGTGCGCAAAGTATTGTCCA
TGCCCCAAGTCAATATTGTCAAGAAAACAGAAGTACAGACAGGCGGATTCTCCAAGGAGTCAATTTTAC
CAAAAAGAAATTCGGACAAGCTTATTGCTCGTAAAAAAGACTGGGATCCAAAAAAATATGGTGGTTTTG
ATAGTCCAACGGTAGCTTATTCAGTCCTAGTGGTTGCTAAGGTGGAAAAAGGGAAATCGAAGAAGTTAA
AATCCGTTAAAGAGTTACTAGGGATCACAATTATGGAAAGAAGTTCCTTTGAAAAAAATCCGATTGACT
TTTTAGAAGCTAAAGGATATAAGGAAGTTAAAAAAGACTTAATCATTAAACTACCTAAATATAGTCTTTT
TGAGTTAGAAAACGGTCGTAAACGGATGCTGGCTAGTGCCGGAGAATTACAAAAAGGAAATGAGCTGG
CTCTGCCAAGCAAATATGTGAATTTTTTATATTTAGCTAGTCATTATGAAAAGTTGAAGGGTAGTCCAGA
AGATAACGAACAAAAACAATTGTTTGTGGAGCAGCATAAGCATTATTTAGATGAGATTATTGAGCAAAT
CAGTGAATTTTCTAAGCGTGTTATTTTAGCAGATGCCAATTTAGATAAAGTTCTTAGTGCATATAACAAA
CATAGAGACAAACCAATACGTGAACAAGCAGAAAATATTATTCATTTATTTACGTTGACGAATCTTGGA
GCTCCCGCTGCTTTTAAATATTTTGATACAACAATTGATCGTAAACGATATACGTCTACAAAAGAAGTTT
TAGATGCCACTCTTATCCATCAATCCATCACTGGTCTTTATGAAACACGCATTGATTTGAGTCAGCTAGG
AGGTGACTGA
SEQ ATGGATAAGAAGTATTCAATTGGACTTGCGATTGGCACTAACAGTGTGGGCTGGGCGGTGATTACAGAC
ID GAGTATAAGGTGCCGTCAAAAAAGTTTAAAGTTCTGGGCAACACTGATCGCCATTCCATCAAGAAAAAC
NO: CTAATCGGGGCCCTTCTTTTTGATAGTGGCGAAACGGCCGAGGCGACGCGTCTAAAACGTACCGCGCGG
168 CGTCGCTACACCCGACGAAAAAACCGTATTTGTTACCTTCAGGAGATCTTCAGTAACGAAATGGCTAAG
GTGGACGATTCATTCTTCCACCGTCTGGAGGAGTCCTTTTTAGTTGAAGAAGACAAGAAGCATGAGCGA
CACCCAATTTTTGGTAACATTGTCGACGAAGTCGCCTATCACGAAAAATATCCGACCATTTATCACCTGC
GCAAAAAACTGGTCGATAGCACGGATAAAGCGGATCTGCGGCTTATTTACCTGGCGCTTGCCCACATGA
TCAAGTTCCGCGGCCACTTCCTGATAGAAGGAGACCTGAACCCGGATAATAGCGATGTAGACAAACTGT
TTATTCAGCTGGTCCAGACCTACAACCAGCTGTTTGAAGAAAATCCGATTAATGCGTCAGGCGTGGATG
CGAAAGCGATACTGAGTGCCCGCCTGTCGAAATCTCGCCGTCTCGAAAATCTGATTGCACAGCTGCCCG
GCGAAAAAAAAAACGGTCTTTTTGGCAATCTGATCGCGCTGTCACTGGGCCTGACACCAAATTTTAAGA
GCAACTTCGACCTGGCAGAGGATGCGAAGCTTCAACTGTCGAAGGACACCTATGACGATGATCTGGATA
ATCTTCTGGCACAAATCGGTGATCAGTATGCGGATTTATTCCTTGCAGCGAAAAACCTATCTGACGCAAT
TCTGTTGAGCGATATCCTCCGCGTCAACACCGAAATCACTAAAGCCCCCCTGTCAGCGTCGATGATTAAA
CGTTATGATGAGCACCATCAGGATCTGACCTTGCTAAAGGCGCTGGTGCGACAGCAGCTTCCCGAAAAA
TATAAAGAGATCTTTTTTGATCAATCGAAGAATGGTTATGCCGGATACATTGATGGCGGAGCCAGTCAG
GAAGAATTTTACAAATTCATCAAACCGATCCTGGAAAAAATGGATGGCACAGAAGAACTGCTTGTGAAA
TTGAACCGGGAAGATTTACTGCGCAAACAGCGTACGTTCGACAACGGCTCCATACCCCATCAGATTCAC
TTAGGTGAGCTGCATGCAATACTCCGTCGCCAGGAAGATTTTTATCCATTTTTAAAAGACAACCGTGAGA
AGATTGAAAAAATTTTAACTTTTCGTATTCCATATTACGTCGGGCCTTTGGCCCGAGGTAACTCTCGATT
CGCCTGGATGACGAGAAAAAGCGAGGAGACCATCACTCCGTGGAATTTTGAAGAGGTTGTTGATAAAG
GCGCGAGCGCCCAGTCGTTTATCGAACGTATGACCAACTTTGATAAAAATCTGCCGAATGAAAAAGTGC
TTCCGAAGCATTCTCTGTTGTATGAATATTTCACTGTGTACAATGAGTTAACGAAAGTGAAATATGTGAC
CGAAGGCATGCGGAAACCTGCTTTTCTGTCCGGAGAACAGAAAAAAGCAATTGTGGACCTGCTGTTCAA
AACGAACCGGAAAGTAACTGTGAAGCAGCTGAAAGAGGACTACTTCAAAAAAATCGAATGCTTCGACT
CAGTAGAGATCTCTGGTGTTGAAGATCGCTTCAACGCGAGTCTGGGAACGTACCATGATTTGTTGAAAA
TCATCAAAGATAAAGACTTTCTGGATAACGAAGAGAATGAGGACATTCTTGAAGATATTGTTTTGACAC
TGACTCTGTTTGAGGATCGCGAAATGATTGAAGAGCGCCTGAAAACGTATGCCCATTTATTCGATGACA
AAGTCATGAAGCAGCTGAAACGTCGCCGCTATACTGGGTGGGGCAGACTTTCACGTAAATTGATCAATG
GTATAAGAGACAAACAGAGCGGCAAAACTATCTTAGATTTCCTGAAGAGTGATGGATTTGCCAACCGGA
ATTTTATGCAGCTTATACATGATGACTCGCTAACGTTTAAAGAAGACATTCAGAAGGCGCAGGTCAGCG
GCCAGGGTGATTCGCTGCATGAACACATTGCAAATCTTGCCGGATCGCCAGCGATCAAAAAAGGCATCC
TTCAGACAGTAAAAGTTGTGGATGAACTGGTGAAAGTAATGGGTCGTCACAAGCCAGAAAATATTGTGA
TCGAAATGGCCCGGGAAAATCAGACTACTCAAAAAGGTCAGAAAAATTCTCGCGAGCGTATGAAACGT
ATTGAAGAAGGCATCAAAGAGCTAGGCAGCCAGATATTAAAGGAACATCCGGTTGAGAACACTCAGCT
GCAGAATGAAAAACTGTATCTGTATTATCTTCAGAACGGCCGTGACATGTATGTTGATCAAGAACTGGA
TATCAATCGCTTGTCCGATTATGACGTGGATCATATTGTTCCGCAAAGCTTTCTGAAAGACGATTCTATT
GACAATAAAGTACTGACACGTTCGGACAAAAACCGTGGTAAAAGCGATAACGTACCGTCGGAAGAAGT
TGTTAAGAAAATGAAAAATTATTGGCGCCAACTCCTGAATGCTAAATTGATTACCCAGCGGAAATTTGA
TAACTTAACCAAAGCCGAGCGGGGTGGCTTAAGTGAACTGGATAAAGCGGGTTTTATTAAACGCCAACT
GGTAGAAACCCGCCAGATAACGAAACATGTAGCTCAAATCCTCGATAGTCGCATGAATACGAAATATGA
CGAAAATGATAAATTGATCCGTGAAGTAAAAGTGATTACTCTTAAAAGCAAATTGGTATCTGATTTTCG
GAAAGATTTCCAATTCTATAAGGTGAGAGAAATTAACAATTACCATCATGCACATGATGCGTATTTAAA
TGCAGTTGTTGGCACCGCCTTAATCAAAAAATATCCGAAATTAGAATCTGAGTTCGTGTATGGTGATTAT
AAAGTTTATGATGTTCGAAAAATGATTGCTAAGTCTGAACAGGAAATCGGCAAAGCGACCGCAAAGTAT
TTTTTTTATAGCAATATTATGAATTTTTTTAAAACTGAGATTACCCTGGCGAATGGCGAAATTCGCAAAC
GTCCTCTGATTGAAACCAATGGCGAAACCGGCGAGATAGTATGGGACAAGGGCCGTGATTTTGCGACCG
TCCGGAAAGTCCTGTCAATGCCGCAGGTGAATATTGTCAAGAAAACAGAAGTTCAGACAGGCGGTTTTA
GTAAAGAGTCTATTCTGCCCAAACGTAATTCGGATAAATTGATTGCCCGCAAGAAAGATTGGGATCCGA
AGAAATATGGTGGATTCGATTCTCCGACGGTCGCCTATAGCGTTCTAGTCGTCGCCAAGGTCGAAAAAG
GTAAATCCAAAAAACTGAAATCTGTGAAAGAACTGTTAGGCATTACAATCATGGAACGTAGTAGTTTTG
AAAAGAACCCGATCGACTTCCTCGAGGCGAAAGGCTACAAAGAAGTCAAGAAGGATTTGATTATTAAA
CTCCCAAAATATTCATTATTTGAGTTAGAAAACGGTAGGAAGCGTATGCTGGCGAGTGCTGGGGAATTA
CAGAAAGGGAATGAGTTAGCACTGCCGTCAAAATATGTGAACTTTCTGTATCTGGCCTCCCATTACGAG
AAACTGAAAGGTAGCCCGGAAGATAATGAACAGAAACAACTATTTGTCGAGCAACACAAACATTATCT
GGATGAAATTATTGAACAGATTAGTGAATTCTCTAAACGTGTTATTTTAGCGGATGCCAACCTTGACAAG
GTGCTGAGCGCATATAATAAACACCGTGATAAACCCATTCGTGAACAGGCTGAAAATATCATACATCTG
TTCACGTTAACCAACTTGGGAGCTCCTGCCGCTTTTAAATATTTCGATACCACAATTGACCGCAAACGTT
ATACGTCTACAAAAGAGGTGCTCGATGCGACCCTGATCCACCAGTCTATTACAGGCCTGTATGAAACTC
GTATCGACCTGTCACAACTGGGCGGCGACTGA
SEQ ATGGACAAGAAATATTCAATCGGTTTAGCAATAGGAACTAACTCAGTAGGTTGGGCTGTAATTACAGAC
ID GAATACAAGGTACCGTCCAAAAAGTTTAAGGTGTTGGGGAACACAGATAGACACTCTATAAAAAAAAA
NO: TTTAATAGGCGCTTTACTTTTCGATTCAGGCGAAACTGCAGAAGCGACACGTCTGAAGAGAACCGCTAG
169 ACGTAGATACACGAGGAGAAAGAACAGAATATGTTACCTACAAGAAATTTTTTCTAATGAGATGGCTAA
GGTGGATGATTCGTTTTTTCATAGACTCGAAGAATCTTTCTTAGTTGAAGAAGATAAAAAACACGAAAG
GCATCCTATCTTTGGAAACATAGTTGATGAGGTGGCTTACCATGAAAAATATCCCACTATATATCACCTT
AGAAAAAAGTTGGTTGATTCAACCGACAAAGCGGATCTAAGGTTAATTTACCTCGCGTTGGCTCACATG
ATAAAATTTAGAGGACATTTCTTGATCGAAGGTGATTTAAATCCCGATAACTCTGATGTAGATAAACTGT
TCATCCAGTTGGTTCAAACATATAATCAGTTGTTCGAAGAGAACCCCATTAACGCATCAGGTGTTGATGC
TAAAGCAATCTTATCAGCAAGGTTGAGCAAGAGCAGACGTCTGGAAAACTTGATTGCCCAATTGCCAGG
TGAAAAGAAGAACGGTCTTTTTGGAAATTTAATTGCACTTTCACTTGGGTTGACACCGAATTTTAAAAGC
AATTTCGACCTCGCTGAGGATGCTAAACTCCAGTTATCTAAGGATACATATGACGATGATTTGGATAATC
TATTGGCCCAGATAGGTGATCAGTATGCAGATTTGTTTTTGGCAGCTAAGAATTTATCAGATGCAATTCT
ACTGAGCGATATTTTAAGGGTGAATACAGAAATAACTAAAGCACCTTTGTCTGCATCTATGATAAAAAG
ATACGATGAACACCATCAAGATCTCACACTATTAAAAGCTTTAGTTAGACAACAATTACCAGAAAAATA
TAAAGAAATCTTTTTCGATCAGTCCAAGAACGGATACGCCGGCTATATAGATGGCGGTGCCTCCCAAGA
AGAATTTTACAAATTTATCAAACCCATTTTGGAAAAGATGGATGGTACTGAAGAATTATTGGTCAAATTA
AACAGGGAAGATTTATTAAGAAAACAAAGGACCTTTGATAATGGTTCTATTCCACACCAAATCCATCTA
GGGGAATTACATGCGATTCTTAGAAGACAAGAAGATTTTTATCCATTCTTGAAAGATAACAGGGAAAAG
ATAGAGAAAATCTTAACTTTTAGAATTCCCTACTACGTCGGGCCCTTAGCTAGGGGGAATTCTAGATTCG
CCTGGATGACACGCAAATCAGAAGAAACAATTACGCCTTGGAATTTTGAAGAAGTTGTTGATAAAGGAG
CCTCTGCTCAATCTTTTATTGAACGAATGACCAATTTTGATAAGAATTTACCCAATGAAAAGGTCTTACC
CAAACATTCACTCCTATACGAGTACTTTACTGTTTACAATGAGTTGACAAAAGTGAAGTATGTTACCGAG
GGTATGCGAAAACCTGCTTTCTTGAGTGGTGAACAAAAGAAGGCCATTGTTGACTTGTTATTCAAAACTA
ACAGAAAGGTCACTGTGAAGCAGCTTAAAGAAGATTATTTCAAAAAGATCGAATGTTTCGACTCGGTAG
AAATTAGTGGTGTGGAAGATAGATTTAATGCTTCTCTTGGAACATATCATGATCTACTAAAGATCATCAA
AGATAAAGATTTCTTGGACAATGAAGAAAATGAAGATATTCTTGAAGACATCGTGTTGACACTTACATT
GTTTGAGGACAGAGAAATGATTGAAGAAAGGCTGAAGACCTACGCCCATTTGTTTGATGATAAAGTCAT
GAAACAGTTAAAGAGGAGAAGGTATACCGGATGGGGTAGGCTGTCTCGCAAATTGATTAATGGTATTCG
TGATAAACAATCGGGTAAAACAATCCTAGATTTCCTGAAGTCCGATGGTTTCGCCAACAGGAATTTTATG
CAATTGATTCATGACGATTCTTTGACTTTTAAAGAGGATATTCAGAAAGCACAGGTCTCAGGACAGGGC
GATTCACTCCATGAACATATAGCTAACCTGGCTGGCTCCCCTGCTATTAAGAAAGGTATCTTGCAAACCG
TCAAAGTAGTAGACGAACTTGTTAAAGTTATGGGAAGACACAAACCTGAAAATATCGTTATTGAAATGG
CTCGCGAAAACCAGACAACACAAAAGGGTCAAAAGAATTCGAGAGAGAGAATGAAGCGTATCGAAGA
AGGTATTAAAGAACTTGGGTCCCAAATACTTAAAGAACATCCAGTAGAAAACACTCAGCTTCAAAATGA
AAAATTATACTTATATTATCTTCAGAATGGCCGCGATATGTATGTTGACCAAGAGTTAGATATAAATAGG
TTGTCTGATTACGACGTGGATCATATTGTACCTCAATCTTTTCTAAAAGATGATTCAATTGATAATAAGG
TATTAACGAGAAGTGATAAAAATAGAGGTAAATCTGACAACGTGCCAAGCGAAGAGGTGGTGAAGAAA
ATGAAAAATTATTGGCGTCAACTGTTGAACGCCAAGTTAATTACGCAGAGAAAGTTTGATAATCTAACA
AAAGCTGAAAGAGGAGGCCTATCTGAGTTAGATAAGGCCGGTTTTATCAAACGTCAGTTAGTTGAAACC
AGGCAAATCACGAAGCACGTTGCCCAAATTCTAGATTCAAGGATGAATACCAAATACGATGAAAACGAT
AAACTGATTCGGGAAGTCAAGGTTATAACTCTAAAAAGCAAACTAGTTTCAGATTTTCGCAAAGATTTTC
AATTTTACAAAGTTCGAGAAATCAATAATTATCATCATGCTCACGACGCGTACTTGAACGCGGTCGTTGG
TACAGCTTTAATAAAGAAATATCCTAAACTGGAATCGGAATTTGTATATGGGGATTACAAAGTATACGA
CGTGAGAAAGATGATCGCTAAATCTGAACAAGAAATTGGGAAAGCAACTGCCAAATATTTTTTTTACAG
CAACATAATGAATTTTTTTTAAAACGGAAATTACATTGGCAAATGGCGAAATTAGAAAGCGCCCATTGAT
AGAGACCAATGGAGAGACTGGGGAAATCGTGTGGGATAAAGGACGTGATTTTGCCACAGTGAGGAAAG
TGTTAAGTATGCCACAAGTTAATATTGTAAAAAAGACCGAGGTCCAAACGGGTGGATTTAGCAAAGAAT
CAATTTTACCTAAGAGAAATTCAGATAAATTAATTGCCCGCAAAAAGGATTGGGATCCTAAAAAATATG
GTGGTTTTGATTCCCCAACAGTTGCTTACTCCGTCCTAGTTGTTGCTAAGGTTGAAAAAGGAAAGTCTAA
GAAACTTAAATCCGTAAAAGAGTTACTGGGAATTACAATAATGGAAAGATCCTCTTTCGAAAAGAACCC
TATTGACTTCTTGGAGGCGAAAGGTTATAAAGAAGTCAAAAAAGATTTGATCATAAAACTACCAAAGTA
TTCTCTATTTGAATTGGAAAACGGCAGAAAAAGGATGTTGGCAAGCGCTGGTGAACTACAAAAGGGTAA
CGAATTGGCATTGCCGAGTAAATACGTGAATTTTCTATATTTGGCATCACATTACGAAAAGTTAAAGGG
ATCACCCGAGGATAACGAGCAGAAACAACTGTTTGTTGAACAACACAAACATTATCTTGATGAAATTAT
AGAACAAATTAGTGAGTTCAGTAAGAGAGTTATTTTAGCCGATGCAAATTTAGACAAAGTTTTATCTGCT
TATAACAAACATAGAGATAAGCCTATAAGGGAACAAGCCGAAAATATTATTCATTTGTTTACGTTAACA
AATTTAGGGGCACCAGCAGCATTCAAGTACTTCGATACGACTATCGATCGTAAGCGTTACACATCTACC
AAAGAAGTTCTTGATGCAACTTTGATTCATCAATCTATAACAGGCTTATATGAAACTAGAATCGATCTGT
CACAACTTGGTGGTGACTAA
SEQ ATGGACAAGAAGTACTCAATTGGGCTTGCTATCGGCACTAACAGCGTTGGCTGGGCGGTCATCACAGAC
ID GAATATAAGGTCCCATCAAAGAAATTCAAAGTCCTTGGCAATACGGACCGACATTCAATCAAGAAGAAC
NO: CTGATTGGAGCTCTGCTGTTTGATTCCGGTGAAACCGCCGAGGCAACACGATTGAAACGTACCGCTCGT
170 AGGAGGTATACGCGGCGGAAAAATAGGATCTGCTATCTGCAGGAAATATTTAGCAACGAAATGGCCAA
GGTAGACGACAGCTTCTTCCACCGGCTCGAGGAATCTTTCCTCGTGGAAGAAGACAAAAAGCACGAGCG
CCACCCCATTTTCGGCAATATCGTGGACGAGGTAGCTTACCATGAAAAGTATCCAACTATTTACCACTTA
CGTAAGAAGTTAGTGGACAGCACCGATAAAGCCGACCTTCGCCTGATTTACCTAGCACTTGCACACATG
ATTAAGTTCCGAGGCCACTTCTTGATAGAGGGAGACCTGAATCCTGACAATTCCGATGTGGATAAATTGT
TCATCCAGCTGGTACAGACATACAATCAGTTGTTTGAGGAAAATCCGATTAATGCCAGTGGCGTGGACG
CCAAGGCTATCCTGTCTGCTCGGCTTAGTAAGAGTAGACGCCTGGAAAATCTAATCGCACAGCTGCCCG
GCGAAAAGAAAAATGGACTGTTCGGTAATTTGATCGCCCTGAGCCTGGGCCTCACCCCTAACTTTAAGT
CTAACTTCGACCTGGCCGAAGATGCTAAGCTCCAGCTGTCCAAAGATACTTACGATGACGATCTCGATA
ATCTACTGGCTCAGATCGGGGACCAGTACGCTGACCTGTTTCTAGCTGCCAAGAACCTCAGTGACGCCAT
TCTCCTGTCCGATATTCTGAGGGTTAACACTGAAATTACAAAGGCCCCGCTGAGCGCGAGCATGATCAA
AAGGTACGACGAGCATCACCAGGACCTCACGCTGCTGAAGGCCTTAGTCAGACAGCAACTGCCCGAAA
AGTACAAAGAAATCTTTTTCGACCAATCCAAGAACGGGTACGCCGGCTACATTGATGGCGGGGCTTCAC
AAGAGGAGTTTTACAAGTTTATCAAGCCCATCCTGGAGAAAATGGACGGCACTGAAGAACTGCTTGTGA
AACTCAATAGGGAAGACTTACTGAGGAAACAGCGCACATTCGATAATGGCTCCATACCCCACCAAATCC
ATCTGGGAGAGTTGCATGCCATCTTGCGAAGGCAGGAGGACTTCTACCCCTTTCTTAAGGACAACAGGG
AGAAAATCGAGAAAATTCTGACTTTCCGTATCCCCTACTACGTGGGCCCACTTGCTCGCGGAAACTCACG
ATTCGCATGGATGACCAGAAAGTCCGAGGAAACAATTACACCCTGGAATTTTGAGGAGGTAGTAGACAA
GGGAGCCAGCGCTCAATCTTTCATTGAGAGGATGACGAATTTCGACAAGAACCTTCCAAACGAGAAAGT
GCTTCCTAAGCACAGCCTGCTGTATGAGTATTTCACGGTGTACAACGAACTTACGAAGGTCAAGTATGTG
ACAGAGGGTATGCGGAAACCTGCTTTTCTGTCTGGTGAACAGAAGAAAGCTATCGTCGATCTCCTGTTTA
AAACCAACCGAAAGGTGACGGTGAAACAGTTGAAGGAGGATTACTTCAAGAAGATCGAGTGTTTTGATT
CTGTTGAAATTTCTGGGGTCGAGGATAGATTCAACGCCAGCCTGGGCACCTACCATGATTTGCTGAAGAT
TATCAAGGATAAGGATTTTCTGGATAATGAGGAGAATGAAGACATTTTGGAGGATATAGTGCTGACCCT
CACCCTGTTCGAGGACCGGGAGATGATCGAGGAGAGACTGAAAACATACGCTCACCTGTTTGACGACAA
GGTCATGAAGCAGCTTAAGAGACGCCGTTACACAGGCTGGGGAAGATTATCCCGCAAATTAATCAACGG
GATACGCGATAAACAAAGTGGCAAGACCATACTCGACTTCCTAAAGAGCGATGGATTCGCAAATCGCAA
TTTCATGCAGTTGATCCACGACGATAGCCTGACCTTCAAAGAGGACATTCAGAAAGCGCAGGTGAGTGG
TCAAGGGGATTCCCTGCACGAACACATTGCTAACTTGGCTGGATCACCAGCCATTAAGAAAGGCATACT
GCAGACCGTTAAAGTGGTAGATGAGCTTGTGAAAGTCATGGGAAGACATAAGCCAGAGAACATAGTGA
TCGAAATGGCCAGGGAAAATCAGACCACGCAAAAGGGGCAGAAGAACTCAAGAGAGCGTATGAAGAG
GATCGAGGAGGGCATCAAGGAGCTGGGTAGCCAGATCCTTAAAGAGCACCCAGTTGAGAATACCCAGC
TGCAGAATGAGAAACTTTATCTCTATTATCTCCAGAACGGAAGGGATATGTATGTCGACCAGGAACTGG
ACATCAATCGGCTGAGTGATTATGACGTCGACCACATTGTGCCTCAAAGCTTTCTGAAGGATGATTCCAT
CGACAATAAAGTTCTGACCCGGTCTGATAAAAATAGAGGCAAATCCGACAACGTACCTAGCGAAGAAG
TCGTCAAAAAAATGAAGAACTATTGGAGGCAGTTGCTGAATGCCAAGCTGATTACACAACGCAAGTTTG
ACAATCTCACCAAGGCAGAAAGGGGGGGCCTGTCAGAACTCGACAAAGCAGGTTTCATTAAAAGGCAG
CTAGTTGAAACTAGGCAGATTACTAAGCACGTGGCCCAGATCCTCGACTCACGGATGAATACAAAGTAT
GATGAGAATGATAAGCTAATCCGGGAGGTGAAGGTGATTACTCTGAAATCTAAGCTGGTGTCAGATTTC
AGAAAAGACTTCCAGTTCTACAAAGTCAGAGAGATCAACAATTATCACCATGCCCACGATGCATATCTT
AATGCAGTAGTGGGGACAGCTCTGATCAAAAAATATCCTAAACTGGAGTCTGAATTCGTTTATGGTGAC
TATAAAGTCTATGACGTCAGAAAAATGATCGCAAAGAGCGAGCAGGAGATAGGGAAGGCCACAGCAAA
GTACTTCTTTTACAGTAATATCATGAACTTTTTCAAAACTGAGATTACATTGGCTAACGGCGAGATCCGC
AAGCGGCCACTGATAGAGACTAACGGAGAGACAGGGGAGATTGTTTGGGATAAGGGCCGTGACTTCGC
CACCGTTAGGAAAGTGCTGTCCATGCCCCAGGTGAACATTGTGAAGAAGACAGAAGTGCAGACGGGTG
GGTTCTCAAAAGAGTCTATTCTGCCTAAGCGGAATAGTGACAAACTGATCGCACGTAAAAAGGACTGGG
ATCCAAAAAAGTACGGCGGATTCGACAGTCCTACCGTTGCATATTCCGTGCTTGTGGTCGCTAAGGTGG
AGAAGGGAAAAAGCAAGAAACTGAAGTCAGTCAAAGAACTACTGGGCATAACGATCATGGAGCGCTCC
AGTTTCGAAAAAAACCCAATCGATTTTCTTGAAGCCAAGGGATACAAGGAGGTAAAGAAAGACCTTATC
ATTAAGCTGCCTAAGTACAGTCTGTTCGAACTGGAGAATGGGAGGAAGCGCATGCTGGCATCAGCTGGA
GAACTCCAAAAAGGGAACGAGTTGGCCCTCCCCTCAAAGTATGTCAATTTTCTCTACCTGGCTTCTCACT
ACGAGAAGTTAAAGGGGTCTCCAGAGGATAATGAGCAGAAACAGCTGTTTGTGGAACAGCACAAGCAC
TATTTGGACGAAATCATCGAACAAATTTCCGAGTTCAGTAAGAGGGTGATTCTGGCCGACGCAAACCTT
GACAAAGTTCTGTCCGCATACAATAAGCACAGAGACAAACCAATCCGCGAGCAAGCCGAGAATATAAT
TCACCTTTTCACTCTGACTAATCTGGGGGCCCCCGCAGCATTTAAATATTTCGATACAACAATCGACCGG
AAGCGGTATACATCTACTAAGGAAGTCCTCGATGCGACACTGATCCACCAGTCAATTACAGGTTTATAT
GAAACAAGAATCGACCTGTCCCAGCTGGGCGGCGACTAG
SEQ AAAATTCcatGCAAAATGCTCCGGTTTCATGTCATCAAAATGATGACGTAATTAAGCATTGATAATTGAGA
ID TCCCTCTCCCTGACAGGATGATTACATAAATAATAGTGACAAAAATAAATTATTTATTTATCCAGAAAAT
NO: GAATTGGAAAATCAGGAGAGCGTTTTCAATCCTACCTCTGGCGCAGTTGATATGTcaaaCAGGTtgccgt
171 cactgcgtcttttactggctcttctcgctaaccaaaccggtaaccccgcttattaaaagcattctgtaac
aaagcgggaccaaagccatgacaaaaacgcgtaacaaaagtgtctataatcacggcagaaaagtccacat
tgattatttgcacggcgtcacactttgctatgccatagcatttttatccataagattagcggatcctacc
tgacgctttttatcgcaactctctactgtttctccatacccgtttttttgggctagcaccgcctatctcg
tgtgagataggcggagatacgaactttaagAAGGAGatatacCATGGAACAGGAATATT
ATCTGGGCTTGGACATGGGCACCGGTTCCGTCGGCTGGGCTGTTACTGACAGTGAATATCACGTTCTAAG
AAAGCATGGTAAGGCATTGTGGGGTGTAAGACTTTTCGAATCTGCTTCCACTGCTGAAGAGCGTAGAAT
GTTTAGAACGAGTCGACGTAGGCTAGACAGGCGCAATTGGAGAATCGAAATTTTACAAGAAATTTTTGC
GGAAGAGATATCTAAGAAAGACCCAGGCTTTTTCCTGAGAATGAAGGAATCTAAGTATTACCCTGAGGA
TAAAAGAGATATAAATGGTAACTGTCCCGAATTGCCTTACGCATTATTTGTGGACGATGATTTTACCGAT
AAGGATTACCATAAAAAGTTCCCAACTATCTACCATTTACGCAAAATGTTAATGAATACAGAGGAAACC
CCAGACATAAGACTAGTTTATCTGGCAATACACCATATGATGAAACATAGAGGCCATTTCTTACTTTCCG
GGGATATCAACGAAATCAAAGAGTTTGGTACCACATTTAGTAAGTTACTGGAAAACATAAAGAATGAAG
AATTGGATTGGAACTTAGAACTCGGAAAAGAAGAATACGCGGTTGTCGAATCTATCCTGAAGGATAATA
TGCTGAATAGGTCGACCAAAAAAACTAGGCTGATCAAAGCACTGAAAGCCAAATCTATCTGCGAAAAA
GCTGTTTTAAATTTACTTGCTGGTGGCACTGTTAAGTTATCAGACATTTTTGGTTTGGAAGAATTGAACG
AAACCGAGCGTCCAAAAATTAGTTTCGCTGATAATGGCTACGATGATTACATTGGTGAGGTGGAAAACG
AGTTGGGCGAACAATTTTATATTATAGAGACAGCTAAGGCAGTCTATGACTGGGCTGTTTTAGTAGAAA
TCCTTGGTAAATACACATCTATCTCCGAAGCGAAAGTTGCTACTTACGAAAAGCACAAGTCCGATCTCCA
GTTTTTGAAGAAAATTGTCAGGAAATATCTGACTAAGGAAGAATATAAAGATATTTTCGTTAGTACCTCT
GACAAACTGAAAAATTACTCCGCTTACATCGGGATGACCAAGATTAATGGCAAAAAAGTTGATCTGCAA
AGCAAAAGGTGTTCGAAGGAAGAATTTTATGATTTCATTAAAAAGAATGTCTTAAAAAAATTAGAAGGT
CAGCCAGAATACGAATATTTGAAAGAAGAACTGGAAAGAGAGACATTCTTACCAAAACAAGTCAACAG
AGATAATGGGGTAATTCCATATCAAATTCACCTCTACGAATTAAAAAAAATTTTAGGCAATTTACGCGAT
AAAATTGACCTTATCAAAGAAAATGAGGATAAGCTGGTTCAACTCTTTGAATTCAGAATACCCTATTATG
TGGGCCCACTGAACAAGATTGATGACGGCAAAGAAGGTAAATTCACATGGGCCGTCCGCAAATCCAATG
AAAAAATTTACCCATGGAACTTTGAAAATGTAGTAGATATTGAAGCGTCTGCGGAGAAATTTATTCGAA
GAATGACTAATAAATGCACTTACTTGATGGGAGAGGATGTTCTGCCTAAAGACAGCTTATTATACAGCA
AGTACATGGTTCTAAACGAACTTAACAACGTTAAGTTGGACGGTGAGAAATTAAGTGTAGAATTGAAAC
AAAGATTGTATACTGACGTCTTCTGCAAGTACAGAAAAGTGACAGTTAAAAAAATTAAGAATTACTTGA
AGTGCGAAGGTATAATTTCTGGAAACGTAGAGATTACTGGTATTGATGGTGATTTCAAAGCATCCCTAA
CAGCTTACCACGATTTCAAGGAAATCCTGACAGGAACTGAACTCGCAAAAAAAGATAAAGAAAACATT
ATTACTAATATTGTTCTTTTCGGTGATGACAAGAAATTGTTGAAGAAAAGACTGAATAGACTTTACCCCC
AGATTACTCCCAATCAACTTAAGAAAATTTGTGCTTTGTCTTACACAGGATGGGGTCGTTTTTCAAAAAA
GTTCTTAGAAGAGATTACCGCACCTGATCCAGAAACAGGCGAAGTATGGAATATAATTACCGCCTTATG
GGAATCGAACAATAATCTTATGCAACTTCTGAGCAATGAATATCGTTTCATGGAAGAAGTTGAGACTTA
CAACATGGGCAAACAGACGAAGACTTTATCCTATGAAACTGTGGAAAATATGTATGTATCACCTTCTGT
CAAGAGACAAATTTGGCAAACCTTAAAAATTGTCAAAGAATTAGAAAAGGTAATGAAGGAGTCTCCTA
AACGTGTGTTTATTGAAATGGCTAGAGAAAAACAAGAGTCAAAAAGAACCGAGTCAAGAAAGAAGCAG
TTAATCGATTTATATAAGGCTTGTAAAAACGAAGAGAAAGATTGGGTTAAAGAATTGGGGGACCAAGA
GGAACAAAAACTACGGTCGGATAAGTTGTATTTATACTATACGCAAAAGGGACGATGTATGTATTCCGG
CGAGGTAATAGAATTGAAGGATTTATGGGACAATACAAAATATGACATAGACCATATATATCCCCAATC
AAAAACGATGGACGATAGCTTGAACAATAGAGTACTCGTGAAAAAAAAATATAATGCGACCAAATCTG
ATAAGTATCCTCTGAATGAAAATATCAGACATGAAAGAAAGGGGTTCTGGAAGTCCTTGTTAGATGGTG
GGTTTATAAGCAAAGAAAAGTACGAGCGTCTAATAAGAAACACGGAGTTATCGCCAGAAGAACTCGCT
GGTTTTATTGAGAGGCAAATCGTGGAAACGAGACAATCTACCAAAGCCGTTGCTGAGATCCTAAAGCAA
GTTTTCCCAGAGTCGGAGATTGTCTATGTCAAAGCTGGCACAGTGAGCAGGTTTAGGAAAGACTTCGAA
CTATTAAAGGTAAGAGAAGTGAACGATTTACATCACGCAAAGGACGCTTACCTAAATATCGTTGTAGGT
AACTCATATTATGTTAAATTTACCAAGAACGCCTCTTGGTTTATAAAGGAGAACCCAGGTAGAACATAT
AACCTGAAAAAGATGTTCACCTCTGGTTGGAATATTGAGAGAAACGGAGAAGTCGCATGGGAAGTTGGT
AAGAAAGGGACTATAGTGACAGTAAAGCAAATTATGAACAAAAATAATATCCTCGTTACAAGGCAGGT
TCATGAAGCAAAGGGCGGCCTTTTTGACCAACAAATTATGAAGAAAGGGAAAGGTCAAATTGCAATAA
AAGAAACCGATGAGAGACTAGCGTCAATAGAAAAGTATGGTGGCTATAATAAAGCTGCGGGTGCATAC
TTTATGCTTGTTGAATCAAAAGACAAGAAAGGTAAGACTATTAGAACTATAGAATTTATACCCCTGTACC
TTAAAAACAAAATTGAATCGGATGAGTCAATCGCGTTAAATTTTCTAGAGAAAGGAAGGGGTTTAAAAG
AACCAAAGATCCTGTTAAAAAAGATTAAGATTGACACCTTGTTCGATGTAGATGGATTTAAAATGTGGT
TATCTGGCAGAACAGGCGATAGACTTTTGTTTAAGTGCGCTAATCAATTAATTTTGGATGAGAAAATCAT
TGTCACAATGAAAAAAATAGTTAAGTTTATTCAGAGAAGACAAGAAAACAGGGAGTTGAAATTATCTGA
TAAAGATGGTATCGACAATGAAGTTTTAATGGAAATCTACAATACATTCGTTGATAAACTTGAAAATAC
CGTATATCGAATCAGGTTAAGTGAACAAGCCAAAACATTAATTGATAAACAAAAAGAATTTGAAAGGCT
ATCACTGGAAGACAAATCCTCCACCCTATTTGAAATTTTGCATATATTCCAGTGCCAATCTTCAGCAGCT
AATTTAAAAATGATTGGCGGACCTGGGAAAGCCGGCATCCTAGTGATGAACAATAATATCTCCAAGTGT
AACAAAATATCAATTATTAACCAATCTCCGACAGGTATTTTTGAAAATGAAATAGACTTGCTTAAGATAT
AAGAAATCATCCTTAGCGAAAGCTAAGGATTTTTTTTATCTGAAATTTATTATATCGCGTTGATTATTGAT
GCTGTTTTTAGTTTTAACGGCAATTAATATATGTGTTATTAATTGAATGAATTTTATCATTCATAATAAGT
ATGTGTAGGATCAAGCTCAGGTTAAATATTCACTCAGGAAGTTATTACTCAGGAAGCAAAGAGGATTAC
AGAATTATCTCATAACAAGTGTTAAGGGATGTTATTTCC
SEQ AATTCAAAGGATAATCAAAC
ID
NO:
172
SEQ AATCTCTACTCTTTGTAGAT
ID
NO:
173
SEQ AATTTCTACTGTTGTAGAT
ID
NO:
174
SEQ AATTTCTACTAGTGTAGAT
ID
NO:
175
SEQ AATTTCTACTATTGT
ID
NO:
176
SEQ AATTTCTACTGTTGTAGA
ID
NO:
177
SEQ AATTTCTACTATTGTA
ID
NO:
178
SEQ AATTTCTACTTTTGTAGAT
ID
NO:
179
SEQ AATTTCTACTGTTGTAGAT
ID
NO:
180
SEQ AATTTCTACTCTTGTAGAT
ID
NO:
181

Claims (6)

What is claims is:
1. A nucleic acid-guided nuclease system comprising:
(a) a nucleic acid having at least 95% identity to SEQ ID NO: 22, SEQ ID NO: 42, SEQ ID NO: 128, or SEQ ID NO: 148, wherein the nucleic acid encodes a nucleic acid-guided nucleases comprising the amino acid sequence of SEQ ID NO:2;
(b) an engineered guide nucleic acid capable of complexing with the nucleic acid-guided nuclease, wherein the engineered guide nucleic acid comprises any one of SEQ ID NO: 88, SEQ ID NO: 93, and SEQ ID NO: 94 and
(c) an editing sequence having a change in sequence relative to the sequence of a target region in a genome of a cell and a mutation in a protospacer adjacent motif (PAM);
wherein the system results in a genome edit in the target region in the genome of the cell facilitated by the nuclease, the engineered guide nucleic acid, and the editing sequence.
2. The system of claim 1, wherein the engineered guide nucleic acid and the editing sequence are provided as a single nucleic acid.
3. The system of claim 1, wherein the nucleic acid encoding for the nucleic acid-guided nuclease is codon optimized for E. coli.
4. The system of claim 1, wherein the nucleic acid encoding for the nucleic acid-guided nuclease is codon optimized for S. cerevisiae.
5. The system of claim 1, wherein the nucleic acid encoding for the nucleic acid-guided nuclease is codon optimized for mammalian cells.
6. The system of claim 1, wherein the nucleic acid-guided nuclease has less than 40% protein identity to SEQ ID NO: 12 and SEQ ID NO: 108.
US15/631,989 2017-06-23 2017-06-23 Nucleic acid-guided nucleases Active US10011849B1 (en)

Priority Applications (26)

Application Number Priority Date Filing Date Title
US15/631,989 US10011849B1 (en) 2017-06-23 2017-06-23 Nucleic acid-guided nucleases
US15/896,433 US10435714B2 (en) 2017-06-23 2018-02-14 Nucleic acid-guided nucleases
ES18821213T ES2971549T3 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
CA3067951A CA3067951A1 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
HUE18821213A HUE066467T2 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
RU2022103603A RU2022103603A (en) 2017-06-23 2018-05-25 NUCLEIC ACID-DIRECTED NUCLEASE
PCT/US2018/034779 WO2018236548A1 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
EP18821213.8A EP3642334B1 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
NZ760730A NZ760730A (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
KR1020217035078A KR102558931B1 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
RU2020102451A RU2769475C2 (en) 2017-06-23 2018-05-25 Nucleic acid-directed nucleases
KR1020207002319A KR102321388B1 (en) 2017-06-23 2018-05-25 Nucleic Acid Guide Nuclease
JP2019571011A JP7136816B2 (en) 2017-06-23 2018-05-25 nucleic acid-guided nuclease
AU2018289077A AU2018289077B2 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
EP21167880.0A EP3916086A1 (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
CN201880054732.5A CN111511906A (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases
MX2019015047A MX2019015047A (en) 2017-06-23 2018-05-25 Nucleic acid-guided nucleases.
US16/548,631 US10626416B2 (en) 2017-06-23 2019-08-22 Nucleic acid-guided nucleases
IL271342A IL271342A (en) 2017-06-23 2019-12-11 Nucleic acid-guided nucleases
US16/819,896 US20200231987A1 (en) 2017-06-23 2020-03-16 Nucleic acid-guided nucleases
US17/179,193 US11130970B2 (en) 2017-06-23 2021-02-18 Nucleic acid-guided nucleases
US17/387,860 US11220697B2 (en) 2017-06-23 2021-07-28 Nucleic acid-guided nucleases
US17/554,736 US11306327B1 (en) 2017-06-23 2021-12-17 Nucleic acid-guided nucleases
US17/692,069 US20220195464A1 (en) 2017-06-23 2022-03-10 Nucleic acid-guided nucleases
AU2022202248A AU2022202248B2 (en) 2017-06-23 2022-04-04 Nucleic acid-guided nucleases
JP2022138875A JP2022169775A (en) 2017-06-23 2022-09-01 Nucleic acid-guided nucleases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/631,989 US10011849B1 (en) 2017-06-23 2017-06-23 Nucleic acid-guided nucleases

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/896,433 Continuation US10435714B2 (en) 2017-06-23 2018-02-14 Nucleic acid-guided nucleases

Publications (1)

Publication Number Publication Date
US10011849B1 true US10011849B1 (en) 2018-07-03

Family

ID=62684493

Family Applications (8)

Application Number Title Priority Date Filing Date
US15/631,989 Active US10011849B1 (en) 2017-06-23 2017-06-23 Nucleic acid-guided nucleases
US15/896,433 Active 2037-08-10 US10435714B2 (en) 2017-06-23 2018-02-14 Nucleic acid-guided nucleases
US16/548,631 Active US10626416B2 (en) 2017-06-23 2019-08-22 Nucleic acid-guided nucleases
US16/819,896 Abandoned US20200231987A1 (en) 2017-06-23 2020-03-16 Nucleic acid-guided nucleases
US17/179,193 Active US11130970B2 (en) 2017-06-23 2021-02-18 Nucleic acid-guided nucleases
US17/387,860 Active US11220697B2 (en) 2017-06-23 2021-07-28 Nucleic acid-guided nucleases
US17/554,736 Active US11306327B1 (en) 2017-06-23 2021-12-17 Nucleic acid-guided nucleases
US17/692,069 Pending US20220195464A1 (en) 2017-06-23 2022-03-10 Nucleic acid-guided nucleases

Family Applications After (7)

Application Number Title Priority Date Filing Date
US15/896,433 Active 2037-08-10 US10435714B2 (en) 2017-06-23 2018-02-14 Nucleic acid-guided nucleases
US16/548,631 Active US10626416B2 (en) 2017-06-23 2019-08-22 Nucleic acid-guided nucleases
US16/819,896 Abandoned US20200231987A1 (en) 2017-06-23 2020-03-16 Nucleic acid-guided nucleases
US17/179,193 Active US11130970B2 (en) 2017-06-23 2021-02-18 Nucleic acid-guided nucleases
US17/387,860 Active US11220697B2 (en) 2017-06-23 2021-07-28 Nucleic acid-guided nucleases
US17/554,736 Active US11306327B1 (en) 2017-06-23 2021-12-17 Nucleic acid-guided nucleases
US17/692,069 Pending US20220195464A1 (en) 2017-06-23 2022-03-10 Nucleic acid-guided nucleases

Country Status (1)

Country Link
US (8) US10011849B1 (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10294473B2 (en) 2016-06-24 2019-05-21 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries
US10337028B2 (en) 2017-06-23 2019-07-02 Inscripta, Inc. Nucleic acid-guided nucleases
US10370700B2 (en) 2017-06-13 2019-08-06 Genetics Research, Llc Detection of targeted sequence regions
US10435714B2 (en) 2017-06-23 2019-10-08 Inscripta, Inc. Nucleic acid-guided nucleases
US10435715B2 (en) 2014-02-11 2019-10-08 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10527608B2 (en) 2017-06-13 2020-01-07 Genetics Research, Llc Methods for rare event detection
WO2020086475A1 (en) * 2018-10-22 2020-04-30 Inscripta, Inc. Engineered enzymes
WO2020097360A1 (en) * 2018-11-07 2020-05-14 The Regents Of The University Of Colorado, A Body Corporate Methods and compositions for genome-wide analysis and use of genome cutting and repair
US10704033B1 (en) * 2019-12-13 2020-07-07 Inscripta, Inc. Nucleic acid-guided nucleases
US10711374B1 (en) 2018-04-24 2020-07-14 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US10723995B1 (en) 2018-08-14 2020-07-28 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10737271B1 (en) 2018-04-13 2020-08-11 Inscripta, Inc. Automated cell processing instruments comprising reagent cartridges
US10815467B2 (en) 2019-03-25 2020-10-27 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US10837021B1 (en) 2019-06-06 2020-11-17 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US10883095B1 (en) 2019-12-10 2021-01-05 Inscripta, Inc. Mad nucleases
US10913941B2 (en) * 2019-02-14 2021-02-09 Metagenomi Ip Technologies, Llc Enzymes with RuvC domains
US10920189B2 (en) 2019-06-21 2021-02-16 Inscripta, Inc. Genome-wide rationally-designed mutations leading to enhanced lysine production in E. coli
US10927385B2 (en) 2019-06-25 2021-02-23 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
US10947599B2 (en) 2017-06-13 2021-03-16 Genetics Research, Llc Tumor mutation burden
US10995424B2 (en) 2018-04-24 2021-05-04 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US11001831B2 (en) 2019-03-25 2021-05-11 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11008557B1 (en) 2019-12-18 2021-05-18 Inscripta, Inc. Cascade/dCas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US11066663B2 (en) * 2018-10-31 2021-07-20 Zymergen Inc. Multiplexed deterministic assembly of DNA libraries
US11142788B2 (en) 2017-06-13 2021-10-12 Genetics Research, Llc Isolation of target nucleic acids
WO2021207651A3 (en) * 2020-04-09 2021-11-18 Verve Therapeutics, Inc. Chemically modified guide rnas for genome editing with cas12b
US11203762B2 (en) 2019-11-19 2021-12-21 Inscripta, Inc. Methods for increasing observed editing in bacteria
CN113846075A (en) * 2021-11-29 2021-12-28 科稷达隆(北京)生物技术有限公司 MAD7-NLS fusion protein, nucleic acid construct for site-directed editing of plant genome and application thereof
US11214781B2 (en) 2018-10-22 2022-01-04 Inscripta, Inc. Engineered enzyme
CN114045303A (en) * 2018-11-07 2022-02-15 中国农业科学院植物保护研究所 Artificial gene editing system for rice
US11268061B2 (en) 2018-08-14 2022-03-08 Inscripta, Inc. Detection of nuclease edited sequences in automated modules and instruments
US11268088B2 (en) 2020-04-24 2022-03-08 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells via viral delivery
US11293021B1 (en) 2016-06-23 2022-04-05 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
US11299731B1 (en) 2020-09-15 2022-04-12 Inscripta, Inc. CRISPR editing to embed nucleic acid landing pads into genomes of live cells
US11306298B1 (en) 2021-01-04 2022-04-19 Inscripta, Inc. Mad nucleases
US20220136014A1 (en) * 2019-10-03 2022-05-05 Artisan Development Labs, Inc. Crispr systems with engineered dual guide nucleic acids
US11332742B1 (en) 2021-01-07 2022-05-17 Inscripta, Inc. Mad nucleases
US11512297B2 (en) 2020-11-09 2022-11-29 Inscripta, Inc. Affinity tag for recombination protein recruitment
US11555184B2 (en) 2018-04-24 2023-01-17 Inscripta, Inc. Methods for identifying selective binding pairs
WO2023028521A1 (en) 2021-08-24 2023-03-02 Inscripta, Inc. Genome-wide rationally-designed mutations leading to enhanced cellobiohydrolase i production in s. cerevisiae
US11597921B2 (en) 2017-06-30 2023-03-07 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
US11667932B2 (en) 2020-01-27 2023-06-06 Inscripta, Inc. Electroporation modules and instrumentation
US20230235362A1 (en) * 2021-02-25 2023-07-27 Artisan Development Labs, Inc. Compositions and methods for targeting, editing, or modifying genes
WO2023150637A1 (en) 2022-02-02 2023-08-10 Inscripta, Inc. Nucleic acid-guided nickase fusion proteins
WO2023164636A1 (en) 2022-02-25 2023-08-31 Vor Biopharma Inc. Compositions and methods for homology-directed repair gene modification
US11787841B2 (en) 2020-05-19 2023-10-17 Inscripta, Inc. Rationally-designed mutations to the thrA gene for enhanced lysine production in E. coli
US11884924B2 (en) 2021-02-16 2024-01-30 Inscripta, Inc. Dual strand nucleic acid-guided nickase editing
US11920140B2 (en) 2017-08-22 2024-03-05 Napigen, Inc. Organelle genome modification using polynucleotide guided endonuclease
US11946039B2 (en) 2020-03-31 2024-04-02 Metagenomi, Inc. Class II, type II CRISPR systems

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL294014B2 (en) 2015-10-23 2024-07-01 Harvard College Nucleobase editors and uses thereof
US11208649B2 (en) 2015-12-07 2021-12-28 Zymergen Inc. HTP genomic engineering platform
US9988624B2 (en) 2015-12-07 2018-06-05 Zymergen Inc. Microbial strain improvement by a HTP genomic engineering platform
CA3032699A1 (en) 2016-08-03 2018-02-08 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
AU2017308889B2 (en) 2016-08-09 2023-11-09 President And Fellows Of Harvard College Programmable Cas9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
EP3592853A1 (en) 2017-03-09 2020-01-15 President and Fellows of Harvard College Suppression of pain by gene editing
JP2020534795A (en) 2017-07-28 2020-12-03 プレジデント アンド フェローズ オブ ハーバード カレッジ Methods and Compositions for Evolving Base Editing Factors Using Phage-Supported Continuous Evolution (PACE)
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
DE112020001342T5 (en) 2019-03-19 2022-01-13 President and Fellows of Harvard College Methods and compositions for editing nucleotide sequences
WO2021025999A1 (en) * 2019-08-02 2021-02-11 Monsanto Technology Llc Methods and compositions to promote targeted genome modifications using huh endonucleases
DE112021002672T5 (en) 2020-05-08 2023-04-13 President And Fellows Of Harvard College METHODS AND COMPOSITIONS FOR EDIT BOTH STRANDS SIMULTANEOUSLY OF A DOUBLE STRANDED NUCLEOTIDE TARGET SEQUENCE
US20220017918A1 (en) * 2020-07-17 2022-01-20 Kraig Biocraft Laboratories, Inc. Synthesis of Non-Native Proteins in Bombyx Mori by Modifying Sericin Expression
JP2023543803A (en) * 2020-09-24 2023-10-18 ザ ブロード インスティテュート,インコーポレーテッド Prime Editing Guide RNA, its composition, and its uses
WO2023102481A1 (en) 2021-12-02 2023-06-08 Inscripta, Inc. Trackable nucleic acid-guided editing

Citations (108)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6322969B1 (en) 1998-05-27 2001-11-27 The Regents Of The University Of California Method for preparing permuted, chimeric nucleic acid libraries
US6562594B1 (en) 1999-09-29 2003-05-13 Diversa Corporation Saturation mutagenesis in directed evolution
WO2003106654A2 (en) 2002-06-14 2003-12-24 Diversa Corporation Xylanases, nucleic adics encoding them and methods for making and using them
WO2007144770A2 (en) 2006-06-16 2007-12-21 Danisco A/S Bacterium
US20080287317A1 (en) 2001-08-15 2008-11-20 Charles Boone Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
US20090176653A1 (en) 2001-08-17 2009-07-09 Toolgen, Inc. Zinc finger domain libraries
US20100305001A1 (en) 2007-08-28 2010-12-02 The Johns Hopkins University Functional assay for indentification of loss-of-function mutations in genes
US8153432B2 (en) 2006-10-25 2012-04-10 President And Fellows Of Harvard College Multiplex automated genome engineering
WO2012142591A2 (en) 2011-04-14 2012-10-18 The Regents Of The University Of Colorado Compositions, methods and uses for multiplex protein sequence activity relationship mapping
WO2013176772A1 (en) 2012-05-25 2013-11-28 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
WO2013176915A1 (en) 2012-05-25 2013-11-28 Roman Galetto Methods for engineering allogeneic and immunosuppressive resistant t cell for immunotherapy
WO2014022702A2 (en) 2012-08-03 2014-02-06 The Regents Of The University Of California Methods and compositions for controlling gene expression by rna processing
US20140089681A1 (en) 2004-06-30 2014-03-27 Fujitsu Semiconductor Limited Secure processor and a program for a secure processor
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
WO2014065596A1 (en) 2012-10-23 2014-05-01 Toolgen Incorporated Composition for cleaving a target dna comprising a guide rna specific for the target dna and cas protein-encoding nucleic acid or cas protein, and use thereof
US20140121118A1 (en) 2010-11-23 2014-05-01 Opx Biotechnologies, Inc. Methods, systems and compositions regarding multiplex construction protein amino-acid substitutions and identification of sequence-activity relationships, to provide gene replacement such as with tagged mutant genes, such as via efficient homologous recombination
WO2014093595A1 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
WO2014099744A1 (en) 2012-12-17 2014-06-26 President And Fellows Of Harvard College Rna-guided human genome engineering
US20140199767A1 (en) 2005-08-26 2014-07-17 Dupont Nutrition Biosciences Aps Use
WO2014110006A1 (en) 2013-01-10 2014-07-17 Ge Healthcare Dharmacon, Inc. Templates, libraries, kits and methods for generating molecules
WO2014143381A1 (en) 2013-03-09 2014-09-18 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple crispr/cas selections of recombineering events
US20140273232A1 (en) 2012-12-12 2014-09-18 The Broad Institute, Inc. Engineering of systems, methods and optimized guide compositions for sequence manipulation
US20140273226A1 (en) 2013-03-15 2014-09-18 System Biosciences, Llc Crispr/cas systems for genomic modification and gene modulation
WO2014150624A1 (en) 2013-03-14 2014-09-25 Caribou Biosciences, Inc. Compositions and methods of nucleic acid-targeting nucleic acids
US20140295557A1 (en) 2013-03-15 2014-10-02 The General Hospital Corporation Using Truncated Guide RNAs (tru-gRNAs) to Increase Specificity for RNA-Guided Genome Editing
WO2014191128A1 (en) 2013-05-29 2014-12-04 Cellectis Methods for engineering t cells for immunotherapy by using rna-guided cas nuclease system
WO2015006290A1 (en) 2013-07-09 2015-01-15 President And Fellows Of Harvard College Multiplex rna-guided genome engineering
WO2015006747A2 (en) 2013-07-11 2015-01-15 Moderna Therapeutics, Inc. Compositions comprising synthetic polynucleotides encoding crispr related proteins and synthetic sgrnas and methods of use.
EP2828386A1 (en) 2012-03-20 2015-01-28 Vilnius University RNA-DIRECTED DNA CLEAVAGE BY THE Cas9-crRNA COMPLEX
WO2015013583A2 (en) 2013-07-26 2015-01-29 President And Fellows Of Harvard College Genome engineering
WO2015017866A1 (en) 2013-08-02 2015-02-05 Enevolv, Inc. Processes and host cells for genome, pathway, and biomolecular engineering
US20150064138A1 (en) 2013-09-05 2015-03-05 Massachusetts Institute Of Technology Tuning microbial populations with programmable nucleases
EP2848690A1 (en) 2012-12-12 2015-03-18 The Broad Institute, Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
US20150079680A1 (en) 2013-09-18 2015-03-19 Kymab Limited Methods, cells & organisms
WO2015048690A1 (en) 2013-09-27 2015-04-02 The Regents Of The University Of California Optimized small guide rnas and methods of use
WO2015048577A2 (en) 2013-09-27 2015-04-02 Editas Medicine, Inc. Crispr-related methods and compositions
US20150098954A1 (en) 2013-10-08 2015-04-09 Elwha Llc Compositions and Methods Related to CRISPR Targeting
US20150133315A1 (en) 2013-11-07 2015-05-14 Massachusetts Institute Of Technology Cell-based genomic recorded accumulative memory
WO2015068785A1 (en) 2013-11-06 2015-05-14 国立大学法人広島大学 Vector for nucleic acid insertion
WO2015069682A2 (en) 2013-11-05 2015-05-14 President And Fellows Of Harvard College Precise microbiota engineering at the cellular level
WO2015071474A2 (en) 2013-11-18 2015-05-21 Crispr Therapeutics Ag Crispr-cas system materials and methods
US20150159174A1 (en) 2013-12-11 2015-06-11 Regeneron Pharmaceutical, Inc. Methods and Compositions for the Targeted Modification of a Genome
WO2015089354A1 (en) 2013-12-12 2015-06-18 The Broad Institute Inc. Compositions and methods of use of crispr-cas systems in nucleotide repeat disorders
US20150176013A1 (en) 2013-04-04 2015-06-25 President And Fellows Of Harvard College THERAPEUTIC USES OF GENOME EDITING WITH CRISPR/Cas SYSTEMS
EP2898075A1 (en) 2012-12-12 2015-07-29 The Broad Institute, Inc. Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation
US20150225773A1 (en) 2014-02-13 2015-08-13 Clontech Laboratories, Inc. Methods of depleting a target molecule from an initial collection of nucleic acids, and compositions and kits for practicing the same
WO2015123339A1 (en) 2014-02-11 2015-08-20 The Regents Of The University Of Colorado, A Body Corporate Crispr enabled multiplexed genome engineering
WO2015153889A2 (en) 2014-04-02 2015-10-08 University Of Florida Research Foundation, Incorporated Materials and methods for the treatment of latent viral infection
WO2015159087A1 (en) 2014-04-17 2015-10-22 Green Biologics Limited Targeted mutations
WO2015159086A1 (en) 2014-04-17 2015-10-22 Green Biologics Limited Deletion mutations
WO2015179540A1 (en) 2014-05-20 2015-11-26 Regents Of The University Of Minnesota Method for editing a genetic sequence
US20150353917A1 (en) 2014-06-05 2015-12-10 Sangamo Biosciences, Inc. Methods and compositions for nuclease design
US20150353905A1 (en) 2013-01-16 2015-12-10 Emory University Cas9-nucleic acid complexes and uses related thereto
WO2015191693A2 (en) 2014-06-10 2015-12-17 Massachusetts Institute Of Technology Method for gene editing
WO2015195798A1 (en) 2014-06-17 2015-12-23 Poseida Therapeutics, Inc. A method for directing proteins to specific loci in the genome and uses thereof
WO2015198020A1 (en) 2014-06-26 2015-12-30 University Of Leicester Cloning
US20160053304A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Depleting Target Sequences Using CRISPR
US20160053272A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Modifying A Sequence Using CRISPR
US20160076093A1 (en) 2014-08-04 2016-03-17 University Of Washington Multiplex homology-directed repair
WO2016040594A1 (en) 2014-09-10 2016-03-17 The Regents Of The University Of California Reconstruction of ancestral cells by enzymatic recording
US20160102322A1 (en) 2014-10-09 2016-04-14 Life Technologies Corporation Crispr oligonucleotides and gene editing
EP3009511A2 (en) 2015-06-18 2016-04-20 The Broad Institute, Inc. Novel crispr enzymes and systems
WO2016070037A2 (en) 2014-10-31 2016-05-06 Massachusetts Institute Of Technology Massively parallel combinatorial genetics for crispr
WO2016099887A1 (en) 2014-12-17 2016-06-23 E. I. Du Pont De Nemours And Company Compositions and methods for efficient gene editing in e. coli using guide rna/cas endonuclease systems in combination with circular polynucleotide modification templates
WO2016100955A2 (en) 2014-12-20 2016-06-23 Identifygenomics, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
WO2016106239A1 (en) 2014-12-23 2016-06-30 The Regents Of The University Of California Methods and compositions for nucleic acid integration
US9458439B2 (en) 1999-02-03 2016-10-04 Children's Medical Center Corporation Chromosomal modification involving the induction of double-stranded DNA cleavage and homologous recombination at the cleavage site
US20160289675A1 (en) 2014-12-03 2016-10-06 Agilent Technologies, Inc. Guide RNA with chemical modifications
US20160298134A1 (en) 2012-12-06 2016-10-13 Sigma-Aldrich Co. Llc Crispr-based genome modification and regulation
US20160298097A1 (en) 2013-11-19 2016-10-13 President And Fellows Of Harvard College Mutant Cas9 Proteins
WO2016166340A1 (en) 2015-04-16 2016-10-20 Wageningen Universiteit Nuclease-mediated genome editing
US20160333389A1 (en) 2013-08-09 2016-11-17 President And Fellows Of Harvard College Nuclease profiling system
WO2016186953A1 (en) 2015-05-15 2016-11-24 Pioneer Hi Bred International Inc Guide rna/cas endonuclease systems
WO2016196805A1 (en) 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
WO2016205554A1 (en) 2015-06-17 2016-12-22 Poseida Therapeutics, Inc. Compositions and methods for directing proteins to specific loci in the genome
WO2016205749A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
WO2016205764A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
WO2016205613A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Crispr enzyme mutations reducing off-target effects
US20170002339A1 (en) 2014-01-24 2017-01-05 North Carolina State University Methods and Compositions for Sequences Guiding Cas9 Targeting
WO2017004261A1 (en) 2015-06-29 2017-01-05 Ionis Pharmaceuticals, Inc. Modified crispr rna and modified single crispr rna and uses thereof
WO2017015015A1 (en) 2015-07-17 2017-01-26 Emory University Crispr-associated protein from francisella and uses related thereto
WO2017019867A1 (en) 2015-07-28 2017-02-02 Danisco Us Inc Genome editing systems and methods of use
US20170051311A1 (en) 2014-05-02 2017-02-23 Tufts University Methods and apparatus for transformation of naturally competent cells
WO2017031483A1 (en) 2015-08-20 2017-02-23 Applied Stemcell, Inc. Nuclease with enhanced efficiency of genome editing
US20170058272A1 (en) 2015-08-31 2017-03-02 Caribou Biosciences, Inc. Directed nucleic acid repair
US20170080107A1 (en) 2015-09-21 2017-03-23 Arcturus Therapeutics, Inc. Allele selective gene editing and uses thereof
WO2017053713A1 (en) 2015-09-25 2017-03-30 Tarveda Therapeutics, Inc. Compositions and methods for genome editing
WO2017066588A2 (en) 2015-10-16 2017-04-20 Temple University - Of The Commonwealth System Of Higher Education Methods and compositions utilizing cpf1 for rna-guided gene editing
WO2017070605A1 (en) 2015-10-22 2017-04-27 The Broad Institute Inc. Type vi-b crispr enzymes and systems
WO2017068120A1 (en) 2015-10-22 2017-04-27 Institut National De La Sante Et De La Recherche Medicale (Inserm) Endonuclease-barcoding
US20170114334A1 (en) 2014-06-25 2017-04-27 Caribou Biosciences, Inc. RNA Modification to Engineer Cas9 Activity
US20170114369A1 (en) 2015-10-23 2017-04-27 Caribou Biosciences, Inc. Engineered Nucleic-Acid Targeting Nucleic Acids
US20170145425A1 (en) 2014-08-06 2017-05-25 Toolgen Incorporated Genome editing using campylobacter jejuni crispr/cas system-derived rgen
WO2017089767A1 (en) 2015-11-26 2017-06-01 Dnae Group Holdings Limited Single molecule controls
US20170159045A1 (en) 2015-12-07 2017-06-08 Zymergen, Inc. Microbial strain improvement by a htp genomic engineering platform
WO2017100343A1 (en) 2015-12-07 2017-06-15 Arc Bio, Llc Methods and compositions for the making and using of guide nucleic acids
WO2017099494A1 (en) 2015-12-08 2017-06-15 기초과학연구원 Genome editing composition comprising cpf1, and use thereof
WO2017100377A1 (en) 2015-12-07 2017-06-15 Zymergen, Inc. Microbial strain improvement by a htp genomic engineering platform
WO2017109167A2 (en) 2015-12-24 2017-06-29 B.R.A.I.N. Ag Reconstitution of dna-end repair pathway in prokaryotes
US20170191123A1 (en) 2014-05-28 2017-07-06 Toolgen Incorporated Method for Sensitive Detection of Target DNA Using Target-Specific Nuclease
US20170198302A1 (en) 2015-11-17 2017-07-13 The Chinese University Of Hong Kong Methods and systems for targeted gene manipulation
US20170204407A1 (en) 2014-07-14 2017-07-20 The Regents Of The University Of California Crispr/cas transcriptional modulation
WO2017127807A1 (en) 2016-01-22 2017-07-27 The Broad Institute Inc. Crystal structure of crispr cpf1
US20170218349A1 (en) 2016-02-02 2017-08-03 Sangamo Biosciences, Inc. Compositions for linking dna-binding domains and cleavage domains
US20170226533A1 (en) 2014-08-13 2017-08-10 E I Du Pont De Nemours And Company Genetic targeting in non-conventional yeast using an rna-guided endonuclease
US20170233752A1 (en) 2011-12-16 2017-08-17 Targetgene Biotechnologies Ltd. Compositions and Methods for Modifying a Predetermined Target Nucleic Acid Sequence
US20170233756A1 (en) * 2016-02-15 2017-08-17 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
US20170369870A1 (en) * 2016-06-24 2017-12-28 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries

Family Cites Families (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US1028757A (en) 1910-07-28 1912-06-04 Charles Margerison Combined curtain-pole and shade-support.
US1035187A (en) 1910-08-04 1912-08-13 Crescent Machine Company Frame for wood-planing machines.
US1024016A (en) 1910-10-29 1912-04-23 Emma R Bowne Gas-burner.
US1029447A (en) 1910-11-22 1912-06-11 Burtren Alexander Holden Lifting-jack.
US1001776A (en) 1911-01-12 1911-08-29 Augustin Scohy Railroad switch and frog.
US1001184A (en) 1911-04-20 1911-08-22 Charles M Coover Non-slipping device.
US1036444A (en) 1911-08-07 1912-08-20 Albert Burger Binder-truck.
US1026684A (en) 1911-09-16 1912-05-21 Emil A Lauer Lamp.
US1033702A (en) 1911-11-13 1912-07-23 Frederick Johnson Bed-spring tightener.
US6391582B2 (en) 1998-08-14 2002-05-21 Rigel Pharmaceuticlas, Inc. Shuttle vectors
SE9900530D0 (en) 1999-02-15 1999-02-15 Vincenzo Vassarotti A device for concentrating and / or purifying macromolecules in a solution and a method for manufacturing such a device
US6986993B1 (en) 1999-08-05 2006-01-17 Cellomics, Inc. System for cell-based screening
WO2002010183A1 (en) 2000-07-31 2002-02-07 Menzel, Rolf Compositions and methods for directed gene assembly
US20020139741A1 (en) 2001-03-27 2002-10-03 Henry Kopf Integral gasketed filtration cassette article and method of making the same
US7166443B2 (en) 2001-10-11 2007-01-23 Aviva Biosciences Corporation Methods, compositions, and automated systems for separating rare cells from fluid samples
EP1476547B1 (en) 2002-01-23 2006-12-06 The University of Utah Research Foundation Targeted chromosomal mutagenesis using zinc finger nucleases
CA2534874A1 (en) 2002-08-13 2004-02-19 National Jewish Medical And Research Center Method for identifying mhc-presented peptide epitopes for t cells
US20040138154A1 (en) 2003-01-13 2004-07-15 Lei Yu Solid surface for biomolecule delivery and high-throughput assay
EP2155868A2 (en) 2007-04-19 2010-02-24 Codon Devices, Inc Engineered nucleases and their uses for nucleic acid assembly
US9017966B2 (en) 2007-05-23 2015-04-28 Nature Technology Corporation E. coli plasmid DNA production
GB0724860D0 (en) 2007-12-20 2008-01-30 Heptares Therapeutics Ltd Screening
DK2279253T3 (en) 2008-04-09 2017-02-13 Maxcyte Inc Construction and application of therapeutic compositions of freshly isolated cells
US9845455B2 (en) 2008-05-15 2017-12-19 Ge Healthcare Bio-Sciences Ab Method for cell expansion
US20100076057A1 (en) 2008-09-23 2010-03-25 Northwestern University TARGET DNA INTERFERENCE WITH crRNA
JP5771147B2 (en) 2008-09-26 2015-08-26 トカジェン インコーポレーテッド Gene therapy vector and cytosine deaminase
EP2206723A1 (en) 2009-01-12 2010-07-14 Bonas, Ulla Modular DNA-binding domains
WO2010093966A2 (en) 2009-02-12 2010-08-19 Fred Hutchinson Cancer Research Center Generation of a dna nicking enzyme that stimulates site-specific gene conversion from a homing endonuclease
GB0922434D0 (en) 2009-12-22 2010-02-03 Ucb Pharma Sa antibodies and fragments thereof
PL2816112T3 (en) 2009-12-10 2019-03-29 Regents Of The University Of Minnesota Tal effector-mediated DNA modification
SG185481A1 (en) 2010-05-10 2012-12-28 Univ California Endoribonuclease compositions and methods of use thereof
EP2395087A1 (en) 2010-06-11 2011-12-14 Icon Genetics GmbH System and method of modular cloning
US9361427B2 (en) 2011-02-01 2016-06-07 The Regents Of The University Of California Scar-less multi-part DNA assembly design automation
US8332160B1 (en) 2011-11-17 2012-12-11 Amyris Biotechnologies, Inc. Systems and methods for engineering nucleic acid constructs using scoring techniques
WO2014004393A1 (en) 2012-06-25 2014-01-03 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
EP2877490B1 (en) 2012-06-27 2018-09-05 The Trustees of Princeton University Split inteins, conjugates and uses thereof
ES2757623T3 (en) 2012-07-25 2020-04-29 Broad Inst Inc Inducible DNA binding proteins and genomic disruption tools and applications thereof
BR112015013784A2 (en) 2012-12-12 2017-07-11 Massachusetts Inst Technology application, manipulation and optimization of systems, methods and compositions for sequence manipulation and therapeutic applications
EP2931899A1 (en) 2012-12-12 2015-10-21 The Broad Institute, Inc. Functional genomics using crispr-cas systems, compositions, methods, knock out libraries and applications thereof
US9499855B2 (en) 2013-03-14 2016-11-22 Elwha Llc Compositions, methods, and computer systems related to making and administering modified T cells
US9816088B2 (en) 2013-03-15 2017-11-14 Abvitro Llc Single cell bar-coding for antibody discovery
EP3013460A1 (en) 2013-06-25 2016-05-04 Tetra Laval Holdings & Finance SA Membrane filtration device having a hygienic suspension arrangement
US9322037B2 (en) 2013-09-06 2016-04-26 President And Fellows Of Harvard College Cas9-FokI fusion proteins and uses thereof
WO2015059690A1 (en) 2013-10-24 2015-04-30 Yeda Research And Development Co. Ltd. Polynucleotides encoding brex system polypeptides and methods of using s ame
US10627411B2 (en) 2014-03-27 2020-04-21 British Columbia Cancer Agency Branch T-cell epitope identification
JP2017509350A (en) 2014-04-03 2017-04-06 マサチューセッツ インスティテュート オブ テクノロジー Methods and compositions for the generation of guide RNA
EP3137601B1 (en) 2014-04-29 2020-04-08 Illumina, Inc. Multiplexed single cell gene expression analysis using template switch and tagmentation
BR112017001567A2 (en) 2014-07-25 2017-11-21 Novogy Inc promoters derived from yarrowia lipolytica and arxula adeninivorans, and methods of using them
BR112017007923B1 (en) 2014-10-17 2023-12-12 The Penn State Research Foundation METHOD FOR PRODUCING GENETIC MANIPULATION MEDIATED BY MULTIPLEX REACTIONS WITH RNA IN A RECEIVING CELL, CONSTRUCTION OF NUCLEIC ACID, EXPRESSION CASSETTE, VECTOR, RECEIVING CELL AND GENETICALLY MODIFIED CELL
WO2016110453A1 (en) 2015-01-06 2016-07-14 Dsm Ip Assets B.V. A crispr-cas system for a filamentous fungal host cell
AU2016226077B2 (en) 2015-03-03 2021-12-23 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases with altered PAM specificity
WO2016145416A2 (en) 2015-03-11 2016-09-15 The Broad Institute, Inc. Proteomic analysis with nucleic acid identifiers
WO2016146618A1 (en) 2015-03-16 2016-09-22 Max-Delbrück-Centrum für Molekulare Medizin Method of detecting new immunogenic t cell epitopes and isolating new antigen-specific t cell receptors by means of an mhc cell library
WO2016168275A1 (en) 2015-04-13 2016-10-20 Maxcyte, Inc. Methods and compositions for modifying genomic dna
KR20180021137A (en) 2015-06-25 2018-02-28 아이셀 진 테라퓨틱스 엘엘씨 Chimeric antigen receptor (CAR), compositions and methods for their use
US20180200342A1 (en) 2015-07-13 2018-07-19 Institut Pasteur Improving sequence-specific antimicrobials by blocking dna repair
US9926546B2 (en) 2015-08-28 2018-03-27 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
US9512446B1 (en) 2015-08-28 2016-12-06 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases
WO2017053902A1 (en) 2015-09-25 2017-03-30 Abvitro Llc High throughput process for t cell receptor target identification of natively-paired t cell receptor sequences
US11092607B2 (en) 2015-10-28 2021-08-17 The Board Institute, Inc. Multiplex analysis of single cell constituents
WO2017075294A1 (en) 2015-10-28 2017-05-04 The Board Institute Inc. Assays for massively combinatorial perturbation profiling and cellular circuit reconstruction
US11261435B2 (en) 2015-11-05 2022-03-01 Agency For Science, Technology And Research Chemical-inducible genome engineering technology
EP3374494A4 (en) 2015-11-11 2019-05-01 Coda Biotherapeutics, Inc. Crispr compositions and methods of using the same for gene therapy
US11085057B2 (en) 2015-12-02 2021-08-10 The Regents Of The University Of California Compositions and methods for modifying a target nucleic acid
WO2017106414A1 (en) 2015-12-18 2017-06-22 Danisco Us Inc. Methods and compositions for polymerase ii (pol-ii) based guide rna expression
EP3402890A1 (en) 2016-01-12 2018-11-21 SQZ Biotechnologies Company Intracellular delivery of complexes
EP3199632A1 (en) 2016-01-26 2017-08-02 ACIB GmbH Temperature-inducible crispr/cas system
SG11201808014SA (en) 2016-03-18 2018-10-30 Qt Holdings Corp Compositions, devices, and methods for cell separation
US11802281B2 (en) 2016-04-04 2023-10-31 Eth Zurich Mammalian cell line for protein production and library generation
EP3445856A1 (en) * 2016-04-19 2019-02-27 The Broad Institute Inc. Novel crispr enzymes and systems
US11499168B2 (en) 2016-04-25 2022-11-15 Universitat Basel Allele editing and applications thereof
EP3452499A2 (en) 2016-05-06 2019-03-13 Juno Therapeutics, Inc. Genetically engineered cells and methods of making the same
GB2552861B (en) 2016-06-02 2019-05-15 Sigma Aldrich Co Llc Using programmable DNA binding proteins to enhance targeted genome modification
WO2017212400A2 (en) 2016-06-06 2017-12-14 The University Of Chicago Proximity-dependent split rna polymerases as a versatile biosensor platform
JP2019522481A (en) 2016-06-22 2019-08-15 アイカーン スクール オブ メディシン アット マウント サイナイ Viral delivery of RNA using self-cleaving ribozymes and its CRISPR-based application
US20190264193A1 (en) 2016-08-12 2019-08-29 Caribou Biosciences, Inc. Protein engineering methods
US20200199599A1 (en) 2016-09-23 2020-06-25 Dsm Ip Assets B.V. A guide-rna expression system for a host cell
EP3526326A4 (en) 2016-10-12 2020-07-29 The Regents of The University of Colorado, A Body Corporate Novel engineered and chimeric nucleases
EP3535290B1 (en) 2016-11-07 2024-01-10 Genovie AB An engineered two-part cellular device for discovery and characterisation of t-cell receptor interaction with cognate antigen
AU2018221730B2 (en) 2017-02-15 2024-06-20 Novo Nordisk A/S Donor repair templates multiplex genome editing
US11739335B2 (en) 2017-03-24 2023-08-29 CureVac SE Nucleic acids encoding CRISPR-associated proteins and uses thereof
WO2018191715A2 (en) 2017-04-14 2018-10-18 Synthetic Genomics, Inc. Polypeptides with type v crispr activity and uses thereof
BR112019021719A2 (en) 2017-04-21 2020-06-16 The General Hospital Corporation CPF1 VARIANT (CAS12A) WITH CHANGED PAM SPECIFICITY
RU2769475C2 (en) 2017-06-23 2022-04-01 Инскрипта, Инк. Nucleic acid-directed nucleases
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) * 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
DK3645719T3 (en) 2017-06-30 2022-05-16 Inscripta Inc Automated cell processing methods, modules, instruments and systems
CA3075532A1 (en) 2017-09-15 2019-03-21 The Board Of Trustees Of The Leland Stanford Junior University Multiplex production and barcoding of genetically engineered cells
US20200263197A1 (en) 2017-10-12 2020-08-20 The Jackson Laboratory Transgenic selection methods and compositions
US20190225928A1 (en) 2018-01-22 2019-07-25 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems comprising filtration devices
WO2019200004A1 (en) 2018-04-13 2019-10-17 Inscripta, Inc. Automated cell processing instruments comprising reagent cartridges
US10508273B2 (en) 2018-04-24 2019-12-17 Inscripta, Inc. Methods for identifying selective binding pairs
US10227576B1 (en) 2018-06-13 2019-03-12 Caribou Biosciences, Inc. Engineered cascade components and cascade complexes
EP3813974A4 (en) 2018-06-30 2022-08-03 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
SG11202100320QA (en) 2018-07-26 2021-02-25 Ospedale Pediatrico Bambino Gesù Opbg Therapeutic preparations of gamma-delta t cells and natural killer cells and methods for manufacture and use
CN112955540A (en) 2018-08-30 2021-06-11 因思科瑞普特公司 Improved detection of nuclease edited sequences in automated modules and instruments
GB201816522D0 (en) 2018-10-10 2018-11-28 Autolus Ltd Methods and reagents for analysing nucleic acids from single cells
CA3139122C (en) 2019-06-06 2023-04-25 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US10927385B2 (en) 2019-06-25 2021-02-23 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
US10704033B1 (en) 2019-12-13 2020-07-07 Inscripta, Inc. Nucleic acid-guided nucleases

Patent Citations (152)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6322969B1 (en) 1998-05-27 2001-11-27 The Regents Of The University Of California Method for preparing permuted, chimeric nucleic acid libraries
US20170037434A1 (en) 1999-02-03 2017-02-09 Children's Medical Center Corporation Chromosomal modification involving the induction of double-stranded dna cleavage and homologous recombination at the cleavage site
US9458439B2 (en) 1999-02-03 2016-10-04 Children's Medical Center Corporation Chromosomal modification involving the induction of double-stranded DNA cleavage and homologous recombination at the cleavage site
US6562594B1 (en) 1999-09-29 2003-05-13 Diversa Corporation Saturation mutagenesis in directed evolution
US20080287317A1 (en) 2001-08-15 2008-11-20 Charles Boone Yeast arrays, methods of making such arrays, and methods of analyzing such arrays
US20090176653A1 (en) 2001-08-17 2009-07-09 Toolgen, Inc. Zinc finger domain libraries
WO2003106654A2 (en) 2002-06-14 2003-12-24 Diversa Corporation Xylanases, nucleic adics encoding them and methods for making and using them
US20140089681A1 (en) 2004-06-30 2014-03-27 Fujitsu Semiconductor Limited Secure processor and a program for a secure processor
US20140199767A1 (en) 2005-08-26 2014-07-17 Dupont Nutrition Biosciences Aps Use
WO2007144770A2 (en) 2006-06-16 2007-12-21 Danisco A/S Bacterium
US20100034924A1 (en) 2006-06-16 2010-02-11 Christophe Fremaux Bacterium
US20150201634A1 (en) 2006-06-16 2015-07-23 Dupont Nutrition Biosciences Aps Bacterium
US8569041B2 (en) 2006-10-25 2013-10-29 President And Fellows Of Harvard College Multiplex automated genome engineering
US8153432B2 (en) 2006-10-25 2012-04-10 President And Fellows Of Harvard College Multiplex automated genome engineering
US20100305001A1 (en) 2007-08-28 2010-12-02 The Johns Hopkins University Functional assay for indentification of loss-of-function mutations in genes
US20140121118A1 (en) 2010-11-23 2014-05-01 Opx Biotechnologies, Inc. Methods, systems and compositions regarding multiplex construction protein amino-acid substitutions and identification of sequence-activity relationships, to provide gene replacement such as with tagged mutant genes, such as via efficient homologous recombination
US20150368639A1 (en) 2011-04-14 2015-12-24 Ryan T. Gill Compositions, methods and uses for multiplex protein sequence activity relationship mapping
WO2012142591A2 (en) 2011-04-14 2012-10-18 The Regents Of The University Of Colorado Compositions, methods and uses for multiplex protein sequence activity relationship mapping
US20170067046A1 (en) 2011-04-14 2017-03-09 The Regents Of The University Of Colorado, A Body Corporate Compositions, methods and uses for multiplex protein sequence activity relationship mapping
US20170233752A1 (en) 2011-12-16 2017-08-17 Targetgene Biotechnologies Ltd. Compositions and Methods for Modifying a Predetermined Target Nucleic Acid Sequence
EP2828386A1 (en) 2012-03-20 2015-01-28 Vilnius University RNA-DIRECTED DNA CLEAVAGE BY THE Cas9-crRNA COMPLEX
WO2013176772A1 (en) 2012-05-25 2013-11-28 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
WO2013176915A1 (en) 2012-05-25 2013-11-28 Roman Galetto Methods for engineering allogeneic and immunosuppressive resistant t cell for immunotherapy
US20140068797A1 (en) 2012-05-25 2014-03-06 University Of Vienna Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20170051310A1 (en) 2012-05-25 2017-02-23 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20160068864A1 (en) 2012-05-25 2016-03-10 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20160060654A1 (en) 2012-05-25 2016-03-03 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20160060653A1 (en) 2012-05-25 2016-03-03 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
WO2014022702A2 (en) 2012-08-03 2014-02-06 The Regents Of The University Of California Methods and compositions for controlling gene expression by rna processing
WO2014065596A1 (en) 2012-10-23 2014-05-01 Toolgen Incorporated Composition for cleaving a target dna comprising a guide rna specific for the target dna and cas protein-encoding nucleic acid or cas protein, and use thereof
US20160298135A1 (en) 2012-12-06 2016-10-13 Sigma-Aldrich Co. Llc Crispr-based genome modification and regulation
US20170073705A1 (en) 2012-12-06 2017-03-16 Sigma-Aldrich Co. Llc Crispr-based genome modification and regulation
US20160298134A1 (en) 2012-12-06 2016-10-13 Sigma-Aldrich Co. Llc Crispr-based genome modification and regulation
US20160298138A1 (en) 2012-12-06 2016-10-13 Sigma-Aldrich Co. Llc Crispr-based genome modification and regulation
EP2840140A1 (en) 2012-12-12 2015-02-25 The Broad Institute, Inc. Crispr-Cas component systems, methods and compositions for sequence manipulation
US20150247150A1 (en) 2012-12-12 2015-09-03 The Broad Institute Inc. Engineering of systems, methods and optimized guide compositions for sequence manipulation
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
WO2014093661A2 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Crispr-cas systems and methods for altering expression of gene products
EP2764103A2 (en) 2012-12-12 2014-08-13 The Broad Institute, Inc. Crispr-cas systems and methods for altering expression of gene products
EP3144390A1 (en) 2012-12-12 2017-03-22 The Broad Institute, Inc. Engineering of systems, methods and optimized guide compositions for sequence manipulation
US8906616B2 (en) 2012-12-12 2014-12-09 The Broad Institute Inc. Engineering of systems, methods and optimized guide compositions for sequence manipulation
WO2014093595A1 (en) 2012-12-12 2014-06-19 The Broad Institute, Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
EP2848690A1 (en) 2012-12-12 2015-03-18 The Broad Institute, Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
US20160115489A1 (en) 2012-12-12 2016-04-28 The Broad Institute Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
US20160115488A1 (en) 2012-12-12 2016-04-28 The Broad Institute Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
US20140273232A1 (en) 2012-12-12 2014-09-18 The Broad Institute, Inc. Engineering of systems, methods and optimized guide compositions for sequence manipulation
EP2898075A1 (en) 2012-12-12 2015-07-29 The Broad Institute, Inc. Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation
EP3064585A1 (en) 2012-12-12 2016-09-07 The Broad Institute, Inc. Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation
EP2825654A1 (en) 2012-12-12 2015-01-21 The Broad Institute, Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
US20150031134A1 (en) 2012-12-12 2015-01-29 The Broad Institute Inc. Crispr-cas component systems, methods and compositions for sequence manipulation
US20160160210A1 (en) 2012-12-17 2016-06-09 President And Fellows Of Harvard College RNA-Guided Human Genome Engineering
WO2014099744A1 (en) 2012-12-17 2014-06-26 President And Fellows Of Harvard College Rna-guided human genome engineering
US20170044569A9 (en) 2012-12-17 2017-02-16 President And Fellows Of Harvard College RNA-Guided Human Genome Engineering
WO2014110006A1 (en) 2013-01-10 2014-07-17 Ge Healthcare Dharmacon, Inc. Templates, libraries, kits and methods for generating molecules
US20150353905A1 (en) 2013-01-16 2015-12-10 Emory University Cas9-nucleic acid complexes and uses related thereto
WO2014143381A1 (en) 2013-03-09 2014-09-18 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple crispr/cas selections of recombineering events
US20160024529A1 (en) 2013-03-09 2016-01-28 Agilent Technologies, Inc. Methods of in vivo engineering of large sequences using multiple crispr/cas selections of recombineering events
WO2014150624A1 (en) 2013-03-14 2014-09-25 Caribou Biosciences, Inc. Compositions and methods of nucleic acid-targeting nucleic acids
US20170051276A1 (en) 2013-03-14 2017-02-23 Caribou Biosciences, Inc. Compositions And Methods Of Nucleic Acid-Targeting Nucleic Acids
US20140273226A1 (en) 2013-03-15 2014-09-18 System Biosciences, Llc Crispr/cas systems for genomic modification and gene modulation
US20160024523A1 (en) 2013-03-15 2016-01-28 The General Hospital Corporation Using Truncated Guide RNAs (tru-gRNAs) to Increase Specificity for RNA-Guided Genome Editing
US20140295557A1 (en) 2013-03-15 2014-10-02 The General Hospital Corporation Using Truncated Guide RNAs (tru-gRNAs) to Increase Specificity for RNA-Guided Genome Editing
US20150176013A1 (en) 2013-04-04 2015-06-25 President And Fellows Of Harvard College THERAPEUTIC USES OF GENOME EDITING WITH CRISPR/Cas SYSTEMS
WO2014191128A1 (en) 2013-05-29 2014-12-04 Cellectis Methods for engineering t cells for immunotherapy by using rna-guided cas nuclease system
WO2015006290A1 (en) 2013-07-09 2015-01-15 President And Fellows Of Harvard College Multiplex rna-guided genome engineering
US20160168592A1 (en) 2013-07-09 2016-06-16 President And Fellows Of Harvard College Multiplex RNA-Guided Genome Engineering
US20160367702A1 (en) 2013-07-11 2016-12-22 Moderna Thrapeutics, Inc. COMPOSITIONS COMPRISING SYNTHETIC POLYNUCLEOTIDES ENCODING CRISPR RELATED PROTEINS AND SYNTHETIC SGRNAs AND METHODS OF USE
WO2015006747A2 (en) 2013-07-11 2015-01-15 Moderna Therapeutics, Inc. Compositions comprising synthetic polynucleotides encoding crispr related proteins and synthetic sgrnas and methods of use.
US20150031133A1 (en) 2013-07-26 2015-01-29 President And Fellows Of Harvard College Genome Engineering
WO2015013583A2 (en) 2013-07-26 2015-01-29 President And Fellows Of Harvard College Genome engineering
WO2015017866A1 (en) 2013-08-02 2015-02-05 Enevolv, Inc. Processes and host cells for genome, pathway, and biomolecular engineering
US20160186168A1 (en) 2013-08-02 2016-06-30 Enevolv, Inc. Processes and host cells for genome, pathway, and biomolecular engineering
US20160333389A1 (en) 2013-08-09 2016-11-17 President And Fellows Of Harvard College Nuclease profiling system
US20150064138A1 (en) 2013-09-05 2015-03-05 Massachusetts Institute Of Technology Tuning microbial populations with programmable nucleases
US20150079680A1 (en) 2013-09-18 2015-03-19 Kymab Limited Methods, cells & organisms
WO2015048690A1 (en) 2013-09-27 2015-04-02 The Regents Of The University Of California Optimized small guide rnas and methods of use
US20160289673A1 (en) 2013-09-27 2016-10-06 The Regents Of The University Of California Optimized small guide rnas and methods of use
WO2015048577A2 (en) 2013-09-27 2015-04-02 Editas Medicine, Inc. Crispr-related methods and compositions
US20150098954A1 (en) 2013-10-08 2015-04-09 Elwha Llc Compositions and Methods Related to CRISPR Targeting
WO2015069682A2 (en) 2013-11-05 2015-05-14 President And Fellows Of Harvard College Precise microbiota engineering at the cellular level
US20160264995A1 (en) 2013-11-06 2016-09-15 Hiroshima University Vector for Nucleic Acid Insertion
WO2015068785A1 (en) 2013-11-06 2015-05-14 国立大学法人広島大学 Vector for nucleic acid insertion
US20150133315A1 (en) 2013-11-07 2015-05-14 Massachusetts Institute Of Technology Cell-based genomic recorded accumulative memory
WO2015070062A1 (en) 2013-11-07 2015-05-14 Massachusetts Institute Of Technology Cell-based genomic recorded accumulative memory
WO2015071474A2 (en) 2013-11-18 2015-05-21 Crispr Therapeutics Ag Crispr-cas system materials and methods
US20160298096A1 (en) 2013-11-18 2016-10-13 Crispr Therapeutics Ag Crispr-cas system materials and methods
US20160298097A1 (en) 2013-11-19 2016-10-13 President And Fellows Of Harvard College Mutant Cas9 Proteins
US20150159174A1 (en) 2013-12-11 2015-06-11 Regeneron Pharmaceutical, Inc. Methods and Compositions for the Targeted Modification of a Genome
WO2015089354A1 (en) 2013-12-12 2015-06-18 The Broad Institute Inc. Compositions and methods of use of crispr-cas systems in nucleotide repeat disorders
US20170002339A1 (en) 2014-01-24 2017-01-05 North Carolina State University Methods and Compositions for Sequences Guiding Cas9 Targeting
WO2015123339A1 (en) 2014-02-11 2015-08-20 The Regents Of The University Of Colorado, A Body Corporate Crispr enabled multiplexed genome engineering
US20170240922A1 (en) 2014-02-11 2017-08-24 The Regents of University of Colorado, a body Corporate Crispr enabled multiplexed genome engineering
US20150225773A1 (en) 2014-02-13 2015-08-13 Clontech Laboratories, Inc. Methods of depleting a target molecule from an initial collection of nucleic acids, and compositions and kits for practicing the same
WO2015153889A2 (en) 2014-04-02 2015-10-08 University Of Florida Research Foundation, Incorporated Materials and methods for the treatment of latent viral infection
WO2015159087A1 (en) 2014-04-17 2015-10-22 Green Biologics Limited Targeted mutations
WO2015159086A1 (en) 2014-04-17 2015-10-22 Green Biologics Limited Deletion mutations
US20170051311A1 (en) 2014-05-02 2017-02-23 Tufts University Methods and apparatus for transformation of naturally competent cells
WO2015179540A1 (en) 2014-05-20 2015-11-26 Regents Of The University Of Minnesota Method for editing a genetic sequence
US20170175143A1 (en) 2014-05-20 2017-06-22 Regents Of The University Of Minnesota Method for editing a genetic sequence
US20170191123A1 (en) 2014-05-28 2017-07-06 Toolgen Incorporated Method for Sensitive Detection of Target DNA Using Target-Specific Nuclease
US20150353917A1 (en) 2014-06-05 2015-12-10 Sangamo Biosciences, Inc. Methods and compositions for nuclease design
WO2015191693A2 (en) 2014-06-10 2015-12-17 Massachusetts Institute Of Technology Method for gene editing
WO2015195798A1 (en) 2014-06-17 2015-12-23 Poseida Therapeutics, Inc. A method for directing proteins to specific loci in the genome and uses thereof
US20170114334A1 (en) 2014-06-25 2017-04-27 Caribou Biosciences, Inc. RNA Modification to Engineer Cas9 Activity
WO2015198020A1 (en) 2014-06-26 2015-12-30 University Of Leicester Cloning
US20170204407A1 (en) 2014-07-14 2017-07-20 The Regents Of The University Of California Crispr/cas transcriptional modulation
US20160053304A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Depleting Target Sequences Using CRISPR
US20160053272A1 (en) 2014-07-18 2016-02-25 Whitehead Institute For Biomedical Research Methods Of Modifying A Sequence Using CRISPR
US20160076093A1 (en) 2014-08-04 2016-03-17 University Of Washington Multiplex homology-directed repair
US20170145425A1 (en) 2014-08-06 2017-05-25 Toolgen Incorporated Genome editing using campylobacter jejuni crispr/cas system-derived rgen
US20170226533A1 (en) 2014-08-13 2017-08-10 E I Du Pont De Nemours And Company Genetic targeting in non-conventional yeast using an rna-guided endonuclease
WO2016040594A1 (en) 2014-09-10 2016-03-17 The Regents Of The University Of California Reconstruction of ancestral cells by enzymatic recording
US20160102322A1 (en) 2014-10-09 2016-04-14 Life Technologies Corporation Crispr oligonucleotides and gene editing
WO2016070037A2 (en) 2014-10-31 2016-05-06 Massachusetts Institute Of Technology Massively parallel combinatorial genetics for crispr
US20160289675A1 (en) 2014-12-03 2016-10-06 Agilent Technologies, Inc. Guide RNA with chemical modifications
WO2016099887A1 (en) 2014-12-17 2016-06-23 E. I. Du Pont De Nemours And Company Compositions and methods for efficient gene editing in e. coli using guide rna/cas endonuclease systems in combination with circular polynucleotide modification templates
WO2016100955A2 (en) 2014-12-20 2016-06-23 Identifygenomics, Llc Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using crispr/cas system proteins
WO2016106239A1 (en) 2014-12-23 2016-06-30 The Regents Of The University Of California Methods and compositions for nucleic acid integration
WO2016166340A1 (en) 2015-04-16 2016-10-20 Wageningen Universiteit Nuclease-mediated genome editing
WO2016186953A1 (en) 2015-05-15 2016-11-24 Pioneer Hi Bred International Inc Guide rna/cas endonuclease systems
WO2016186946A1 (en) 2015-05-15 2016-11-24 Pioneer Hi-Bred International, Inc. Rapid characterization of cas endonuclease systems, pam sequences and guide rna elements
WO2016196805A1 (en) 2015-06-05 2016-12-08 The Regents Of The University Of California Methods and compositions for generating crispr/cas guide rnas
WO2016205554A1 (en) 2015-06-17 2016-12-22 Poseida Therapeutics, Inc. Compositions and methods for directing proteins to specific loci in the genome
WO2016205764A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
WO2016205613A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Crispr enzyme mutations reducing off-target effects
WO2016205749A1 (en) 2015-06-18 2016-12-22 The Broad Institute Inc. Novel crispr enzymes and systems
US20160208243A1 (en) 2015-06-18 2016-07-21 The Broad Institute, Inc. Novel crispr enzymes and systems
EP3009511A2 (en) 2015-06-18 2016-04-20 The Broad Institute, Inc. Novel crispr enzymes and systems
WO2017004261A1 (en) 2015-06-29 2017-01-05 Ionis Pharmaceuticals, Inc. Modified crispr rna and modified single crispr rna and uses thereof
WO2017015015A1 (en) 2015-07-17 2017-01-26 Emory University Crispr-associated protein from francisella and uses related thereto
WO2017019867A1 (en) 2015-07-28 2017-02-02 Danisco Us Inc Genome editing systems and methods of use
WO2017031483A1 (en) 2015-08-20 2017-02-23 Applied Stemcell, Inc. Nuclease with enhanced efficiency of genome editing
US20170058272A1 (en) 2015-08-31 2017-03-02 Caribou Biosciences, Inc. Directed nucleic acid repair
US20170080107A1 (en) 2015-09-21 2017-03-23 Arcturus Therapeutics, Inc. Allele selective gene editing and uses thereof
WO2017053713A1 (en) 2015-09-25 2017-03-30 Tarveda Therapeutics, Inc. Compositions and methods for genome editing
WO2017066588A2 (en) 2015-10-16 2017-04-20 Temple University - Of The Commonwealth System Of Higher Education Methods and compositions utilizing cpf1 for rna-guided gene editing
WO2017068120A1 (en) 2015-10-22 2017-04-27 Institut National De La Sante Et De La Recherche Medicale (Inserm) Endonuclease-barcoding
WO2017070605A1 (en) 2015-10-22 2017-04-27 The Broad Institute Inc. Type vi-b crispr enzymes and systems
US20170211142A1 (en) 2015-10-22 2017-07-27 The Broad Institute, Inc. Novel crispr enzymes and systems
US20170114369A1 (en) 2015-10-23 2017-04-27 Caribou Biosciences, Inc. Engineered Nucleic-Acid Targeting Nucleic Acids
US20170198302A1 (en) 2015-11-17 2017-07-13 The Chinese University Of Hong Kong Methods and systems for targeted gene manipulation
WO2017089767A1 (en) 2015-11-26 2017-06-01 Dnae Group Holdings Limited Single molecule controls
WO2017100377A1 (en) 2015-12-07 2017-06-15 Zymergen, Inc. Microbial strain improvement by a htp genomic engineering platform
WO2017100343A1 (en) 2015-12-07 2017-06-15 Arc Bio, Llc Methods and compositions for the making and using of guide nucleic acids
US20170159045A1 (en) 2015-12-07 2017-06-08 Zymergen, Inc. Microbial strain improvement by a htp genomic engineering platform
WO2017099494A1 (en) 2015-12-08 2017-06-15 기초과학연구원 Genome editing composition comprising cpf1, and use thereof
WO2017109167A2 (en) 2015-12-24 2017-06-29 B.R.A.I.N. Ag Reconstitution of dna-end repair pathway in prokaryotes
WO2017127807A1 (en) 2016-01-22 2017-07-27 The Broad Institute Inc. Crystal structure of crispr cpf1
US20170218349A1 (en) 2016-02-02 2017-08-03 Sangamo Biosciences, Inc. Compositions for linking dna-binding domains and cleavage domains
US20170233756A1 (en) * 2016-02-15 2017-08-17 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
US20170369870A1 (en) * 2016-06-24 2017-12-28 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries
WO2017223538A1 (en) 2016-06-24 2017-12-28 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries

Non-Patent Citations (132)

* Cited by examiner, † Cited by third party
Title
"Dickinson et al. Engineering the Caenorhabditis elegans Genome Using Cas9-Triggered Homologous Recombination; Nat Methods. Oct. 2013; 10(10): 1028-1034; doi: 10.1038/nmeth.2641".
"Withers, et al.Identification of isopentenol biosynthetic genes from Bacillus subtilis by a screening method based on isoprenoid precursor toxicity. Appl Environ Microbiol. Oct. 2007;73(19):6277-83. Epub Aug. 10, 2007.".
Abudayyeh, et al. C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector. Science. Jun. 2, 2016. DOI: 10.1126/science.aaf557.
Agresti, et al. Ultrahigh-throughput screening in drop-based microfluidics for directed evolution. Proc Natl Acad Sci U S A. Mar. 2, 2010;107(9):4004-9. doi: 10.1073/pnas.0910781107. Epub Feb. 8, 2010.
Alper, et al. Engineering yeast transcription machinery for improved ethanol tolerance and production. Science. Dec. 8, 2006;314(5805):1565-8.
Alper, et al. Global transcription machinery engineering: a new approach for improving cellular phenotype. Metab Eng. May 2007;9(3):258-67. Epub Jan. 8, 2007.
Baba, et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. 2006;2:2006.0008. Epub Feb. 21, 2006.
Bakan, et al. ProDy: protein dynamics inferred from theory and experiments. Bioinformatics. Jun. 1, 2011;27(11):1575-7. doi: 10.1093/bioinformatics/btr168. Epub Apr. 5, 2011.
Basak, et al. Enhancing E. coli tolerance towards oxidative stress via engineering its global regulator cAMP receptor protein (CRP). PLoS One. 2012;7(12):e51179. doi: 10.1371/journal.pone.0051179. Epub Dec. 14, 2012.
Bateman, et al. The Pfam protein families database. Nucleic Acids Res. Jan. 1, 2004;32(Database issue):D138-41.
Bhabha, et al. Divergent evolution of protein conformational dynamics in dihydrofolate reductase. Nat Struct Mol Biol. Nov. 2013;20(11):1243-9. doi: 10.1038/nsmb.2676. Epub Sep. 29, 2013.
Bhaya, et al. CRISPR-Cas systems in bacteria and archaea: versatile small RNAs for adaptive defense and regulation. Annu Rev Genet. 2011;45:273-97. doi: 10.1146/annurev-genet-110410-132430.
Bikard, et al. CRISPR interference can prevent natural transformation and virulence acquisition during in vivo bacterial infection. Cell Host Microbe. Aug. 16, 2012;12(2):177-86. doi: 10.1016/j.chom.2012.06.003.
Boehr, et al. The dynamic energy landscape of dihydrofolate reductase catalysis. Science. Sep. 15, 2006;313(5793):1638-42.
Brouns, et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. Aug. 15, 2008;321(5891):960-4. doi: 10.1126/science.1159689.
Browning, et al. Modulation of CRP-dependent transcription at the Escherichia coli acsP2 promoter by nucleoprotein complexes: anti-activation by the nucleoid proteins FIS and IHF. Mol Microbiol. Jan. 2004;51(1):241-54.
Campbell, et al. Structural mechanism for rifampicin inhibition of bacterial rna polymerase. Cell. Mar. 23, 2001;104(6):901-12.
Chang, et al. Structural systems biology evaluation of metabolic thermotolerance in Escherichia coli. Science. Jun. 7, 2013;340(6137)1220-3. doi: 10.1126/science.1234012.
Chen, et al. Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell. Mar. 12, 2015;160(6):1246-60. doi: 10.1016/j.cell.2015.02.038. Epub Mar. 5, 2015.
Chiang, et al. Regulators of oxidative stress response genes in Escherichia coli and their functional conservation in bacteria. Arch Biochem Biophys. Sep. 15, 2012;525(2):161-9. doi: 10.1016/j.abb.2012.02.007. Epub Feb. 20, 2012.
Cong, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. Feb. 15, 2013;339(6121):819-23. doi: 10.1126/science.1231143. Epub Jan. 3, 2013.
Co-pending U.S. Appl. No. 15/116,616, filed Aug. 4, 2016.
Co-pending U.S. Appl. No. 15/630,909, filed Jun. 22, 2017.
Co-pending U.S. Appl. No. 15/632,001, filed Jun. 23, 2017.
Co-pending U.S. Appl. No. 15/632,222, filed Jun. 23, 2017.
Costantino, et al. Enhanced levels of lambda Red-mediated recombinants in mismatch repair mutants. Proc Natl Acad Sci U S A. Dec. 23, 2003;100(26):15748-53. Epub Dec. 12, 2003.
Datta, et al. A set of recombineering plasmids for gram-negative bacteria. Gene. Sep. 1, 2006;379:109-15. Epub May 4, 2006.
Dicarlo, et al. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res. Apr. 2013;41(7):4336-43. doi: 10.1093/nar/gkt135. Epub Mar. 4, 2013.
Dwyer, et al. Role of reactive oxygen species in antibiotic action and resistance. Curr Opin Microbiol. Oct. 2009;12(5):482-9. doi: 10.1016/j.mib.2009.06.018. Epub Jul. 31, 2009.
Ebright, et al. Consensus DNA site for the Escherichia coli catabolite gene activator protein (CAP): CAP exhibits a 450-fold higher affinity for the consensus DNA site than for the E. coli lac DNA site. Nucleic Acids Res. Dec. 25, 1989;17(24):10295-305.
Edgar. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. Oct. 1, 2010;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub Aug. 12, 2010.
Eklund, et al. Altered target site specificity variants of the I-Ppol His-Cys box homing endonuclease. Nucleic Acids Res. 2007;35(17):5839-50. Epub Aug. 24, 2007.
European Search Report dated Jun. 26, 2017 for EP Application No. 15749644.9.
Examination Report dated Jun. 27, 2017 for GB Application No. 1615434.6.
Farasat, et al. Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria. Mol Syst Biol. Jun. 21, 2014;10:731. doi: 10.15252/msb.20134955.
Findlay, et al. Saturation editing of genomic regions by multiplex homology-directed repair. Nature. Sep. 4, 2014;513(7516):120-3. doi: 10.1038/nature13695.
Firth, et al. GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res. Jul. 1, 2008;36(Web Server issue):W281-5. doi: 10.1093/nar/gkn226. Epub Apr. 28, 2008.
Fisher, et al. Enhancing tolerance to short-chain alcohols by engineering the Escherichia coli AcrB efflux pump to secrete the non-native substrate n-butanol. ACS Synth Biol. Jan. 17, 2014;3(1):30-40. doi: 10.1021/sb400065q. Epub Sep. 13, 2013.
Foo, et al. Directed evolution of an E. coli inner membrane transporter for improved efflux of biofuel molecules. Biotechnol Biofuels. May 21, 2013;6(1):81. doi: 10.1186/1754-6834-6-81.
Gao, et al. DNA-guided genome editing using the Natronobacterium gregoryi Argonaute. Nat Biotechnol. May 2, 2016. doi: 10.1038/nbt.3547.
Garst, et al., Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nature Biotechnology 35, 48-55 (2017) doi:10.1038/nbt.3718.
Garst, et al., Strategies for the multiplex mapping of genes to traits. Microbial Cell Factories 2013, 12:99.
Glebes, et al. Comparison of genome-wide selection strategies to identify furfural tolerance genes in Escherichia coli. Biotechnol Bioeng. Jan. 2015;112(1):129-40. doi: 10.1002/bit.25325. Epub Sep. 2, 2014.
Gutierrez-Rios, et al. Regulatory network of Escherichia coli: consistency between literature knowledge and microarray profiles. Genome Res. Nov. 2003;13(11):2435-43.
Hamady, et al. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods. Mar. 2008;5(3):235-7. doi: 10.1038/nmeth.1184. Epub Feb. 10, 2008.
Hhmi. The Fields lab homepage. Cloning Vectors. Available at http://depts.washington.edu/sfields/protocols/pOAD.html. Accessed on Jan. 3, 2017.
Ho, et al. Efficient Reassignment of a Frequent Serine Codon in Wild-Type Escherichia coli. ACS Synth Biol. Feb. 19, 2016;5(2):163-71. doi: 10.1021/acssynbio.5b00197. Epub Nov. 20, 2015.
Hsu, et al., DNA targeting specificity of RNA-guided Cas9 nucleases. Nature Biotechnology. Jul. 21, 2013; 31(9): 827-834.
Hung, et al. Crystal structure of AcrB complexed with linezolid at 3.5 Å resolution. J Struct Funct Genomics. Jun. 2013;14(2):71-5. doi: 10.1007/s10969-013-9154-x. Epub May 15, 2013.
Hwang, et al. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat Biotechnol. Mar. 2013;31(3):227-9. doi: 10.1038/nbt.2501. Epub Jan. 29, 2013.
Ibanez, et al. Mass spectrometry-based metabolomics of single yeast cells. Proc Natl Acad Sci U S A. May 28, 2013;110(22):8790-4. doi: 10.1073/pnas.1209302110. Epub May 13, 2013.
International search report and written opinion dated Jul. 28, 2015 for PCT/US2015/015476.
International search report and written opinion dated Nov. 5, 2012 for PCT/US2012/033799.
International Search Report dated Dec. 26, 2017 for International Patent Application No. PCT/US2017/056344.
International Search Report dated Nov. 29, 2017 for International Patent Application No. PCT/US2017/039146.
Isaacs, et al. Precise manipulation of chromosomes in vivo enables genome-wide codon replacement. Science. Jul. 15, 2011;333(6040):348-53. doi: 10.1126/science.1205822.
Iwakura, et al. Evolutional design of a hyperactive cysteine- and methionine-free mutant of Escherichia coli dihydrofolate reductase. J Biol Chem. May 12, 2006;281(19):13234-46. Epub Mar. 1, 2006.
Jiang, et al. Multigene editing in the Escherichia coli genome via the CRISPR-Cas9 system. Appl Environ Microbiol. Apr. 2015;81(7):2506-14. doi: 10.1128/AEM.04023-14. Epub Jan. 30, 2015.
Jiang, et al. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat Biotechnol. Mar. 2013;31(3):233-9. doi: 10.1038/nbt.2508. Epub Jan. 29, 2013.
Jinek, et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science. Aug. 17, 2012;337(6096):816-21. doi: 10.1126/science.1225829. Epub Jun. 28, 2012.
Kersten, et al. A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nat Chem Biol. Oct. 9, 2011;7(11):794-802. doi: 10.1038/nchembio.684.
Kim, et al. A guide to genome engineering with programmable nucleases. Nat Rev Genet. May 2014;15(5):321-34. doi: 10.1038/nrg3686. Epub Apr. 2, 2014.
Kim, et al. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc Natl Acad Sci U S A. Feb. 6, 1996;93(3):1156-60.
Kleinstiver, et al., Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. Jul. 23, 2015; 523 (7561): 481-5. doi: 10.1038/nature14592.
Kohanski, et al. A common mechanism of cellular death induced by bactericidal antibiotics. Cell. Sep. 7, 2007;130(5):797-810.
Kosuri, et al. Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips. Nat Biotechnol. Dec. 2010;28(12):1295-9. doi: 10.1038/nbt.1716. Epub Nov. 28, 2010.
Kwon, et al. Crystal structure of the Escherichia coli Rob transcription factor in complex with DNA. Nat Struct Biol. May 2000;7(5):424-30.
Lajoie, et al. Genomically recoded organisms expand biological functions. Science. Oct. 18, 2013;342(6156):357-60. doi: 10.1126/science.1241459.
Li, et al. Identification of factors influencing strand bias in oligonucleotide-mediated recombination in Escherichia coli. Nucleic Acids Res. Nov. 15, 2003;31(22):6674-87.
Li, et al. Metabolic engineering of Escherichia coli using CRISPR-Cas9 meditated genome editing. Metab Eng. Sep. 2015;31:13-21. doi: 10.1016/j.ymben.2015.06.006. Epub Jun. 30, 2015.
Liu, et al. Efficient genome editing in filamentous fungus Trichoderma reesei using the CRISPR/Cas9 system. Cell Discovery. 2015; 1:15007. doi:10.1038/celldisc.2015.7.
Makarova et al. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Microbiol 13:722-736 (2015).
Makarova, et al. Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol. Jun. 2011;9(6):467-77. doi: 10.1038/nrmicro2577. Epub May 9, 2011.
Mali, et al. RNA-guided human genome engineering via Cas9. Science. Feb. 15, 2013;339(6121):823-6. doi: 10.1126/science.1232033. Epub Jan. 3, 2013.
Maruyama, et al. Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining. Nat Biotechnol. May 2015;33(5):538-42. doi: 10.1038/nbt.3190. Epub Mar. 23, 2015.
Mills, et al. Cellulosic hydrolysate toxicity and tolerance mechanisms in Escherichia coli. Biotechnol Biofuels. Oct. 15, 2009;2:26. doi: 10.1186/1754-6834-2-26.
Molodtsov, et al. X-ray crystal structures of the Escherichia coli RNA polymerase in complex with benzoxazinorifamycins. J Med Chem. Jun. 13, 2013;56(11):4758-63. doi: 10.1021/jm4004889. Epub May 31, 2013.
Murakami, et al. Structural basis of transcription initiation: RNA polymerase holoenzyme at 4 A resolution. Science. May 17, 2002;296(5571):1280-4.
Nakashima, et al. Structural basis for the inhibition of bacterial multidrug exporters. Nature. Aug. 1, 2013;500(7460):102-6. doi: 10.1038/nature12300. Epub Jun. 30, 2013.
Nakashima, et al. Structures of the multidrug exporter AcrB reveal a proximal multisite drug-binding pocket. Nature. Nov. 27, 2011;480(7378):565-9. doi: 10.1038/nature10641.
NCBI (https://www.ncbi.nlm.nih.gov/protein/WP_055225123?report=genbank&log$=protalign &blast_rank=1&RID=5FPJ4MN4014, Type V CRISPR-associated protein Cpf1 Eubacterium rectale, 2016).
NCBI. Basic Local Alignment Search Tool. Available at https://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessed on Jan. 3, 2017.
Neylon, Cameron., Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution. Nucleic Acids Research, 2004, vol. 32, No. 4. 1448-1459.
Office Action dated Dec. 9, 2016 for U.S. Appl. No. 14/110,072.
Office Action dated Jan. 19, 2018 for U.S. Appl. No. 15/632,001.
Office Action dated Jun. 16, 2016 for U.S. Appl. No. 14/110,072.
Office Action dated Jun. 21, 2017 for U.S. Appl. No. 14/110,072.
Office Action dated Nov. 20, 2017 for U.S. Appl. No. 15/632,222.
Office Action dated Nov. 8, 2017 for U.S. Appl. No. 15/630,909.
Office Action dated Sep. 28, 2017 for U.S. Appl. No. 15/632,001.
Oh, et al. CRISPR-Cas9-assisted recombineering in Lactobacillus reuteri. Nucleic Acids Res. 2014;42(17):e131. doi: 10.1093/nar/gku623. Epub Jul. 29, 2014.
Pines, et al. Codon compression algorithms for saturation mutagenesis. ACS Synth Biol. May 15, 2015;4(5):604-14. doi: 10.1021/sb500282v. Epub Oct. 30, 2014.
Plagens, et al., DNA and RNA interference mechanisms by CRISPR-Cas surveillance complexes, FEMS Microbiology Reviews, May 1, 2015; 39(3): 442-463.
Prior, et al. Broad-host-range vectors for protein expression across gram negative hosts. Biotechnol Bioeng. Jun. 1, 2010;106(2):326-32. doi: 10.1002/bit.22695.
Qi, et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell. Feb. 28, 2013;152(5):1173-83. doi: 10.1016/j.cell.2013.02.022.
Raman, et al. Evolution-guided optimization of biosynthetic pathways. Proc Natl Acad Sci U S A. Dec. 16, 2014;111(50):17803-8. doi: 10.1073/pnas.1409523111. Epub Dec. 1, 2014.
Reynolds, et al. Quantifying Impact of Chromosome Copy Number on Recombination in Escherichia coli. ACS Synth Biol. Jul. 17, 2015;4(7):776-80. doi: 10.1021/sb500338g. Epub Mar. 19, 2015.
Rhee, et al. A novel DNA-binding motif in MarA: the first structure for an AraC family transcriptional activator. Proc Natl Acad Sci U S A. Sep. 1, 1998;95(18):10413-8.
Rice, et al. Crystal structure of an IHF-DNA complex: a protein-induced DNA U-turn. Cell. Dec. 27, 1996;87(7):1295-306.
Rodriguez-Verdugo, et al. Evolution of Escherichia coli rifampicin resistance in an antibiotic-free environment during thermal stress. BMC Evol Biol. Feb. 22, 2013;13:50. doi: 10.1186/1471-2148-13-50.
Ronda, et al. CRMAGE: CRISPR Optimized MAGE Recombineering. Sci Rep. Jan. 22, 2016;6:19452. doi: 10.1038/srep19452.
Ross, et al. A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. Science. Nov. 26, 1993;262(5138):1407-13.
Sandoval, et al. Strategy for directing combinatorial genome engineering in Escherichia coli. Proc Natl Acad Sci U S A. Jun. 26, 2012;109(26):10540-5. doi: 10.1073/pnas.1206299109. Epub Jun. 11, 2012.
Sawitzke, et al. Probing cellular processes with oligo-mediated recombination and using the knowledge gained to optimize recombineering. J Mol Biol. Mar. 18, 2011;407(1):45-59. doi: 10.1016/j.jmb.2011.01.030. Epub Jan. 19, 2011.
Semenova, et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A. Jun. 21, 2011;108(25):10098-103. doi: 10.1073/pnas.1104144108. Epub Jun. 6, 2011.
Shalem, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. Jan. 3, 2014;343(6166):84-7. doi: 10.1126/science.1247005. Epub Dec. 12, 2013.
Shendure. Life after genetics. Genome Med. Oct. 29, 2014;6(10):86. doi: 10.1186/s13073-014-0086-2. eCollection 2014.
Shmakov et al. Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems. Mol Cell 60(3):385-397 (2015).
Smanski, et al. Functional optimization of gene clusters by combinatorial design and assembly. Nat Biotechnol. Dec. 2014;32(12):1241-9. doi: 10.1038/nbt.3063. Epub Nov. 24, 2014.
Stearns, et al., Manipulating yeast genome using plasmid vectors. Methods in Enzymology. 1990, 185:280-297.
Steinmetz, et al. Maximizing the potential of functional genomics. Nat Rev Genet. Mar. 2004;5(3):190-201.
Stoebel, et al. Compensatory evolution of gene regulation in response to stress by Escherichia coli lacking RpoS. PLoS Genet. Oct. 2009;5(10):e1000671. doi: 10.1371/journal.pgen.1000671. Epub Oct. 2, 2009.
Swarts, et al. Argonaute of the archaeon Pyrococcus furiosus is a DNA-guided nuclease that targets cognate DNA. Nucleic Acids Res. May 26, 2015;43(10):5120-9. doi: 10.1093/nar/gkv415. Epub Apr. 29, 2015.
Swarts, et al. DNA-guided DNA interference by a prokaryotic Argonaute. Nature. Mar. 13, 2014;507(7491):258-61. doi: 10.1038/nature12971. Epub Feb. 16, 2014.
Tenaillon, et al. The molecular diversity of adaptive convergence. Science. Jan. 27, 2012;335(6067):457-61. doi: 10.1126/science.1212986.
Toprak, et al. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nat Genet. Dec. 18, 2011;44(1)101-5. doi: 10.1038/ng.1034.
Waaijers, et al. CRISPR/Cas9-targeted mutagenesis in Caenorhabditis elegans. Genetics. Nov. 2013;195(3):1187-91. doi: 10.1534/genetics.113.156299. Epub Aug. 26, 2013.
Wang, et al. Engineering furfural tolerance in Escherichia coli improves the fermentation of lignocellulosic sugars into renewable chemicals. Proc Natl Acad Sci U S A. Mar. 5, 2013;110(10):4021-6. doi: 10.1073/pnas.1217958110. Epub Feb. 19, 2013.
Wang, et al. Genome-scale promoter engineering by coselection MAGE. Nat Methods. Jun. 2012;9(6):591-3. doi: 10.1038/nmeth.1971. Epub Apr. 8, 2012.
Wang, et al. Multiplexed in vivo His-tagging of enzyme pathways for in vitro single-pot multienzyme catalysis. ACS Synth Biol. Feb. 17, 2012;1(2):43-52.
Wang, et al. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell. May 9, 2013;153(4):910-8. doi: 10.1016/j.cell.2013.04.025. Epub May 2, 2013.
Wang, et al. Programming cells by multiplex genome engineering and accelerated evolution. Nature. Aug. 13, 2009;460(7257):894-8. doi: 10.1038/nature08187. Epub Jul. 26, 2009.
Warner, et al. Rapid profiling of a microbial genome using mixtures of barcoded oligonucleotides. Nat Biotechnol. Aug. 2010;28(8):856-62. doi: 10.1038/nbt.1653. Epub Jul. 18, 2010.
Watson, et al. Directed evolution of trimethoprim resistance in Escherichia coli. FEBS J. May 2007;274(10):2661-71. Epub Apr. 19, 2007.
Wetmore, et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio. May 12, 2015;6(3):e00306-15. doi: 10.1128/mBio.00306-15.
White, et al. Role of the acrAB locus in organic solvent tolerance mediated by expression of marA, soxS, or robA in Escherichia coli. J Bacteriol. Oct. 1997;179(19):6122-6.
Wolfe. The acetate switch. Microbiol Mol Biol Rev. Mar. 2005;69(1):12-50.
Zeitoun, et al. Multiplexed tracking of combinatorial genomic mutations in engineered cell populations. Nat Biotechnol. Jun. 2015;33(6):631-7. doi: 10.1038/nbt.3177. Epub Mar. 23, 2015.
Zetsche, et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell. Oct. 22, 2015;163(3):759-71. doi: 10.1016/j.cell.2015.09.038. Epub Sep. 25, 2015.
Zhang, et al., Efficient editing of malaria parasite genome using the CRISPR/Cas9 system. mBio. Jul. 2014; vol. 5 Art. e01414-14.
Zhao, et al. Activity and specificity of the bacterial PD-(D/E)XK homing endonuclease I-Ssp6803I. J Mol Biol. Feb. 6, 2009;385(5):1498-510. doi: 10.1016/j.jmb.2008.10.096. Epub Nov. 12, 2008.
Zheng, et al. Metabolic engineering of Escherichia coli for high-specificity production of isoprenol and prenol as next generation of biofuels. Biotechnol Biofuels. Apr. 24, 2013;6:57. doi: 10.1186/1754-6834-6-57. eCollection 2013.

Cited By (114)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11639511B2 (en) 2014-02-11 2023-05-02 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10435715B2 (en) 2014-02-11 2019-10-08 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US10711284B2 (en) 2014-02-11 2020-07-14 The Regents Of The University Of Colorado CRISPR enabled multiplexed genome engineering
US10731180B2 (en) 2014-02-11 2020-08-04 The Regents Of The University Of Colorado CRISPR enabled multiplexed genome engineering
US11795479B2 (en) 2014-02-11 2023-10-24 The Regents Of The University Of Colorado CRISPR enabled multiplexed genome engineering
US11078498B2 (en) 2014-02-11 2021-08-03 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US11702677B2 (en) 2014-02-11 2023-07-18 The Regents Of The University Of Colorado CRISPR enabled multiplexed genome engineering
US11345933B2 (en) 2014-02-11 2022-05-31 The Regents Of The University Of Colorado CRISPR enabled multiplexed genome engineering
US10669559B2 (en) 2014-02-11 2020-06-02 The Regents Of The University Of Colorado, A Body Corporate CRISPR enabled multiplexed genome engineering
US11293021B1 (en) 2016-06-23 2022-04-05 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
US10294473B2 (en) 2016-06-24 2019-05-21 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries
US11584928B2 (en) 2016-06-24 2023-02-21 The Regents Of The University Of Colorado, A Body Corporate Methods for generating barcoded combinatorial libraries
US11142788B2 (en) 2017-06-13 2021-10-12 Genetics Research, Llc Isolation of target nucleic acids
US10527608B2 (en) 2017-06-13 2020-01-07 Genetics Research, Llc Methods for rare event detection
US10370700B2 (en) 2017-06-13 2019-08-06 Genetics Research, Llc Detection of targeted sequence regions
US10947599B2 (en) 2017-06-13 2021-03-16 Genetics Research, Llc Tumor mutation burden
US11421263B2 (en) 2017-06-13 2022-08-23 Genetics Research, Llc Detection of targeted sequence regions
US11697826B2 (en) 2017-06-23 2023-07-11 Inscripta, Inc. Nucleic acid-guided nucleases
US11220697B2 (en) 2017-06-23 2022-01-11 Inscripta, Inc. Nucleic acid-guided nucleases
US11306327B1 (en) 2017-06-23 2022-04-19 Inscripta, Inc. Nucleic acid-guided nucleases
US10337028B2 (en) 2017-06-23 2019-07-02 Inscripta, Inc. Nucleic acid-guided nucleases
US20200231987A1 (en) * 2017-06-23 2020-07-23 Inscripta, Inc. Nucleic acid-guided nucleases
US11130970B2 (en) 2017-06-23 2021-09-28 Inscripta, Inc. Nucleic acid-guided nucleases
US10435714B2 (en) 2017-06-23 2019-10-08 Inscripta, Inc. Nucleic acid-guided nucleases
US11408012B2 (en) 2017-06-23 2022-08-09 Inscripta, Inc. Nucleic acid-guided nucleases
US11597921B2 (en) 2017-06-30 2023-03-07 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
US11920140B2 (en) 2017-08-22 2024-03-05 Napigen, Inc. Organelle genome modification using polynucleotide guided endonuclease
US10737271B1 (en) 2018-04-13 2020-08-11 Inscripta, Inc. Automated cell processing instruments comprising reagent cartridges
US10995424B2 (en) 2018-04-24 2021-05-04 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US11555184B2 (en) 2018-04-24 2023-01-17 Inscripta, Inc. Methods for identifying selective binding pairs
US11085131B1 (en) 2018-04-24 2021-08-10 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US11396718B2 (en) 2018-04-24 2022-07-26 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US11236441B2 (en) 2018-04-24 2022-02-01 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US10774446B1 (en) 2018-04-24 2020-09-15 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US11332850B2 (en) 2018-04-24 2022-05-17 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US10711374B1 (en) 2018-04-24 2020-07-14 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US11542633B2 (en) 2018-04-24 2023-01-03 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
US11473214B2 (en) 2018-04-24 2022-10-18 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US11293117B2 (en) 2018-04-24 2022-04-05 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US11046928B2 (en) 2018-08-14 2021-06-29 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US11268061B2 (en) 2018-08-14 2022-03-08 Inscripta, Inc. Detection of nuclease edited sequences in automated modules and instruments
US10723995B1 (en) 2018-08-14 2020-07-28 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10760043B2 (en) 2018-08-14 2020-09-01 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US11739290B2 (en) 2018-08-14 2023-08-29 Inscripta, Inc Instruments, modules, and methods for improved detection of edited sequences in live cells
US10801008B1 (en) 2018-08-14 2020-10-13 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10844344B2 (en) 2018-08-14 2020-11-24 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US11214781B2 (en) 2018-10-22 2022-01-04 Inscripta, Inc. Engineered enzyme
WO2020086475A1 (en) * 2018-10-22 2020-04-30 Inscripta, Inc. Engineered enzymes
US11345903B2 (en) 2018-10-22 2022-05-31 Inscripta, Inc. Engineered enzymes
AU2019368215B2 (en) * 2018-10-22 2023-05-18 Inscripta, Inc. Engineered enzymes
US10876102B2 (en) 2018-10-22 2020-12-29 Inscripta, Inc. Engineered enzymes
US10655114B1 (en) 2018-10-22 2020-05-19 Inscripta, Inc. Engineered enzymes
US11066663B2 (en) * 2018-10-31 2021-07-20 Zymergen Inc. Multiplexed deterministic assembly of DNA libraries
CN114045303A (en) * 2018-11-07 2022-02-15 中国农业科学院植物保护研究所 Artificial gene editing system for rice
WO2020097360A1 (en) * 2018-11-07 2020-05-14 The Regents Of The University Of Colorado, A Body Corporate Methods and compositions for genome-wide analysis and use of genome cutting and repair
CN114045303B (en) * 2018-11-07 2023-08-29 中国农业科学院植物保护研究所 Artificial gene editing system for rice
US12024727B2 (en) 2019-02-14 2024-07-02 Metagenomi, Inc. Enzymes with RuvC domains
US10913941B2 (en) * 2019-02-14 2021-02-09 Metagenomi Ip Technologies, Llc Enzymes with RuvC domains
US10982200B2 (en) * 2019-02-14 2021-04-20 Metagenomi Ip Technologies, Llc Enzymes with RuvC domains
US11136572B2 (en) 2019-03-25 2021-10-05 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11306299B2 (en) 2019-03-25 2022-04-19 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11149260B2 (en) 2019-03-25 2021-10-19 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11746347B2 (en) 2019-03-25 2023-09-05 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11274296B2 (en) 2019-03-25 2022-03-15 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11279919B2 (en) 2019-03-25 2022-03-22 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US10815467B2 (en) 2019-03-25 2020-10-27 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11001831B2 (en) 2019-03-25 2021-05-11 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11034945B2 (en) 2019-03-25 2021-06-15 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
US11634719B2 (en) 2019-06-06 2023-04-25 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US11053507B2 (en) 2019-06-06 2021-07-06 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US10837021B1 (en) 2019-06-06 2020-11-17 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US11254942B2 (en) 2019-06-06 2022-02-22 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
US10920189B2 (en) 2019-06-21 2021-02-16 Inscripta, Inc. Genome-wide rationally-designed mutations leading to enhanced lysine production in E. coli
US11078458B2 (en) 2019-06-21 2021-08-03 Inscripta, Inc. Genome-wide rationally-designed mutations leading to enhanced lysine production in E. coli
US10927385B2 (en) 2019-06-25 2021-02-23 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
US11066675B2 (en) 2019-06-25 2021-07-20 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
US20220136014A1 (en) * 2019-10-03 2022-05-05 Artisan Development Labs, Inc. Crispr systems with engineered dual guide nucleic acids
US20230235363A1 (en) * 2019-10-03 2023-07-27 Artisan Development Labs, Inc. Crispr systems with engineered dual guide nucleic acids
US11891609B2 (en) 2019-11-19 2024-02-06 Inscripta, Inc. Methods for increasing observed editing in bacteria
US11203762B2 (en) 2019-11-19 2021-12-21 Inscripta, Inc. Methods for increasing observed editing in bacteria
US11319542B2 (en) 2019-11-19 2022-05-03 Inscripta, Inc. Methods for increasing observed editing in bacteria
US11193115B2 (en) 2019-12-10 2021-12-07 Inscripta, Inc. Mad nucleases
US11174471B2 (en) 2019-12-10 2021-11-16 Inscripta, Inc. Mad nucleases
US11053485B2 (en) 2019-12-10 2021-07-06 Inscripta, Inc. MAD nucleases
US11085030B2 (en) 2019-12-10 2021-08-10 Inscripta, Inc. MAD nucleases
US10883095B1 (en) 2019-12-10 2021-01-05 Inscripta, Inc. Mad nucleases
US10724021B1 (en) 2019-12-13 2020-07-28 Inscripta, Inc. Nucleic acid-guided nucleases
US10745678B1 (en) 2019-12-13 2020-08-18 Inscripta, Inc. Nucleic acid-guided nucleases
US10704033B1 (en) * 2019-12-13 2020-07-07 Inscripta, Inc. Nucleic acid-guided nucleases
US11008557B1 (en) 2019-12-18 2021-05-18 Inscripta, Inc. Cascade/dCas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US11359187B1 (en) 2019-12-18 2022-06-14 Inscripta, Inc. Cascade/dCas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US11104890B1 (en) 2019-12-18 2021-08-31 Inscripta, Inc. Cascade/dCas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US11286471B1 (en) 2019-12-18 2022-03-29 Inscripta, Inc. Cascade/dCas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US11198857B2 (en) 2019-12-18 2021-12-14 Inscripta, Inc. Cascade/dCas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US11667932B2 (en) 2020-01-27 2023-06-06 Inscripta, Inc. Electroporation modules and instrumentation
US11946039B2 (en) 2020-03-31 2024-04-02 Metagenomi, Inc. Class II, type II CRISPR systems
WO2021207651A3 (en) * 2020-04-09 2021-11-18 Verve Therapeutics, Inc. Chemically modified guide rnas for genome editing with cas12b
US11407994B2 (en) 2020-04-24 2022-08-09 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells via viral delivery
US11591592B2 (en) 2020-04-24 2023-02-28 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells using microcarriers
US11845932B2 (en) 2020-04-24 2023-12-19 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells via viral delivery
US11268088B2 (en) 2020-04-24 2022-03-08 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells via viral delivery
US11787841B2 (en) 2020-05-19 2023-10-17 Inscripta, Inc. Rationally-designed mutations to the thrA gene for enhanced lysine production in E. coli
US11299731B1 (en) 2020-09-15 2022-04-12 Inscripta, Inc. CRISPR editing to embed nucleic acid landing pads into genomes of live cells
US11597923B2 (en) 2020-09-15 2023-03-07 Inscripta, Inc. CRISPR editing to embed nucleic acid landing pads into genomes of live cells
US11512297B2 (en) 2020-11-09 2022-11-29 Inscripta, Inc. Affinity tag for recombination protein recruitment
US11306298B1 (en) 2021-01-04 2022-04-19 Inscripta, Inc. Mad nucleases
US11965186B2 (en) 2021-01-04 2024-04-23 Inscripta, Inc. Nucleic acid-guided nickases
US11332742B1 (en) 2021-01-07 2022-05-17 Inscripta, Inc. Mad nucleases
US11884924B2 (en) 2021-02-16 2024-01-30 Inscripta, Inc. Dual strand nucleic acid-guided nickase editing
US20230235362A1 (en) * 2021-02-25 2023-07-27 Artisan Development Labs, Inc. Compositions and methods for targeting, editing, or modifying genes
WO2023028521A1 (en) 2021-08-24 2023-03-02 Inscripta, Inc. Genome-wide rationally-designed mutations leading to enhanced cellobiohydrolase i production in s. cerevisiae
CN113846075A (en) * 2021-11-29 2021-12-28 科稷达隆(北京)生物技术有限公司 MAD7-NLS fusion protein, nucleic acid construct for site-directed editing of plant genome and application thereof
WO2023150637A1 (en) 2022-02-02 2023-08-10 Inscripta, Inc. Nucleic acid-guided nickase fusion proteins
WO2023164636A1 (en) 2022-02-25 2023-08-31 Vor Biopharma Inc. Compositions and methods for homology-directed repair gene modification

Also Published As

Publication number Publication date
US20180371497A1 (en) 2018-12-27
US20190390226A1 (en) 2019-12-26
US20210388391A1 (en) 2021-12-16
US11220697B2 (en) 2022-01-11
US20220195464A1 (en) 2022-06-23
US11306327B1 (en) 2022-04-19
US10435714B2 (en) 2019-10-08
US10626416B2 (en) 2020-04-21
US11130970B2 (en) 2021-09-28
US20210180090A1 (en) 2021-06-17
US20200231987A1 (en) 2020-07-23

Similar Documents

Publication Publication Date Title
US11306327B1 (en) Nucleic acid-guided nucleases
US11408012B2 (en) Nucleic acid-guided nucleases
AU2018289077B2 (en) Nucleic acid-guided nucleases
US20190359976A1 (en) Novel engineered and chimeric nucleases
JP7083364B2 (en) Optimized CRISPR-Cas dual nickase system, method and composition for sequence manipulation
JP6625971B2 (en) Delivery, engineering and optimization of tandem guide systems, methods and compositions for array manipulation
DK2931898T3 (en) CONSTRUCTION AND OPTIMIZATION OF SYSTEMS, PROCEDURES AND COMPOSITIONS FOR SEQUENCE MANIPULATION WITH FUNCTIONAL DOMAINS
US11976308B2 (en) CRISPR DNA targeting enzymes and systems
US20190292568A1 (en) Genomic editing in automated systems

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: M1554); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4