Genome

Source: Wikipedia, the free encyclopedia.

An image of the 46 chromosomes making up the diploid genome of a human male. (The mitochondrial chromosome
is not shown.)

In the fields of molecular biology and genetics, a genome is all the genetic information of an organism.[1] It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as regulatory sequences (see non-coding DNA), and often a substantial fraction of 'junk' DNA with no evident function.[2][3] Almost all eukaryotes have mitochondria and a small mitochondrial genome.[2] Algae and plants also contain chloroplasts with a chloroplast genome.

The study of the genome is called genomics. The genomes of many organisms have been sequenced and various regions have been annotated. The International Human Genome Project reported the sequence of the genome for Homo sapiens in 2004 The Human Genome Project, although the initial "finished" sequence was missing 8% of the genome consisting mostly of repetitive sequences.

With advancements in technology that could handle sequencing of the many repetitive sequences found in human DNA that were not fully uncovered by the original Human Genome Project study, scientists reported the first end-to-end human genome sequence in March, 2022.[4]

Origin of term

The term genome was created in 1920 by Hans Winkler,[5] professor of botany at the University of Hamburg, Germany. The Oxford Dictionary and the Online Etymology Dictionary suggest the name is a blend of the words gene and chromosome.[6][7][8][9] However, see omics for a more thorough discussion. A few related -ome words already existed, such as biome and rhizome, forming a vocabulary into which genome fits systematically.[10]

Defining the genome

It's very difficult to come up with a precise definition of "genome." It usually refers to the DNA (or sometimes RNA) molecules that carry the genetic information in an organism but sometimes it is difficult to decide which molecules to include in the definition; for example, bacteria usually have one or two large DNA molecules (chromosomes) that contain all of the essential genetic material but they also contain smaller extrachromosomal plasmid molecules that carry important genetic information. The definition of 'genome' that's commonly used in the scientific literature is usually restricted to the large chromosomal DNA molecules in bacteria.[11]

Eukaryotic genomes are even more difficult to define because almost all eukaryotic species contain nuclear chromosomes plus extra DNA molecules in the mitochondria. In addition, algae and plants have chloroplast DNA. Most textbooks make a distinction between the nuclear genome and the organelle (mitochondria and chloroplast) genomes so when they speak of, say, the human genome, they are only referring to the genetic material in the nucleus.[2][12] This is the most common use of 'genome' in the scientific literature.

Most eukaryotes are diploid, meaning that there are two copies of each chromosome in the nucleus but the 'genome' refers to only one copy of each chromosome. Some eukaryotes have distinctive sex chromosomes such as the X and Y chromosomes of mammals so the technical definition of the genome must include both copies of the sex chromosomes. When referring to the standard reference genome of humans, for example, it consists of one copy of each of the 22 autosomes plus one X chromosome and one Y chromosome.[13]

Sequencing and mapping

A genome sequence is the complete list of the

In 1976, Walter Fiers at the University of Ghent (Belgium) was the first to establish the complete nucleotide sequence of a viral RNA-genome (Bacteriophage MS2). The next year, Fred Sanger completed the first DNA-genome sequence: Phage Φ-X174, of 5386 base pairs.[14] The first bacterial genome to be sequenced was that of Haemophilus influenzae, completed by a team at The Institute for Genomic Research in 1995. A few months later, the first eukaryotic genome was completed, with sequences of the 16 chromosomes of budding yeast Saccharomyces cerevisiae published as the result of a European-led effort begun in the mid-1980s. The first genome sequence for an archaeon, Methanococcus jannaschii, was completed in 1996, again by The Institute for Genomic Research.

The development of new technologies has made genome sequencing dramatically cheaper and easier, and the number of complete genome sequences is growing rapidly. The US National Institutes of Health maintains one of several comprehensive databases of genomic information.[15] Among the thousands of completed genome sequencing projects include those for rice, a mouse, the plant Arabidopsis thaliana, the puffer fish, and the bacteria E. coli. In December 2013, scientists first sequenced the entire genome of a Neanderthal, an extinct species of humans. The genome was extracted from the toe bone of a 130,000-year-old Neanderthal found in a Siberian cave.[16][17]

New sequencing technologies, such as massive parallel sequencing have also opened up the prospect of personal genome sequencing as a diagnostic tool, as pioneered by Manteia Predictive Medicine. A major step toward that goal was the completion in 2007 of the full genome of James D. Watson, one of the co-discoverers of the structure of DNA.[18]

Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome. The Human Genome Project was organized to map and to sequence the human genome. A fundamental step in the project was the release of a detailed genomic map by Jean Weissenbach and his team at the Genoscope in Paris.[19][20]

Reference genome sequences and maps continue to be updated, removing errors and clarifying regions of high allelic complexity.[21] The decreasing cost of genomic mapping has permitted genealogical sites to offer it as a service,[22] to the extent that one may submit one's genome to crowdsourced scientific endeavours such as DNA.LAND at the New York Genome Center,[23] an example both of the economies of scale and of citizen science.[24]

Viral genomes

Viral genomes can be composed of either RNA or DNA. The genomes of RNA viruses can be either single-stranded RNA or double-stranded RNA, and may contain one or more separate RNA molecules (segments: monopartit or multipartit genome). DNA viruses can have either single-stranded or double-stranded genomes. Most DNA virus genomes are composed of a single, linear molecule of DNA, but some are made up of a circular DNA molecule.[25]

Prokaryotic genomes

Prokaryotes and eukaryotes have DNA genomes. Archaea and most bacteria have a single circular chromosome,[26] however, some bacterial species have linear or multiple chromosomes.[27][28] If the DNA is replicated faster than the bacterial cells divide, multiple copies of the chromosome can be present in a single cell, and if the cells divide faster than the DNA can be replicated, multiple replication of the chromosome is initiated before the division occurs, allowing daughter cells to inherit complete genomes and already partially replicated chromosomes. Most prokaryotes have very little repetitive DNA in their genomes.[29] However, some symbiotic bacteria (e.g. Serratia symbiotica) have reduced genomes and a high fraction of pseudogenes: only ~40% of their DNA encodes proteins.[30][31]

Some bacteria have auxiliary genetic material, also part of their genome, which is carried in plasmids. For this, the word genome should not be used as a synonym of chromosome.

Eukaryotic genomes

In a typical human cell, the genome is contained in 22 pairs of autosomes, two sex chromosomes (the female and male variants shown at bottom right), as well as the mitochondrial genome (shown to scale as "MT" at bottom left).

Eukaryotic genomes are composed of one or more linear DNA chromosomes. The number of chromosomes varies widely from Jack jumper ants and an asexual nemotode,[32] which each have only one pair, to a fern species that has 720 pairs.[33] It is surprising the amount of DNA that eukaryotic genomes contain compared to other genomes. The amount is even more than what is necessary for DNA protein-coding and noncoding genes due to the fact that eukaryotic genomes show as much as 64,000-fold variation in their sizes.[34] However, this special characteristic is caused by the presence of repetitive DNA, and transposable elements (TEs).

A typical human cell has two copies of each of 22 autosomes, one inherited from each parent, plus two sex chromosomes, making it diploid. Gametes, such as ova, sperm, spores, and pollen, are haploid, meaning they carry only one copy of each chromosome. In addition to the chromosomes in the nucleus, organelles such as the chloroplasts and mitochondria have their own DNA. Mitochondria are sometimes said to have their own genome often referred to as the "mitochondrial genome". The DNA found within the chloroplast may be referred to as the "plastome". Like the bacteria they originated from, mitochondria and chloroplasts have a circular chromosome.

Unlike prokaryotes where exon-intron organization of protein coding genes exists but is rather exceptional, eukaryotes generally have these features in their genes and their genomes contain variable amounts of repetitive DNA. In mammals and plants, the majority of the genome is composed of repetitive DNA.[35] Genes in eukaryotic genomes can be annotated using FINDER.[36]

Coding sequences

DNA sequences that carry the instructions to make proteins are referred to as coding sequences. The proportion of the genome occupied by coding sequences varies widely. A larger genome does not necessarily contain more genes, and the proportion of non-repetitive DNA decreases along with increasing genome size in complex eukaryotes.