Elijah 1, 14 14 silver badges 21 21 bronze badges. As for the number of atoms, this depends on the composition. A and T are smaller molecules than G and C. The structure of the molecule is the beef, though, not its atomic composition, so this isn't really a very useful calculation. For what it's worth, e. See also biostars. Except for users slayton, Paul Amstrong and rauchen all other answers given are dead wrong in its essence or far from complete. In the answers user fail to mentioned compression methods or is poorly explained.
See my answer to clarify the 4 times downsizing of the genome as seen in many answers. I'm voting to close this question as off-topic because it is off-topic here, should be on bioinformatics. Vote to reopen because this is definitely not opinion based — Jonathan. Show 1 more comment.
Active Oldest Votes. Improve this answer. Oliver Charlesworth Oliver Charlesworth k 30 30 gold badges silver badges bronze badges. Just to add some biological commentary, "haploid" here means only one copy of each chromosome. The human reference assembly is haploid and a mosaic of multiple people. An actual individual genome will be diploid 2 copies of each chromosome, except X and Y but again only variant between the two copies at a small subset of sites.
Thought about it for a day, and realized this: If you stored some base case human DNA, any subsequent human's DNA would only need to be stored as the diff between it and the base case. For same sex examples DNA is And across sexes it's like Also worth to remember that not all information encoded within DNA base pairs there is also epigenetic information.
Show 4 more comments. You do not store all the DNA in one stream, rather most the time it is store by chromosomes. A large chromosome take about MB and a small one about 50 MB. Edit: I think the first reason why it is not saved in 2 bits per base pair is that it would cause an hurdle to work with the data. Another point is that the data isn't as simple as you get told. Community Bot 1 1 1 silver badge. However, I have no clue what does "large" or "small" chromosome mean?
These numbers don't tally with what Wikipedia says see the table at en. It looks like he is quoting Mbp million of base-pairs, each base-pair being a single position in the genome rather than MB which can assume a 2-bit encoding of each position — Alex Stoddard.
Some of a genome's DNA methylation changes over the lifetime of the organism. Including DNA methylation data for a human genome would be more like a detailed snapshot of a person at a particular moment, rather than a generic description of the individual.
Although, the OP didn't specify which they wanted. Why would you store the whole thing for every individual? Add a comment. Paul Armstrong Paul Armstrong 6, 1 1 gold badge 19 19 silver badges 34 34 bronze badges. Realistically, more than 2 bits are required, as there are other bases stored in sequence information N , for example, where data is not mappable and therefore unknown.
The IUPAC nucleotide codes include more than the standard four, and this can increase storage overhead. AlexReynolds o0' bioinformatics. If we could read a genome perfectly, it would be just 2 bits per base. The X chromosome is single for females. Males have as extra the Y chrom. You use binary, so your number is lower. Show 5 more comments. ZF ZF 3, 8 8 gold badges 30 30 silver badges 45 45 bronze badges.
You can as well store it as a pictire or audio recording, or even video - and it will take terabates to store. But this is not required and minimal , as it was asked. I'm missing the point you try to make It is not a whole human genome in the real world—the genome that is necessary to make a human being and that is found inside the cells of the body. That human genome—the real, physical human genome—is diploid; in other words, it has two pairs of each chromosome—one from each parent.
Everyone has a chromosome 1 from mom and another from dad; and a chromosome 2 from mom and another from dad; and so forth.
The total is 46 chromosomes, or two pairs of Thus, 6. Note: two paired bases in the double helix provide redundant information and count as one unit. Learn more about whole genome sequencing from our Cinema Veritas series, or by reading about our myGenome product.
Want to be up to date with our adventures? Sign up for our newsletter and join the conversation. We are a passionate and dynamic group of people driving accessibility to genomics across the globe. Learn Featured Webinars Learn how to access resources associated with human sequence variations and phenotypes associated with specific human genes and phenotypes.
RefSeq biocurators focus on data curation for eukaryotic organisms, including several aspects of manual curation like sequence analysis, functional annotation, data validation and community collaboration. It runs on your local machine. Genome Remapping Service A tool that makes remapping features and annotations simple and straightforward.
0コメント