Bioinformatics
Adapted from Wikipedia · Adventurer experience
Bioinformatics is a special science that combines many different areas of learning. It helps scientists understand lots of information about living things, especially when there is too much data to handle by hand. This field uses ideas from biology, chemistry, physics, computer science, and math to study and explain biological data.
People use bioinformatics to study the genes inside living things and learn how they work. This can help us learn about diseases, why some plants or animals are special, or what makes different groups of people unique. Bioinformatics also studies proteins, which are tiny parts of our bodies that do important jobs.
It helps scientists read and organize large sets of information, like the instructions inside our cells. By doing this, bioinformatics helps us understand how living things grow, change, and stay healthy. It even helps us see how small parts of our bodies work together in complex ways.
History
The term bioinformatics was first used in 1970 by Paulien Hogeweg and Ben Hesper. It describes the study of information in living systems, similar to how biochemistry studies chemical processes in living things.
The field grew quickly in the mid-1990s, helped by the Human Genome Project and new technology for reading DNA. To understand biological data like DNA and protein sequences, scientists use special computer programs. These programs use ideas from many areas of science and math.
Since the Human Genome Project, sequencing has become faster and cheaper. Some labs can now read a huge amount of DNA each year, and a full set of DNA can be read for $1,000 or less.
Computers became very important when scientists started sharing protein sequences in the 1950s. Comparing these sequences by hand was too hard, so scientists created databases and ways to compare them easily. Early leaders in this field helped build the first databases and methods to study these sequences.
Goals
Bioinformatics helps us learn how cells work, especially when something goes wrong and makes people sick. Scientists study lots of information about living things. Bioinformatics uses computers and math to make this easier.
One big goal of bioinformatics is to understand how living things work. It creates programs and tools to manage big amounts of data. For example, scientists use these tools to find genes in DNA, guess what proteins will look like, and see how pieces of DNA are related. Other tasks include designing medicines, learning how genes work together, and studying how cells grow and change.
Sequence analysis
Main articles: Sequence alignment, Sequence database, and Alignment-free sequence analysis
Since 1977, scientists have found the DNA codes of thousands of living things and stored them in databases. These codes help us learn about genes that make proteins and other important parts of life. By comparing genes from the same or different species, we can discover how proteins work or how species are related.
Because there is so much data, we use computer programs like BLAST to search through the codes. These programs help scientists find useful information from many different organisms.
DNA sequencing
Main article: DNA sequencing
Before we can study DNA, we need to get it from a database like GenBank. Reading DNA codes is tricky because the data can be messy. Special computer methods help make sense of this data.
Sequence assembly
Main article: Sequence assembly
Most methods for reading DNA give us short pieces of the code that need to be put together to make complete sets of genes or genomes. One common way to do this is called shotgun sequencing, where many tiny pieces of DNA are read. These pieces overlap each other, and computer programs help line them up to rebuild the full genome. This method is quick but can be hard for very large genomes.
See also: sequence analysis, sequence mining, sequence profiling tool, and sequence motif
Genome annotation
Main article: Gene prediction
In genomics, annotation means marking the start and end points of genes and other features in a DNA sequence. Many genomes are too big to annotate by hand, and this has become a big challenge in bioinformatics.
Genome annotation can be done at three levels: looking at the DNA building blocks, the proteins they make, and the processes they are part of. Finding genes is a big part of the DNA level. For complicated genomes, scientists use both computer predictions and comparisons with other organisms to find genes. At the protein level, the goal is to figure out what each protein does. Databases of protein codes help with this, but many proteins in new genomes still have unknown functions.
Understanding how genes and proteins work together in cells and organisms is the aim of the process level. The first full system for describing a genome was created in 1995 for a type of bacteria. After the Human Genome Project ended in 2003, the ENCODE project began, using new technologies to find more details about the human genome.
Gene function prediction
While genome annotation often looks at how genes are similar, other features of genes can also help predict their functions. Scientists also use information about when genes are active and how proteins are shaped to understand gene functions better.
Computational evolutionary biology
Further information: Computational phylogenetics
Evolutionary biology studies how species change over time. Computers help scientists in this field by allowing them to:
- follow the changes in many organisms by looking at their DNA,
- compare whole genomes to study complex events like gene duplication and transfer between species,
- create models to predict how populations will change over time,
- keep track of information about many species
Comparative genomics
Main article: Comparative genomics
Comparative genomics looks at the differences and similarities between genomes of different organisms. Scientists make maps to see how genomes have changed over time. Studying these changes helps scientists understand how life has developed and changed.
Pan genomics
Main article: Pan-genome
Pan genomics is a way to look at all the genes in a group of related organisms. It includes a core set of genes found in every organism and a flexible set that varies between them. Tools like BPGA can help study these gene sets in bacteria.
Genetics of disease
Main article: Genome-wide association studies
With new technologies, scientists can now find the causes of many human disorders. Some diseases follow simple patterns passed down in families, while others are more complex. Studies have found many small pieces of DNA linked to diseases like breast cancer and Alzheimer’s.
Analysis of mutations in cancer
Main article: Oncogenomics
In cancer, the DNA of affected cells changes in many ways. Scientists use special tools to find changes in the number of DNA pieces and to look for small changes that cause cancer. These methods create huge amounts of data, which can be messy, so scientists use computer models to find real changes in DNA copies.
Two key ideas help identify cancer through DNA changes. First, cancer happens because of changes that build up in genes. Second, some of these changes drive cancer growth.
Better bioinformatics tools could help classify cancer types by looking at these DNA changes. Scientists are also studying common damage found in many tumors to learn more about cancer.
Gene and protein expression
Analysis of gene expression
We can learn which genes are active by measuring tiny messages called mRNA using tools like microarrays, expressed cDNA sequence tag, serial analysis of gene expression, massively parallel signature sequencing, and RNA-Seq. These tools help scientists study many genes at once. Researchers use computer programs to understand the data and find which genes are involved in different conditions.
Analysis of protein expression
Tools like protein microarrays and mass spectrometry let scientists see which proteins are present in a sample. Scientists can also learn where proteins are located in tissues using special staining techniques and tissue microarrays.
Analysis of regulation
Gene regulation is how the body controls which genes are active. Signals like hormones can turn genes on or off. Bioinformatics helps study how genes are controlled, for example by looking at parts of DNA near genes called sequence motifs that affect how much mRNA is made. Scientists also study how distant parts of DNA called enhancer elements influence genes. By comparing data from different conditions, scientists can find groups of genes that act together.
Analysis of cellular organization
Scientists have ways to study where important parts like tiny parts of cells, genes, and proteins are located inside cells. They use a special group called "cellular component" to organize this information.
Microscopes help us see where tiny parts of cells and molecules are. This can show us clues about diseases.
Knowing where proteins are helps us guess what they do. For example, proteins in the nucleus might help control genes, while proteins in mitochondria might help with energy production. There are tools and databases to help figure out where proteins are located.
Main article: Nuclear organization
Experiments like Hi-C and ChIA-PET give us details about how DNA is arranged inside the nucleus. This helps us understand how parts of the genome are grouped together.
Structural bioinformatics
Main articles: Structural bioinformatics and Protein structure prediction
See also: Structural motif and Structural domain
Finding the shape of proteins is an important part of bioinformatics. There is a contest called the Critical Assessment of Protein Structure Prediction (CASP) where research teams try to guess the shapes of unknown proteins.
Proteins are made from building blocks called amino acids. The order of these amino acids is called the primary structure. We can find this order from DNA. In most proteins, this order decides the protein's 3D shape, which helps it do its job. Hemoglobin, for example, carries oxygen in both humans and some plants. Even though these hemoglobins have different amino acid orders, their shapes are similar because they share the same job and ancestor.
Other ways to guess protein shapes include using shapes from related proteins and physics-based modeling. In 2021, a tool called AlphaFold, made by Google's DeepMind, became much better at guessing protein shapes. It has predicted shapes for many proteins.
Network and systems biology
Main articles: Computational systems biology, Biological network, and Interactome
Network analysis helps us learn how different parts in living things are connected. This includes how proteins work together or how genes interact. These connections can be between many types of molecules, such as proteins and small chemicals, all working together.
Systems biology uses computer simulations to study small parts of cells. This includes how chemicals move and change, or how genes turn on and off. This helps scientists see how everything in a cell is connected. Some scientists even use computers to create simple artificial life to better understand how real life works.
Molecular interaction networks
Main articles: Protein–protein interaction prediction and interactome
Scientists have discovered the shapes of many proteins using special tools. One big question is whether we can predict how proteins will interact just by looking at their shapes, without doing experiments. There are many ways to try to solve this, but there is still more work to do.
Other important interactions include how proteins bind to small molecules or other tiny pieces. By simulating how atoms move, scientists use special computer programs to study these interactions.
Biodiversity informatics
Main article: Biodiversity informatics
Biodiversity informatics helps us learn about different plants, animals, and tiny living things, like those in microbiome data. Scientists use this information to understand how these species are related and where they might live. They can even identify them using parts of their DNA. This field also studies how living things are affected by big changes in the world, like climate change.
Others
Literature analysis
Main articles: Text mining and Biomedical text mining
There is so much scientific writing that one person cannot read it all. Literature analysis uses computers to help find important information. This can include recognizing biological terms, finding names of genes, and figuring out which proteins work together.
High-throughput image analysis
Computers help us study many medical pictures. These tools make analysis more accurate, fair, and fast. They are used for diagnosing diseases and for research. Examples include measuring tiny parts inside cells, studying shapes and sizes, and watching how air moves in animals' lungs.
High-throughput single cell data analysis
Main article: Flow cytometry bioinformatics
Computers study data from single cells, like information from flow cytometry. These methods help find groups of cells that are important for understanding diseases or experiments.
Ontologies and data integration
Biological ontologies are special lists that help organize biological ideas so computers can study them easily. The OBO Foundry worked to make some of these lists standard. One well-known list is the Gene ontology, which describes what genes do. There are also lists for describing traits of living things.
Databases
Main articles: List of biological databases and Biological database
Databases help scientists study living things using computers. They store information about DNA, proteins, and other parts of life. Some databases have real data from experiments, while others use data to make new ideas.
Some well-known databases include:
- For studying DNA and proteins: Genbank, UniProt
- For looking at protein shapes: Protein Data Bank (PDB)
- For finding groups of proteins and special patterns: InterPro, Pfam
- For new ways to read DNA: Sequence Read Archive
- For studying how parts of cells work together: Metabolic Pathway Databases (KEGG, BioCyc), Interaction Analysis Databases, Functional Networks
- For designing new tiny machines inside cells: GenoCAD [citation needed]
Software and tools
Software tools for bioinformatics help scientists study living things. They provide different ways to work with data, like simple commands, fancy programs, or online services. These tools are made by special companies or public groups.
Many tools are free and open for anyone to use. They have been around since the 1980s. These tools help scientists find new ways to study biology and share their work easily. Some well-known tools include Bioconductor, BioPerl, Biopython, BioJava, and BioJS.
Scientists can also use online services to run experiments and share data across the world. These services make it easier for everyone to access important tools without having to manage complicated software themselves. Some platforms that help scientists organize their work include Galaxy and UGENE.
Education platforms
Bioinformatics can be studied through online courses and special programs, not just in classrooms at universities. Many websites and tools help people learn bioinformatics, such as Rosalind and courses from the Swiss Institute of Bioinformatics Training Portal. The Canadian Bioinformatics Workshops share videos and slides from their workshops for free.
There are also big online classes called MOOCs that give certificates in bioinformatics. Examples include Coursera’s Bioinformatics Specialization at the University of California, San Diego, Genomic Data Science Specialization at Johns Hopkins University, and EdX’s Data Analysis for Life Sciences XSeries at Harvard University. Some projects, like 4273π, use simple computers such as Raspberry Pi to teach these topics to students and adults alike.
Conferences
There are many big meetings where people talk about bioinformatics. Some important ones are the European Conference on Computational Biology (ECCB), Intelligent Systems for Molecular Biology (ISMB), Pacific Symposium on Biocomputing (PSB), and Research in Computational Molecular Biology (RECOMB). These conferences help scientists share their ideas and discoveries.
Images
Related articles
This article is a child-friendly adaptation of the Wikipedia article on Bioinformatics, available under CC BY-SA 4.0.
Images from Wikimedia Commons. Tap any image to view credits and license.
Safekipedia