🔬 AlphaFold 2 Unveils Isoform Diversity 🔄 | Protein BLAST: Past vs Future? 🤔 | ANDES: Revolutionizing Gene Set Analysis 🧬🔍
Bioinformer Weekly Roundup
Stay Updated with the Latest in Bioinformatics!
Issue: 25 | Date: 23 February 2024
👋 Welcome to the Bioinformer Weekly Roundup!
In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!
🔬 Featured Research
Using AlphaFold 2, the study examines alternative splicing in 3,000 human genes, revealing structural changes in coils and beta-sheets. It highlights an abundance of loops and exposed residues in these regions, hinting at evolutionary implications. Additionally, it explores potential connections between alternative splicing and the Septin-9 protein, as well as Tau mutations linked to Alzheimer's disease, shedding light on their role in human disease.
This study presents a Deep Learning method, "Dfold", for predicting RNA 3D folding structures solely from nucleic acid sequences. By combining Deep Generative Models and Monte Carlo Tree Search, it achieves atom-resolution prediction, showcasing competitive performance in RNA-Puzzles challenges. Notably, "Dfold" achieves these results without relying on structural contact information or additional experimental data, highlighting its effectiveness in blind predictions.
This study proposes WMAGT, a framework combining Graph Transformer Networks and multi-aggregate graph convolutional networks, for predicting drug-disease associations. By integrating these components, WMAGT effectively learns heterogenous information graph representations, leading to accurate predictions. Rigorous validation and comparison with state-of-the-art methods demonstrate WMAGT's potential in drug repositioning and safety research.
This study investigates genetic variants' impact on gene expression, emphasizing alternative splicing considerations. It compares "isoform-aware" methods with traditional approaches, showing the former's superior ability to detect variant-influenced genes. The study also highlights limitations in standard methods, offering insights from real data analyses that may inform large-scale genetic studies like GEUVADIS and GTEx.
This study compares adenine (ABEs) and cytosine (CBEs) base editing outcomes in mammalian cells for endogenous genomic sites and genome-integrated targets, finding differences influenced by factors like epigenetics and transcriptional activity. A deep-learning algorithm, BE_Endo, accurately predicts base editing outcomes. These findings advance base editor applications in research and gene therapy.
🛠️ Latest Tools
A user-friendly GUI application that simplifies converting multiple sequence alignments into a format compatible with the STRUCTURE software. Developed using Tkinter and Biopython, the tool identifies variable sites, converts sequences to binary format, and concatenates them for easy review. It provides an efficient solution for researchers engaged in population structure and genetic analysis, streamlining data formatting while minimizing errors.
CombFold is introduced as an algorithm that uses deep learning predictions to model large protein complexes. It achieves a TM-score >0.7 for 72% of top-10 predictions and enhances structural coverage by 20% compared to Protein Data Bank entries. Notably, it supports high-confidence predictions for complexes from Complex Portal with known stoichiometry but unknown structure, potentially expanding structural coverage beyond monomeric proteins.
"magpie" introduces a statistical power assessment tool for epitranscriptome studies with m6A sequencing data. It aids in experimental design decisions, considering factors like sample size and statistical power. Utilizing simulation-based approaches and real pilot data, "magpie" evaluates key parameters including sequencing depth and effect size, offering a flexible and comprehensive solution for researchers in this field.
CytoPipeline and CytoPipelineGUI are R packages for constructing, comparing, and evaluating flow cytometry data pre-processing pipelines. They address challenges in automated analysis due to increased data dimensionality, offering benchmarking capabilities and visual assessment techniques. Overall, these packages aim to enhance productivity and provide intuitive tools for evaluating pre-processing pipelines.
High-throughput measurement of cis-regulatory activity helps decode expression values using Proformer, an end-to-end transformer encoder architecture. Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data.
InSilicoSeq 2.0 is a Python-based tool for simulating realistic Illumina-like sequencing reads, including amplicon-based sequencing. It offers pre-made error models for multiple platforms and customizable options, enhancing efficiency. Demonstrations show similarity to actual sequencing data, advocating for pre-sequencing experimental design testing through simulations.
Source-code is freely available under the MIT license here.
SNVstory introduces a method for objectively inferring sub-continental genetic ancestry, addressing limitations of self-reported ancestry reliance. Leveraging machine learning models and unique feature-importance schemes, it shows high accuracy across 36 populations in clinical exome sequencing data evaluation. SNVstory represents an advancement in ancestry assignment methods, enhancing the potential for ancestry-informed healthcare decisions.
SNVstory is packaged as a Docker container for enhanced reliability and interoperability. It can be accessed from here.
Embedding methods compress complex data into lower-dimensional spaces, with gene embeddings capturing gene relationships effectively. ANDES proposes a best-match approach for comparing gene sets in embedding spaces, enhancing gene set enrichment analysis. It also eases knowledge transfer across organisms.
Recommended by LinkedIn
📰 Community News
Scientists have discovered tiny bits of RNA — even smaller than viruses — that colonize the bacteria inside human guts and mouths1. Too minimal to be considered a standard life form, these scraps of genetic material are among the smallest known elements to transfer information that can be read by a cell, and the sequences that they encode are new to science.
A recent study published in Cell Research; Shi et al. investigates gene expression changes during human spinal cord development. They introduce a new spatial transcriptomics method focused on transcription factors. The study reveals key features of spinal cord development, offering insights for future research in this area.
The emergence of protein structure search tools like AlphaFold prompts discussion on their potential to replace protein sequence search methods such as BLAST. This summary explores the promises of structure search for remote homology detection and suggests incorporating structural information into protein BLAST, while keeping a neutral stance on their comparative utility.
📅 Upcoming Events
Boost your skills in scRNA-seq analysis! Discover scRNA-seq data analysis in this educational webinar! Learn to explore tissue heterogeneity, cell types, and pathogenic mechanisms. Navigate studies with QIAGEN Omicsoft Single-Cell Lands, analyse data using t-SNE, UMAP, and Violin plots, and identify key pathways with QIAGEN Ingenuity Pathway Analysis (IPA).
Join this webinar for an introduction to metagenomics and modern microbial-omics studies. Explore topics like microbial composition and functional gene analysis. Suitable for all levels of interest, the webinar is part of a series on microbial ecosystems' impact, covering data interpretation and database navigation.
The webinar introduces UniProt, focusing on resources for proteomics scientists and users with large protein datasets. Topics include the proteomes service, ID mappings, and peptide searches, along with the UniProt Proteins API for accessing data alongside genomic, proteomics, and variation data. Suitable for scientists without programming knowledge. Join and unlock the mysteries of UniProt!
This course offers a refresher on the basics of statistical analysis, targeting scientists of all levels. It emphasizes understanding the principles of statistical testing, selecting, and executing the most suitable test for your data, and interpreting the results. Ideal for individuals whose formal education covered statistics but haven't used it recently.
📚 Educational Corner
This blog compares trimming in next-generation sequencing (NGS) to a sculptor's meticulous work. It stresses the need for precision to ensure data accuracy and quality, highlighting the risks of over trimming and under trimming. Overall, it underscores the importance of careful trimming for optimal NGS data analysis.
This article guides readers through differential gene expression analysis using the limma package. It provides a step-by-step approach to analysing gene expression data, offering detailed instructions, explanations and offering valuable insights into the analytical process.
This course teaches essential NGS bioinformatics skills: quality control, read mapping, visualization, and DNA variant analysis. Participants will learn NGS technology, algorithms, and data formats, and how to use bioinformatics tools for sequencing data. Ideal for beginners in NGS bioinformatics.
🔗 Connect with Us
Stay connected and engage with us on social media for daily updates, discussions, and more!
📬 Subscribe
Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.
We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!
---------------------------------------------------------------------------------------------
Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.
Contact: bioinformatics@zifornd.com