🔬 AlphaFold 2 Unveils Isoform Diversity 🔄 | Protein BLAST: Past vs Future? 🤔 | ANDES: Revolutionizing Gene Set Analysis 🧬🔍

🔬 AlphaFold 2 Unveils Isoform Diversity 🔄 | Protein BLAST: Past vs Future? 🤔 | ANDES: Revolutionizing Gene Set Analysis 🧬🔍

Bioinformer Weekly Roundup

Stay Updated with the Latest in Bioinformatics!

Issue: 25 | Date: 23 February 2024

👋 Welcome to the Bioinformer Weekly Roundup!

In this newsletter, we curate and bring you the most captivating stories, developments, and breakthroughs from the world of bioinformatics. Whether you're a seasoned researcher, a student, or simply curious about the intersection of biology and data science, we've got you covered. Subscribe now to stay ahead in the exciting realm of bioinformatics!

🔬 Featured Research

Systematic characterization of protein structural features of alternative splicing isoforms using AlphaFold 2 | bioRxiv

Using AlphaFold 2, the study examines alternative splicing in 3,000 human genes, revealing structural changes in coils and beta-sheets. It highlights an abundance of loops and exposed residues in these regions, hinting at evolutionary implications. Additionally, it explores potential connections between alternative splicing and the Septin-9 protein, as well as Tau mutations linked to Alzheimer's disease, shedding light on their role in human disease.

De novo prediction of RNA 3D structures with deep generative models | PLOS ONE

This study presents a Deep Learning method, "Dfold", for predicting RNA 3D folding structures solely from nucleic acid sequences. By combining Deep Generative Models and Monte Carlo Tree Search, it achieves atom-resolution prediction, showcasing competitive performance in RNA-Puzzles challenges. Notably, "Dfold" achieves these results without relying on structural contact information or additional experimental data, highlighting its effectiveness in blind predictions.

Fusing graph transformer with multi-aggregate GCN for enhanced drug–disease associations prediction | BMC Bioinformatics

This study proposes WMAGT, a framework combining Graph Transformer Networks and multi-aggregate graph convolutional networks, for predicting drug-disease associations. By integrating these components, WMAGT effectively learns heterogenous information graph representations, leading to accurate predictions. Rigorous validation and comparison with state-of-the-art methods demonstrate WMAGT's potential in drug repositioning and safety research.

Accounting for isoform expression increases power to identify genetic regulation of gene expression | PLOS ONE COMPUTATIONAL BIOLOGY

This study investigates genetic variants' impact on gene expression, emphasizing alternative splicing considerations. It compares "isoform-aware" methods with traditional approaches, showing the former's superior ability to detect variant-influenced genes. The study also highlights limitations in standard methods, offering insights from real data analyses that may inform large-scale genetic studies like GEUVADIS and GTEx.

Deep learning models incorporating endogenous factors beyond DNA sequences improve the prediction accuracy of base editing outcomes | Nature Cell Discovery  

This study compares adenine (ABEs) and cytosine (CBEs) base editing outcomes in mammalian cells for endogenous genomic sites and genome-integrated targets, finding differences influenced by factors like epigenetics and transcriptional activity. A deep-learning algorithm, BE_Endo, accurately predicts base editing outcomes. These findings advance base editor applications in research and gene therapy.

🛠️ Latest Tools

Fasta2Structure: a user-friendly tool for converting multiple aligned FASTA files to STRUCTURE format | BMC Bioinformatics

A user-friendly GUI application that simplifies converting multiple sequence alignments into a format compatible with the STRUCTURE software. Developed using Tkinter and Biopython, the tool identifies variable sites, converts sequences to binary format, and concatenates them for easy review. It provides an efficient solution for researchers engaged in population structure and genetic analysis, streamlining data formatting while minimizing errors.

CombFold: predicting structures of large protein assemblies using a combinatorial assembly algorithm and AlphaFold2 | Nature Methods

CombFold is introduced as an algorithm that uses deep learning predictions to model large protein complexes. It achieves a TM-score >0.7 for 72% of top-10 predictions and enhances structural coverage by 20% compared to Protein Data Bank entries. Notably, it supports high-confidence predictions for complexes from Complex Portal with known stoichiometry but unknown structure, potentially expanding structural coverage beyond monomeric proteins.

magpie: A power evaluation method for differential RNA methylation analysis in N6-methyladenosine sequencing | PLOS ONE Computational Biology

"magpie" introduces a statistical power assessment tool for epitranscriptome studies with m6A sequencing data. It aids in experimental design decisions, considering factors like sample size and statistical power. Utilizing simulation-based approaches and real pilot data, "magpie" evaluates key parameters including sequencing depth and effect size, offering a flexible and comprehensive solution for researchers in this field.

CytoPipeline and CytoPipelineGUI: a Bioconductor R package suite for building and visualizing automated pre-processing pipelines for flow cytometry data | BMC Bioinformatics

CytoPipeline and CytoPipelineGUI are R packages for constructing, comparing, and evaluating flow cytometry data pre-processing pipelines. They address challenges in automated analysis due to increased data dimensionality, offering benchmarking capabilities and visual assessment techniques. Overall, these packages aim to enhance productivity and provide intuitive tools for evaluating pre-processing pipelines.

Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences | BMC Bioinformatics

High-throughput measurement of cis-regulatory activity helps decode expression values using Proformer, an end-to-end transformer encoder architecture. Proformer introduced multiple expression heads with mask filling to prevent the transformer models from collapsing when training on relatively small amount of data.

InSilicoSeq 2.0: Simulating realistic amplicon-based sequence reads | bioRxiv

InSilicoSeq 2.0 is a Python-based tool for simulating realistic Illumina-like sequencing reads, including amplicon-based sequencing. It offers pre-made error models for multiple platforms and customizable options, enhancing efficiency. Demonstrations show similarity to actual sequencing data, advocating for pre-sequencing experimental design testing through simulations.

Source-code is freely available under the MIT license here.

SNVstory: inferring genetic ancestry from genome sequencing data | BMC Bioinformatics

SNVstory introduces a method for objectively inferring sub-continental genetic ancestry, addressing limitations of self-reported ancestry reliance. Leveraging machine learning models and unique feature-importance schemes, it shows high accuracy across 36 populations in clinical exome sequencing data evaluation. SNVstory represents an advancement in ancestry assignment methods, enhancing the potential for ancestry-informed healthcare decisions.

SNVstory is packaged as a Docker container for enhanced reliability and interoperability. It can be accessed from here.

ANDES: a novel best-match approach for enhancing gene set analysis in embedding spaces | bioRxiv

Embedding methods compress complex data into lower-dimensional spaces, with gene embeddings capturing gene relationships effectively. ANDES proposes a best-match approach for comparing gene sets in embedding spaces, enhancing gene set enrichment analysis. It also eases knowledge transfer across organisms.

📰 Community News

‘Wildly weird’ RNA bits discovered infesting the microbes in our guts | Nature News

Scientists have discovered tiny bits of RNA — even smaller than viruses — that colonize the bacteria inside human guts and mouths1. Too minimal to be considered a standard life form, these scraps of genetic material are among the smallest known elements to transfer information that can be read by a cell, and the sequences that they encode are new to science.

It’s me, hi, I solved the problem, it’s TF-seqFISH | Cell research

A recent study published in Cell Research; Shi et al. investigates gene expression changes during human spinal cord development. They introduce a new spatial transcriptomics method focused on transcription factors. The study reveals key features of spinal cord development, offering insights for future research in this area.

Is Protein BLAST a thing of the past? | Nature Communications

The emergence of protein structure search tools like AlphaFold prompts discussion on their potential to replace protein sequence search methods such as BLAST. This summary explores the promises of structure search for remote homology detection and suggests incorporating structural information into protein BLAST, while keeping a neutral stance on their comparative utility.

📅 Upcoming Events

Public single-cell RNA-seq data investigation using QIAGEN Omicsoft and Ingenuity Pathway Analysis | QIAGEN

Boost your skills in scRNA-seq analysis! Discover scRNA-seq data analysis in this educational webinar! Learn to explore tissue heterogeneity, cell types, and pathogenic mechanisms. Navigate studies with QIAGEN Omicsoft Single-Cell Lands, analyse data using t-SNE, UMAP, and Violin plots, and identify key pathways with QIAGEN Ingenuity Pathway Analysis (IPA).

Exploring metagenomes to assess microbiomes across the globe | EMBL-EBI

Join this webinar for an introduction to metagenomics and modern microbial-omics studies. Explore topics like microbial composition and functional gene analysis. Suitable for all levels of interest, the webinar is part of a series on microbial ecosystems' impact, covering data interpretation and database navigation.

UniProt for Proteomics Scientists | EMBL-EBI

The webinar introduces UniProt, focusing on resources for proteomics scientists and users with large protein datasets. Topics include the proteomes service, ID mappings, and peptide searches, along with the UniProt Proteins API for accessing data alongside genomic, proteomics, and variation data. Suitable for scientists without programming knowledge. Join and unlock the mysteries of UniProt!

Bioinformatics training course in Statistical Analysis of Biological Data in R | University of Sheffield

This course offers a refresher on the basics of statistical analysis, targeting scientists of all levels. It emphasizes understanding the principles of statistical testing, selecting, and executing the most suitable test for your data, and interpreting the results. Ideal for individuals whose formal education covered statistics but haven't used it recently.

📚 Educational Corner

Understanding Trimming Accuracy in NGS: A Closer Look at Precision and Tool Performance | Adina Nadeem

This blog compares trimming in next-generation sequencing (NGS) to a sculptor's meticulous work. It stresses the need for precision to ensure data accuracy and quality, highlighting the risks of over trimming and under trimming. Overall, it underscores the importance of careful trimming for optimal NGS data analysis.

Differential gene expression analysis using Limma-step by step | Data Goat

This article guides readers through differential gene expression analysis using the limma package. It provides a step-by-step approach to analysing gene expression data, offering detailed instructions, explanations and offering valuable insights into the analytical process.

Online Course - A Practical Introduction to NGS Data Analysis | ECSEQ Bioinformatics

This course teaches essential NGS bioinformatics skills: quality control, read mapping, visualization, and DNA variant analysis. Participants will learn NGS technology, algorithms, and data formats, and how to use bioinformatics tools for sequencing data. Ideal for beginners in NGS bioinformatics.

🔗 Connect with Us

Stay connected and engage with us on social media for daily updates, discussions, and more!

 

📬 Subscribe

Don't miss an issue! Subscribe to the Bioinformer Weekly Roundup and receive the latest insights directly in your inbox.

Subscribe Now

We hope you enjoyed this week's edition of the Bioinformer Weekly Roundup. Feel free to share it with your colleagues and friends who share your passion for bioinformatics!

---------------------------------------------------------------------------------------------

Disclaimer: The information provided in this newsletter is for educational and informational purposes only and does not constitute professional advice.

Contact: bioinformatics@zifornd.com

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics