Using grammars to analyze the relation between protein folding and sequence in low complexity regions
PhD Candidate: Mariane Gonçalves
Supervisors: M. Andrade (Biology) & F. Schmid (Physics)
Regions with low amino acids diversity in proteins, known as low complexity regions (LCRs), do not follow the rules that define the secondary structure for globular, unbiased regions. For this reason, studying their structure and biological functions is difficult. Increasing evidence suggests that some LCRs may facilitate the adoption of structure in flexible regions of the protein, called IDRs. This project targets these flexible regions with LCRs in the human proteome and seeks the structural properties and function of similar proteins, already studied and deposited in the Protein DataBank (PDB). Mariane was able to extract the best candidates and now is preparing a manuscript with her recent findings on the structural characteristics of these similar sequences, with basic rules and patterns to support structure in the IDR regions.
The next step will be to apply these rules as constraints for a toy physical model to analyze the statistical properties of the resulting heteropolymers. As the targets are much shorter than complete protein sequences and directed to target specific pre-explored patterns, she expects to optimize processing times and to direct efforts towards effective and biological relevant targets.
You can download the original proposal here.
Understanding specific regulation by disordered biomolecules and liquid-liquid phase separation by multi-scale simulations
PhD Candidate: Arya Changiarath
Supervisors: L. Stelzl (Biophysics), F. Schmid (Physics)
This project aims to understand specific regulation by disordered biomolecules and liquid-liquid phase separation by multi-scale simulations, combining biology and physics. Liquid-liquid phase separation plays an important role in the formation of localized nuclear hubs of RNAP II during transcription processes. Recent experimental studies revealed that Carboxy terminal domain (CTD), the largest subunit of RNAP II, is a low complexity domain, and has a very strong tendency to phase separate. This project mainly focuses on understanding the molecular basis of phase separation of CTD using multiscale molecular dynamics simulation methods. CTD is conserved in eukaryotes with the repeats of heptapeptide sequences. However, there are small differences in CTD sequences of different species. Arya investigated how the CTD phase separation is affected by such differences in CTD sequences using coarse grained molecular dynamic simulations based on the Hydropathy scale model (HPS). Initial results indicate that deviation from ideal heptapeptide sequence have less tendency to phase separate. Also, the effects of temperature on CTD phase behavior and the influence of polymer length on critical temperature are as expected.
In addition, Arya is looking at other factors such as phosphorylation of CTD and the presence of other biomolecules that can influence CTD phase behavior and regulate gene transcription. Simulating phosphorylated CTD using coarse grained models indicates that phosphorylation prevents phase separation, as the negatively charged phosphate groups repel each other. She is also exploring how the presence of other biomolecules such as transcription factors affect the phase behavior, and their roles in different stages of the transcription process. To explore more on this, she studied the phase behavior of CTD and phosphorylated CTD in the presence of HRD and the results show that they co-phase separate into a large cluster, but do not mix, which may help to physically distinguish between the initiation and elongation stages of transcription. A precise understanding of molecular basis of interactions that leads to phase separation could be possible by employing atomistic simulations and these simulations will, in turn, lead to improved coarse-grained simulation models. The current task is to simulate two chains of CTD using gromacs and study the inter and intra residual interactions. In collaboration with Prof. Markus Zweckstetter (Göttingen), Arya is studying the possible structure of human and yeast CTD by performing atomistic simulations integrating experimental information where possible. The atomistic simulations have highlighted interesting features to study in further coarse-grained simulations.
You can download the original proposal here.
Chiral structures across scales
PhD Candidate: Stanislav Sys
Supervisors: S. Berger (Biology), K. Everschor-Sitte (Physics)
DNA is the main carrier of heritable information in living organisms. As such, it plays a crucial role in human development, disease pathogenesis and many other vital processes. While we can sequence DNA on single-nucleotide level, the analysis of the three-dimensional structure and folding mechanisms of the chromatin fiber is until today a challenging task. Current experimental methodologies to uncover the folding sate of chromatin as Hi-C or pore-C involve complicated, expensive experimental procedures. The main goal of this project is to provide an alternative computational approach to predict the chromatin folding mechanisms rather than having to perform expensive experiments. To achieve this, there are different milestones to achieve: on the one hand, to determine which parameters need to be included and how to construct a biologically meaningful dataset allowing us to predict the folding mechanisms based on the sequence combined with other publicly available meta information as, e.g., genetic linkage between loci and methylation states. On the other hand, a sufficiently accurate model has to be developed, which integrates all these different multi-scale information layers. One big hurdle was to understand and model the uncertainty of high-throughput sequencing technologies, a task assessed in the early stages of the project and published in 2021 (doi: 10.1186/s12864-020-07362-8). The next step was to tackle uncertainty and biases in experimental procedures by developing and benchmarking different machine learning methodologies for experimental data containing a certain degree of bias and mislabeling (doi: 10.3389/frai.2021.739432; and a draft in revision in Methods in Ecology and Evolution). Currently, Stanislav focuses on data mining, and he develops and implements a type of information maximizing Generative Adversarial Neural Network (infoGAN) to integrate different information layers as control variables to directly forecast chromatin folding in specific cell lines. This approach aims to assure the extensibility of the model, being able to work with different data types on multiple scales.
You can download the original proposal here.