Saturday, April 29, 2017

Oncodomains: Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005428


So, this caught my eye. Another neat use of the awesome TCGA cancer database...

oncodomains - families of protein domains in which somatic variants from one or more genes containing the same domain form a hotspot.

Oncodomain hotspots are defined as protein domain positions where somatic variants for a specific cancer type occur more frequently than expected by chance


From their methods:

Their results are interesting. I struggle with believing so much of the validity of work looking for specific variants with cancer. We have a long, long way to go to get the statistical power we need to really draw some substantial conclusions. Moreover, as this paper discusses, many mutations are rare somatic variants. If there is anything that modern cancer genomics has shown us, it’s that the mutations are highly heterogeneous. So, this is a cool approach — let’s at broden our specificity a bit, and identify mutations that are significant at the protein domain level. Nice work.

Thursday, April 27, 2017

Research about to resume…

As I mentioned in my other blog, I am about to take my first full year sabbatical. I’m very excited about this. I thoroughly enjoy teaching, but damn, it (along with excessive service contributions) has wiped me out. I’ve lost the time to continue a research focus. So, I’m very much looking forward to being able to just focus on research...


I still believe my general area of expertise is in sequential pattern mining, particularly with large scale data (either with numerosity or dimensionality). Though I originally spent a large amount of time in biological sequence analysis, I have since branched off to word prediction modeling, and more recently, eye tracking data, thanks for some great projects with my students.

More recently, I’ve started embracing deep learning models (yes, I know, I know… who hasn’t?!?!) Whenever I want to learn something, I usually work with a bright undergraduate and we work on a related, motivating project together. We all know that deep learning has made some amazing strides with respect to object identification and recognition in large sets of images. However, I have not seen quite as much use of deep learning in genomic data, so I’m hopeful there may be some opportunities to explore some new approaches there. Don’t get me wrong, it has indeed been done! After all, let’s not forget that neural nets in general made some huge strides decades ago with biological sequence processing, particularly with secondary structure prediction models (thanks in large part to the early work of Burkhard Rost and Chris Sander in the late 80s and 90s.) So, it’s not new, per se. The part of deep learning that has me most intrigued is in the visualization of deep learning models. Most of us who are investigating deep learning have seen dozens of examples of very cool visualizations, mostly showing how the different layers learn increasingly more complex discriminatory features in the images the further down the learning model you go. For example:


I want to know what has been done recently with deep learning to help those who are investigating the extraction of interesting patterns from biological sequence data. In particular, what ways can we visualize the intermediate layers in a deep learning model that is meaningful for sequential data?

More another time...

Monday, February 01, 2016

Github: compomics-utilities: an open-source Java library for computational proteomics

https://github.com/compomics/compomics-utilities

This is a pretty cool project for visualizing spectra and chromatograms and objects for representing peptides and proteins etc.

If I ever get back into proteomics, this will be a must...

Wednesday, August 04, 2010

Monday, June 14, 2010

A Decade Later, Human Genome Project Yields Few New Cures - NYTimes.com

A Decade Later, Human Genome Project Yields Few New Cures - NYTimes.com

Who would have thought that sifting through thousands of 3-billion character strings from each human for statistically significant variations related to a wide range of various diseases would have been difficult! We've come a long way in suggesting some mutations that are related to disease. I think that it is wonderful to be able to inform people that they may have certain genetic propensities to various ailments in their future. I think it's good to have your head up and be aware of these possibilities down the road, particular for some cancers that are so difficult to catch early in its onset.

Its one thing to be able to detect known mutations. That's simple. It's substantially more difficult to be able to detect new mutations yet discovered. And, it is infinitely more difficult to figure out how to use the genome to solve these ailments! This is a great field to be in. There are so many opportunities ahead to do some great things.

Finally, on a related note, I want to see the data inflow STOP, and give us data miners a chance to try and understand the enormous amount of data we already have. Just my opinion.


Sunday, May 09, 2010

Going from where to why--interpretable prediction of protein subcellular localization -- Briesemeister et al. 26 (9): 1232 -- Bioinformatics

Going from where to why--interpretable prediction of protein subcellular localization -- Briesemeister et al. 26 (9): 1232 -- Bioinformatics

Abstract | Predicting conserved protein motifs with Sub-HMMs

Abstract | Predicting conserved protein motifs with Sub-HMMs

At NIH? You’ve got tutorials! | The OpenHelix Blog

At NIH? You’ve got tutorials! | The OpenHelix Blog

Genome Biology | Full text | Bioconductor: open software development for computational biology and bioinformatics

Genome Biology | Full text | Bioconductor: open software development for computational biology and bioinformatics

Genome Biology | Full text | Searching for SNPs with cloud computing

Genome Biology | Full text | Searching for SNPs with cloud computing

Genome Biology | Full text | The case for cloud computing in genome informatics

Genome Biology | Full text | The case for cloud computing in genome informatics

Wednesday, April 07, 2010