Saturday, April 29, 2017

Oncodomains: Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005428


So, this caught my eye. Another neat use of the awesome TCGA cancer database...

oncodomains - families of protein domains in which somatic variants from one or more genes containing the same domain form a hotspot.

Oncodomain hotspots are defined as protein domain positions where somatic variants for a specific cancer type occur more frequently than expected by chance


From their methods:

Their results are interesting. I struggle with believing so much of the validity of work looking for specific variants with cancer. We have a long, long way to go to get the statistical power we need to really draw some substantial conclusions. Moreover, as this paper discusses, many mutations are rare somatic variants. If there is anything that modern cancer genomics has shown us, it’s that the mutations are highly heterogeneous. So, this is a cool approach — let’s at broden our specificity a bit, and identify mutations that are significant at the protein domain level. Nice work.

Thursday, April 27, 2017

Research about to resume…

As I mentioned in my other blog, I am about to take my first full year sabbatical. I’m very excited about this. I thoroughly enjoy teaching, but damn, it (along with excessive service contributions) has wiped me out. I’ve lost the time to continue a research focus. So, I’m very much looking forward to being able to just focus on research...


I still believe my general area of expertise is in sequential pattern mining, particularly with large scale data (either with numerosity or dimensionality). Though I originally spent a large amount of time in biological sequence analysis, I have since branched off to word prediction modeling, and more recently, eye tracking data, thanks for some great projects with my students.

More recently, I’ve started embracing deep learning models (yes, I know, I know… who hasn’t?!?!) Whenever I want to learn something, I usually work with a bright undergraduate and we work on a related, motivating project together. We all know that deep learning has made some amazing strides with respect to object identification and recognition in large sets of images. However, I have not seen quite as much use of deep learning in genomic data, so I’m hopeful there may be some opportunities to explore some new approaches there. Don’t get me wrong, it has indeed been done! After all, let’s not forget that neural nets in general made some huge strides decades ago with biological sequence processing, particularly with secondary structure prediction models (thanks in large part to the early work of Burkhard Rost and Chris Sander in the late 80s and 90s.) So, it’s not new, per se. The part of deep learning that has me most intrigued is in the visualization of deep learning models. Most of us who are investigating deep learning have seen dozens of examples of very cool visualizations, mostly showing how the different layers learn increasingly more complex discriminatory features in the images the further down the learning model you go. For example:


I want to know what has been done recently with deep learning to help those who are investigating the extraction of interesting patterns from biological sequence data. In particular, what ways can we visualize the intermediate layers in a deep learning model that is meaningful for sequential data?

More another time...