Thursday, April 27, 2017

Research about to resume…

As I mentioned in my other blog, I am about to take my first full year sabbatical. I’m very excited about this. I thoroughly enjoy teaching, but damn, it (along with excessive service contributions) has wiped me out. I’ve lost the time to continue a research focus. So, I’m very much looking forward to being able to just focus on research...


I still believe my general area of expertise is in sequential pattern mining, particularly with large scale data (either with numerosity or dimensionality). Though I originally spent a large amount of time in biological sequence analysis, I have since branched off to word prediction modeling, and more recently, eye tracking data, thanks for some great projects with my students.

More recently, I’ve started embracing deep learning models (yes, I know, I know… who hasn’t?!?!) Whenever I want to learn something, I usually work with a bright undergraduate and we work on a related, motivating project together. We all know that deep learning has made some amazing strides with respect to object identification and recognition in large sets of images. However, I have not seen quite as much use of deep learning in genomic data, so I’m hopeful there may be some opportunities to explore some new approaches there. Don’t get me wrong, it has indeed been done! After all, let’s not forget that neural nets in general made some huge strides decades ago with biological sequence processing, particularly with secondary structure prediction models (thanks in large part to the early work of Burkhard Rost and Chris Sander in the late 80s and 90s.) So, it’s not new, per se. The part of deep learning that has me most intrigued is in the visualization of deep learning models. Most of us who are investigating deep learning have seen dozens of examples of very cool visualizations, mostly showing how the different layers learn increasingly more complex discriminatory features in the images the further down the learning model you go. For example:


I want to know what has been done recently with deep learning to help those who are investigating the extraction of interesting patterns from biological sequence data. In particular, what ways can we visualize the intermediate layers in a deep learning model that is meaningful for sequential data?

More another time...