As much progress as can be made on Rappel in a jam-packed two month period is complete. Ultimately I was unable to produce a prototype capable of running on the Sunway Taihulight supercomputer. This is unfortunate but developing a full read mapping program in less than two months was an extremely ambitious goal, the best kind. A quick rundown on what was accomplished. I developed and wrote a single processing core kmer hashing and matching program in Python that outputs exact kmer matches that is followed by inexact read mapping for the segments of the read that don't map exactly. During this process I learned many new algorithmic concepts such as the Burrows-Wheeler transform and under what situations they applicable. I began programming in the C language and learning all the concepts required to use this new tool. The amount of programming experience I gained in a short time over the summer during this period is difficult to describe. The result is that I can at least feel comfortable in a language that I had never used before this program and I expect to use C (or maybe C++) almost exclusively for my future program development. Following this, I implemented the same read kmer matching algorithm in C, complete with kmer hashing function. The program works as intended and produces read match locations. From here, I developed a parallel version of Rappel. From here I learned about the importance of accurately passing data between processing cores using MPI. This has been an incredible learning experience and I can't see myself developing programs without using some form of parallel function at all times in the future. The utility is simply too powerful to ignore in this age of readily available excess ram and processing cores. While I wasn't able to complete a parallel version of Rappel that could be used on the Sunway Taihulight supercomputer, I learned enough to be confident that I'll be able to implement such a function easily in the future.
Beginning this program with only rudimentary understanding of suffix arrays and read mapping algorithms and finishing with the ability to write in a new programming language, an understanding of many algorithms commonly used in the field of read mapping program development and a beginning/intermediate understanding of passing information between processing cores is a huge step in my progress as a bioinformatician. I learned more in the last two months of concentrated study, research and implementation that I have in an entire semester of program development. Granted, the previous semester was focused on program development focused on different goals. Still, this has been an exceptional experience. The functionality of Rappel is undeniable at this point. With another month or two of development, I could produce a program that utilizes any number of cores, allocates the appropriate amount of data to each core before performing a very efficient read mapping algorithm. This is incredibly exciting and I look forward to using this experience to produce chapter two of my dissertation. Additionally, this experience has further instilled the knowledge that I can learn any computer science concept I deem necessary for my research by simply putting in the requisite time necessary. This is an exceptionally invigorating feeling at this point in my PhD program.
Comentarios