Public Health Research into Action: Machine Learning: Train to Win

Machine Learning is a disciplinary which combines modern statistics and computer science to achieve artificial intelligence with a stress on obtaining knowledge from a large amount of data. As is defined through the term itself, machine learning study aims to train “machines” to “learn”. Here the machines target only on computing tools ranging from a most basic Mac or PC to network servers such as supercomputers. And when it comes to the process of learning, it is a knowledge discovery endeavor. Conventionally, learning is to improve performance in recognizing patterns, summarizing rules and providing predictions after substantial trainings. When translated into the area of computer intelligence, strictly defined learning can be interpreted as: “the enhancement in performance under corresponding measurements (M) toward certain tasks (T) after trained by experiences (E) of a certain computer program” (Tom Mitchell, 1997).

The abstraction of measurements (M), tasks (T) and experiences (E) could be better illustrated in real world cases. One of the best paradigms is the search engines such as Google, Bing and Yahoo. For these web based searching service providers, the initial stages of the searching products are to provide more accurate results. That is, the items the users received after typing in the key words should be more relevant, informative and inspiring. At the same time, the junk information such as flooded commercial advertisements should be filtered out by the searching services providers. Actually, these basic requirements are measurements (M) for the task (T) of discovering useful information. However, only based on the most fundamental features, we cannot regard the searching engines as machine learning advocators. As the development of machine learning algorithms, nowadays search providers could provide customized search results based on individuals’ searching history. In that, individualized searching history is the experiences (E) used to train the search engines. After the training processes, personalized searching service, which means more advanced tasks for the computing tools, could be provided under much more complicated measurements, the standards of which differ dramatically from person to person.

Apart from the personalized searching services, the burgeoning development in the area of machine learning has benefited various fields both in academia and industry. One perfect commercial success is Netflix. Netflix is a website providing a combined service of online video streaming and DVD rental for internet users. By employing efficient machine learning algorithms, Netflix could recommend new movies or TV series for its users based on their browsing and rental history (i.e., training experiences for the algorithm). Most importantly, the superior performance of their recommendations could match the users’ taste so that help increasing the amount of their streaming and rental service sales. At the same time, costumers will become relying on Netflix to hunt for movies to their individualized favored genre, thus they would be more willing to pay the monthly membership fee. In 2006, Netflix even held the "Netflix Prize" competition to award a machine learning program which could better predict user preferences by beating the existing recommendation algorithm at least 10%. A team named "Pragmatic Chaos" from AT&T laboratory won the prize and was awarded one million dollars for their remarkable achievement.

In the realm of academic researches, machine learning is also boosting the development in different areas. The application of a certain machine learning tool called “support vector machine” (SVM) has been applied in the study of medical imaging based diagnosis. After acquiring the available medical imaging data collected from cohorts suffering from neurodegenerative diseases such as Alzheimer’s and normal controls, the SVM computing systems could update the parameters of their kernels, which are, in fact, sophisticated statistical models. After the parameter training procedure, the learning algorithm could be applied to conduct automatic disease diagnosis for arbitrary individuals who have had their brain scanned using the brain imaging data. The performance of this diagnostic tool is admirable with accuracy over 90% when applied in autism stage prediction (Ecker C, Marquand A, et al., 2010).

Another application of machine learning in academia is to applied machine learning methods into the study of human genome. The complex nature of human genome renders it rather challenging for using current computing tools to effectively make sense of the data. However, the application of machine learning algorithms revolutionized the study of genomics by contributing to identify functional fragment of human genome related with certain traits, especially with inherited diseases, cancers and aging. By adapting to the already known functional gene regions and compare them to the junk regions in the genome which have no clear functions, modern machine learning algorithm could recognition the potential roles of newly sequenced gene regions. As a result, researchers in biomedical science could use the information about gene functions predicted from the computational tools to conduct individualized therapy and drug delivery.

Although machine learning is bring about changes in different areas, earn people a variety of convenience and generating social wealth in diversified ways, its requirements for large amount of data sets to work as the experience (E) for training makes it confined under many conditions. The most significant limitation comes from the information security and privacy. For example, using machine learning methods and trained by personal e-mail contact records or even the content, internet companies like Google could push up targeted advertisements, which, of course, will be profitable. At the same time, personal information will be retrieved illegally to train the models of machine learning algorithms for commercial use by crawling the websites such as Facebook, Blogger and Twitter. What’s more, some medical institute will even sell the clinical data from the patients to information technology companies. In that, the enormous potential profits from the application machine learning might result in more serious personal information leaking and illegal personal data transaction. This is what people should become aware of in the days to come.

References:

Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7, p.2.

Ecker C, Marquand A, et al., Describing the Brain in Autism in Five Dimensions—Magnetic Resonance Imaging-Assisted Diagnosis of Autism Spectrum Disorder Using a Multiparameter Classification Approach.

The Journal of Neuroscience,11 August 2010, 30(32):10612-10623; doi:10.1523

Ran Shi is a first year PhD student in Department of Biostatistics and Bioinformatics at Rollins School of Public Health, Emory University. He loves mathematics. He is a Juventini, a Beatlemania and a big fan of Stanley Kubrick. Btw, happy birthday, Woody Allen!

4 comments:

Emory Translational Public Health ResearchDecember 2, 2011 at 6:18 PM
Your blog is very informative and I shudder to think how much of my personal information is widely available on the internet now (funny how your personal information is just below your concern about information security, but I'm in no position to laugh at you, either).

- Will Zhu
AnonymousDecember 2, 2011 at 6:28 PM
Your topic of Machine Learning is quite interesting and relevant to modern science, but I fear that your communication on the topic is too advanced and abstract for most people to understand. A simple on-line reading calculator such as that found here: http://www.online-utility.org/english/readability_test_and_improve.jsp estimates that the reader of your blog would have to have more than 16 years of education to understand the writing. The long, complex sentences (on average 21 words) make my mind spin--and not in a good way. It is important to remember that even the most brilliant scientists would have a hard time convincing granting agencies (like the NIH) of the importance of their work if they can't explain their thoughts clearly and simply. Remember--the most powerful mathematical equations are often the most beautiful and simple (E=mc^2). Keep writing!!

Thanks,
Casey Hall, MD
Emory University Hospital
Ariela M. FreedmanDecember 4, 2011 at 4:31 PM
I agree with Casey's comments above -- very fascinating topic but a little hard to understand for those of us without this type of background. : )

I like how you discuss the medical connections here. Perhaps a better way to frame this post would be to go from here: How does machine learning affect your health?

Great topic to keep in the discussion of public health!
kminerDecember 5, 2011 at 8:20 AM
I find this discussion fascinating because it brought me back to my earlier days as a secondary educator when I was challenged to make sure my students passed standardized tests. At this time (as is true with no child left behind), the measurement/success of learning was determined by the outcomes of test scores. Even with complex multilayered questions included in the tests, the final question choices were linear ones (A,B,C,D,E). Most educational theorists would say that functional learning at every level is not linear. It uses more than one part of the brain, more than one sense, more than one time period, and more than one emotion etc. Thus, to have an effective learning environment requires more than a linear approach/machine orientation to the instruction and the same would be true with the measurement of its attainment. This is why story telling, case studies, interactive classroom instructional environments are preferred strategies by learners. However, there will always be those who will want the standardize test to segment and/or assess the quality of the education delivered. They will will also use these same tests to "seemingly" separate the gifted teachers or learners from the less so. In the final analysis,the processes of learning are complex. It is true neurons are the biologic "pods" of information and science can mechanize some of this. Yet, learning goes beyond the biologic. It is the love of learning, the passion for reading, and commitment to one's profession that comes from the rest of the story. This is why teachers need to be story tellers.

Thursday, December 1, 2011

Machine Learning: Train to Win

4 comments: