PMI Results

Our PMI results are very useful. We can clearly see the nondeterministic nature of some pairs, e.g., “should” occurs so many times that it becomes less correlated with any specific joining term. Whereas we see that high PMI values are associated with terms that do not occur on their own very frequently, such as “infantile spasms,” at least for our specific corpus of guideline recommendations. The significance of these results needs to be investigated further.

Pointwise Mutual Information

Pointwise Mutual Information (log base 2) was calculated based on our unigram and bigram tables. The results indicate that “should” is too ubiquitous to be meaningfully correlated. The PMI was determined from P(x,y) being equivalent to the frequency of the bigram appearing and P(x) and P(y) are frequencies taken from the unigram table. Our corpus is the set of conditional recommendation terms (1682 tokens). A more extensive set of PMI’s can be generated if we parse the raw text, since our n-gram table is highly filtered.

Probability Theory

One aspect of machine learning that is ubiquitous is probability theory. As we develop more complex programs, it seems natural to turn to a programming language that is built for this task. We are investigating Gen which is written in the Julia programming language. Julia is a very powerful and yet user-friendly language. It builds on the past successes of other languages, which makes it ideal for our purposes. Although Gen is pre-Alpha and may not prove to be the future we seek, Julia certainly has the potential to be an important part of it and efforts to implement it will surely be worth pursing.