Guideline documents are edited using GEM Cutter to form GEM Documents. ( for example.) These are then uploaded to a repository using a program to extract the essential elements. The next step is to run Apache cTAKES, which is an UIMA-based NLP processor for clinical documents that creates annotations for the guideline text. We adopt the YTEX version so that the results are stored in a relational database. The final step is to create SVM classifiers based on training sets created by clinical experts. See Action Types below.
An important initiative that we are pursuing is the automated identification of clinical unique identifier (CUI) and ICD10 codes with recommendation text. The challenge is to optimize the set of codes that are related to the recommendation text. Using a restful interface from the UMLS, a single concept submitted to the search engine can produce a wide range of returned CUI’s (False Positives). If we submit bi-grams and tri-grams, we receive a different but related set of codes. We have utilized similarity measures to help reduce the result set with the goal of reducing a loss of information (False Negatives). Currently, we have employed YTEX to produce a canonical set of concepts and UMLS codes, methods to generate n-grams for search terms, and similarity measures (Cosine and Sorenson-Dice coefficient) to reduce the matches generated from search results.
Another way to approach the problem is to keep the top returned result only and then determine its suitability in matching the code with the text. For example, if we send the common deontic term “should” to the UMLS, the first returned code is Concept: [C0882088] Multisection:Finding:Point in time:Shoulder:Narrative:MRI, which is not a code that we want to associate with the recommendation. So the challenge is to determine if this one result is sufficient to describe the submitted term. Limiting terms that are not significant in capturing the meaning of the content will also improve results.
N-grams are useful in that they allow for the calculation of probabilities, since we can convert frequency counts to probabilities after normalizing. An expression that has a high probability of occurring, which is then found to occur in some unidentified text, will be a clear indicator of its likelihood to be related to the text in question. Combined with the use of other similarity measures, we hope to form an optimized system for identifying recommendations that are related to a clinical task or action.
Action Types have been used to create a classifier. The goal is to identify recommendations using these categories. This will work to improve identifying recommendations related to specific clinical decision support tasks. Action types are based on work done by Essaihi et al. Action types are used explicitly in Bridge-Wiz to aid guideline developers in choosing actions from a controlled vocabulary for the development of guideline recommendations. You can find more detailed information on the implementation of Bridge-Wiz at Glides. Bridge-Wiz has also been adapted as a web application.
The classifier we tried first was a Naive-Bayes Classifier. (See below for a better approach, Fasttext.)
These will enable users to categorize recommendations based on Action Type.
The Action Types are:
- Gather Data
- Draw Conclusion
- Perform Therapeutic Procedure
The benefit of classifying with Action Types is that it gives a clear indication of where in the course of a clinical encounter the recommendation is relevant. From there, we may further classify recommendations based on a more clinically specific vector space model, such as organ or disease type. The approach we take is an iterative one, so that we build a clear picture of a recommendation without the need to rely on manual development.
Our work with Action Types demonstrated some interesting results. Common terms, like receive, are used in different settings and can throw the model off and turn a Procedure into a Prescribe action type. Clearly, Naive Bayes will need more training examples to overcome this hurdle. Also, a phrase such as “A focused exam of the hair” is asserted to be a Monitor rather than Test activity, which is somewhat understandable. Another example of interest is this recommendation, “The presence of maternal thyroid disease is important information for the pediatrician to have at the time of delivery.” The experts said it was a Test, but the model said it was Educate/Counsel at 72% confidence. There is an inherent inference that this model cannot make.
We have furthered this activity by implementing the FastText classifier. This is a C++ program that we compiled for our Linux platform and ran in a Java program utilizing our expert’s Action Types data set. The results are displayed in the AI section for comparison with our Naive Bayes approach. We have not implemented multi-labels, although FastText has this feature. (This was the primary source of error.) We used n-grams of size 2, learning rate of 1, and epoch of 30. The results are very encouraging. If you wish to try it out you can use this data set, where we have added a “__label__” tag to work with Fasttext.
Acronyms are always a stumbling block for NLP and our capturing of these in a reference database table should help in reducing these errors.
Our next activity will be to examine ways to utilize n-grams to identify recommendations that are applicable in a clinical setting. Based on the frequency results, we will convert these to probabilities and conditional probabilities. We can then assign a likelihood that a given recommendation will be relevant to a clinical context source, such as an order set or EHR clinical note. We are also going to see how to build on our Fasttext results.
We have been reviewing the work of Wessam Gad El-Rab, “Clinical Practice Guideline Formalization: Translating Clinical Practice Guidelines to Computer Interpretable Guidelines,” and have considered the usage of the Action Palette in that work. Given our excellent Fasttext results, we wish to pursue the idea of whether these transitive verbs can be used to produce an even stronger mechanism to categorize recommendations. Are the verbs consistent with our Fasttext results? Are they necessary and/or sufficient? We plan to explore this connection and review its impact.