Guideline documents are edited using GEM Cutter to form GEM Documents. ( for example.) These are then uploaded to a repository using a program to extract the essential elements. The next step is to run Apache cTAKES, which is an UIMA-based NLP processor for clinical documents that creates annotations for the guideline text. We adopt the YTEX version so that the results are stored in a relational database. The final step is to create SVM classifiers based on training sets created by clinical experts. See Action Types below.
An important initiative that we are pursuing is the automated identification of clinical unique identifier (CUI) and ICD10 codes with recommendation text. The challenge is to optimize the set of codes that are related to the recommendation text. Using a restful interface from the UMLS, a single concept submitted to the search engine can produce a wide range of returned CUI’s (False Positives). If we submit bi-grams and tri-grams, we receive a different but related set of codes. We have utilized similarity measures to help reduce the result set with the goal of reducing a loss of information (False Negatives). Currently, we have employed YTEX to produce a canonical set of concepts and UMLS codes, methods to generate n-grams for search terms, and similarity measures (Cosine and Sorenson-Dice coefficient) to reduce the matches generated from search results.
Another way to approach the problem is to keep the top returned result only and then determine its suitability in matching the code with the text. For example, if we send the common deontic term “should” to the UMLS, the first returned code is Concept: [C0882088] Multisection:Finding:Point in time:Shoulder:Narrative:MRI, which is not a code that we want to associate with the recommendation. So the challenge is to determine if this one result is sufficient to describe the submitted term. Limiting terms that are not significant in capturing the meaning of the content will also improve results.
N-grams are useful in that they allow for the calculation of probabilities, since we can convert frequency counts to probabilities after normalizing. An expression that has a high probability of occurring, which is then found to occur in some unidentified text, will be a clear indicator of its likelihood to be related to the text in question. Combined with the use of other similarity measures, we hope to form an optimized system for identifying recommendations that are related to a clinical task or action.
Action Types have been used to create a classifier. The goal is to identify recommendations using these categories. This will work to improve identifying recommendations related to specific clinical decision support tasks. Action types are based on work done by Essaihi et al. Action types are used explicitly in Bridge-Wiz to aid guideline developers in choosing actions from a controlled vocabulary for the development of guideline recommendations. You can find more detailed information on the implementation of Bridge-Wiz at Glides. Bridge-Wiz has also been adapted as a web application.
The classifier we tried first was a Naive-Bayes Classifier. (See below for a better approach, Fasttext.)
These will enable users to categorize recommendations based on Action Type.
The Action Types are:
- Gather Data
- Draw Conclusion
- Perform Therapeutic Procedure
The benefit of classifying with Action Types is that it gives a clear indication of where in the course of a clinical encounter the recommendation is relevant. From there, we may further classify recommendations based on a more clinically specific vector space model, such as organ or disease type. The approach we take is an iterative one, so that we build a clear picture of a recommendation without the need to rely on manual development.
Our work with Action Types demonstrated some interesting results. Common terms, like receive, are used in different settings and can throw the model off and turn a Procedure into a Prescribe action type. Clearly, Naive Bayes will need more training examples to overcome this hurdle. Also, a phrase such as “A focused exam of the hair” is asserted to be a Monitor rather than Test activity, which is somewhat understandable. Another example of interest is this recommendation, “The presence of maternal thyroid disease is important information for the pediatrician to have at the time of delivery.” The experts said it was a Test, but the model said it was Educate/Counsel at 72% confidence. There is an inherent inference that this model cannot make.