Guideline documents are edited using GEM Cutter to form GEM Documents. These are then uploaded to a repository using a program to extract the essential elements. The next step is to run Apache cTAKES, which is an UIMA-based NLP processor for clinical documents that creates annotations for the guideline text. We adopt the YTEX version so that the results are stored in a relational database. The final step is to create SVM classifiers based on training sets created by clinical experts. See Action Types below.
An important initiative that we are pursuing is the automated identification of clinical unique identifier (CUI) and ICD10 codes with recommendation text. The challenge is to optimize the set of codes that are related to the recommendation text. Using a restful interface from the UMLS, a single concept submitted to the search engine can produce a wide range of returned CUI’s (False Positives). If we submit bi-grams and tri-grams, we receive a different but related set of codes. We have utilized similarity measures to help reduce the result set with the goal of reducing a loss of information (False Negatives). Currently, we have employed YTEX to produce a canonical set of concepts and UMLS codes, methods to generate n-grams for search terms, and similarity measures (Cosine and Sorenson-Dice coefficient) to reduce the matches generated from search results.
N-grams are also useful in that they allow for the calculation of probabilities, since we can convert frequency counts to probabilities after normalizing. An expression that has a high probability of occurring, which is then found to occur in some unidentified text, will be a clear indicator of its likelihood to be related to the text in question. Combined with the use of other similarity measures, we hope to form an optimized system for identifying recommendations that are related to a clinical task or action.
Action Types will be used to create a classifier. The goal will be to identify recommendations using these categories. This will work to improve identifying recommendations related to specific clinical decision support tasks. Action types will be based on work done by Essaihi et al. Action types are used explicitly in Bridge-Wiz to aid guideline developers in choosing actions from a controlled vocabulary for the development of guideline recommendations. You can find more detailed information on the implementation of Bridge-Wiz at Glides.
The classifiers we will try first are Random Forest and Support Vector Machine. These will enable users to categorize recommendations based on Action Type. The Action Types are:
- Gather Data
- Draw Conclusion
- Perform Therapeutic Procedure
The benefit of classifying with Action Types is that it gives a clear indication of where in the course of a clinical encounter the recommendation is relevant. From there, we may further classify recommendations based on a more clinically specific vector space model, such as organ or disease type.