Another interesting question we wish to pursue is whether the similarity (such as cosine) of recommendations can be used to rank them in a guideline. That is, are highly similar recommendations more important than recommendations that are less similar. It would appear be true prima facie, but we wish to verify it and then compare it with the actual ordering in the guideline. We expect that being able to sort recommendations by degree of relevance will be a useful tool.
Are Choosing Wisely recommendations appropriate to implement without a complete available context, which is established by relevant metadata, such as the intended audience/user or target audience/population, etc., typically contained in the complete guideline or guideline summary? Should one implement a recommendation with only a partial or incomplete context? In a recent article [Zadro, J.R., Farey, J., Harris, I.A. et al. Do choosing wisely recommendations about low-value care target income-generating treatments provided by members? A content analysis of 1293 recommendations. BMC Health Serv Res 19, 707 (2019) doi:10.1186/s12913-019-4576-1] the authors provide us with a corpus of recommendations where they have determined a number of properties: test or treatment, income generating or not, for or against, qualified or not, and member focused. Our interest is in using this corpus as a gold standard for each of these properties. Choosing Wisely recommendations are not typical guideline recommendations, but it is worth considering the machine learning models we can make with this data and extend the properties to our corpus. We have created a model using test or treatment and have produced excellent results. The author’s focus is on income-generating recommendations and it is not clear we can determine this from the text, but we will see if it is possible. For or against is fairly trivial, so we should have a good model for that property as well as the rest.
Our work on Recommendation Strength was not conclusive and so we have moved on from it. An interesting concept that has come up is Lexical Density. How do recommendations rank as defined by the number of clinical terms over the total number of terms. This measure could be evaluated as a ratio of the number of terms that return a code set from the UMLS to the total number of terms in the text of interest, either recommendation or guideline. If the ratio is high then we have text this lexically dense. Does this impact the message? Does the text convey a clear and unambiguous message? Is this appropriate for the Intended Audience?
Recommendation Strength is a critical element to establish the degree to which the recommended action should be followed. We have created a training set that contains a label of recommendation strength (S for Strong or W for Weak) along with the recommendation text. The goal is to be able to judge whether the language used in a given recommendation is consistent with the corpus of recommendations and, if not, try to identify recommendations that may not convey the intended meaning of the developer .