- WHEN – Under what circumstances
- WHO – Guideline’s intended audience
- OUGHT – Level of obligation
- To do WHAT -Action
- To WHOM – Receiver of the Action
- HOW – More details of the Action
- WHY – What the evidence base is
We will start by focusing on Who – Does What using a tool called ReVerb (ReVerb is a program that automatically identifies and extracts binary relationships from English sentences.) from Washington University. We will put each recommendation in our corpus through this tool and see how it is able to parse the recommendations. Besides POS, it produces “triples” of Arg-Relation-Arg or Who-Did What-ToWhom.
We have put our corpus of recommendations through the OpenIE 5 tool and have found some very useful information. When trying to discern Who-Did What-To Whom etc., the structure of the text can create problems. The more complex the structure, the more incorrect the results. Imperative statements are not handled well: (“Educate caregivers to assist in their ability to care for the wanderer.” 0.84 (Educate; to assist; in their ability)) Unfortunately, the tool does not address the missing Who in such statements as (“Effects on the breast, with or without estrogen replacement, should be measured.” 0.83 (Effects on the breast\, with or without estrogen replacement\; should be measured; )) This is not a triple and we do not know Who is supposed to measure something. This is a common structure in recommendations “Something should be considered or measured” and perhaps needs special attention.
Here is an interesting case:
“Caregivers should be trained to assess neonates for pain using multidimensional tools.
0.88 (Caregivers; to assess; neonates for pain)
0.89 (Caregivers; should be trained; to assess neonates for pain)
Neither is quite right. We have lost a key aspect of the recommendation: “multidimensional tools” is dropped. The caregiver is to be trained to use multidimensional tools to assess neonates. But it could easily have been a sentence with the structure: “to assess neonates for development using their hands to point” where the “using” is attached tot he neonates and not the action.
Of course our corpus is not a fair assessment since it is somewhat fractured, but this also points out that identifying what to do is not a simple task. Forming an If-Then-Else statement from these triples seems to be a remaining challenge in all but the simplest cases.
Another example of missing data:
“If a patient has GFR <30 mL/min/1.73”
0.94 (a patient; has; GFR)
Units are always a challenge but the tool we are using failed badly here.
“For individuals without symptoms or a history of cancer\, the guideline developers recommend against the use of serial chest x-rays (CXRs) to screen for the presence of lung cancer.”
0.91 (the guideline developers; recommend; against the use of serial chest x-rays; to screen; for the presence of lung cancer)
individuals without symptoms or a history of cancer
recommend against the use of serial chest x-rays (CXRs)
WHAT to screen for the presence of lung cancer
Or even better – IF patient does not have symptoms or a history of cancer THEN do not use serial chest x-rays WHAT to screen for the presence of lung cancer”
So we should have
(individuals without symptoms or a history of cancer, recommend against the use of serial chest x-rays (CXRs), to screen for the presence of lung cancer)
(the guideline developers; recommend; against the use of serial chest x-rays; to screen; for the presence of lung cancer)
To be fair, the tool is not built for these statements, since recommendations are frequently self-referential, but we clearly do not need “guideline developers” in our rule making even though it is correct to say this. Also the negative sentiment needs to be emphasized since telling someone NOT to do something is not telling them what they should do instead and it is vital that negative statements be separated in our reasoning.
All in all, I would say this approach has demonstrated significant problems for us to pursue this line of reasoning in future work. Perhaps a decision tree approach with many labeled examples would support the goal we have in mind: At least to verify its validity before scaling things up.
Refocusing our efforts to produce mechanisms to auto-generate codes for recommendations. Rx has been seen to be challenging even with writing rules, so we will focus on our previous work before directing our efforts to adding Rx codes.