Thông tin tài liệu
Augmenting WordNet for Deep Understanding of Text Peter Clark, Phil Harrison, Bill Murray, John Thompson (Boeing) Christiane Fellbaum (Princeton Univ) Jerry Hobbs (ISI/USC) “Deep Understanding” • Not (just) parsing + word senses • Construction of a coherent representation of the scene the text describes • Challenge: much of that representation is not in the text “A soldier was killed in a gun battle” “The soldier died” “The soldier was shot” “There was a fight” … “Deep Understanding” How we get this knowledge into the machine? How we exploit it? “A soldier was killed in a gun battle” Because A battle involves a fight Soldiers use guns Guns shoot Guns can kill If you are killed, you are dead … “The soldier died” “The soldier was shot” “There was a fight” … “Deep Understanding” Several partially useful resources exist WordNet is already used a lot…can we extend it? “A soldier was killed in a gun battle” Because A battle involves a fight Soldiers use guns Guns shoot Guns can kill If you are killed, you are dead … “The soldier died” “The soldier was shot” “There was a fight” … The Initial Vision • Our vision: Rapidly expand WordNet to be more of a knowledge-base Question-answering software to demonstrate its use The Evolution of WordNet lexical resource • v1.0 (1986) – synsets (concepts) + hypernym (isa) links • v1.7 (2001) – add in additional relationships • has-part • causes • member-of • entails-doing (“subevent”) • v2.0 (2003) – introduce the instance/class distinction • Paris isa Capital-City is-type-of City – add in some derivational links • explode related-to explosion knowledge base? • … • v10.0 (200?) – ????? Augmenting WordNet • World Knowledge – Sense-disambiguate the glosses (by hand) – Convert the glosses to logic • Similar to LCC’s Extended WordNet attempt – Axiomatize “core theories” • WordNet links – Morphosemantic links – Purpose links • Experiments Converting the Glosses to Logic “ambition#n2: A strong drive for success” Convert gloss to form “word is gloss” Parse (Charniak) LFToolkit: Generate logical form fragments Lexical output rules strong drive for success produce logical form fragments strong(x1) & drive(x2) & for(x3,x4) & success(x5) Converting the Glosses to Logic “ambition#n2: A strong drive for success” Convert gloss to form “word is gloss” Parse (Charniak) LFToolkit: Generate logical form fragments Identify equalities, add senses Converting the Glosses to Logic x2=x3 x1=x2 Composition rules identify variables x4=x5 Lexical output rules A strong drive for success produce logical form fragments strong(x1) & drive(x2) & for(x3,x4) & success(x5) Identify equalities, add senses Successful Examples with the Glosses • Another (somewhat) good example 56.H3 T: The administration managed to track down the perpetrators H: The perpetrators were being chased by the administration Successful Examples with the Glosses • Another (somewhat) good example 56.H3 T: The administration managed to track down the perpetrators H: The perpetrators were being chased by the administration WN: hunt_v1 “hunt” “track down”: pursue for food or sport T: The administration managed to pursue the perpetrators [for food or sport!] H: The perpetrators were being chased by the administration → ENTAILED (correct) Unsuccessful examples with the glosses • More common: Being “tantalizingly close” 16.H3 T: Satomi Mitarai bled to death H: His blood flowed out of his body Unsuccessful examples with the glosses • More common: Being “tantalizingly close” 16.H3 T: Satomi Mitarai bled to death H: His blood flowed out of his body WordNet: bleed_v1: "shed blood", "bleed", "hemorrhage": lose blood from one's body So close! Need to also know: “lose liquid from container”usually → “liquid flows out of container” Unsuccessful examples with the glosses • More common: Being “tantalizingly close” 20.H2 T: The National Philharmonic orchestra draws large crowds H: Large crowds were drawn to listen to the orchestra Unsuccessful examples with the glosses • More common: Being “tantalizingly close” 20.H2 T: The National Philharmonic orchestra draws large crowds H: Large crowds were drawn to listen to the orchestra WordNet: WN: orchestra = collection of musicians WN: musician: plays musical instrument WN: music = sound produced by musical instruments WN: listen = hear = perceive sound So close! Success with Morphosemantic Links • Good example 66.H100 T: The Zoopraxiscope was invented by Mulbridge H*: Mulbridge was the invention of the Zoopraxiscope [NOT entailed] WordNet too permissive!: (“invent”) invent_v1 invention_n1 (“invention”) derivation link “invent”/ “invention” But need an agent (X verb Y -> X is agent-noun of Y) Got: result-noun (“invention” is result of “invent”) So no entailment (correct!) Successful Examples with DIRT • Good example 54.H1 T: The president visited Iraq in September H: The president traveled to Iraq DIRT: IF Y is visited by X THEN X flocks to Y WordNet: "flock" is a type of "travel" Entailed [correct] Unsuccessful Examples with DIRT • Bad rule 55.H100 T: The US troops stayed in Iraq although the war was over H*: The US troops left Iraq when the war was over [NOT entailed] DIRT: IF Y stays in X THEN Y leaves X Entailed [incorrect] Overall Results • Note: Eschewing statistics! • BPI test suite (61%): Correct Incorrect “StraightForward” When H or ¬H is predicted by: Simple syntax manipulation 11 WordNet taxonomy + morphosemantics 14 WordNet logicalized glosses DIRT paraphrase rules 27 20 When H or ¬H is not predicted: 97 72 Overall Results • Note: Eschewing statistics! • BPI test suite (61%): Correct Incorrect When H or ¬H is predicted by: Useful Simple syntax manipulation 11 WordNet taxonomy + morphosemantics 14 WordNet logicalized glosses DIRT paraphrase rules 27 20 When H or ¬H is not predicted: 97 72 Overall Results • Note: Eschewing statistics! • BPI test suite (61%): Correct Incorrect When H or ¬H is predicted by: Simple syntax manipulation 11 WordNet taxonomy + morphosemantics 14 DIRT paraphrase rules 27 20 When H or ¬H is not predicted: 97 72 Occasionally WordNet logicalized glosses useful Overall Results • Note: Eschewing statistics! • BPI test suite (61%): Correct Incorrect When H or ¬H is predicted by: Often useful but unreliable • RTE3: 55% Simple syntax manipulation 11 WordNet taxonomy + morphosemantics 14 WordNet logicalized glosses DIRT paraphrase rules 27 20 When H or ơH is not predicted: 97 72 Summary “Understanding” – Constructing a coherent model of the scene being described – Much is implicit in text → Need lots of world knowledge • Augmenting WordNet – Made some steps forward: • More connectivity • Logicalized glosses • But still need a lot more knowledge! Thank you!
Ngày đăng: 06/06/2018, 21:06
Xem thêm: Augmenting WordNet for Deep Understanding of Text