[日本語 | English]
Distributional semantics represents what an expression means as a vector that summarizes the contexts where it occurs. This approach has successfully extracted semantic relations such as similarity and entailment from large corpora. However, it remains unclear how to take advantage of syntactic structure, pragmatic context, and multiple information sources to overcome data sparsity. These issues also confront language models used for statistical parsing, machine translation, and text compression.
Thus, we seek guidance by converting language models into distributional semantics. We propose to convert any probability distribution over expressions into a denotational semantics in which each phrase denotes a distribution over contexts. Exploratory data analysis led us to hypothesize that the more accurate the expression distribution is, the more accurate the distributional semantics tends to be. We tested this hypothesis on two expression distributions that can be estimated using a tiny corpus: a bag-of-words model, and a lexicalized probabilistic context-free grammar a la Collins.
Abstract Categorial Grammar (ACG) is a grammar formalism based on typed lambda-calculus. The syntactic side of ACGs — parsing algorithms, generative power and other formal-language-theoretical properties — is under active investigation. ACGs can also elegantly model syntax-semantics interface and compute truth conditions. This semantic side of ACGs has received much less attention.
We describe a generalization of ACGs that lets us give the standard dynamic-logic account of anaphora and analyze quantifier strength, quantifier ambiguity and scope islands. Most of these ACG analyses have not been possible before; prior ACG analyses of quantifier ambiguity required type lifting, hence higher-order ACGs with very high parsing complexity.
Our generalization to ACG affects only the mapping from abstract language to semantics. We retain all ACG benefits of parsing from the surface form. By avoiding type lifting we keep the order of the abstract signature low, so that parsing remains tractable.
The generalization relies on ‘applicative functors’, which extend function applications. The fact that applicative functors compose lets us take the full advantage of modularity and compositionality of ACGs. We assemble the semantic mapping from separate, independently developed components responsible for a single phenomenon such as anaphora, coordination, universal or indefinite operator. We have implemented the generalized ACGs in a ‘semantic calculator’, which is the ordinary Haskell interpreter. The calculator lets us write grammar derivations in a linguist-readable form and see their yields, types and truth conditions. We easily extend fragments with more lexical items and operators, and experiment with different semantic-mapping assemblies.
Last modified: 2012-05-29 12:47:39 JST