79
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.

          Related collections

          Most cited references128

          • Record: found
          • Abstract: not found
          • Article: not found

          Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Statistical Learning by 8-Month-Old Infants

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Phonetic feature encoding in human superior temporal gyrus.

              During speech perception, linguistic elements such as consonants and vowels are extracted from a complex acoustic speech signal. The superior temporal gyrus (STG) participates in high-order auditory processing of speech, but how it encodes phonetic information is poorly understood. We used high-density direct cortical surface recordings in humans while they listened to natural, continuous speech to reveal the STG representation of the entire English phonetic inventory. At single electrodes, we found response selectivity to distinct phonetic features. Encoding of acoustic properties was mediated by a distributed population response. Phonetic features could be directly related to tuning for spectrotemporal acoustic cues, some of which were encoded in a nonlinear fashion or by integration of multiple cues. These findings demonstrate the acoustic-phonetic representation of speech in human STG.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Artif Intell
                Front Artif Intell
                Front. Artif. Intell.
                Frontiers in Artificial Intelligence
                Frontiers Media S.A.
                2624-8212
                08 July 2020
                2020
                : 3
                : 44
                Affiliations
                [1] 1Department of Linguistics, University of California, Berkeley , Berkeley, CA, United States
                [2] 2Department of Linguistics, University of Washington , Seattle, WA, United States
                Author notes

                Edited by: Kemal Oflazer, Carnegie Mellon University in Qatar, Qatar

                Reviewed by: David R. Mortensen, Carnegie Mellon University, United States; Robert Malouf, San Diego State University, United States; Kevin Tang, University of Florida, United States

                *Correspondence: Gašper Beguš begus@ 123456berkeley.edu

                This article was submitted to Language and Computation, a section of the journal Frontiers in Artificial Intelligence

                Article
                10.3389/frai.2020.00044
                7861218
                33733161
                d255b9b2-e84a-4b4f-b329-cee9845e6be3
                Copyright © 2020 Beguš.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 28 January 2020
                : 19 May 2020
                Page count
                Figures: 17, Tables: 2, Equations: 0, References: 132, Pages: 25, Words: 19135
                Categories
                Artificial Intelligence
                Original Research

                generative adversarial networks,deep neural network interpretability,language acquisition,speech,voice onset time,allophonic distribution

                Comments

                Comment on this article