The Use of ASR-Equipped Software in the Teaching of Suprasegmental Features of Pronunciation

A Critical Review

Authors

  • Tim Kochem Iowa State University
  • Jeanne Beck Iowa State University
  • Erik Goodale Iowa State University

DOI:

https://doi.org/10.1558/cj.19033

Keywords:

Language Teaching, Automatic Speech Recognition (ASR) tools, Computer Assisted Pronunciation Training, suprasegmentals; pronunciation instruction; automatic speech recognition

Abstract

Technology has paved the way for new modalities in language learning, teaching, and assessment. However, there is still a great deal of work to be done to develop such tools for oral communication, specifically tools that address suprasegmental features in pronunciation instruction. Therefore, this critical literature review examines how researchers have tried to create computer-assisted pronunciation training tools using automatic speech recognition (ASR) systems to aid language learners in the perception and production of suprasegmental features. We used 30 texts from 1990 to 2020 to explore how technologies have been and are currently being used to help learners develop their proficiency with suprasegmental features. Based on our thematic analysis, a persistent gap still exists between ASR-equipped software available to participants in research studies and what is available to university and classroom teachers and students. Additionally, there seems to be more development in the production of speech software for language assessment. In contrast, the translation of these tools into instructional tools for individualized learning seems to be almost non-existent. Moving forward, we recommend that more commercialized pronunciation systems utilizing ASR should be made publicly available using the technologies that are currently developed or are in development for the purposes of oral proficiency judgments.

Author Biographies

  • Tim Kochem, Iowa State University

    Tim Kochem is a PhD candidate in the applied linguistics and technology program at Iowa State University. His primary research areas include L2 pronunciation, teacher cognitions, classroom-based research, and technology-enhanced language learning. He has worked as a Graduate Peer Mentor and Supervisor at the Center for Communication Excellence in the Graduate College for over four years. He has also taught multiple global online courses for the Online Professional English Network (OPEN), including “Using educational technology in the English language classroom,” as well as introductory courses in public speaking and linguistics at Iowa State University.

  • Jeanne Beck, Iowa State University

    Jeanne Beck is a PhD student in the applied linguistics and technology program at Iowa State University. Her research interests include L2 assessment, project-based learning, CALL, and English learner policy. She has experience teaching English learners and technology at the K–12 level in the USA and Japan, as well as experience teaching college-level English learners and public speaking courses in the USA and South Korea. She mentors English teachers worldwide through the OPEN course “Using educational technology in the English language classroom,” and assists Iowa State Department of English instructors with technology and LMS needs.

  • Erik Goodale, Iowa State University

    Erik Goodale is a PhD student in the applied linguistics and technology program at Iowa State University. His research interests include L2 pronunciation instruction, oral communication, and online learning environments. He works as an English-speaking consultant and interpersonal communication consultant for the Center for Communication Excellence.

References

References marked with an asterisk indicate studies included in the text review.

*Al-Qudah, F. Z. M. (2012). Improving English pronunciation through computer-assisted programs in Jordanian universities. Journal of College Teaching & Learning (TLC), 9(3), 201–208. https://doi.org/10.19030/tlc.v9i3.7085

*Anderson-Hsieh, J. (1992). Using electronic visual feedback to teach suprasegmentals. System, 20(1), 51–62. https://doi.org/10.1016/0346-251X(92)90007-P

Anderson?Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker judgments of nonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42(4), 529–555. https://doi.org/10.1111/j.1467-1770.1992.tb01043.x

Besacier, L., Barnard, E., Karpov, A., & Schultz, T. (2014). Automatic speech recognition for under-resourced languages: A survey. Speech Communication, 56, 85–100. https://doi.org/10.1016/j.specom.2013.07.008

Chapelle, C. A., & Chung, Y. R. (2010). The promise of NLP and speech processing technologies in language assessment. Language Testing, 27(3), 301–315. https://doi.org/10.1177/0265532210364405

*Chen, L., Zechner, K., Yoon, S.-Y., Evanini, K., Wang, X., Loukina, A., Tao, J., Davis, L., Lee, C. M., Ma, M., Mundkowsky, R., Lu, C., Leong, C. W., & Gyawali, B. (2018). Automated scoring of nonnative speech using the SpeechRaterSM v. 5.0 Engine. ETS Research Report Series, 2018(1), 1–31. https://doi.org/10.1002/ets2.12198

Chun, D. M. (1989). Teaching tone and intonation with microcomputers. CALICO Journal, 7(1), 21–46. https://doi.org/10.1558/cj.v7i1.21-46

*Cox, T., & Davies, R. (2012). Using automated speech recognition technology with elicited oral response testing. CALICO Journal, 29(4), 601–618. https://doi.org/10.11139/cj.29.4.601-618

*Cucchiarini, C., Strik, H., & Boves, L. (1997). Automatic evaluation of Dutch pronunciation by using speech recognition technology. In 1997 IEEE workshop on automatic speech recognition and understanding proceedings (pp. 622–629). New York: IEEE.

*Delmonte, R. (2000). SLIM prosodic automatic tools for self-learning instruction. Speech Communication, 30(1), 145–166. https://doi.org/10.1016/S0167-6393(99)00043-6

*Delmonte (2002). Feedback generation and linguistic knowledge in “SLIM” automatic tutor. ReCall, 14(2), 209–234. https://doi.org/10.1017/S0958344002000320

Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favor of a broad framework for pronunciation instruction. Language Learning, 48(3), 393–410. https://doi.org/10.1111/0023-8333.00047

*Ding, S., Liberatore, C., Sonsaat, S., Lu?i?, I., Silpachai, A., Zhao, G., Chukharev-Hudilainen, E., Levis, J., & Gutierrez-Osuna, R. (2019). Golden speaker builder—an interactive tool for pronunciation training. Speech Communication, 115, 51–66. https://doi.org/10.1016/j.specom.2019.10.005

Dixon, D. H. (2018). Use of technology in teaching pronunciation skills. In J. I. Liontas (Ed.), The TESOL encyclopedia of English language teaching (pp. 1–7). Hoboken: Wiley. https://doi.org/10.1002/9781118784235.eelt0692

*Evanini, K., & Wang, X. (2013). Automated speech scoring for nonnative middle school students with multiple task types. In Proceedings of Interspeech (pp. 2435–2439). 14th Annual Conference of the ISCA, Lyon. http://evanini.com/papers/evaniniWang2013toefljr.pdf; https://doi.org/10.21437/Interspeech.2013-566

*Fergadiotis, G., Gorman, K., & Bedrick, S. (2016). Algorithmic classification of five characteristic types of paraphasias. American Journal of Speech-Language Pathology, 25, S776–S787. https://doi.org/10.1044/2016_AJSLP-15-0147

*Holland, M., Kaplan, J., & Sabol, M. (1999). Preliminary tests of language learning in a speech-interactive graphics microworld. CALICO Journal, 16(3), 339–359. https://doi.org/10.1558/cj.v16i3.339-359

Johnson, D. O., & Kang, O. (2016). Automatic detection of Brazil’s prosodic tone unit. In Proceedings of speech prosody (pp. 287–291). Boston: ISCA. https://doi.org/10.21437/SpeechProsody.2016-59

*Johnson, W. L., & Valente, A. (2009). Tactical language and culture training systems: Using AI to teach foreign languages and cultures. AI Magazine, 30(2), 72. https://doi.org/10.1609/aimag.v30i2.2240

*Kang, O., & Johnson, D. (2018). The roles of suprasegmental features in predicting English oral proficiency with an automated system. Language Assessment Quarterly, 15(2), 150–168. https://doi.org/10.1080/15434303.2018.1451531

Kang, O., Rubin, D. O. N., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. Modern Language Journal, 94(4), 554–566. https://doi.org/10.1111/j.1540-4781.2010.01091.x

*Komatsu, T., Ustunomiya, A., Suzuki, K., Ueda, K., Hiraki, K., & Oka, N. (2005). Experiments toward a mutual adaptive speech interface that adopts the cognitive features humans use for communication and induces and exploits users’ adaptations. International Journal of Human-Computer Interaction, 18(3), 243–268. https://doi.org/10.1207/s15327590ijhc1803_1

Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics, 36(3), 345–366. https://doi.org/10.1093/applin/amu040

Levis, J. (2007). Computer technology in teaching and researching pronunciation. Annual Review of Applied Linguistics, 27, 184–202. https://doi.org/10.1017/S0267190508070098

Levis, J. (2016). Research into practice: How research appears in pronunciation teaching materials. Language Teaching, 49(3), 423–437. https://doi.org/10.1017/S0261444816000045

*Liu, Y., Chawla, N. V., Harper, M. P., Shiberg, E., & Stolcke, A. (2006). A study in machine learning from imbalanced data for sentence boundary detection in speech. Computer Speech and Language, 20(4), 468–494. https://doi.org/10.1016/j.csl.2005.06.002

*Mansour, S. (2014). Generation of suprasegmental information for speech using a recurrent neural network and binary gravitational search algorithm for feature selection. Applied Intelligence, 40, 772–790. https://doi.org/10.1007/s10489-013-0505-x

*Masmoudi, A., Bougares, F., Ellouze, M., Estève, Y., & Belguith, L. (2018). Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52(1), 249–267. https://doi.org/10.1007/s10579-017-9402-y

McCrocklin, S. M. (2016). Pronunciation learner autonomy: The potential of automatic speech recognition. System, 57, 25–42. https://doi.org/10.1016/j.system.2015.12.013

*Ming, Y., Ruan, Q., & Gao, G. (2013). A Mandarin edutainment system integrated virtual learning environments. Speech Communication, 55(1), 71–83. https://doi.org/10.1016/j.specom.2012.06.007

Mora, J., & Levkina, M. (2017). Task-based pronunciation teaching and research: Key issues and future directions. Studies in Second Language Acquisition, 39, 381–399. https://doi.org/10.1017/S0272263117000183

Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy–technology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 15(5), 441–467. https://doi.org/10.1076/call.15.5.441.13473

Pearson Education, Inc. (2015). Versant English test. https://www.versanttest.com/products/english.jsp

Pennington, M. (1999). Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer Assisted Language Learning, 12(5), 427–440. https://doi.org/10.1076/call.12.5.427.5693

Probst, K., Ke, Y., & Eskenzai, M. (2002). Enhancing foreign language tutors—in search of the golden speaker. Speech Communication, 37(3–4), 423–441. https://doi.org/10.1016/S0167-6393(01)00009-7

Saito, K. (2012). Effects of instruction on L2 pronunciation development: A synthesis of 15 quasi-experimental intervention studies. TESOL Quarterly, 46(4), 842–854. https://doi.org/10.1002/tesq.67

Saito, K., & Plonsky, L. (2019). Effects of second language pronunciation teaching revisited: A proposed measurement framework and meta?analysis. Language Learning, 69(3), 652–708. https://doi.org/10.1111/lang.12345

*Scherrer, Y., Samardzic, T., & Glaser, E. (2019). Digitising Swiss German: How to process and study a polycentric spoken language. Language Resources & Evaluation, 53, 735–769. https://doi.org/10.1007/s10579-019-09457-5

*Setter, J., & Jenkins, J. (2005). State-of-the-art review article. Language Teaching, 38(1), 1–17. https://doi.org/10.1017/S026144480500251X

*Shahin, I. M. A. (2012). Speaker identification investigation and analysis in unbiased and biased emotional talking environments. International Journal of Speech Technology, 15(3), 325–334. https://doi.org/10.1007/s10772-012-9156-2

*Shahin, I. M. A. (2013). Gender-dependent emotion recognition based on HMMs and SPHMMs. International Journal of Speech Technology, 16(2), 133–141. https://doi.org/10.1007/s10772-012-9170-4

*Shahin, I., & Nassif, A. B. (2018). Three-stage speaker verification architecture in emotional talking environments. International Journal of Speech Technology, 21(4), 915–930. https://doi.org/10.1007/s10772-018-9543-4

*Soonklang, T., Damper, R., & Marchand, Y. (2008). Multilingual pronunciation by analogy. Natural Language Engineering, 14(4), 527–546. https://doi.org/10.1017/S1351324908004737

Surface, E., & Dierdorff, E. (2007). Special operations language training software measurement of effectiveness study: Tactical Iraqi study final report. Tampa, FL: U.S. Army Special Operations Forces Language Office.

*Tamburini, F., & Caini, C. (2005). An automatic system for detecting prosodic prominence in American English continuous speech. International Journal of Speech Technology, 8, 33–44. https://doi.org/10.1007/s10772-005-4760-z

Tanaka, R. (2000). Automatic speech recognition and language learning. Journal of Wayo Women’s University, 40, 53–62.

Taylor, J., & Kochem, T. (2020). Access and empowerment in digital language learning, maintenance, and revival: A critical literature review. Diaspora, Indigenous, and Minority Education, 1–12. https://doi.org/10.1080/15595692.2020.1765769

Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics, 36(3), 326–344. https://doi.org/10.1093/applin/amu076

Van Compernolle, D. (2001). Recognizing speech of goats, wolves, sheep and ... nonnatives. Speech Communication, 35(1–2), 71–79. https://doi.org/10.1016/S0167-6393(00)00096-0

*Vojtech, J. M., Noordzij, J. P., Cler, G. J., & Stepp, C. E. (2019). The effects of modulating fundamental frequency and speech rate on the intelligibility, communication efficiency, and perceived naturalness of synthetic speech. American Journal of Speech-Language Pathology, 28, 875–886. https://doi.org/10.1044/2019_AJSLP-MSC18-18-0052

*Walker, N., Trofimovich, P., Cedergren, H., & Gatbonton, E. (2011). Using ASR technology in language training for specific purposes: A perspective from Quebec, Canada. CALICO Journal, 28(3), 721–743. https://doi.org/10.11139/cj.28.3.721-743

*Wang, F., Sahli, H., Gao, J., Jiang, D., & Verhelst, W. (2015). Relevance units machine based dimensional and continuous speech emotion prediction. Multimedia Tools Application, 74, 9983–10000. https://doi.org/10.1007/s11042-014-2319-1

*Ward, M. (2015). I’m a useful NLP tool—get me out of here. In F. Helm, L. Bradley, M. Guarda, & S. Thouësny (Eds.), Critical CALL—proceedings of the 2015 EUROCALL Conference, Padova, Italy (pp. 553–557). Dublin: Research-publishing.net. https://doi.org/10.14705/rpnet.2015.000392

*Witt, S. M., & Young, S. J. (2000). Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication, 30(2–3), 95–108. https://doi.org/10.1016/S0167-6393(99)00044-8

Published

2022-10-20

Issue

Section

Articles

How to Cite

Kochem, T., Beck, J., & Goodale, E. (2022). The Use of ASR-Equipped Software in the Teaching of Suprasegmental Features of Pronunciation: A Critical Review. CALICO Journal, 39(3), 306–325. https://doi.org/10.1558/cj.19033

Most read articles by the same author(s)

1 2 3 4 5 6 7 8 9 10 > >>