Background
Standard assessments of autism spectrum disorders (ASD) rely primarily on negative markers for detection
of the disorders. Research has indicated that certain vocal characteristics of children diagnosed with ASD may differ
consistently from those of typically developing children, suggesting the presence of a positive marker for ASD in child
vocalization activity. However, investigations into the potential clinical utility of such a marker have been limited by two
key challenges: 1) the difficulty of obtaining sample data of sufficient quantity and quality; and 2) the identification of
consistent discriminative vocalization patterns. The LENA System’s advanced technology overcomes these limitations
and provides a unique approach to the detection of ASD with good accuracy.
Development
The LENA System comprises two distinct components: recording hardware and processing software.
The LENA Digital Language Processor (DLP) is a small, lightweight digital recorder that fits into the front pocket of
specially designed children’s clothing and records up to 16 hours of continuous, high-quality audio. Recordings include
all vocalizations produced by the key child (i.e., the child wearing the DLP) and all externally sourced sounds and speech
activity within an approximate 4-6 foot radius. This unobtrusive approach to data sampling permits the collection of
naturalistic full-day recordings from a child’s home language environment with relative ease, rendering negligible the
limitations arising from the first challenge, obtaining adequate child vocalization data. The second challenge, identifying
consistent patterns in child vocalizations that can be utilized to discriminate a child with ASD from a typically developing
child, is addressed by the processing software as described below.
The LENA System software processes the audio recording into segments from several seconds to several minutes in
duration, assigning a sound category (e.g., key child vocalizations, adult male speech, TV/electronic sound, silence) to
each segment based on previously developed acoustic models. Key child vocalization segments are further processed
to determine the probability that the child’s vocal output is consistent with a pre-defined classification model for ASD.
We have developed two complementary methods for detecting unique and discriminating patterns in the vocalizations
produced by children with ASD and deriving these classification probabilities.
The first method, here called phone-based (PB), defines a unique acoustic feature set using a quantitative approach that incorporates modified components of the open-source Sphinx automatic speech recognition (ASR) software. Child vocalization segment data are processed by this software into 46 unique categories that include 39 “phone” and 7 “nonphone” categories. Note that these “phones” are more broadly defined acoustic approximations of commonly accepted phoneme categories. Sequential pairs of these “phones” are grouped into “biphones” that are then linearly recombined and reduced to 50 dimensions following a previously derived principal components analysis. For a more detailed description of the phone-based approach described here, please see LENA Technical Report LTR-08-1, "The LENATM Automatic Vocalization Assessment" (http://www.lenafoundation.org/TechReport.aspx/AVA/LTR-08-1
The second method, here called cluster-based (CB), utilizes an unsupervised k-means clustering routine applied directly
to child vocalization segment data. This self-organized approach utilizes 64 phone-like clusters generated on the
acoustic feature of mel-frequency cepstrum (MFC). For this method, as for the phone-based method, because the goal
is not to recognize or translate speech it is not necessary that the resulting clusters or dimensions be identifiable as
specific phones but only that the processing provides reliable or consistent results.
Ultimately, a previously derived linear discriminant analysis (LDA) function is applied to the combined PB and CB feature
sets to determine the probability of classification to the ASD pattern. For convenience and to enhance interpretability,
LDA classification probabilities are reduced to seven ordinal categories using a variable threshold based on sensitivity
and specificity for our development data.
Performance
Classification performance was assessed for a sample of 190 children
ages 24–48 months based on each child’s first
recording after age 24 months and employing the method of Leave-One-Out
Cross Validation (LOOCV) to maximize data usage and
enhance generalizability. The sample included 75 children diagnosed
with ASD, 34 children diagnosed with a language delay (LD), and
81 typically developing children (TD). The ASD sample was recruited
nationwide, and families were required to provide documented
confirmation of the ASD diagnosis from a professional or team of
professionals. In addition, parents completed the self-report symptom
questionnaires the Modified Checklist for Autism in Toddlers (M-CHAT)
and the Social Communication Questionnaire (SCQ); average
parent score for the M-CHAT was 9.5 (SD=4.8; Range 0-19) and for the
SCQ was 18.7 (SD=5.7; Range 7-32). The performance metric
presented here is based on the Equal Error Rate (EER).1 The following
table summarizes EER performance across three comparisons:
ASD vs. non-ASD (TD & LD); ASD vs. LD; and ASD vs. TD.
The LENA Automatic Autism Screen compares favorably to other non-automatic measures, which are widely used. Shown below are
some of the reliability statistics reported by other well known measures.
Summary
The combined phone-based and cluster-based detection method detailed
above demonstrates relatively low classification
error rates, reinforcing the viability of an automated detector for ASD
based on child vocalization activity. The DLP provides researchers
the means to collect comprehensive naturalistic language environment
data in a simple and unobtrusive manner, and the automated
processing software enables the assessment of ASD-specific vocal
characteristics using completely objective measures.
The LENA Foundation is exploring other approaches that
incorporate additional information that may be derived from recording
data,
which has the potential to improve accuracy. In addition, the
Foundation is seeking to increase the sample size, include a more
diverse
sample set, as well as younger children with ASD, with the hope that
the screen could be extended down to 18 months and perhaps
even younger.
1
In any classification problem, it is necessary to set a threshold
probability value for detection of the target group of interest. This
threshold
value determines not only the number of correct detections but also the
number of false acceptances (false positives) and false rejections
(false negatives). There is a trade-off between these two types of
error; for example, as the false positive rate decreases the false
negative
rate increases. A generally accepted measure of classification
performance is the EER, or the classification error using a threshold
at which the
false positive rate equals the false negative rate. The lower the EER,
the fewer classification errors overall.