The EU Framework 5
project IST-2001-37599 'PF-STAR' involved
ITC-irst (Italy), RWTH (Germany), University of Erlangen-Nurnberg (Germany), University of Karlsruhe (Germany),
KTH (Sweden), University of Birmingham (UK), CNR ISTC-SPFD (Italy).
A multi-lingual corpus of recordings of British, German, Italian
and Swedish children speaking their native languages and English, was created as part of
WP5 "Speech Technologies for Children".
|
|
|
The PF-STAR British English Children's Speech Corpus
A collection of speech from 158
British children aged between 4 and 14.
The PF-STAR British English children's speech corpus was collected as part of the FP5 project
'PF-STAR' by researchers at the
University of Birmingham's Department of Electronic, Electrical and Computer Engineering
at three locations: a university laboratory and
two primary schools. The corpus contains speech from 158 children aged 4 to 14 years. The majority of
the children (excluding some of the younger children) recorded 20 'SCRIBE' sentences, a list of
40 isolated words, a list of 10 'phonetically rich' sentences, 20 'generic phrases', an 'accent
diagnostic' passage (the 'sailor passage') and a list of 20 digit triples. The recordings are
divided into a training set (86 speakers, 703 recorded speech files, 7 hrs 29 mins 49 secs
including non-speech), evaluation set (12 speakers, 97 recorded speech files, 53 mins 58 secs
including non-speech) and test set (60 speakers, 510 recorded speech files, 5 hrs 49 mins 47
secs including non-speech). Full documentation is available as a
PDF file.
The corpus has previously been used in research on automatic recognition of children's speech
(see D'Arcy et al., "An investigation of read and spontaneous children's speech using two new
databases", Proc. ICSLP 2004), and the effects of age and bandwidth on human recognition of children's
speech (see D'Arcy and Russell, "A comparison of human and computer recognition accuracy for children's
speech", Proc. Interspeech 2005).
|
|
Pricing
The full database is available to academic institutions for 'research
purposes only' at £300 and for 'commercial use' at £6,000. (prices correct at time
of going to press, 2nd January 2007).
For further information please contact The Speech Ark using our enquiry form
|
|