The EU Framework 5 project IST-2001-37599 'PF-STAR' involved ITC-irst (Italy), RWTH (Germany), University of Erlangen-Nurnberg (Germany), University of Karlsruhe (Germany), KTH (Sweden), University of Birmingham (UK), CNR ISTC-SPFD (Italy).

A multi-lingual corpus of recordings of British, German, Italian and Swedish children speaking their native languages and English, was created as part of WP5 "Speech Technologies for Children".

The PF-STAR British English Children's Speech Corpus

A collection of speech from 158 British children aged between 4 and 14.

The PF-STAR British English children's speech corpus was collected as part of the FP5 project 'PF-STAR' by researchers at the University of Birmingham's Department of Electronic, Electrical and Computer Engineering at three locations: a university laboratory and two primary schools. The corpus contains speech from 158 children aged 4 to 14 years. The majority of the children (excluding some of the younger children) recorded 20 'SCRIBE' sentences, a list of 40 isolated words, a list of 10 'phonetically rich' sentences, 20 'generic phrases', an 'accent diagnostic' passage (the 'sailor passage') and a list of 20 digit triples. The recordings are divided into a training set (86 speakers, 703 recorded speech files, 7 hrs 29 mins 49 secs including non-speech), evaluation set (12 speakers, 97 recorded speech files, 53 mins 58 secs including non-speech) and test set (60 speakers, 510 recorded speech files, 5 hrs 49 mins 47 secs including non-speech). Full documentation is available as a PDF file.

The corpus has previously been used in research on automatic recognition of children's speech (see D'Arcy et al., "An investigation of read and spontaneous children's speech using two new databases", Proc. ICSLP 2004), and the effects of age and bandwidth on human recognition of children's speech (see D'Arcy and Russell, "A comparison of human and computer recognition accuracy for children's speech", Proc. Interspeech 2005).

Pricing


The full database is available to academic institutions for 'research purposes only' at £300 and for 'commercial use' at £6,000. (prices correct at time of going to press, 2nd January 2007).


For further information please contact The Speech Ark using our enquiry form