Title
The British Sign Language (BSL) Corpus Project: Impact Report
Description
Impact report for BSL Corpus Project, June 2012
Language
English
Director
Kearsy Cormier
Contributor
Frances Elton
Contributor
Rachel Sutton-Spence
Contributor
Margaret Deuchar
Contributor
Donall O'Baoill
Contributor
Graham Turner
Contributor
Adam Schembri
Contributor
Bencie Woll
Summary of scientific impacts
The British Sign Language (BSL) Corpus Project has established a permanent and representative record of BSL via a digital video dataset that been partly annotated using ELAN software, has received standard metadata descriptions, and has been made accessible online by researchers worldwide, serving as a model system for openly accessible video archiving. This is the first and most significant step towards a searchable, machine-readable, and fully accessible language corpus which can be used as a reference corpus of BSL data. Studies on sociolinguistic variation and change and lexical frequency have also had scientific impact. Findings linked to pointing signs as part of the phonological variation study on the ‘1’ handshape have led to an application with the Australian Research Council to compare the use of pointing signs in the BSL Corpus and similar corpora in Australia and the Netherlands with non-signers’ pointing gestures, which will be important for understanding linguistic and gestural aspects of pointing. The lexical variation studies form the core component of project-linked student Rose Stamp’s PhD dissertation (due for completion in 2012). This also includes the first-ever dialect accommodation study in a sign language, a follow-on project from the BSL Corpus. The lexical frequency study led to the creation of a lexical database for BSL, which under the UCL Deafness, Cognition & Language Research Centre (DCAL, 2011-2015), will become BSL SignBank, an online dictionary of BSL and the first corpus-based dictionary for any sign language. The experience of PI Cormier and project researcher Fenlon with the BSL Corpus greatly contributed to their status as UK representatives in the European COST Action network SignGram: “Unravelling the grammars of European sign languages” (2011-2014). Several SignGram members are leaders of sign language corpus projects in other EU countries which will enhance the future prospects of these and other sign language corpora.
Findings and Outputs
The primary scientific output of the BSL Corpus Project is the open-access online video dataset itself. Several outputs (including conference proceedings and chapters in edited volumes) have described linguistic, ethical and technological issues involved in creating and annotating a sign language corpus. The phonological variation study examined variation in BSL signs produced with the 1 handshape (i.e., a typical pointing handshape with extended index finger). Analysis of 2110 signs indicates that variation in the 1 handshape is conditioned by linguistic factors including phonological environment, grammatical category (particularly pointing signs), and lexical frequency. A paper on this study has been submitted to a high-impact interdisciplinary journal and is under review as of June 2012. With the study on lexical variation and change in numbers, colours, and placename signs, we have found that a) younger signers, b) signers who attended school outside of their region, and c) signers with non-signing parents were all less likely to use traditional signs (specific to the region they lived in), compared to a) older signers, b) signers who attended school in their region and c) signers with signing parents. A paper on variation in number signs has been submitted to a high-impact sociolinguistics journal and is under review as of June 2012. The study on lexical frequency (Cormier et al. 2011, LDLT3 proceedings) includes 24,920 signs from the conversation data. Results indicate that 60% of the data consists of signs from the core lexicon, with high frequency items including more content words than is typical of spoken languages. This provides the first lexical frequency data for a sign language based on a large conversational dataset. The lexical database that was created as part of the lexical frequency study and its development into BSL SignBank under DCAL (2011-2015) is described in Cormier et al. (2012, LREC proceedings).
How these impacts were achieved
These impacts were achieved via invited presentations and refereed conference presentations at a wide range of academic institutions in the UK and Europe, North America, Australia, and Asia. Conference and workshop topics covered areas not only specific to sign language research but also language resources and evaluation, sustainable digital data in the humanities, language variation and change, dialectology, linguistics, language documentation, and gesture studies. In addition, news about the BSL Corpus becoming available online in mid-2011 was also posted as a news item on the university websites for UCL and La Trobe University (where former project PI Schembri is now based).
Who these findings impact
The BSL Corpus is one of very few large sign language corpus projects (along with projects in Australia, The Netherlands and Germany) and only the second to have video data available online (after The Netherlands).The impacts from the BSL Corpus Project have been experienced by academics in the UK and worldwide. The BSL Corpus team was involved in the Sign Language Corpora Network (2009-2010) and maintains strong links with researchers working on corpus projects (partly through regular conference and workshop series such as Language Resources and Evaluation (LREC) and partly through the SignGram COST Action network). The BSLCP-hosted 2009 workshop “Sign Language Corpora: Linguistic Issues” had international attendance (>100 academics), and many were researchers starting or planning sign language documentation in their own countries. Since then, the BSL Corpus team have worked and consulted with such researchers, including those from USA, Italy, India, Japan, Spain and central Australia (the latter for documentation of ‘alternate’ sign languages used by indigenous Arandic speakers). The BSL Corpus team is clearly seen as one of the leading teams on sign language documentation and corpora in the world. In addition to connections with academics via networks and collaborations, we also have granted user licences to academics as registered researchers so that they may access the restricted (conversation and interview) data. As of June 2012, 13 user licenses have been granted to researchers: 4 to UK academic staff, 4 to research students in the UK, and 5 to academic staff and research students in Sweden, Japan, Australia and the Netherlands working on corpora in their own countries.
Summary of economic impacts
The British Sign Language (BSL) Corpus Project has established a permanent and representative record of BSL via a digital video dataset that been partly annotated using ELAN software, has received standard metadata descriptions, and has been made accessible online by researchers worldwide, serving as a model system for openly accessible video archiving. This is the first and most significant step towards a searchable, machine-readable, and fully accessible language corpus which can be used as a reference corpus of BSL data. Studies on sociolinguistic variation and change and lexical frequency have also had scientific impact. Findings linked to pointing signs as part of the phonological variation study on the ‘1’ handshape have led to an application with the Australian Research Council to compare the use of pointing signs in the BSL Corpus and similar corpora in Australia and the Netherlands with non-signers’ pointing gestures, which will be important for understanding linguistic and gestural aspects of pointing. The lexical variation studies form the core component of project-linked student Rose Stamp’s PhD dissertation (due for completion in 2012). This also includes the first-ever dialect accommodation study in a sign language, a follow-on project from the BSL Corpus. The lexical frequency study led to the creation of a lexical database for BSL, which under the UCL Deafness, Cognition & Language Research Centre (DCAL, 2011-2015), will become BSL SignBank, an online dictionary of BSL and the first corpus-based dictionary for any sign language. The experience of PI Cormier and project researcher Fenlon with the BSL Corpus greatly contributed to their status as UK representatives in the European COST Action network SignGram: “Unravelling the grammars of European sign languages” (2011-2014). Several SignGram members are leaders of sign language corpus projects in other EU countries which will enhance the future prospects of these and other sign language corpora.
Findings and Outputs
The primary scientific output of the BSL Corpus Project is the open-access online video dataset itself. Several outputs (including conference proceedings and chapters in edited volumes) have described linguistic, ethical and technological issues involved in creating and annotating a sign language corpus. The phonological variation study examined variation in BSL signs produced with the 1 handshape (i.e., a typical pointing handshape with extended index finger). Analysis of 2110 signs indicates that variation in the 1 handshape is conditioned by linguistic factors including phonological environment, grammatical category (particularly pointing signs), and lexical frequency. A paper on this study has been submitted to a high-impact interdisciplinary journal and is under review as of June 2012. With the study on lexical variation and change in numbers, colours, and placename signs, we have found that a) younger signers, b) signers who attended school outside of their region, and c) signers with non-signing parents were all less likely to use traditional signs (specific to the region they lived in), compared to a) older signers, b) signers who attended school in their region and c) signers with signing parents. A paper on variation in number signs has been submitted to a high-impact sociolinguistics journal and is under review as of June 2012. The study on lexical frequency (Cormier et al. 2011, LDLT3 proceedings) includes 24,920 signs from the conversation data. Results indicate that 60% of the data consists of signs from the core lexicon, with high frequency items including more content words than is typical of spoken languages. This provides the first lexical frequency data for a sign language based on a large conversational dataset. The lexical database that was created as part of the lexical frequency study and its development into BSL SignBank under DCAL (2011-2015) is described in Cormier et al. (2012, LREC proceedings).
How these impacts were achieved
These impacts were achieved via invited presentations and refereed conference presentations at a wide range of academic institutions in the UK and Europe, North America, Australia, and Asia. Conference and workshop topics covered areas not only specific to sign language research but also language resources and evaluation, sustainable digital data in the humanities, language variation and change, dialectology, linguistics, language documentation, and gesture studies. In addition, news about the BSL Corpus becoming available online in mid-2011 was also posted as a news item on the university websites for UCL and La Trobe University (where former project PI Schembri is now based).
Who these findings impact
The BSL Corpus is one of very few large sign language corpus projects (along with projects in Australia, The Netherlands and Germany) and only the second to have video data available online (after The Netherlands).The impacts from the BSL Corpus Project have been experienced by academics in the UK and worldwide. The BSL Corpus team was involved in the Sign Language Corpora Network (2009-2010) and maintains strong links with researchers working on corpus projects (partly through regular conference and workshop series such as Language Resources and Evaluation (LREC) and partly through the SignGram COST Action network). The BSLCP-hosted 2009 workshop “Sign Language Corpora: Linguistic Issues” had international attendance (>100 academics), and many were researchers starting or planning sign language documentation in their own countries. Since then, the BSL Corpus team have worked and consulted with such researchers, including those from USA, Italy, India, Japan, Spain and central Australia (the latter for documentation of ‘alternate’ sign languages used by indigenous Arandic speakers). The BSL Corpus team is clearly seen as one of the leading teams on sign language documentation and corpora in the world. In addition to connections with academics via networks and collaborations, we also have granted user licences to academics as registered researchers so that they may access the restricted (conversation and interview) data. As of June 2012, 13 user licenses have been granted to researchers: 4 to UK academic staff, 4 to research students in the UK, and 5 to academic staff and research students in Sweden, Japan, Australia and the Netherlands working on corpora in their own countries.
Potential future impacts
Once the BSL Corpus video dataset is annotated and translated so that it is at machine-readable and searchable, it will serve as an invaluable resource in sign language classrooms across the country. Access to more information about the structure and use of BSL will lead to improved BSL teaching resources. This will in turn enable us to create more reliable and valid assessment instruments to evaluate the signing of deaf children and adults, and for evaluating the progress of students learning BSL or in interpreter trainee programs. Academically, once the online BSL Corpus archive is updated with annotations and translations of the BSL Corpus data, these resources will have immediate practical application as a valuable reference that can be consulted by researchers in sign language linguistics, as well as others in spoken language linguistics, gesture studies, psychology, neuroscience, and related fields, in the UK and abroad. Work on automated analyses of sign language (e.g. signing avatars, sign language recognition, automatic translation) will greatly benefit from having a large, representative, annotated corpus as it will provide new tools with which to conduct their research and verify existing work. This will help computational sign linguists move beyond the state of the art, and will also allow this technology to become commercially viable. The greater understanding of BSL and improved resources for BSL teaching, learning and research will provide an evidence-base for policy-makers in supporting appropriate education, training and services for deaf children and adults. This will help close the gap in education, employment, and health between deaf people throughout their lifespan and their hearing peers. Deaf people who can become more highly qualified and trained will be in a better position to contribute to society in different ways, and will be able to achieve greater recognition and access in the wider community.
Unexpected impacts
It was unexpected that the lexical database created under the BSL Corpus Project would form the basis for BSL SignBank, the first-ever usage-based dictionary of BSL. The initial plan for the lexical frequency study included creation of a lexical database which is required for lexical-level annotation to determine lexical frequency. Existing BSL dictionaries were not suitable because they had not been systematically lemmatised (i.e. with phonological and morphological variants grouped together) to allow for lexical-level annotation. The project team began by taking advantage of the fact that a lemmatised lexical database for Australian Sign Language (Auslan, a sign language variety closely related to BSL which shares much of the same lexicon) already existed, created by Trevor Johnston. In collaboration with Johnston, the Auslan lexical database was adapted for use as a BSL lexical database. Annotation of BSL signs for the lexical frequency study took advantage of shared vocabulary between BSL and Auslan, and new entries for BSL were added to the database as needed. Because Johnston’s lexical database also exists concurrently as an online dictionary (Auslan SignBank), it became clear that the BSL lexical database could eventually (similarly) become BSL SignBank. The creation of BSL SignBank then became part of the workplan of DCAL during its second phase of funding (2011-2015). The scientific and societal impact of BSL SignBank will be similar to the impact of the BSL Corpus more generally. But the timeframe of the impacts of BSL SignBank (and thus of the BSL Corpus) will be faster than was expected of the BSL Corpus itself. Initial launch of SignBank with around 2500 lexical signs is planned for 2013. At that time, researchers, students, teachers, and interpreters will benefit from SignBank as both a lexical database of BSL and as a BSL/English dictionary. BSL SignBank will grow and develop as more annotation of the BSL Corpus data is undertaken in future.
Limited scientific impacts
The initial BSL Corpus Project plan included a project-linked PhD studentship on language contact, but we were unable to recruit any suitable candidates. Because this student was meant to undertake annotation of the corpus data as part of his/her studies, this meant that less annotation was actually done than was initially intended. Because only a limited number of annotations of the BSL corpus were undertaken, and none of them so far are suitable for making publicly available, the BSL Corpus video dataset is therefore not as accessible to researchers as it could be – in particular, it is not yet searchable. Until we are able to undertake substantially more annotation and translation and to make these available publicly, the BSL Corpus will not be of use to individuals who do not know BSL themselves. Furthermore, because the BSL Corpus is not yet machine-readable, it is not of use by computer scientists. Researchers from the UK (Aberdeen and Sheffield) and the Czech Republic have contacted us seeking such annotated corpora, so we know the demand is there. We have explained that transforming the online video dataset into a language corpus that is searchable and machine-readable is our highest priority. Similarly, in order for psycholinguists to study language processing data, lexical frequency data is needed on a large sample. Currently, subjective norms of familiarity (based on signers’ subjective familiarity judgements of BSL signs) must be used instead of direct measures of lexical frequency, simply because not enough data about lexical frequency in BSL exists. The lexical frequency study conducted under the BSL Corpus Project (based on 25,000 sign tokens) has not provided enough data for use in psycholinguistic studies. However, when considerably more annotation of the BSL Corpus data is undertaken in the future, it will be possible for direct frequency measures to be used for the first time in psycholinguistic studies on BSL.
Limited economic impacts
The BSL Corpus is already of interest to BSL teachers, students learning BSL, BSL/English interpreters, and teachers of deaf children – we have had direct contact from individuals from each of these groups who have been interested in accessing/using the BSL Corpus data. However, the uptake has not been as great as it might be primarily because the Corpus data are not yet searchable (as noted in other sections, this has also limited the scientific impact to date). Additionally - while the narrative and lexical elicitation data are openly accessible to anyone, the conversation and interview data are restricted to researchers only. This could be changed if we get permission from all participants to show their conversation and interview data online. We think that this should not be problematic for the bulk of the data and in future we will seek funds to attempt to make all the data open access which – in addition to annotations and translations - would greatly enhance the impact of the corpus data on teachers, students and other professionals working with the Deaf community.
Harvard
Cormier, Kearsy et al. The British Sign Language (BSL) Corpus Project: Sociolinguistic variation, language change, language contact and lexical frequency in BSL: ESRC Impact Report, RES-062-23-0825. Swindon: ESRC
Vancouver
Cormier Kearsy et al. The British Sign Language (BSL) Corpus Project: Sociolinguistic variation, language change, language contact and lexical frequency in BSL: ESRC Impact Report, RES-062-23-0825. Swindon: ESRC.