JSL Colloquial Corpus
Data Collection, Translation and Annotation Conventions ( Version 1.1 )

Table of Contents

Digital Content and Media Sciences Research Division, National Institute of Informarics, Japan 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430

Contact : jsl-corpus-mem@nii.ac.jp

1 Acknowledgements

Japan Society for the Promotion of Science (JSPS), M. Bono (PI), Y. Osugi, K. Kikuchi, Y. Horiuchi, and N. Harada, A Colloquial Corpus of Japanese Sign Language: The Growth of Linguistic Awareness between Deaf and Hearing, JSPS, Grants-in-Aid for Scientific Research, Category B, 2011-2014; Y. Osugi (PI), M. Bono, W. Takei, R. Kikusawa, Description, Documentation, and Analysis of Lexicon in Japanese Sign Language from Deaf Perspectives, JSPS, Grants-in-Aid for Scientific Research, Category B, 2013-2016; M. Bono (PI), Proposal for Linguistic Description System in Signed Interaction Analysis, JSPS, Grants-in-Aid for Young Scientist, Category A, 2014-2017.

JSL Colloquial Corpus Project Participants and Collaborators from the deaf community in Gunma, Nara, Nagasaki, and Fukuoka for being involved in the JSL Colloquial Corpus Project.

Japanese Federation of the Deaf.

The National Research Association for Sign Language Interpretation (Zen-tsu-ken).

Prof. Adam Schembri, La Trobe University.

Prof. Robert Adam, University College London (UCL).

Prof. Kearsy Cormier, University College London (UCL).

BSL Corpus Project.

2 Introducition

This document describes the data collection, translation and annotation conventions used to build and share the JSP Colloquial Corpus. In addition, it presents our linguistic motivations toward sign language corpora by explaining several work procedures. We began building a corpus of Japanese Sign Language (JSL) in April 2011 with the support of the Japan Society pppfor the Promotion of Science. This is the first JSL corpus developed under the purpose of academic and public use.

In 2011, we invited the principal investigator (Prof. Adam Schembri) of the BSL corpus project at that time to Japan to help us create such a corpus. Our initial steps in building a JSL corpus were based on advice from him and his colleagues.

3 Data collection

This chapter consists of two sections:1) Area, participants and tasks, and 2) Procedures of dialogue tasks.

3.1 Area, participants and tasks

figure1.png

Figure 1: Regions and prefectures from which data were collected

The first stage of this project was funded as Category B, 1,911,000JPY (10,462GBP) (PI: M. Bono) from 2011-2014 by the Japan Society for the Promotion of Science (JSPS). From May to July 2012, we videoed 40 deaf subjects in two prefectures, Gunma and Nara (colored in yellow in Fig.1), each of the prefectures has one school for the deaf. We obtained data from an age-balanced sample of individuals aged 30–70 in each prefecture, and each age group was divided into same-sex pairs. Our participants from Gunma and Nara, were in their 30s, 40s, 50s, 60s and 70s, both male pairs and female pairs. We used three methods to collect data: interviews, in which field workers, assistants of native signers living in the same area, and who knew the procedures in advance, asked participants about their language life, environment and so on (for introductory purposes only, not open access); dialogues about animation, in this procedure, one participant memorized the story of "Canary Row," and explained it to other participants; and lexical elicitation, in which participants showed correspondent signs for 100 slides of pictures and words shown on a monitor .

3.2 Procedures for dialogue tasks

This section consists of six parts: 1) Devices and settings, 2)Synchronizing and cropping of multiple video clips, 3) Prefecture ID, 4) Participants’ ID, 5) Session ID, and 6) File names.

3.2.1 Devices and settings

We used three high-definition cameras, four lighting devices, blue panels, and blue chairs for the recordings (Fig.2 & Fig.3). During the dialogue task, camera A showed the two participants from the knees up; camera B focused on the participant on the left, also showing the back of the other participant; and camera C focused on the participant on the right, also showing the back of the other participant. The camera angles and spatial configuration were designed to enable spatial reproducibility in the service of annotating gaze direction and pointing during the dialogues. We added a sound cue made by a clapperboard to each file for timing synchronization later.

figure2.png

Figure 2: Camera and lighting setup

3.2.2 Synchronizing and cropping of multiple video clips

figure3.png

Figure 3: Two versions of the three camera angles used for data collection: Original (upper), Cropped (lower)

Three independent files are synchronized using Final Cut Pro. The original combined-angles image includes the interlocutor’s back recorded by cameras B and C; also there is dead space—showing as black areas. The cropped combined-angles image does not include the interlocutor’s back and there is no dead space. Video images from all camera angles were enlarged to make then easy to see for detail analysis. All clips were made available on our website.

Table 1: PrefectureID
Region Kanji Romanized Alphabet Abbr. Localreregional Alphabet Abbr.
1 Hokkaido 北海道 Hokkaido HK
2 Tohoku 青森 Aomori AM
3 岩手 Iwate IT
4 宮城 Miyagi MG
5 秋田 Akita AT
6 山形 Yamagata YG
7 福島 Fukushima FS
8 Kanto 茨城 Ibaraki IK
9 栃木 Tochigi TG
10 群馬 Gunma GM
11 埼玉 Saitama ST
12 千葉 Chiba CB
13 東京 Tokyo TO
14 神奈川 Kanagawa KN
15 Chubu 新潟 Niiagta NG
16 富山 Toyama TY Takaoka-shi
Toyama-shi
17 石川 Ishikawa IS
18 福井 Fukui FI
19 山梨 Yamanashi YN
20 長野 Nagano NN
21 岐阜 Gifu GF
22 静岡 Shizuoka SO
23 愛知 Aichi AC
24 Kinki 三重 Mie ME
25 滋賀 Shiga SI
26 京都 Kyoto KT
27 大阪 Oosaka OS
28 兵庫 Hyogo HG
29 奈良 Nara NR
30 和歌山 Wakayama WK
31 Chugoku 鳥取 Tottori TT
32 島根 Shimane SN
33 岡山 Okayama OY
34 広島 Hiroshima HS
35 山口 Yamaguchi YC
36 Shikoku 徳島 Tokushima TK
37 香川 Kagawa KW
38 愛媛 Ehime EH
39 高知 Kouchi KC
40 Kyushu 福岡 Fukuoka FO Fukuoka-shi FK
Kitakyusyu-shi KT
Nogata-shi NG
Kurume-shi KR
41 佐賀 Saga SG
42 長崎 Nagasaki NS North NH
South SH
43 熊本 Kumamoto KM
44 大分 Ooita OI
45 宮崎 Miyazaki MZ
46 鹿児島 Kagoshima KS
47 沖縄 Okinawa ON

3.2.3 Prefecture ID

There are 47 prefectures in Japan, and we assigned an ID to each prefecture using an abbreviated form by referring to the BSL corpus project.

We have one-site-filming to Nara prefecture and Gunma prefecture in first stage of this project and several-site-filming to Nagasaki prefecture and Fukuoka prefecture in second stage of this project. The difference between one-site-filming and several-site-filming is whether we divide the prefecture into two or more. The case in which the prefecture has one Deaf school, we had one-site-filming. On the other hand, the case in which the prefecture has two or more Deaf schools, we divided into the regions aligning with each Deaf school area.

3.2.4 Participant ID

In the animation task, the narrator who had watched "Canary Row" sat on the right (as we viewed the stage). The recipient who had not seen it sat on the left (as we viewed the stage). We placed numbers on the back of each chair to identify each participant for the purposes of data analysis. We set the abbreviation for the prefecture as shown in Table 2 . The camera operator put the prefecture ID and the number of each participant on the back in order of appearance, e.g. GM01, narrator of the first pair in the animation task; GM02, recipient of the first pair in the animation task in Gunma Prefecture.

figure324.png

Figure 4: Participant ID tag

(Participant IDs were temporally assigned in filming by camera clues. They were modified after filming (ex. G-01 to GM0150.)

3.2.5 File names

Video files, Word files and ELAN eaf files were named as per the following example. In ***, we put the following task names: Ani (animation task), Cur (Curry recipe task), Pro (Proud of your country task), ReS (Regional Signing), Lex (lexical elicitation), and Int (Interview).

figure325.png

Figure 5: File name

Gunma prefecture_01 (right)_02 (left)_Animation task

3.2.6 Tier names

figure326-one-site.png

Figure 6: Tier name(one-site filming)

figure326-several-site.png

Figure 7: Tier name(seberal-site)

Table 2: List of file name and tier name
Prefecture File name Tier name
Right Left
Gunma 1 GM_01-02 GM_01_50F GM_02_50F
2 GM_03-04 GM_03_70M GM_04_70M
3 GM_05-06 GM_05_30M GM_06_30M
4 GM_07-08 GM_07_30F GM_08_30F
5 GM_09-10 GM_09_60F GM_10_60F
6 GM_11-12 GM_11_70F GM_12_70F
7 GM_13-14 GM_13_40F GM_14_40F
8 GM_15-16 GM_15_40M GM_16_40M
9 GM_17-18 GM_17_50M GM_18_50M
10 GM_19-20 GM_19_60M GM_20_60M
Prefecture File name Tier name
Right Left
Nara 1 NR_01-02 NR_01_70M NR_02_70M
2 NR_03-04 NR_03_50F NR_04_50F
3 NR_05-06 NR_05_60M NR_06_60M
4 NR_07-08 NR_07_70F NR_08_70F
5 NR_09-10 NR_09_60M NR_10_60M
6 NR_11-12 NR_11_40M NR_12_40M
7 NR_13-14 NR_13_40F NR_14_40F
8 NR_15-16 NR_15_30M NR_16_30M
9 NR_17-18 NR_17_50M NR_18_50M
10 NR_19-20 NR_19_40M NR_20_40M
Prefecture File name Tier name
Right Left
Nagasaki 1 NS_01-02 NS_01_SH_70F NS_02_SH_70F
2 NS_03-04 NS_03_SH_30F NS_04_SH_50F
3 NS_05-06 NS_05_NH_70F NS_06_SH_70F
4 NS_07-08 NS_07_SH_50M NS_08_NH_30M
5 NS_09-10 NS_09_SH_60M NS_10_SH_70M
6 NS_11-12 NS_11_SH_40F NS_12_SH_40F
7 NS_13-14 NS_13_NH_70M NS_14_NH_70M
8 NS_15-16 NS_15_SH_40M NS_16_SH_40M
Prefecture File name Tier name
Right Left
Fukuoka 1 FO_01-02 FO_01_KT_70F FO_02_KT_60F
2 FO_03-04 FO_03_NG_50F FO_04_NG_40F
3 FO_05-06 FO_05_FK_80F FO_06_FK_70F
4 FO_07-08 FO_07_FK_40F FO_08_FK_50F
5 FO_09-10 FO_09_KR_50M FO_10_KR_50M
6 FO_11-12 FO_11_KT_70M FO_12_KT_70M
7 FO_13-14 FO_13_NG_70M FO_14_NG_70M
8 FO_15-16 FO_15_KR_70M FO_16_KR_70M
Prefecture File name Tier name
Right Left
Ishikawa 1 IS_01-02 IS_01_70M IS_02_70M
2 IS_03-04 IS_03_20M IS_04_20M
3 IS_05-06 IS_05_50M IS_06_50M
4 IS_07-08 IS_07_80M IS_08_80M
5 IS_09-10 IS_09_60M IS_10_60M
6 IS_11-12 IS_11_30M IS_12_30M
7 IS_13-14 IS_13_40M IS_14_40M

4 Translation and Annotation Conventions

This chapter consists of four sections: 1) practical steps, 2) representation of time, 3) translation conventions, 4) annotation conventions.

4.1 Practical steps

Basically, we perform two steps when making annotations: 1) translation into text: sign language interpreters translate sign language into written Japanese. They create Gloss, Word-order-translations, and Idiomatic translations in Microsoft Word. They then ask native signers who live in different regions of Japan to check the translations using their native sense. 2) Referring to the text in Word created in step 1, native signers annotate the features of hand movements for each unit of Gloss and units of utterances in ELAN to observe the temporal relationships between or within them.

figure41.png

Figure 8: Practical steps in translation and annotation

After making the translation and annotations, we modify each file in a circulatory way using the findings noted in the working process of each, as shown in Fig.8. We pay close attention to making the information in these files exactly the same.

4.2 Representation of time

figure42-schematic-of-translation.png

Figure 9: Schematic of translation in Microsoft Word

figure42-schematic-image.png

Figure 10: Schematic image of annotation in ELAN

When translated into Word (Fig. 9), each line has an independent time axis. If line 01 and line 02 were to overlap, it would be difficult to represent their temporal relationships. Conversely, because annotations in ELAN (Fig. 10) have one time axis, it is easy to represent a micro-pause or show that two lines are overlapped. However, the annotation scheme of ELAN, due to the need to scroll to see the next and following lines, can make it difficult to find a sequential relationship, e.g. question and answer in speech act theory, conversation analysis (CA), and the discourse structure of narratives. We are trying to not only collect data but also build a hybrid notation system like a 'transcript' in CA (Jefferson, 1986) in this project.

4.3 Translation Conventions

4.3.1 Gloss

Translation at the Gloss level is a common method of learning sign language for beginners and training in interpretation skills at an advanced level.

EXAMPLE
(Gloss-ENG)	NOW/PT/COMIC1(M:ma-n-ga)/THEATER-PLAY (M:a-ni-me)/PT/=
			=FS: A(M:a)NI(M:ni)ME(M:me)/SIGN-LANGUAGE/WHAT/PT:G02/

We explain how each symbol in the translation conventions is aligned, using the example above.

NOW : Gloss (word), which is usually in a lexical databases

/ : a Gloss boundary

PT : Pointing by hands. Here, we do not specify which hand points. When the direction and reference of pointing are ambiguous, we do not label anything after PT.

COMIC1 : When there are several representations of this meaning, we add a number to distinguish one from others.

THEATER-PLAY : In a case where signing consists of one movement or one Gloss, however, in spoken language, we need two words to represent the same meaning.

M:a-ni-me : Mouthing of a-ni-me (abbreviation of 'animation') represented in Mora rhythms, it is three Mora combinations, a+ni+me. ‘COMIC1 (M:ma-n-ga)’ means the signer moves her/his lips ma-n-ga while signing COMIC1. In cases where there are no parentheses around M: a-ni-me, this means the mouth movement has a syntagmatic relation with other Glosses.

: Sign production continues across line break.

FS: A-NI-ME: Finger-spelling of A-NI-ME, represented in Mora rhythms as the same as mouthing.

PT:G02 : Signer points at interlocutor in front of her or him by hand. Usually this is PT2 in previous studies of the grammatical aspect of signing. We try to avoid putting the function of indexicality in the translation level.

HS:KI : The signer uses a specific hand shape when producing a sign. There are cases where two phonetically distinct signs are represented with the same gloss and this description is used order to specify which sign is being employed.

Here we explain other symbols, which have already been set in working process

MG : pa: Mouth gesture that has grammatical function (e.g., tense, pronoun etc.)

CLIMBING-UP (CL: drainpipe): The Gloss includes representation of classifier (CL). In this case, the CL representation is classified by the verb type, climbing up, with inside drainpipe.

CL : Although it is impossible for the annotators to analyze what kind of CL it is, there is some king of CL here.

? : impossible to read.

?: CAT : Although it is impossible for the annotators to read clearly, they have some candidates in mind (in this case “cat”).

D : Although it is impossible to read, there is a part of signing (nearly equal disfluency).

D: CAT : Although it is impossible to read, there is some signing (nearly equal disfluency). The annotators have some candidates (in this case “cat”).

PT : object name: Pointing to a concrete object.

PT: CL : Pointing to CL, which is signing with the other hand

PT: CL-lost : Pointing to the space where CL was previously represented

PT: RU : Pointing in an upper right direction (from the signer’s perspective)

PT: RU (PET BOTTLE) :Pointing in an upper right direction with the meaning of PET bottle in context

PT: RU (people’s name) : Pointing in an upper right direction with the meaning of 3rd person

4.3.2 Word-order translation (WOT)

The word-order translation (WOT) tier serves to maintain the original word order; at this level, the text in translation is very consciously written in a grammatically inaccurate manner. This kind of translation always conveys a strong impression to the audience, illustrating how much signed language differs from spoken language.

EXAMPLE
(WOT-ENG) 	Now, comic, theater play, animation... Animation, how do you sign?

4.3.3 Idiomatic translation (IT)

The idiomatic translation (IT) tier serves as ideal forms of sentences as language, in this case English.

EXAMPLE
(IT-ENG)	I just watched a cartoon, uhm, how do you sign "cartoon"?

4.3.4 Multilingual environment, line numbers and participant ID

English and Japanese translations were prepared for establishing a multilingual environment of language research and communication studies in sign languages. All lines in both languages coincide with each other.

EXAMPLE: 
01>
GM01: 
(Gloss-JPN)	今/PT/まんが1 (M:ma-n-ga)/劇 (M:a-ni-me)/PT/=
			=FS:A-(M:a)NI-(M:ni)ME-(M:me)/手話/何/PT:G02/

(Gloss-ENG)	NOW/PT/COMIC1(M:ma-n-ga)/THEATER-PLAY (M:a-ni-me)/PT/=
			=FS: A(M:a)NI(M:ni)ME(M:me)/SIGN-LANGUAGE/WHAT/PT:G02/ 

(WOT-JPN)	今 まんが アニメを…アニメって手話は何 あなたは?

(WOT-ENG) 	Now, comic, theater play, animation... Animation, how do you sign?

(IT-JPN)	今、アニメを見たんだけど、ねえ、アニメって手話はどうやる?

(IT-ENG)	I just watched a cartoon, uhm, how do you sign "cartoon"?

02>

GM02:

(Gloss-JPN)	まんが1(M:a-ni-me)/まんが2(M:cont.)/まんが1(M:a-ni-me)/=

			=まんが1(M:cont.)

(Gloss-ENG)	COMIC1(M:a-ni-me)/COMIC2 (M:cont.)/COMIC1 (M:a-ni-me)/= 

			=COMIC1(M:cont.)/

(WOT-JPN)	まんが まんが まんが

(WOT-ENG)	Comic, comic, comic

(IT-JPN)	うーん、こうかな?

(IT-ENG)	Uhm, like this (I guess)?

03>

GM01: 

(Gloss-JPN)	まんが1+まんが2 (M:a-ni-me)/(.)/まんが2/まんが1(M:a-ni-me)/=

			=PT/見た/PT/見た/PT/

(Gloss-ENG)	COMIC1+COMIC2(M:a-ni-me) /(.)/COMIC2/COMIC1 (M:a-ni-me)/=

			=PT/WATCHED/PT /WATCHED/PT/

(WOT-JPN)	まんが まんが まんが まんが…まんがを見た 見た。

(WOT-ENG)	Comic, comic, comic, comic…, (I) watched a comic, watched.

(IT-JPN)	こうか…、で、アニメを見たの。

(IT-ENG)	Okay, like this. So, I just watched a cartoon.

4.4 Annotation conventions

Applying gesture phases to signing movements

One of our original points was to establish a physical and hand movement unit smaller than Gloss, called a Movement Unit. We applied the concept of the gesture unit (GU) proposed by Kendon (1972, 1980, 2004) to annotate the beginning and end points of signed turns. The GU is the interval between successive rests of the limbs, rest positions, or home positions. A GU consists of one or several gesture phrases. A gesture phrase is what we intuitively call a “gesture,” and it, in turn, consists of up to five phases: preparation (optional), stroke (obligatory in the sense that a gesture is not said to occur in the absence of a stroke), retraction (optional), and pre- and post-stroke hold phases (optional). When analyzing overlapping communications in conversations, it is important to note the timing of the expressions of both the signer and recipient. In signed conversations, articulation involves hand signs that appear in front of the participants; this process of articulation is comparable to the visible lip movements made by those involved in spoken conversations. Using this methodology, we can observe how participants engage in an articulation phase in which signers move their hands to the signing space from the home position as a signal for the start of turn-taking in interactions.

4.4.1 Setting tiers in ELAN

We prepared 17 tiers per participant in ELAN. In the case of a dialogue task, we have 34 tiers in total (Fig. 11).

figure441-TiersinELAN.png

Figure 11: Tiers in ELAN

4.4.2 Movement tiers

Hand signing is divided into left and right hand per signer (e.g. RH and LH). These tiers have several labels for annotating each movement in signing, including preparation (prep), pre-stroke hold (pre-s-h), stroke (str), post-stroke hold (post-s-h), retraction (ret), and more. We consider the preparatory and stroking movement meaningful figures that constitute a signed token.

prep : Preparation phase of signing. Signers raise their hands from the home or rest position to the signing space.

pre-s-h : Pre-stroke hold. The phase in which the hand shape and the hand position are sustained before the next stroke phase.

str : Stroke. The phase in which the core part of a sign is presented, with the hand changing shape and moving within the signing space.

post-s-h : Post-stroke hold. The phase in which the hand shape and the hand position are sustained after the previous stroke phase.

ret : Retraction. The phase in which the hands are returned to the home position or rest.

hold : An independent holding phase.

4.4.3 Gloss tiers

The Gloss, called Word_for_Word in ELAN (Fig. 12), basically consists of at least one pair of prep and stroke. This is a signed token including non-lexical element, e.g. disfluency, truncated. The reason there isn't a well-structured lexical database for JSL is that we don’t prepare the linkage of a lexical database like the BSL corpus (Johnston 2008).

We assign one tier of Gloss per signer, that is, we don’t separate each hand like the BSL corpus. If a case where each hand has a different meaning at the same time, we note as: ‘R:/PT+L:/3,’ which means right hand represents PT (pointing) and left hand represents number 3.

EXAMPLE

\begin{eqnarray*} L:/3 + R:/ PT:L-3 \end{eqnarray*}

figure443-Listbuoy.png

Figure 12: List buoy

The start point of Gloss is coincident with the beginning of prep of the dominant or meaningful hand at that time, and the end point of Gloss is coincident with the ending of str.

We use three written systems: Japanese in hiragana and kanji, Romanized Japanese, and English. We assign the role of parent to Japanese in hiragana and kanji, the other two systems are assigned the role of child to change the range of Gloss in one action in ELAN.

4.4.4 Utterance tiers

Applying the turn constructional unit to signed interactions

One purpose of this study was to apply the concepts of CA (e.g., turn-taking systems (Sacks et al., 1974), repair sequences (Schegloff et al., 1977), etc.) to signed dialogues and signed conversations. CA is the study of naturally occurring speech in social interactions. Sacks, Schegloff, and Jefferson (1974; SSJ) proposed several concepts related to turn-taking systems to analyze spoken conversational data.

We argue that these theoretical and methodological frameworks can be applied to the analysis of signed conversations. SSJ proposed the concept of a turn construction unit (TCU), which is a fundamental unit that differs from a sentence. SSJ assumed that the participants in a conversation are able to anticipate whether the ongoing TCU will be closed by the current speaker. One TCU sometimes has several possible completion points; phrasal boundaries, intonation units, and so on at the end of some TCUs, considered transition-relevance places (TRPs).

An utterance consists of one or several Glosses. The annotators who are native signers who segment using their native sense. This is close to utterance and GU as mentioned above. TUC is represented by seven tiers, including three notations: WordOrderTranslation, Grammatical Gloss and IdiomaticTranslation on ELAN (Figure 3). Translations have multilingual environments as well as Gloss.

4.4.5 NMA tiers

We prepared four tiers for non-manual actions: gaze, mouth, NMA (nodding), NMA (others). These modalities sometimes are used at the same time, meaning they are not in an exclusive relationships.


Date: 2015/10/20

Author: Mayumi Bono, Yutaka Osugi, Kouhei Kikuchi, Tomohiro Okada, Paul Cibulka and Mami Kaneko

Created: 2015-11-06 Fri 16:52

Validate