JSL Colloquial Corpus
Data Collection, Translation and Annotation Conventions ( Version 1.1 )

1. Acknowledgements
2. Introducition
3. Data collection
- 3.1. Area, participants and tasks
- 3.2. Procedures for dialogue tasks
4. Translation and Annotation Conventions

Digital Content and Media Sciences Research Division, National Institute of Informarics, Japan 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430

Contact : jsl-corpus-mem@nii.ac.jp

1 Acknowledgements

• Japan Society for the Promotion of Science (JSPS), M. Bono (PI), Y. Osugi, K. Kikuchi, Y. Horiuchi, and N. Harada, A Colloquial Corpus of Japanese Sign Language: The Growth of Linguistic Awareness between Deaf and Hearing, JSPS, Grants-in-Aid for Scientific Research, Category B, 2011-2014; Y. Osugi (PI), M. Bono, W. Takei, R. Kikusawa, Description, Documentation, and Analysis of Lexicon in Japanese Sign Language from Deaf Perspectives, JSPS, Grants-in-Aid for Scientific Research, Category B, 2013-2016; M. Bono (PI), Proposal for Linguistic Description System in Signed Interaction Analysis, JSPS, Grants-in-Aid for Young Scientist, Category A, 2014-2017.

• JSL Colloquial Corpus Project Participants and Collaborators from the deaf community in Gunma, Nara, Nagasaki, and Fukuoka for being involved in the JSL Colloquial Corpus Project.

• Japanese Federation of the Deaf.

• The National Research Association for Sign Language Interpretation (Zen-tsu-ken).

• Prof. Adam Schembri, La Trobe University.

• Prof. Robert Adam, University College London (UCL).

• Prof. Kearsy Cormier, University College London (UCL).

• BSL Corpus Project.

2 Introducition

This document describes the data collection, translation and annotation conventions used to build and share the JSP Colloquial Corpus. In addition, it presents our linguistic motivations toward sign language corpora by explaining several work procedures. We began building a corpus of Japanese Sign Language (JSL) in April 2011 with the support of the Japan Society pppfor the Promotion of Science. This is the first JSL corpus developed under the purpose of academic and public use.

In 2011, we invited the principal investigator (Prof. Adam Schembri) of the BSL corpus project at that time to Japan to help us create such a corpus. Our initial steps in building a JSL corpus were based on advice from him and his colleagues.

3 Data collection

This chapter consists of two sections:1) Area, participants and tasks, and 2) Procedures of dialogue tasks．

3.1 Area, participants and tasks

Figure 1: Regions and prefectures from which data were collected

The first stage of this project was funded as Category B, 1,911,000JPY (10,462GBP) (PI: M. Bono) from 2011-2014 by the Japan Society for the Promotion of Science (JSPS). From May to July 2012, we videoed 40 deaf subjects in two prefectures, Gunma and Nara (colored in yellow in Fig.1), each of the prefectures has one school for the deaf. We obtained data from an age-balanced sample of individuals aged 30–70 in each prefecture, and each age group was divided into same-sex pairs. Our participants from Gunma and Nara, were in their 30s, 40s, 50s, 60s and 70s, both male pairs and female pairs. We used three methods to collect data: interviews, in which field workers, assistants of native signers living in the same area, and who knew the procedures in advance, asked participants about their language life, environment and so on (for introductory purposes only, not open access); dialogues about animation, in this procedure, one participant memorized the story of "Canary Row," and explained it to other participants; and lexical elicitation, in which participants showed correspondent signs for 100 slides of pictures and words shown on a monitor .

3.2 Procedures for dialogue tasks

This section consists of six parts: 1) Devices and settings, 2)Synchronizing and cropping of multiple video clips, 3) Prefecture ID, 4) Participants’ ID, 5) Session ID, and 6) File names.

3.2.1 Devices and settings

We used three high-definition cameras, four lighting devices, blue panels, and blue chairs for the recordings (Fig.2 & Fig.3). During the dialogue task, camera A showed the two participants from the knees up; camera B focused on the participant on the left, also showing the back of the other participant; and camera C focused on the participant on the right, also showing the back of the other participant. The camera angles and spatial configuration were designed to enable spatial reproducibility in the service of annotating gaze direction and pointing during the dialogues. We added a sound cue made by a clapperboard to each file for timing synchronization later.

Figure 2: Camera and lighting setup

3.2.2 Synchronizing and cropping of multiple video clips

Figure 3: Two versions of the three camera angles used for data collection: Original (upper), Cropped (lower)

Three independent files are synchronized using Final Cut Pro. The original combined-angles image includes the interlocutor’s back recorded by cameras B and C; also there is dead space—showing as black areas. The cropped combined-angles image does not include the interlocutor’s back and there is no dead space. Video images from all camera angles were enlarged to make then easy to see for detail analysis. All clips were made available on our website.

Table 1: PrefectureID
	Region	Kanji	Romanized Alphabet	Abbr.	Localreregional Alphabet	Abbr.
1	Hokkaido	北海道	Hokkaido	HK
2	Tohoku	青森	Aomori	AM
3		岩手	Iwate	IT
4		宮城	Miyagi	MG
5		秋田	Akita	AT
6		山形	Yamagata	YG
7		福島	Fukushima	FS
8	Kanto	茨城	Ibaraki	IK
9		栃木	Tochigi	TG
10		群馬	Gunma	GM
11		埼玉	Saitama	ST
12		千葉	Chiba	CB
13		東京	Tokyo	TO
14		神奈川	Kanagawa	KN
15	Chubu	新潟	Niiagta	NG
16		富山	Toyama	TY	Takaoka-shi
					Toyama-shi
17		石川	Ishikawa	IS
18		福井	Fukui	FI
19		山梨	Yamanashi	YN
20		長野	Nagano	NN
21		岐阜	Gifu	GF
22		静岡	Shizuoka	SO
23		愛知	Aichi	AC
24	Kinki	三重	Mie	ME
25		滋賀	Shiga	SI
26		京都	Kyoto	KT
27		大阪	Oosaka	OS
28		兵庫	Hyogo	HG
29		奈良	Nara	NR
30		和歌山	Wakayama	WK
31	Chugoku	鳥取	Tottori	TT
32		島根	Shimane	SN
33		岡山	Okayama	OY
34		広島	Hiroshima	HS
35		山口	Yamaguchi	YC
36	Shikoku	徳島	Tokushima	TK
37		香川	Kagawa	KW
38		愛媛	Ehime	EH
39		高知	Kouchi	KC
40	Kyushu	福岡	Fukuoka	FO	Fukuoka-shi	FK
					Kitakyusyu-shi	KT
					Nogata-shi	NG
					Kurume-shi	KR
41		佐賀	Saga	SG
42		長崎	Nagasaki	NS	North	NH
					South	SH
43		熊本	Kumamoto	KM
44		大分	Ooita	OI
45		宮崎	Miyazaki	MZ
46		鹿児島	Kagoshima	KS
47		沖縄	Okinawa	ON

3.2.3 Prefecture ID

There are 47 prefectures in Japan, and we assigned an ID to each prefecture using an abbreviated form by referring to the BSL corpus project.

We have one-site-filming to Nara prefecture and Gunma prefecture in first stage of this project and several-site-filming to Nagasaki prefecture and Fukuoka prefecture in second stage of this project. The difference between one-site-filming and several-site-filming is whether we divide the prefecture into two or more. The case in which the prefecture has one Deaf school, we had one-site-filming. On the other hand, the case in which the prefecture has two or more Deaf schools, we divided into the regions aligning with each Deaf school area.

3.2.4 Participant ID

In the animation task, the narrator who had watched "Canary Row" sat on the right (as we viewed the stage). The recipient who had not seen it sat on the left (as we viewed the stage). We placed numbers on the back of each chair to identify each participant for the purposes of data analysis. We set the abbreviation for the prefecture as shown in Table 2 . The camera operator put the prefecture ID and the number of each participant on the back in order of appearance, e.g. GM01, narrator of the first pair in the animation task; GM02, recipient of the first pair in the animation task in Gunma Prefecture.

Figure 4: Participant ID tag

(Participant IDs were temporally assigned in filming by camera clues. They were modified after filming (ex. G-01 to GM₀₁₅₀.)

3.2.5 File names

Video files, Word files and ELAN eaf files were named as per the following example. In ***, we put the following task names: Ani (animation task), Cur (Curry recipe task), Pro (Proud of your country task), ReS (Regional Signing), Lex (lexical elicitation), and Int (Interview).

Figure 5: File name

Gunma prefecture_01 (right)_02 (left)_Animation task

3.2.6 Tier names

Figure 6: Tier name(one-site filming)

Figure 7: Tier name(seberal-site)

Table 2: List of file name and tier name
Prefecture		File name	Tier	name
			Right	Left
Gunma	1	GM_01-02	GM_01_50F	GM_02_50F
	2	GM_03-04	GM_03_70M	GM_04_70M
	3	GM_05-06	GM_05_30M	GM_06_30M
	4	GM_07-08	GM_07_30F	GM_08_30F
	5	GM_09-10	GM_09_60F	GM_10_60F
	6	GM_11-12	GM_11_70F	GM_12_70F
	7	GM_13-14	GM_13_40F	GM_14_40F
	8	GM_15-16	GM_15_40M	GM_16_40M
	9	GM_17-18	GM_17_50M	GM_18_50M
	10	GM_19-20	GM_19_60M	GM_20_60M

Prefecture		File name	Tier	name
			Right	Left
Nara	1	NR_01-02	NR_01_70M	NR_02_70M
	2	NR_03-04	NR_03_50F	NR_04_50F
	3	NR_05-06	NR_05_60M	NR_06_60M
	4	NR_07-08	NR_07_70F	NR_08_70F
	5	NR_09-10	NR_09_60M	NR_10_60M
	6	NR_11-12	NR_11_40M	NR_12_40M
	7	NR_13-14	NR_13_40F	NR_14_40F
	8	NR_15-16	NR_15_30M	NR_16_30M
	9	NR_17-18	NR_17_50M	NR_18_50M
	10	NR_19-20	NR_19_40M	NR_20_40M

Prefecture		File name	Tier	name
			Right	Left
Nagasaki	1	NS_01-02	NS_01_SH_70F	NS_02_SH_70F
	2	NS_03-04	NS_03_SH_30F	NS_04_SH_50F
	3	NS_05-06	NS_05_NH_70F	NS_06_SH_70F
	4	NS_07-08	NS_07_SH_50M	NS_08_NH_30M
	5	NS_09-10	NS_09_SH_60M	NS_10_SH_70M
	6	NS_11-12	NS_11_SH_40F	NS_12_SH_40F
	7	NS_13-14	NS_13_NH_70M	NS_14_NH_70M
	8	NS_15-16	NS_15_SH_40M	NS_16_SH_40M

Prefecture		File name	Tier	name
			Right	Left
Fukuoka	1	FO_01-02	FO_01_KT_70F	FO_02_KT_60F
	2	FO_03-04	FO_03_NG_50F	FO_04_NG_40F
	3	FO_05-06	FO_05_FK_80F	FO_06_FK_70F
	4	FO_07-08	FO_07_FK_40F	FO_08_FK_50F
	5	FO_09-10	FO_09_KR_50M	FO_10_KR_50M
	6	FO_11-12	FO_11_KT_70M	FO_12_KT_70M
	7	FO_13-14	FO_13_NG_70M	FO_14_NG_70M
	8	FO_15-16	FO_15_KR_70M	FO_16_KR_70M

Prefecture		File name	Tier	name
			Right	Left
Ishikawa	1	IS_01-02	IS_01_70M	IS_02_70M
	2	IS_03-04	IS_03_20M	IS_04_20M
	3	IS_05-06	IS_05_50M	IS_06_50M
	4	IS_07-08	IS_07_80M	IS_08_80M
	5	IS_09-10	IS_09_60M	IS_10_60M
	6	IS_11-12	IS_11_30M	IS_12_30M
	7	IS_13-14	IS_13_40M	IS_14_40M

4 Translation and Annotation Conventions

This chapter consists of four sections: 1) practical steps, 2) representation of time, 3) translation conventions, 4) annotation conventions.

4.1 Practical steps

Basically, we perform two steps when making annotations: 1) translation into text: sign language interpreters translate sign language into written Japanese. They create Gloss, Word-order-translations, and Idiomatic translations in Microsoft Word. They then ask native signers who live in different regions of Japan to check the translations using their native sense. 2) Referring to the text in Word created in step 1, native signers annotate the features of hand movements for each unit of Gloss and units of utterances in ELAN to observe the temporal relationships between or within them.

Figure 8: Practical steps in translation and annotation

After making the translation and annotations, we modify each file in a circulatory way using the findings noted in the working process of each, as shown in Fig.8. We pay close attention to making the information in these files exactly the same.

4.2 Representation of time

Figure 9: Schematic of translation in Microsoft Word

Figure 10: Schematic image of annotation in ELAN

When translated into Word (Fig. 9), each line has an independent time axis. If line 01 and line 02 were to overlap, it would be difficult to represent their temporal relationships. Conversely, because annotations in ELAN (Fig. 10) have one time axis, it is easy to represent a micro-pause or show that two lines are overlapped. However, the annotation scheme of ELAN, due to the need to scroll to see the next and following lines, can make it difficult to find a sequential relationship, e.g. question and answer in speech act theory, conversation analysis (CA), and the discourse structure of narratives. We are trying to not only collect data but also build a hybrid notation system like a 'transcript' in CA (Jefferson, 1986) in this project.

4.3 Translation Conventions

4.3.1 Gloss

Translation at the Gloss level is a common method of learning sign language for beginners and training in interpretation skills at an advanced level.

EXAMPLE
（Gloss-ENG）	NOW/PT/COMIC1(M:ma-n-ga)/THEATER-PLAY (M:a-ni-me)/PT/=
			=FS: A(M:a)NI(M:ni)ME(M:me)/SIGN-LANGUAGE/WHAT/PT:G02/

We explain how each symbol in the translation conventions is aligned, using the example above.

NOW : Gloss (word), which is usually in a lexical databases

/ : a Gloss boundary

PT : Pointing by hands. Here, we do not specify which hand points. When the direction and reference of pointing are ambiguous, we do not label anything after PT.

COMIC1 : When there are several representations of this meaning, we add a number to distinguish one from others.

THEATER-PLAY : In a case where signing consists of one movement or one Gloss, however, in spoken language, we need two words to represent the same meaning.

M:a-ni-me : Mouthing of a-ni-me (abbreviation of 'animation') represented in Mora rhythms, it is three Mora combinations, a+ni+me. ‘COMIC1 (M:ma-n-ga)’ means the signer moves her/his lips ma-n-ga while signing COMIC1. In cases where there are no parentheses around M: a-ni-me, this means the mouth movement has a syntagmatic relation with other Glosses.

＝ : Sign production continues across line break.

FS: A-NI-ME: Finger-spelling of A-NI-ME, represented in Mora rhythms as the same as mouthing.

PT:G02 : Signer points at interlocutor in front of her or him by hand. Usually this is PT2 in previous studies of the grammatical aspect of signing. We try to avoid putting the function of indexicality in the translation level.

HS:KI : The signer uses a specific hand shape when producing a sign. There are cases where two phonetically distinct signs are represented with the same gloss and this description is used order to specify which sign is being employed.

Here we explain other symbols, which have already been set in working process

MG : pa: Mouth gesture that has grammatical function (e.g., tense, pronoun etc.)

CLIMBING-UP (CL: drainpipe): The Gloss includes representation of classifier (CL). In this case, the CL representation is classified by the verb type, climbing up, with inside drainpipe.

CL : Although it is impossible for the annotators to analyze what kind of CL it is, there is some king of CL here.

? : impossible to read.

?: CAT : Although it is impossible for the annotators to read clearly, they have some candidates in　mind (in this case “cat”).

D : Although it is impossible to read, there is a part of signing (nearly equal disfluency).

D: CAT : Although it is impossible to read, there is some signing (nearly equal disfluency). The annotators have some candidates (in this case “cat”).

PT : object name: Pointing to a concrete object.

PT: CL : Pointing to CL, which is signing with the other hand

PT: CL-lost : Pointing to the space where CL was previously represented

PT: RU : Pointing in an upper right direction (from the signer’s perspective)

PT: RU (PET BOTTLE) ：Pointing in an upper right direction with the meaning of PET bottle in context

PT: RU (people’s name) : Pointing in an upper right direction with the meaning of 3rd person

4.3.2 Word-order translation (WOT)

The word-order translation (WOT) tier serves to maintain the original word order; at this level, the text in translation is very consciously written in a grammatically inaccurate manner. This kind of translation always conveys a strong impression to the audience, illustrating how much signed language differs from spoken language.

EXAMPLE
（WOT-ENG） 	Now, comic, theater play, animation... Animation, how do you sign?

4.3.3 Idiomatic translation (IT)

The idiomatic translation (IT) tier serves as ideal forms of sentences as language, in this case English.

EXAMPLE
（IT-ENG）	I just watched a cartoon, uhm, how do you sign "cartoon"?

4.3.4 Multilingual environment, line numbers and participant ID

English and Japanese translations were prepared for establishing a multilingual environment of language research and communication studies in sign languages. All lines in both languages coincide with each other.

EXAMPLE: 
01>
GM01: 
（Gloss-JPN）	今/PT/まんが1 (M:ma-n-ga)/劇 (M:a-ni-me)/PT/=
			=FS:A-(M:a)NI-(M:ni)ME-(M:me)/手話/何/PT:G02/

（Gloss-ENG）	NOW/PT/COMIC1(M:ma-n-ga)/THEATER-PLAY (M:a-ni-me)/PT/=
			=FS: A(M:a)NI(M:ni)ME(M:me)/SIGN-LANGUAGE/WHAT/PT:G02/ 

（WOT-JPN）	今　まんが　アニメを…アニメって手話は何　あなたは？

（WOT-ENG） 	Now, comic, theater play, animation... Animation, how do you sign?

（IT-JPN）	今、アニメを見たんだけど、ねえ、アニメって手話はどうやる？

（IT-ENG）	I just watched a cartoon, uhm, how do you sign "cartoon"?

02>

GM02:

（Gloss-JPN）	まんが１(M:a-ni-me)/まんが２(M:cont.)/まんが１(M:a-ni-me)/=

			=まんが1(M:cont.)

（Gloss-ENG）	COMIC1(M:a-ni-me)/COMIC2 (M:cont.)/COMIC1 (M:a-ni-me)/= 

			=COMIC1(M:cont.)/

（WOT-JPN）	まんが　まんが　まんが

（WOT-ENG）	Comic, comic, comic

（IT-JPN）	うーん、こうかな？

（IT-ENG）	Uhm, like this (I guess)?

03>

GM01: 

（Gloss-JPN）	まんが１+まんが2 (M:a-ni-me)/(.)/まんが2/まんが1(M:a-ni-me)/=

			=PT/見た/PT/見た/PT/

（Gloss-ENG）	COMIC1+COMIC2(M:a-ni-me) /(.)/COMIC2/COMIC1 (M:a-ni-me)/=

			=PT/WATCHED/PT /WATCHED/PT/

（WOT-JPN）	まんが　まんが　まんが　まんが…まんがを見た　見た。

（WOT-ENG）	Comic, comic, comic, comic…, (I) watched a comic, watched.

（IT-JPN）	こうか…、で、アニメを見たの。

（IT-ENG）	Okay, like this. So, I just watched a cartoon.

4.4 Annotation conventions

Applying gesture phases to signing movements

One of our original points was to establish a physical and hand movement unit smaller than Gloss, called a Movement Unit. We applied the concept of the gesture unit (GU) proposed by Kendon (1972, 1980, 2004) to annotate the beginning and end points of signed turns. The GU is the interval between successive rests of the limbs, rest positions, or home positions. A GU consists of one or several gesture phrases. A gesture phrase is what we intuitively call a “gesture,” and it, in turn, consists of up to five phases: preparation (optional), stroke (obligatory in the sense that a gesture is not said to occur in the absence of a stroke), retraction (optional), and pre- and post-stroke hold phases (optional). When analyzing overlapping communications in conversations, it is important to note the timing of the expressions of both the signer and recipient. In signed conversations, articulation involves hand signs that appear in front of the participants; this process of articulation is comparable to the visible lip movements made by those involved in spoken conversations. Using this methodology, we can observe how participants engage in an articulation phase in which signers move their hands to the signing space from the home position as a signal for the start of turn-taking in interactions.

4.4.1 Setting tiers in ELAN

We prepared 17 tiers per participant in ELAN. In the case of a dialogue task, we have 34 tiers in total (Fig. 11).

Figure 11: Tiers in ELAN

4.4.2 Movement tiers

Hand signing is divided into left and right hand per signer (e.g. RH and LH). These tiers have several labels for annotating each movement in signing, including preparation (prep), pre-stroke hold (pre-s-h), stroke (str), post-stroke hold (post-s-h), retraction (ret), and more. We consider the preparatory and stroking movement meaningful figures that constitute a signed token.

prep : Preparation phase of signing. Signers raise their hands from the home or rest position to the signing space.

pre-s-h : Pre-stroke hold. The phase in which the hand shape and the hand position are sustained before the next stroke phase.

str : Stroke. The phase in which the core part of a sign is presented, with the hand changing shape and moving within the signing space.

post-s-h : Post-stroke hold. The phase in which the hand shape and the hand position are sustained after the previous stroke phase.

ret : Retraction. The phase in which the hands are returned to the home position or rest.

hold : An independent holding phase.

4.4.3 Gloss tiers

The Gloss, called Word_for_Word in ELAN (Fig. 12), basically consists of at least one pair of prep and stroke. This is a signed token including non-lexical element, e.g. disfluency, truncated. The reason there isn't a well-structured lexical database for JSL is that we don’t prepare the linkage of a lexical database like the BSL corpus (Johnston 2008).

We assign one tier of Gloss per signer, that is, we don’t separate each hand like the BSL corpus. If a case where each hand has a different meaning at the same time, we note as: ‘R:/PT+L:/3,’ which means right hand represents PT (pointing) and left hand represents number 3.

EXAMPLE

\begin{eqnarray*} L:/3 + R:/ PT:L-3 \end{eqnarray*}

Figure 12: List buoy

The start point of Gloss is coincident with the beginning of prep of the dominant or meaningful hand at that time, and the end point of Gloss is coincident with the ending of str.

We use three written systems: Japanese in hiragana and kanji, Romanized Japanese, and English. We assign the role of parent to Japanese in hiragana and kanji, the other two systems are assigned the role of child to change the range of Gloss in one action in ELAN.

4.4.4 Utterance tiers

Applying the turn constructional unit to signed interactions

One purpose of this study was to apply the concepts of CA (e.g., turn-taking systems (Sacks et al., 1974), repair sequences (Schegloff et al., 1977), etc.) to signed dialogues and signed conversations. CA is the study of naturally occurring speech in social interactions. Sacks, Schegloff, and Jefferson (1974; SSJ) proposed several concepts related to turn-taking systems to analyze spoken conversational data.

We argue that these theoretical and methodological frameworks can be applied to the analysis of signed conversations. SSJ proposed the concept of a turn construction unit (TCU), which is a fundamental unit that differs from a sentence. SSJ assumed that the participants in a conversation are able to anticipate whether the ongoing TCU will be closed by the current speaker. One TCU sometimes has several possible completion points; phrasal boundaries, intonation units, and so on at the end of some TCUs, considered transition-relevance places (TRPs).

An utterance consists of one or several Glosses. The annotators who are native signers who segment using their native sense. This is close to utterance and GU as mentioned above. TUC is represented by seven tiers, including three notations: Word_Order_Translation, Grammatical Gloss and Idiomatic_Translation on ELAN (Figure 3). Translations have multilingual environments as well as Gloss.

4.4.5 NMA tiers

We prepared four tiers for non-manual actions: gaze, mouth, NMA (nodding), NMA (others). These modalities sometimes are used at the same time, meaning they are not in an exclusive relationships.

JSL Colloquial Corpus Data Collection, Translation and Annotation Conventions ( Version 1.1 )

Table of Contents