System Description Form

Japanese English Information Retrieval Task

[JAPANESE] [NTCIR Home] [Workshop Home] [Task Description] [Data Sample]
(last modified on June 26, 2000)



Note:

1. The participating group which participates in the Japanese-
   English IR task is asked to complete this form and submit it
   by 18th September 2000.

   If you have a problem on submitting the result by using this 
   CGI-form, please download the text version of this system 
   description form. After filling up the form, please send the 
   file to ftp://falcon.rd.nacsis.ac.jp (136.187.19.31). 
   We also would like to send the file to ntc-admin 
   by e-mail.

2. When your group will submit more than one search result file for
   Japanese-English IR task, please submit a system description form
   for each run.

3. If the participation type of your group is "A. Full", please
   complete every items in this form. When the appropriate data for
   a specific item in this form is not available for you, please
   specify the "data is not available".

4. If the participation type of your group is "B. Results only",   
   you do not have to describe details of the system regarding the
   items that may cause problems because of trade secret or patent
   or so. Please complete the items that are not to problematic for
   you or your project. 


(a)Group's ID
(b)Run ID
(c)Task
J-J J-E J-JE E-E E-J E-JE

(d)Search result filename, which was transfered to the ftp site
(e)Query term list?
yes no
(f)Order of priority
(g)Document files used
ntc1-j1.mod ntc2-j0g ntc2-j0k ntc1-e1.mod ntc2-e0g
ntc2-e0k ntc1-j1-wakachi ntc2-w0g ntc2-w0k
(h)Search topics used
topics-j101-150 topics-e101-150 topic-w101-150
(i)Fields of search topics used in the run
TITLE DESCRIPTION NARRATIVE CONCEPT FIELD
(j)Method used in building queries
automatic interactive
* NTCIR-1 = NACSIS Test Collection 1, NTCIR-2 = NII Test Collection 2


1 Indexing

1.1 Indexing

1) Index units for Japanese text
uni-gram bi-gram other n-gram word phrase word+phrase
other

2) Index units for English text
n-gram word phrase word+phrase
other

3) Index units for English terms within sentence in Japanese
n-gram word phrase word+phrase
other

4) Method(s) used in indexing
NTCIR's segmented data lexicon morphological analysis
other

5) Method(s) used in selection of index terms
stop word type of characters part of speech
other

6) Standardizing terms (characters)?
yes Please specify
no

7) Stemming Algorithm?
yes Please specify
no

8) Term Weighting?
yes Please specify
no

9)Phrase identification?
yes no
- Method used
statistical syntactic other
- Method to construct phrases
Please specify

10)Syntactic Parsing?

yes Please specify
no

11) Thesaurus and/or lexical resources?
yes Please specify
no

12) Word sense disambiguation?
yes Please specify
no

13) Spelling checking (including manual checking)?
yes Please specify
no

14) Correcting them?
yes Please specify
no

15) Proper noun identification?
yes Please specify
no

16)Method(s) used in selecting index terms?
Please specify

17)Use "YOMI" of Japanese text?
yes no

18) (if 17 is Yes) method(s) used to generate "YOMI"
Please specify

19) Other method(s) used in indexing (Please specify)
Please specify

1.2 Index data structures built from NTCIR-1&2

1) Kind of index structures
inverted index clusters signature files pat-tree knowledge bases
other (Please specify)

2) Summary of index
- Total storage [MB]
- Total time to build [minutes]
- Automatic process? (If not, number of manual hours)
yes no [minutes]
- Use of positional information (off-set)?
yes no


1.3 Data built from sources other than NTCIR-1&2

1) Internally-built auxiliary files
- Type of file
thesaurus knowledge base lexicon other
- Total storage [MB]
- Number of concepts represented
- Total computer time to build [hours]
- Use of manual labor?
yes no
- Total manual time to build [hours]

2) Externally-built auxiliary files (including commodities)
Please specify

2 Query construction

2.1 Automatically constructed queries

1) Average computer time to build query [in CPU seconds]
[seconds]

2) Method(s) used in building queries

- Tokenizing
uni-gram bi-gram other n-gram word phrase word+phrase
other Please specify
- Phrase identification from topics?
yes Please specify
no
- Syntactic parsing?
yes Please specify
no
- Word sense disambiguation?
yes Please specify
no

- Proper noun identification?
yes Please specify
no

- Automatic expansion of queries?
- Automatic addition of Boolean/procimity operators?
yes Please specify
no

- Other(s)
Please specify

2.2 Manually constructed queries

1) Average time to construct query
[minutes]

2) Person constructing queries

- Domain expert
yes no
- Member of the group which has developed the system
yes no
- Computer system expert
yes no
- Other
yes Please specify
no

3) IR system experience
Less than once a month More than once a month more than once a week everyday

4) Understanding of the retrieval system used
Well understanding understanding can not tell poor understanding no understanding

5) Advaise from the specialist of the field
%

6) Tools used in building query

- Word frequency list?
yes Please specify
no
- Knowledge base?
yes Please specify
no
- Other lexical tools (e.g. thesaurus, lexicon)?
yes Please specify
no
7)Method used in query construction

- Term weighting?
yes Please specify
no
- Boolean operators (AND, OR, NOT)?
yes Please specify
no
- Proximity operators?
yes Please specify
no
- Addition of terms not included in topics?
yes no
- Other (Please specify)
yes Please specify
no

2.3 Interactive queries

1) Initial query constructed automatically or manually
auto manual

2) Person doing interaction

- Domain expert
yes no one part
- Member of the group which has developed the system
yes no
- Computer system expert
yes no
- Other
yes Please specify
no

3) IR system experience
Less than once a month More than once a month more than once a week everyday

4) Understanding of the retrieval system used
Well understanding understanding can not tell poor understanding no understanding

5) Advaise from the specialist of the field
%

6) Average time to complete an interaction

[minutes]

7) What determines the end of an interaction

Please specify

8) Method(s) used in interaction

- Automatic term reweighting from relevant documents (relevance feedback)?
yes no
- Query expansion from relevant documents (relevance feedback)?
yes no
- Query modification in manual?
yes no

3 Searching

3.1 Search times

1) Computer time to search (average per query, in CPU)
[seconds]

3.2 Searching methods

1) Vector space model?
yes Please specify
no

2) Probabilistic model?
yes Please specify
no

3) Other
Please specify

3.3 Factors in ranking

1) TF (Term Frequency)?
yes no

2) IDF (Inverse document frequency)?
yes no

3) Other term weights?
yes Please specify
no

4) Semantic closeness?
yes Please specify
no

5) Positional information in the document?
yes Please specify
no

6) Syntactic clues?
yes Please specify
no

7) Proximity of terms?
yes Please specify
no

8) Document length?
yes Please specify
no

9) Other
Please specify

3.4 Machine information

1) Machine type for the experiment

2) Was the machine dedicated or shared?
dedicated shared

3) Amount of hard disk storage [GB]

Amount of RAM [MB]

5) Clock rate of CPU [MHz]

3.5 Others

1) Brief description of features of your system not answered by above questions (Please specify)


2) Others (Please specify)


3)Your group has

- Japanese native speaker(s)
yes no
- Member(s) who can understand Japanese language
yes no


[In case J-E, E-J, J-EJ, E-JE]

4 Overall Approach of Cross-Lingual IR task

1) What basic approach do you take to Cross-Lingual Retrieval?

- Query Translation
yes no
- Document Translation
yes no
- Corpus based approach
yes Please specify
no
- Other
yes Please specify
no

2) (If manually) To what degree is the searcher's ability of understanding Japanese?

- native Japanese speaker
yes no
- Using dictionaries, he/she can write an academic paper in Japanese language
yes no
- Using dictionaries, he/she can read an academic paper in Japanese language
yes no
- He/She had been learned Japanese language more than three months
yes no
- He/She can't understand Japanese language
yes no
- Other
yes Please specify
no

3) (If manually) To what degree is his(her) ability of understanding English?

- native English speaker
yes no
- Using dictionaries, he/she can write an academic paper in English language
yes no
- Using dictionaries, he/she can read an academic paper in English language
yes no
- He/She had been learned English language more than three months
yes no
- He/She can't understand English language
yes no
- Other
yes Please specify
no

4) Spelling checking (including manual checking)?
yes Please specify
no

5) Correcting them?
yes Please specify
no

6)Methods used in query translation

6-1 Multilingual dictionary

- Externally-constructed one(s)
yes no
- Internally-constructed one(s)
yes no

6-2 Machine translation system

- Externally-constructed system
yes no
- Internally-constructed system
yes no

6-3 Other(s)
Please specify

6-4 Manual effort involved in translation?
yes Please specify
no

6-5 Query expansion:

- Before query translation
yes no
- After query translation
yes no
- No query expansion
yes no

6-6 Methods used in query expansion
- Relevance feedback
yes no
- Automatic relevance feedback (local context analysis)
yes no
- Global relevance feedback
yes no
- Thesaurus, lexicon, etc.
yes no
- Other
yes Please specify
no

6-7 Disambiguation when translating?
yes Please specify
no

5. Any comment or question?



[JAPANESE] [NTCIR Home] [Workshop Home] [Task Description] [Data Sample] [Top]