Masao Utiyama and Hitoshi Isahara
- Published in print:
- 2008
- Published Online:
- August 2013
- ISBN:
- 9780262072977
- eISBN:
- 9780262255097
- Item type:
- chapter
- Publisher:
- The MIT Press
- DOI:
- 10.7551/mitpress/9780262072977.003.0002
- Subject:
- Computer Science, Machine Learning
Large-scale parallel corpora are indispensable language resources for machine translation (MT). However, there are only a few publicly available large-scale parallel corpora. This chapter describes a ...
More
Large-scale parallel corpora are indispensable language resources for machine translation (MT). However, there are only a few publicly available large-scale parallel corpora. This chapter describes a Japanese-English patent parallel corpus created from patent families filed in Japan and the United States. The parallel corpus contains about 2 million sentence pairs that were aligned automatically. This is the largest Japanese-English parallel corpus and will be available to the public after the NTCIR-7 workshop meeting.Less
Large-scale parallel corpora are indispensable language resources for machine translation (MT). However, there are only a few publicly available large-scale parallel corpora. This chapter describes a Japanese-English patent parallel corpus created from patent families filed in Japan and the United States. The parallel corpus contains about 2 million sentence pairs that were aligned automatically. This is the largest Japanese-English parallel corpus and will be available to the public after the NTCIR-7 workshop meeting.