Building And Using Comparable Corpora

Author: Serge Sharoff
Publisher: Springer Science & Business Media
ISBN: 3642201288
Size: 78.51 MB
Format: PDF, ePub, Mobi
View: 5677
Download Read Online

Building And Using Comparable Corpora from the Author: Serge Sharoff. The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Building And Using Comparable Corpora

Author: Serge Sharoff
Publisher: Springer
ISBN: 9783642201271
Size: 33.47 MB
Format: PDF, ePub, Docs
View: 1435
Download Read Online

Building And Using Comparable Corpora from the Author: Serge Sharoff. The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Computational Linguistics And Intelligent Text Processing

Author: Alexander Gelbukh
Publisher: Springer
ISBN: 3319181114
Size: 34.79 MB
Format: PDF, Mobi
View: 1676
Download Read Online

Computational Linguistics And Intelligent Text Processing from the Author: Alexander Gelbukh. The two volumes LNCS 9041 and 9042 constitute the proceedings of the 16th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2015, held in Cairo, Egypt, in April 2015. The total of 95 full papers presented was carefully reviewed and selected from 329 submissions. They were organized in topical sections on grammar formalisms and lexical resources; morphology and chunking; syntax and parsing; anaphora resolution and word sense disambiguation; semantics and dialogue; machine translation and multilingualism; sentiment analysis and emotion detection; opinion mining and social network analysis; natural language generation and text summarization; information retrieval, question answering, and information extraction; text classification; speech processing; and applications.

Explorations In Automatic Thesaurus Discovery

Author: Gregory Grefenstette
Publisher: Springer Science & Business Media
ISBN: 1461527104
Size: 77.71 MB
Format: PDF, ePub, Mobi
View: 6537
Download Read Online

Explorations In Automatic Thesaurus Discovery from the Author: Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus.

Web As Corpus

Author: Maristella Gatto
Publisher: A&C Black
ISBN: 1441134131
Size: 65.32 MB
Format: PDF, Kindle
View: 1913
Download Read Online

Web As Corpus from the Author: Maristella Gatto. Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the "web as corpus†?. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.

The Routledge Handbook Of Corpus Linguistics

Author: Anne O'Keeffe
Publisher: Routledge
ISBN: 1135153620
Size: 10.43 MB
Format: PDF, Kindle
View: 5809
Download Read Online

The Routledge Handbook Of Corpus Linguistics from the Author: Anne O'Keeffe. The Routledge Handbook of Corpus Linguistics provides a timely overview of a dynamic and rapidly growing area with a widely applied methodology. Through the electronic analysis of large bodies of text, corpus linguistics demonstrates and supports linguistic statements and assumptions. In recent years it has seen an ever-widening application in a variety of fields: computational linguistics, discourse analysis, forensic linguistics, pragmatics and translation studies. Bringing together experts in the key areas of development and change, the handbook is structured around six themes which take the reader through building and designing a corpus to using a corpus to study literature and translation. A comprehensive introduction covers the historical development of the field and its growing influence and application in other areas. Structured around five headings for ease of reference, each contribution includes further reading sections with three to five key texts highlighted and annotated to facilitate further exploration of the topics. The Routledge Handbook of Corpus Linguistics is the ideal resource for advanced undergraduates and postgraduates.

Web As Corpus

Author: Maristella Gatto
Publisher: A&C Black
ISBN: 1441134131
Size: 57.60 MB
Format: PDF, Kindle
View: 874
Download Read Online

Web As Corpus from the Author: Maristella Gatto. Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the "web as corpus†?. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.

Hybrid Approaches To Machine Translation

Author: Marta R. Costa-jussĂ 
Publisher: Springer
ISBN: 3319213113
Size: 53.34 MB
Format: PDF, Docs
View: 4523
Download Read Online

Hybrid Approaches To Machine Translation from the Author: Marta R. Costa-jussà. This volume provides an overview of the field of Hybrid Machine Translation (MT) and presents some of the latest research conducted by linguists and practitioners from different multidisciplinary areas. Nowadays, most important developments in MT are achieved by combining data-driven and rule-based techniques. These combinations typically involve hybridization of different traditional paradigms, such as the introduction of linguistic knowledge into statistical approaches to MT, the incorporation of data-driven components into rule-based approaches, or statistical and rule-based pre- and post-processing for both types of MT architectures. The book is of interest primarily to MT specialists, but also – in the wider fields of Computational Linguistics, Machine Learning and Data Mining – to translators and managers of translation companies and departments who are interested in recent developments concerning automated translation tools.

Multimedia And Network Information Systems

Author: Aleksander Zgrzywa
Publisher: Springer
ISBN: 3319439820
Size: 58.68 MB
Format: PDF
View: 6098
Download Read Online

Multimedia And Network Information Systems from the Author: Aleksander Zgrzywa. Recent years have seen remarkable progress on both advanced multimedia data processing and intelligent network information systems. The objective of this book is to contribute to the development of multimedia processing and the intelligent information systems and to provide the researches with the essentials of current knowledge, experience and know-how. Although many aspects of such systems have already been under investigation, but there are many new that wait to be discovered and defined.The book contains a selection of 36 papers based on original research presented during the 10th International Conference on Multimedia & Network Information Systems (MISSI 2016) held on 14–16 September 2016 in Wrocław, Poland. The papers provide an overview the achievements of researches from several countries in three continents.The volume is divided into five parts: (a) Images and Videos - Virtual and Augmented Reality, (b) Voice Interactions in Multimedia Systems, (c) Tools and Applications, (d) Natural Language in Information Systems, and (e) Internet and Network Technologies.The book is an excellent resource for researchers, those working in multimedia, Internet, and Natural Language technologies, as well as for students interested in computer science and other related fields.

Cross Language Information Retrieval

Author: Gregory Grefenstette
Publisher: Springer Science & Business Media
ISBN: 1461556619
Size: 39.82 MB
Format: PDF, Kindle
View: 4804
Download Read Online

Cross Language Information Retrieval from the Author: Gregory Grefenstette. Most of the papers in this volume were first presented at the Workshop on Cross-Linguistic Information Retrieval that was held August 22, 1996 dur ing the SIGIR'96 Conference. Alan Smeaton of Dublin University and Paraic Sheridan of the ETH, Zurich, were the two other members of the Scientific Committee for this workshop. SIGIR is the Association for Computing Ma chinery (ACM) Special Interest Group on Information Retrieval, and they have held conferences yearly since 1977. Three additional papers have been added: Chapter 4 Distributed Cross-Lingual Information retrieval describes the EMIR retrieval system, one of the first general cross-language systems to be implemented and evaluated; Chapter 6 Mapping Vocabularies Using Latent Semantic Indexing, which originally appeared as a technical report in the Lab oratory for Computational Linguistics at Carnegie Mellon University in 1991, is included here because it was one of the earliest, though hard-to-find, publi cations showing the application of Latent Semantic Indexing to the problem of cross-language retrieval; and Chapter 10 A Weighted Boolean Model for Cross Language Text Retrieval describes a recent approach to solving the translation term weighting problem, specific to Cross-Language Information Retrieval. Gregory Grefenstette CONTRIBUTORS Lisa Ballesteros David Hull W, Bruce Croft Gregory Grefenstette Center for Intelligent Xerox Research Centre Europe Information Retrieval Grenoble Laboratory Computer Science Department University of Massachusetts Thomas K. Landauer Department of Psychology Mark W. Davis and Institute of Cognitive Science Computing Research Lab University of Colorado, Boulder New Mexico State University Michael L. Littman Bonnie J.