Building And Using Comparable Corpora

Author: Serge Sharoff
Publisher: Springer Science & Business Media
ISBN: 3642201288
Size: 15.33 MB
Format: PDF, Docs
View: 5773
Download Read Online

Building And Using Comparable Corpora from the Author: Serge Sharoff. The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Web As Corpus

Author: Maristella Gatto
Publisher: A&C Black
ISBN: 1441134131
Size: 11.76 MB
Format: PDF, Docs
View: 5762
Download Read Online

Web As Corpus from the Author: Maristella Gatto. Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the "web as corpus†?. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.

Applications Of Finite State Language Processing

Author: Tamás Váradi
Publisher: Cambridge Scholars Publishing
ISBN: 1443826030
Size: 56.61 MB
Format: PDF, ePub
View: 7358
Download Read Online

Applications Of Finite State Language Processing from the Author: Tamás Váradi. NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several levels of linguistic phenomena: orthography and spelling, lexicons for simple words, multiword units and frozen expressions, inflectional, derivational and productive morphology, local, structural syntax and transformational syntax. For each of these levels, NooJ provides linguists with one or more formal tools specifically designed to facilitate the description of each phenomenon, as well as parsing tools designed to be as computationally efficient as possible. This approach distinguishes NooJ from most computational linguistic tools, which provide a single formalism that should describe everything. As a corpus processing tool, NooJ allows users to apply sophisticated linguistic queries to large corpora in order to build indices and concordances, annotate texts automatically, perform statistical analyses, etc. NooJ is freely available and linguistic modules can already be downloaded for Acadian, Arabic, Armenian, Bulgarian, Catalan, Chinese, Croatian, French, English, German, Hebrew, Greek, Hungarian, Italian, Polish, Portuguese, Spanish and Turkish. The present volume contains papers from the 2008 International NooJ conference which was held 8–10 June 2008 in Budapest. While the focus of the Budapest conference was on making NooJ compatible with other applications, the papers vary with respect to whether they regard Natural Language Processing (NLP) as a research goal or as a tool. However, they all present a slightly different problem either in the field of NLP, or in one that can be solved using NLP, or present a new development in the tool itself. The range of problems dealt with in the volume is quite varied, which will hopefully enable the readers to find contributions that are relevant to their field of interest.


Author: A. Abeillé
Publisher: Springer Science & Business Media
ISBN: 9401002010
Size: 59.68 MB
Format: PDF, ePub
View: 796
Download Read Online

Treebanks from the Author: A. Abeillé. This book provides a state of the art on work being done with parsed corpora. It gathers 21 papers on building and using parsed corpora raising many relevant questions, and deals with a variety of languages and a variety of corpora. It is for those working in linguistics, computational linguistics, natural language, syntax, and grammar.

Corpora In Translator Education

Author: Federico Zanettin
Publisher: Routledge
ISBN: 1317641353
Size: 34.95 MB
Format: PDF, Mobi
View: 595
Download Read Online

Corpora In Translator Education from the Author: Federico Zanettin. The use of language corpora as a resource in linguistics and language-related disciplines is now well-established. One of the many fields where the impact of corpora has been growing in recent years is translation, both at a descriptive and a practical level. The papers in this volume, which grew out of presentations at the conference Cult2k (Bertinoro, Italy, 2000), the second in the series Corpus Use and Learning to Translate, are principally concerned with the use of corpora as resources for the translator and as teaching and learning aids in the context of the translation classroom. This book offers a cross-section of research by some leading scholars in the field, who offer accounts of first-hand experience and theoretical insights into the various ways of building and using appropriate corpora in translation teaching, for the benefit of teachers and learners alike. The various contributions provide a rich source of inspiration for other researchers and practitioners concerned with 'corpora in translator education'. Contributors include Stig Johansson, Tony McEnery, Kirsten Malmkjær, Jennifer Pearson, Lynne Bowker, Krista Varantola, Belinda Maia and a number of other scholars.

Multilingual And Multimodal Information Access Evaluation

Author: Maristella Agosti
Publisher: Springer Science & Business Media
ISBN: 3642159974
Size: 57.94 MB
Format: PDF, Kindle
View: 4744
Download Read Online

Multilingual And Multimodal Information Access Evaluation from the Author: Maristella Agosti. In its ?rst ten years of activities (2000-2009), the Cross-Language Evaluation Forum (CLEF) played a leading role in stimulating investigation and research in a wide range of key areas in the information retrieval domain, such as cro- language question answering, image and geographic information retrieval, int- activeretrieval,and many more.It also promotedthe study andimplementation of appropriateevaluation methodologies for these diverse types of tasks and - dia. As a result, CLEF has been extremely successful in building a wide, strong, and multidisciplinary research community, which covers and spans the di?erent areasofexpertiseneededto dealwith thespreadofCLEFtracksandtasks.This constantly growing and almost completely voluntary community has dedicated an incredible amount of e?ort to making CLEF happen and is at the core of the CLEF achievements. CLEF 2010 represented a radical innovation of the “classic CLEF” format and an experiment aimed at understanding how “next generation” evaluation campaigns might be structured. We had to face the problem of how to innovate CLEFwhile still preservingits traditionalcorebusiness,namely the benchma- ing activities carried out in the various tracks and tasks. The consensus, after lively and community-wide discussions, was to make CLEF an independent four-day event, no longer organized in conjunction with the European Conference on Research and Advanced Technology for Digital Libraries (ECDL) where CLEF has been running as a two-and-a-half-day wo- shop. CLEF 2010 thus consisted of two main parts: a peer-reviewed conference – the ?rst two days – and a series of laboratories and workshops – the second two days.

Introducing Corpora In Translation Studies

Author: Maeve Olohan
Publisher: Routledge
ISBN: 1134492219
Size: 60.21 MB
Format: PDF, ePub
View: 2387
Download Read Online

Introducing Corpora In Translation Studies from the Author: Maeve Olohan. The use of corpora in translation studies, both as a tool for translators and as a way of analyzing the process of translation, is growing. This book provides a much-needed assessment of how the analysis of corpus data can make a contribution to the study of translation. Introducing Corpora in Translation Studies: traces the development of corpus methods within translation studies defines the types of corpora used for translation research, discussing their design and application and presenting tools for extracting and analyzing data examines research potential and methodological limitatis considers some uses of corpora by translators and in translator training features research questions, case studies and discussion points to provide a practical guide to using corpora in translation studies. Offering a comprehensive account of the use of corpora by today's translators and researchers, Introducing Corpora in Translation Studies is the definitive guide to a fast-developing area of study.

Corpus Based Translation And Interpreting Studies From Description To Application Estudios Traductol Gicos Basados En Corpus De La Descripci N A La Aplicaci N

Author: María Teresa Sánchez Nieto
Publisher: Frank & Timme GmbH
ISBN: 3732900843
Size: 48.80 MB
Format: PDF, ePub
View: 7747
Download Read Online

Corpus Based Translation And Interpreting Studies From Description To Application Estudios Traductol Gicos Basados En Corpus De La Descripci N A La Aplicaci N from the Author: María Teresa Sánchez Nieto. The contributions in this volume illustrate some noteworthy tendencies in current Corpus-based Translation and Interpreting Studies: the reflection on the state of research on the characteristics of translated language, the extension of descriptive proposals into minority languages, the diversification of applied proposals and the growing importance of corpora for the study of interpreting. Las aportaciones de este volumen representan algunas tendencias destacables en los actuales estudios traductológicos basados en corpus: la reflexión sobre el estado de la investigación en torno a las características de la lengua traducida, la extensión de las propuestas descriptivas a lenguas minoritarias, la diversificación de las propuestas aplicadas y la creciente importancia de los corpus para el estudio de la interpretación.

Multilingual Processing In Eastern And Southern Eu Languages

Author: Cristina Vertan
Publisher: Cambridge Scholars Pub
Size: 16.15 MB
Format: PDF, Kindle
View: 6742
Download Read Online

Multilingual Processing In Eastern And Southern Eu Languages from the Author: Cristina Vertan. This volume draws attention to many specific challenges of multilingual processing within the European Union, especially after the recent successive enlargement. Most of the languages considered herein are not only less resourced in terms of processing tools and training data, but also have features which are different from the well known international language pairs. The 16 contributions address specific problems and solutions for languages from south-eastern and central Europe in the context of multilingual communication, translation and information retrieval.

Building And Exploring Web Corpora Wac3 2007

Author: CĂ©drick Fairon
Publisher: Presses univ. de Louvain
ISBN: 9782874630828
Size: 74.42 MB
Format: PDF
View: 2998
Download Read Online

Building And Exploring Web Corpora Wac3 2007 from the Author: CĂ©drick Fairon. WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.