Divvuns & Giellateknos techdoc Divvuns & Giellateknos techdoc
  • Home
  • Infrastructure
    Transducer infrastructureTechnical maintenanceApplication infrastructureServers, users and accessOld documentationPresentations
  • Tools
    KeyboardsProofing toolsGrammar checkerDictionariesCorpusICALLMachine translationTranslation memoryLinguistic analysisMachine learningText-to-speechLocalisation
  • Languages/linguistics
    Linguistic issuesLanguages
  • Administration
    MeetingsBugzilla

    Corpus

    • Overview
    • Overview and important links

      • Introduction
      • Corpus Tools
      • Repositories
      • Metadata files

      Corpus collection/maintenance

      • Korpussamlerens 1-2-3
      • Corpus collector's manual
      • Maintenance
      • Corpus analysis
      • Corpus conversion
      • Language recognition
      • Unicode normalisation
      • OCR
      • Wikipedia as corpus

      Sentence alignment

      • Overview
      • Workflow
      • Improving PDF-files
      • TCA2 parameters
      • Graph. interface
      • Alternatives
      • Meetings

        • 2017-06-21
        • 2017-07-04
        • 2012-03-22
        • 2012-03-12
        • 2012-02-29
        • 2012-02-17
        • 2012-02-13
        • 2012-02-07
        • 2012-02-01
        • 2012-01-25
        • 2012-01-19
        • 2012-01-12
        • 2011-12-20
        • 2011-12-14
        • 2011-12-08
        • 2011-11-28
        • 2011-11-25
        • 2011-04-07

      Korp

      • Installations
      • Ordbilde

        • Overview
        • Plan for content

      Spoken corpora

      • Overview
      • LIA

        • Overview

        ELAN

        • Overview
        • ELAN documentation
        • Elan tiers
        • FSTs
        • GRAID
        • GT corpus
        • Metadata
        • TLA
        • Toolbox
        • Transcription

UiT Norgga árktalaš universitehta
Copyright © 2004-2019 UiT Norgga árktalaš universitehta
giellalt@uit.no

Documentation for corpus work

Contents:

  • Overview and important links
  • Corpus collection and maintenance
  • Sentence alignment
  • Korp

Overview and important links

  • Introduction
  • The corpus tools
  • Corpus repositories
  • Metadata files

Corpus collection and maintenance

  • Korpussamlerens 1-2-3 (juridisk intro for kontakt med tekstprodusenter)
  • Corpus collector's manual (how to work)
  • Corpus maintenance
  • Corpus analysis (for Korp)
  • Corpus conversion (technical)
  • Language recognition
  • Unicode Normalisation (how to fix decomposed Sami letters)
  • OCR
  • Wikipedia as a Corpus

Sentence alignment

  • Overview
  • Workflow
  • Improving PDF-files
  • TCA2 parameters
  • Graph. interface
  • Alternatives

Meetings

2017: 06-21 // 07-04

2012: 03-22 // 03-12 // 02-29 // 02-17 // 02-13 // 02-07 // 02-01 // 01-25 // 01-19 // 01-12

2011: 12-20 // 12-14 // 12-08 // 11-28 // 11-25 // 04-07

Korp

  • Korp installations and interface
  • Links to our Korp installations
    • Saami languages
    • Kven, Meänkieli, Veps, Võro
    • Komi-Zyrian, Komi-Permyak, Udmurt, Moksha, Erzya, Hill Mari, Meadow Mari
    • Artlab: Plains Cree

Ordbilde

  • Ordbild (DeepDict)