The physical address for this access is http://acdc.linguateca.pt/acesso/. This service was launched on 23 September 1999.
Internet access: Web interface to a corpus workbench, in the present case the IMS Corpus Workbench.
Why should one provide Internet acess to corpora?
(Values computed on the 13th January 2004)
| Corpus | Size (units) | Size (words) | Size (sentences) | Short description |
| natura natpanot | 7.257.175 7.321.642 | 6.257.950 6.268.817 | 225.673 225.734 | Newspaper text of PÚBLICO, Portugal, 1991-1994, 2 paragraphs a day |
| enpcpub enpcanot | 89.864 90.574 | 72.244 72.392 | 4.369 4.369 | Translated fiction from five novels in English, from the ENPC |
| minho minhanot | 2.083.761 2.107.826 | 1.738.475 1.747.274 | 53.040 53.185 | Newspaper text of local periodic, Diário do Minho, full articles before proofreading |
| eci-ebr ebranot | 891.687 898.542 | 722.012 723.007 | 45.530 44.689 | Brazilian text: fiction, non-fiction, from the Borba-Ramsey corpus |
| eci-ee eeanot | 30.157 31.127 | 26.515 27.140 | 780 780 | Call of ESPRIT program in European Portuguese, from the ECI |
| saocarlos scanot | 41.372.943 41.948.319 | 32.091.996 32.385.765 | 1.955.166 1.952.829 | Brazilian text, mainly from newspapers, but also didactic material and business letters |
| frasespp fppanot | 19.340 19.542 | 16.225 16.208 | 594 594 | Sentence corpus in European Portuguese |
| frasespb fpbanot | 22.486 22.730 | 19.155 19.165 | 651 651 | Sentence corpus in Brazilian Portuguese |
| cetempublicoprmi cpprmianot | 1.198.015 1.202.938 | 997.695 995.851 | 38.151 38.251 | Newspaper text from PÚBLICO, Portugal, 1991-1998, extracts of two paragraphs in a random order |
| ancib ancibanot | 811.739 828.475 | 650.045 660.045 | 25.798 25.596 | Brazilian e-mail corpus - traffic in the ANCIB list (libraries and information science in Brazil) |
| diaclav diaclavanot | 7.441.109 7.529.495 | 6.488.273 6.549.823 | 228.856 210.741 | Newspaper text ol local periodicals, Diário de Coimbra, Diário de Leiria, Diário de Aveiro, Viseu Diário, Portugal, 1999-2000 |
| avante avantanot | 7.607.651 7.685.242 | 6.488.201 6.512.510 | 204.686 204.833 | Newspaper text, political party weekly newspaper, Avante, Portugal, 1997-2002 |
| amostra amostranot | 124.655 124.836 | 98.444 98.505 | 4.925 4.965 | Selection of texts from the NILC corpus, in Brazilian Portuguese, including texts from the didactic, journalistic and literary styles |
| classlppe | 1.872.381 | 1.307.334 | 74.174 | Literary text (prose, drama and poetry) of Portuguese "classical" 16th to 19th century writers |
| Total raw Total tagged | 70.822.776 69.811.288 | 56.974.564 56.076.502 | 2.862.393 2.767.217 | All raw corpora except for CETEMPúblico |
We provide more extensive documentation and information, in Portuguese, about the raw corpora, the annotated corpora and the actual processing and encoding of the several kinds of information present in the corpora (tokenization, sentence separation and annotation).
[ Access to the corpora | Portuguese main page of Linguateca | English page of Linguateca ]