Águia, a tool for searching the Floresta treebank

logo temporário da FS
Floresta sintá(c)tica project

Interface em português


Search in

Bosque, a subset of Floresta, fully revised by the linguistic team (version 7.4, 22 December 2005): 9,431 trees, corresponding to 1962 extracts of CETEMPúblico and CETENFolha, 9,368 distinct sentences, 215,003 tokens and ca. 184,773 words
Floresta virgem, unrevised Floresta (version 2.1, 16 March 2005). 78,246 trees automatically created from the CG output of the PALAVRAS parser, corresponding to the first million words of the CETEMPúblico and CETENFolha corpora each. NB. Floresta Virgem includes the contents of Bosque without manual revision .

Kind of result

Concordance
Lemma distribution
(Word's) function distribution
Part of speech distribution
Phrase distribution
Phrase distribution of immediate constituents
(Phrase's) function distribution
Function distribution of immediate constituents
Text distribution
Size distribution

Look for:

Help

We are still experimenting with the user interface, and warmly encourage user feedback. We are also developing a guided tour to make the tool more user understandable.

Use the tables below for an idea of the kinds of search criteria already avaliable. Note that for the moment the functions whose names start with /ass require exactly one space in the end of their regular expression argument. We hope to be able to improve usability of this interface soon.

Concordance request

Distribution request

When you ask for distribution, you are actually searching in another corpus whose terminals are phrases. Your search expressions should thus look for things like "np" or [funcao="ACC"], while the kind of results is specified in the kind of distribution you selected.

For example, you can look for what kinds of phrasal subjects (in terms of their constituents) there is in the treebank, by selecting "phrase distribution" and input [funcao="SUBJ"] in the query window. You would get the size distribution if you had chosen size instead.

You may, on the other hand, look at what are the functions of PPs in the corpus, by selecting the function distribution, and simply input "pp" in the query window. Or you may simply look at the actual words in the PP's, for which you would choose text distribution.

To be added


Detailed quantitative data on current Bosque

clauses21,931
finite15,566
non-finite5,602
averbal763
noun phrases43,096
prepositional phrases32,210
d adjectival phrases1,780
adverbial phrases833
conjuncts5,448
trees9,431
sentences with more than one tree64
sentences with exactly two trees61
sentences with exactly three trees2


Last update: 8 September 2006.
Comments and suggestions about the Floresta treebank