
	IMS Open Corpus Workbench (CWB)
	Release 3.5 (PRE-RELEASE)


The IMS Open Corpus Workbench (CWB) is a highly specialised database and query
processor for large text corpora with linguistic annotations.  The CWB uses a
proprietary read-only format to store corpora with token-level annotations
(such as POS tags, lemmata, morphological features, etc.) and shallow
structural markup (sentences and paragraphs, as well as mildly recursive
syntactic chunks and phrases).  The read-only approach allows corpus data to
be fully indexed and compressed efficiently, so that the CWB scales easily to
corpora of several hundred million words.

The main components of the CWB are a set of command-line utilities for
encoding, indexing and compressing annotated text corpora (using a simple "one
word per line" input format), tools for decoding and accessing frequency data
(cwb-decode, cwb-lexdecode, cwb-scan-corpus), and most importantly the
powerful and versatile query processor CQP.  It is strongly recommended to use
the CWB in combination with the CWB/Perl package (available separately), which
simplifies the encoding process and provides a comprehensive and well-tested
API to the functionality of the CWB.


For more information, see the CWB homepage

    http://cwb.sourceforge.net/


Installation instructions are detailed in the file "INSTALL" (in this directory).


The IMS Open Corpus Workbench is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
details.

A copy of the GNU General Public License is included in the file "COPYING".
It is also available via WWW at http://www.gnu.org/copyleft/gpl.html, or you
can obtain it by writing to the Free Software Foundation, Inc., 51 Franklin
Street Fifth Floor, Boston, MA 02110-1301 USA.

See the file "COPYING" for additional licensing information concerning third-party
open-source software packages used by the IMS Open Corpus Workbench.

