| Metadata.
Comparative linguists typically swim in a sea of data from many
different sources. Wordcorr keeps information about all this
at five levels: data about the linguist, the data collection, each
of the speech varieties in the collection, each of the linguist's
views of the collection, and the data themselves.
"Metadata" has come to be the standard name for data
about data.
In addition to working with the
actual data, sooner or later you will need to look up or share information about data. It's
like the information in the card catalog or database of a library,
as over against the contents of the library itself. Think of it as
cataloguing information.
Information
about you, the linguist (or in Computerese generically,
the "user"), is the same things you would attach to a
published article: your full name, email address, and institutional
affiliation. Wordcorr also uses a short identification like
"JG" in the Web site for the Wordcorr community, to
distinguish data collections that might have the same name but be
originated by different people. JG-Austronesian is not the same as
CH-Austronesian. You enter your user information on the User panel
of the Wordcorr window.
Information
about the data collection is like a library entry for
something that isn't a book yet, but is a well organized assemblage
of information that other scholars may be interested in. It follows
standard library categories. Each collection has a creator
whose identification is prefixed to the collection name. It may
have collaborators; people who have contributed data or
assisted in transcribing or taken part in the analysis.
Information
about speech varieties identifies each speech variety
precisely, using the proposed Universal Language Code based on the
Ethnologue codes1 for living
languages and the Linguist List codes for extinct languages. It
also tells where the data came from, whether from published
sources, other linguists, or your own field work.
Information
about views is not about the data as such; it is about
each analytical view of common data. Different investigators may
work on analyzing a common set of data, each one as a contributor
to the same data collection. Wordcorr makes it easy for them to
pass their analyses back and forth to each other for comment, as
part of the linguistic dialogue. But it also keeps each set of
observations identified so that they don't get mixed up with other
views. A single investigator may develop more than one view of the
same data at the same time, in order to follow out conflicting
hypotheses.
1The
new codes can be viewed on the World Wide Web at ethnologue.org. Grimes, who
designed Wordcorr, also wrote the computer program that assigned
the individual language codes back around 1972. In the process of
acceptance as the international standard ISO/DIS 639-3, there have
been some changes to provide compatibility with earlier standards.
The Ethnologue Web site contains tables showing the changes.
|
Worldwide, the growth of available information is
staggering. Libraries, newspapers, archives, are all bursting at
the seams, and trying to keep track of where everything is.
And how about you? Is your
personal information organization better now than it was five years
ago? Can you put your finger on the good stuff?
People who are committed to making
information available take metadata organization very seriously.
That's why Wordcorr has jumped in with OLAC and EMELD, to make sure
all linguists can find the Wordcorr collections they need.
|