Arches Lingo is a software application purpose-built for cultural heritage organizations and their vocabulary editors and data managers. It provides a single environment for loading, browsing, editing, and publishing thesauri, with support for the standards and workflows that heritage professionals rely on.

View a demonstration by software developer Rob Gaston of Farallon Geographics showing a walkthrough of the core features of Lingo through the user persona of a FISH vocabulary editor, responsible for maintaining and enriching the FISH vocabularies—a set of thesauri used across the cultural heritage sector in the United Kingdom, managed by the Forum on Information Standards in Heritage.

 

Lingo & Arches

Across the cultural heritage field, organizations are documenting objects, sites, and places of immense significance, but the data often exists in isolation. Organizations are not always using the same words, and even when they are, those words do not always have the same meaning.

Arches Lingo is a newly released open-source application built to address this problem. Developed by the Getty Conservation Institute with input from knowledge organization specialists at Historic England and other organizations, Lingo provides cultural heritage organizations and professionals with a dedicated tool for creating, managing, and publishing vocabularies and thesauri that make data meaningful, discoverable, and shareable between information systems and across institutions and borders.

Lingo is built on the Arches platform, which to date has had over 130 known implementations worldwide and more than a decade of software development behind it. Rather than starting from scratch, we leveraged the robust data management capabilities of Arches to create a purpose-built vocabulary management tool informed by the needs of the cultural heritage field. 

Vocabularies can be exported from Lingo in multiple formats, including CSV, SKOS XML, RDF XML, and JSON-LD, allowing for integration with Arches implementations as well as non-Arches platforms.

Being part of the open-source Arches ecosystem brings practical benefits beyond the software itself. Implementers can draw on the expertise of a global network of peers with similar goals, accessing bug fixes, code, and resource models shared across the community. Visit the Open-Source Software & the Arches Project webpage to learn more.

 

Why Vocabulary Management Matters

The cultural heritage sector is particularly good at building data sets, but there is a big difference between collecting data and being able to understand the information and knowledge encoded in that data. What we end up with are data sets that are like islands: tantalizingly close to each other, but separated by deep waters. Vocabularies are one of the most important tools we have for bridging those gaps.

When heritage professionals describe objects and sites, they rely on language to categorize and record what they find. That language has to be shared to be useful. But the same word can mean very different things to different people. In English, “gift” conjures a birthday or holiday occasion; in German, the same word means poison. The word “mortar” could refer to a building material, a vessel used for grinding, or a military explosive device. In both cases, matching the letters is not enough. What matters is matching the concept behind them, and the context that makes the meaning clear. For cultural heritage data, where terminology is rich, specialized, and often multilingual, this is not a minor inconvenience. It is a structural barrier to research, collaboration, and discovery.

What is needed is not just matching words, but matching the ideas behind the words. This is exactly what Lingo was built to do. A concept is more than its name — it needs scope notes to define its meaning, a hierarchy to show how it relates to broader and narrower ideas, and a record of the full range of terms used to describe it, with one designated as preferred. Together, these elements capture not just what something is called, but what it actually means.

Language, however, can only go so far. For an unfamiliar object, it is often not until you see an image that the term and the concept truly come together. Lingo supports attaching images to concepts for exactly this reason.

Screenshot of the concept “Thurible” with an image uploaded in the Arches Lingo application.

Good vocabulary management supports principles that are increasingly central to the cultural heritage sector. FAIR data principles—Findable, Accessible, Interoperable, and Reusable—set the standard for what well-structured, well-linked data should look like. CARE principles—Collective Benefit, Authority to Control, Responsibility, and Ethics—address how that data may need to be governed, particularly when it concerns communities whose heritage is being documented. Lingo is built to support both. The result is a shift away from growing isolated data sets toward building integrated ones that adhere to appropriate guidelines, are built for the collective benefit of the community, and are managed responsibly and ethically.

Beyond supporting human researchers, vocabularies are becoming critical to how machines interpret and retrieve information. Ontologies and semantically structured data give machines a common framework for connecting concepts across systems, datasets, and organizations. As artificial intelligence tools become more common in heritage contexts, this kind of semantically structured, well-defined vocabulary data, along with images, provides the precise, contextual information that helps AI systems work accurately, reducing errors and improving the quality of automated search, retrieval, and analysis. Lingo is built with this expanding role for vocabulary data in mind.

 

What Lingo Does

Lingo offers features designed to support core tasks for vocabulary management, including:

Browsing and navigation. Lingo can hold multiple thesauri simultaneously. Users can browse them through a hierarchy view that makes it easy to see how concepts are organized and related to one another, or search across all loaded schemes at once using any term in any language. The application is fully internationalized, making it capable of displaying both interface labels and concept terms in the user’s preferred language where translations are available, with support for right-to-left scripts such as Arabic or Mandarin.

Advanced search. Beyond simple term lookup, Lingo includes a faceted search interface that allows users to build complex queries combining label content, scope notes, hierarchical position, language, and other criteria. Searches can be saved and recalled, and results can be collected into named concept sets for ongoing reference during editorial work.

Editorial tools. When a scheme is placed in edit mode, vocabulary editors can add and modify preferred labels, alternative labels, and scope notes, with the ability to record source and contributor information for every element. New concepts can be created and placed within the hierarchy, and images can be attached to concepts to support understanding, which is particularly useful for concepts that are easier to recognize visually than to describe in words. Concepts can also be linked to equivalent or related entries in external thesauri using standard match relationships, enabling the kind of cross-dataset connections that make linked open data valuable.

Collaborative editing. Lingo is a web-based application, meaning multiple people can work on vocabularies simultaneously within the same system. Rather than managing vocabularies through shared spreadsheets or passing files between colleagues, teams can log in together, make edits, and see each other’s contributions in a shared environment.

Publishing and export. Schemes and concepts move through defined lifecycle states, starting in draft and moving to published once editorial work is complete. Publishing generates stable identifiers and URIs for each concept. Published data can be exported in standard formats including SKOS XML, RDF XML, CSV, and JSON-LD, making it straightforward to share vocabularies with other systems and partners.

The demonstration video above by software developer Rob Gaston of Farallon Geographics shows a walkthrough of these core features of Lingo through the user persona of a vocabulary editor, responsible for maintaining and enriching the FISH (Forum on Information Standards in Heritage) vocabularies—a set of thesauri used across the cultural heritage sector in the United Kingdom. The demonstration illustrates a realistic editorial workflow, showing how a vocabulary editor might use the Getty Art and Architecture Thesaurus (AAT) as a reference source to identify gaps in the FISH vocabularies, enrich and expand existing concepts, establish links between the two thesauri, and publish the results. 

 

What Comes Next

Version 1.0.0 of Lingo is out and is already robust. But it’s just the beginning. 

Future development goals can be found on the Lingo roadmap, including more streamlined workflows for reviewing and approving candidate concepts, dedicated tools for merging and splitting concepts, expanded support for additional hierarchy types including people, groups, places, and periods, versioning capabilities, and a SPARQL endpoint for advanced querying. 

Continued collaboration with the cultural heritage sector is what will drive Lingo forward. Because open source development is agile, feedback received today can result in improvements in days or weeks, not years down a corporate roadmap.

If you are working with heritage vocabularies as a data manager, a researcher, or an implementer, your experience with Lingo matters. Share your feedback and questions using the arches-lingo tag on the Arches Project Community Forum. For installation and configuration instructions, visit https://pypi.org/project/arches-lingo/.