Self-hosting language


self-hosting language noun
A self-hosting language is a language whose grammar and lexicon can be described in the language itself.


I use the term "self-hosting language" to describe the maturity level of languages. For both natural and constructed languages, it means that technical terms and syntactic structures are present which allow for describing the language's grammar and lexicon using only the language itself.

The concept of self-hostedness is a generalization from computer science, where the term refers to programming languages that are capable of compiling themselves into machine code, i.e. without the need to use a compiler developed in a previously existing programming language.

In the development of modern programming languages, an existing language is used to "bootstrap" a new language: In the first stage, a compiler to translate the new language into machine code is developed using an existing language. Secondly, when the language is mature enough, a compiler to compile the new language into machine code is developed using the new language, and that new compiler is compiled into machine code using the first compiler that was developed in an existing language. Now that the compiler developed in the new language is available, we enter the third stage: The new language can be compiled using the compiler developed in the new language itself and the new language doesn't rely on any previously existing language anymore.

This transition serves as some kind of a maturity test for programming languages, following the reasoning that compilers are fairly complex computer programs, and if a language can be used to develop a compiler, it is probably fit to be used for any other meaningful programming task.

When comparing this to the process of language creation, conlangers would usually write a grammar for their newly created languages in a language they are already familiar with, for example English. This is similar to stage one above, "bootstrapping". The second stage would be the process of writing a grammar of the constructed language in the constructed language itself. By publishing the grammar, the third stage would be reached: The constructed language now classifies as "self-hosted". This stage has, for example, been reached by Esperanto, where the learning website is available entirely in Esperanto (among other languages).

There is a broad and narrow definition of "self-hosting language". In the broad sense, it refers to any language that is theoretically capable of describing its own grammar and lexicon. In the narrow sense, it refers only to languages where a grammar and lexicon in the language itself have actually been written and published. In between, there is a spectrum of languages that are in the process of transitioning to self-hostedness.


It would be interesting to see more conlangers attempt to write grammars for their languages in their languages themselves, and learn from their experiences during the process. Assuming the computer science analogy can be applied to constructed languages as well, this method might provide an alternative way to ensure a language's "completeness", in addition to the well-known "Conlang Syntax Test Cases".

For natural languages, the tradition of writing reference grammars in the languages themselves dates back to the Sanskrit grammar by Pāṇini, authored a few hundred years BCE, but some languages don't even have a writing system that could be used to produce a written reference grammar. "Self-hostedness" may be relevant when saving endangered languages: Speakers of endangered languages will probably feel that their language is more valued when they are also empowered to document their language in their language itself, instead of just having their language documented in another language.

Copyright © 2021 by Thomas Heller [ˈtoːmas ˈhɛlɐ]