The TXM platform is a Unicode – XML & TEI compatible text/corpus analysis environment and graphical client based on the CQP search engine and the R statistical environment. Created and developed by UMR 5191, “Interactions, Corpus, Learning Representations”, CNRS / Ecole Normale Supérieure de Lyon, Université Lyon 2.
TXM is a textometric software being developed as part of the equipement d’excellence Matrice free to download and use from the Sourceforge platform under the GNU General Public Licence version 3 open-source licence.
As Textométric software, it provides a global and a local linguistic vision of large corpora of digital texts submitted to it; its development is part of the general framework of Digital Humanities and the growing demand for analysis of texts and speech in the Humanities and Social Sciences (History, Sociology, Linguistics, etc.).
TXM allows the user to articulate qualitative and quantitative reading of a body of text with different types of functionalities.
First documentary functionalities that provide hypertext navigation in the corpus, text span or concordances extraction, full text search of words or discourse units. The user can build a scientific interpretation by controlling it at any time by an accurate and comprehensive consultation with its textual data.
Secondly, statistical functionalities: calculation of words specific to a etxt or a part of the corpus (representing a period or a type of speaker for example), calculation of co-occurrences to identify specific attractions between words, factorial view of texts or words of the corpus, etc.
It is in the mutual understanding of these two types of reading – qualitative reading and quantitative reading – that resides the richness of the interpretive trail the analyst is invited to explore with the software.
In practice, TXM takes as input classical text formats like TXT or XML but it also takes output files from lemmatization and morpho-syntactic tagging software, so that user can its treatment applies to both graphic (or raw) text, and enriched text. Be it from a documentary or a statistical point of view, queries become more complex involving both graphical words as lemmas or grammatical codes such as a specific verb tense or specific syntactical pattern. Text structures can also be taken into account in the calculations: as segmentations in paragraphs, in speech turns in the interview or division into chapters of a book.
TXM has a mailing list federating its network of users in the spirit of its open and modular development, and to facilitate its use by a thriving community. Import and export functions allow the reuse of results, and interoperability with other open-source software developed by other communities.
To meet the diverse needs and practices of Humanities and Social Sciences, TXM exists both as a desktop software for use on a personal computer (Windows, Mac or Linux), and and as a webportal giving shared and controlled access to corpora through a simple web browser.