Semantic technologies: How text creates knowledge

Whether the challenge is to make internal archives user-friendly or to optimally monetise the items of commercial websites: semantic processes are a key technology for the utilisation of content. They enable the extraction of information – also know­n as information retrieval – so that text can be used optimally as raw material for knowledge. Semantic technologies thus represent the cornerstones of innovative and elaborate knowledge management. The goal: to automatically extract as much knowledge from texts as possible. For this reason, semantic processes are highly relevant for the publishing world, but also for numerous other industries. With the current release of Retresco’s semantic API, interested parties can obtain a free overview of the potential and capacities of semantic technologies – for more knowledge and more valuable data.

How do semantic technologies work?

In the digital world, semantic technologies automate the classification of text and allow content to be assigned to specific thematic areas.

The origin of content does not play a decisive role in the analysis of meaning. Algorithms for semantic analysis process content from any available internal or external source. These include content management systems (CMS) and customer relationship management (CRM) solutions, as well as files from the company’s own intranet and openly accessible sources on the internet.

Text, audio, video or office formats – basically, any digital file format is suitable for semantic enrichment. But even if images and sounds can be analysed in principle, the semantic enrichment of texts is the most developed. Semantic text enrichment analyses headlines, teasers, texts and metadata of all content. An algorithm searches through the content for specific keywords and identifies what are referred to as entities i.e. relevant persons, places, organisations, events and general keywords.

The algorithm then calculates relevance information, which determines how important an entity is for the meaning of a text. In a third step, semantic enrichment marks recognised entities in a text.

It is in this way that structured, machine-readable data is created from unstructured texts and content is generated that can be digitally utilised by any company. The functions of semantic solutions for companies and institutions can be described briefly as follows:

  • Derive structured data from unstructured content by extraction
  • Automatic tagging of entities and internal linking
  • Classification and curation of content
  • Machine learning for the optimal evaluation of historical data
  • Easy integration into any CMS

In concrete solutions, semantic enrichment supports, for example, the editors of news portals in the automatic indexing of articles. Tools for the creation of topic pages, which aggregate all contributions of a news website in a user-friendly way, are also based on semantic enrichment. Significant effects can also be observed through the visibility of information in search engines, for user-friendliness and the economic use of digital content.

Where are semantic technologies used?

The industries and application areas in which semantic processes can be used are as varied as they are exciting: wherever knowledge is gained from internal data troves or where information from different sources is to be aggregated and (re)utilised, the use of semantic technologies comes into question. Companies and organisations benefit in particular from the following advantages:

  • Gain knowledge from internal stores of data
  • Increase visibility and relevance of their own topics
  • Aggregate and utilise content from various sources

Semantic methods are therefore particularly useful where the optimal management of a constantly growing volume of text content is important for the strategic goals of an organisation. Publishing houses, archives and service departments are potential beneficiaries of semantic solutions. But companies from other industries – for example e-commerce traders with large amounts of product data and texts or companies in the field of knowledge management – also use semantic technologies to classify their product range.

For example, the media library of the Stasi document archive was able to organise the multimedia contents of its online database more efficiently by using semantic technologies. The result: the diversity of the multimedia media library has been made more visible, and access to the documents in the database optimised.