Modelling of Product Knowledge in the Framework of Multilingual Technical Documentation

Dietmar Rösner
Björn Höfling
Thorsten Liebig
Otto-von-Guericke-Universität Magdeburg
Institut für Informations- und Kommunikationssysteme
P.O.Box 41 20, D-39016 Magdeburg, Germany
Email: (roesner,hoefling,tliebig)@iik.cs.uni-magdeburg.de

Abstract:

Multilingual technical documentation is increasingly important due to the rising complexity of technical products. Our main point in this paper is the fact that it is possible to model product knowledge in such a way that this representation, together with additional linguistic resources, can be used for the knowledge based generation of multilingual technical documentation. The advantages of this approach are availability, timeliness and consistency between the state of the product and its documentation in multiple languages. For the different text types in technical documents, different kinds of knowledge are necessary. Parts of the product knowledge can be obtained by reusing data which have already been gained by the designers and engineers during the product life cycle. For the other sources, authoring tools and workbenches can be designed to interactively build up the necessary knowledge.

1 Introduction

In manufacturing enterprises, technical documentation is not only a minor addition to the product itself. Instead, it plays a key role because the usage would not be possible without a detailed documentation, especially in the case of complex technical objects. In addition, it is even legally required, e.g. in the EU legislation.

To illustrate the extent that technical documentation can reach: [LCM 91] mention that the weight of the documentation of an aeroplane in printed form can sometimes surpass the weight of the aeroplane itself. Moreover, the documentation of a product can be a time-consuming and cost-intensive factor for an enterprise such that optimizing efforts are worthwhile.

Some important properties and requirements of technical documentation are:

a short time lag between product development and documentation
conformity of the product and its documentation
availability in all necessary languages with an acceptable delay
version consistency (changes in the product have to be reflected in all versions of its documentation in the different languages)
a consistent terminology should be used
a tailoring to different users and different purposes should be possible

The process of creating technical documentation is -- compared with other aspects of the product development process -- still less automated. Over the last years, we developed a prototypical software system [SR94] that aims at automatically generating at least parts of multilingual technical documentation from a language independent representation of form and content. Our research concentrates on knowledge based natural language generation as a feasible technique to fulfill the requirements listed above.

In the next section, we will focus on different text types in technical documentation before we describe a prototype which has shown the practicability of our general approach to generate instruction texts from plans. The following chapter describes approaches for the modelling of knowledge for different text types and purposes. How the necessary domain knowledge can be obtained is discussed afterwards before we conclude with some open issues.

2 Text Types in Technical Documentation

There are a variety of text types in technical documentation.

Some differences are due to different target groups: end consumers typically get other manuals than technicians that are responsible for maintenance or repair. Other distinctions can be made with respect to the type of content (which in turn is related to differences in usage):

structured technical data in tabular or other schematic format
instruction texts: How to perform maintenance, checking or repair actions?
functional explanation: What is the purpose of a technical device? How does it function in normal operation?
guidance for fault diagnosis: How to perform trouble shooting?

For our approach it is central to discuss these different text types under the following perspectives:

What type of knowledge about a product has to be available as a basis for the automatic generation of the texts ?
What techniques have to be employed to map from the product related knowledge onto a representation of the intended text (a process termed `text planning' in natural language generation)?
What (language dependent) linguistic resources have to be employed for the actual production of the surface version of the text (a process termed `realisation' in natural language generation)?

From the viewpoint of knowledge representation and from natural language generation, structured technical data are not very demanding. An attribute-value pair representation will suffice in many cases, no sophisticated techniques for text planning and realisation are necessary. But for multilingual delivery, a multilingual terminology base will be an indispensable linguistic resource.

The other text types are more demanding both with respect to the knowledge to be modelled as well as to the processes in generation.

3 A Prototype: From Plans to Instruction Texts

In the TECHDOC project [SR94], initial considerations about the representation of technical documents have been gained by the comparative analysis of part of manuals in different languages in order to differentiate between content and linguistic form of a text [RHH96].

For any adequate modelling of physical products, it is important to identify the relevant phenomena, and to identify exactly the appropriate level of detail to model each phenomenon in the focused application of multilingual text generation. To identify these, we focused our representation objects from a functional and structural perspective.

Our technical model is also divided into a part with abstract technical knowledge (about connections, container etc.) and a part describing a concrete product in detail.

The underlying knowledge structure of instruction and maintenance texts can be well formalized by means of plans [RHH96]. These plans are represented in the knowledge base and consist of one or more steps or other plans. The technical editor has to specify the contents of a maintenance or instruction plan. The plan can include links to additional information sources e. g. tables, pictures or animations. A discourse planer converts the plan in a textual representation due to a mapping onto discourse relations. In a next step these relations are linearized and send to the language specific generators.

Figure 1: The TECHDOC Architecture

The general architecture of the TECHDOC system is given in figure 1. Starting from an instance of the concept plan, a document structure is generated. The elements of the document structure are complex RST relations. Details of this transformation are described in [RS92b].

In the next step the discourse structures are broken down into clause sequences. This transformation must determine

clause borders,
the structure of single clauses,
the theme of the clauses,
appropriate referential expressions (pronouns, definite or indefinite descriptions) within the clauses and
select a linguistic realisation (e.g. function word or phrases) that expresses the rhetorical relation.

We used a pattern based algorithm that incorporates linguistic and semantic constraints and preferences. Details of this transformation are described in [RS92a]. As a result of this transformation, a sequence of SPL terms is created.

SPL terms can be used as input to the sentence generator PENMAN [Man83]. In the TECHDOC project we enlarged the grammar coverage such that an English, a German and a French version of the clause can be generated from an SPL term . In a final step the document structure is exploited for an automatic formatting of the output text. That takes the output medium (screen or printed) and the chosen formatting device (e.g. ASCII, SGML, LaTeX, ...) into account.

4 Beyond Instructions

As mentioned above, the underlying knowledge structure of instruction and maintenance texts can be well formalized by means of plans. These plans consist of one ore more actions or other plans. Furthermore a detailed representation of actions allows the qualitative simulation of the instruction steps in order to verfy the completeness of a plan or to generate hints and warnings automatically. Consider, for example, the consequences of the preparatory instruction `warming up the engine' of a car for the maintenance process of changing oil: all parts and fluids directly connected to the engine become hot. This implicit knowledge is infered by the reasoning system due to the physical model of the represented product. It can cause an automatically generated warning whenever hot objects (e.g. the oil) will be involved in future instruction steps. This example illustrates that for those kinds of reasoning tasks one will need a domain representation based on a model of physical features as well as on qualities and status properties.

We propose for the representation to combine structural and functional information about a complex object (e.g. a device) [LRn96]. As discussed in [Keu91], functional structuring is useful because, to understand the functioning of a complex device, problem-solving mechanisms must often decompose the device's function into the component's functions. The functional specification describes the device's goals at a level of abstraction that is of interest at the object level. The function of a device is its intended purpose, which is achieved by behaviours [Keu91]. Our model represents behaviour as transitions of partial states/predicates. In the case of an Electrical-Power-Supply-Facility, for example, the intended states are either ``power-supplied'' or ``not-power-supplied''. In contrast the part-whole relation [AFGP96] is an example for a structuring based on a physical organisation of components which can be used for the schematic presentation of structured technical data.

The structural organisation based on functional components supports the process of functional explanation and fault diagnosis. As discussed in [Mey88], explicit modelling is one of the requirements which have an impact on reusability, understandability and extendibility, which are considered as crucial modelling quality factors.

According to [AFGP96], additional minimal requirements of a conceptual model, which are able to capture the ontological nature of both parts and wholes, are the capabilities to express ``vertical'' and ``horizontal'' relationships and constraints. For instance certain locative properties of the whole hold also for its parts. An example of a property which the whole inherits from its parts is the status to be defective (an electrical device with one defective component is expected not to work as normal). Horizontal relationships are composed of constraints among parts which characterize the integrity of the whole. Although they are important for capturing the notion of a whole, they find little space in current modelling formalisms [AFGP96].

5 Obtaining the Necessary Domain Knowledge

In the TECHDOC project, the necessary knowledge for the generation of instruction manuals has been gained by analysing some written manuals and representing their content in a language independent way [RHH96]. This has been done for research purposes and to show the practicability of the general approach. However in real applications the acquisition of the knowledge has to be integrated as closely as possible into the product life cycle to benefit from the reuse of product knowledge and data already necessary for other purposes.

Reconsider the structured technical data mentioned in the second section: the integration goal can be reached for example by representing both the geometric and the additional product data by means of ISO 10303 (STEP) in order to be sharable among different applications. Many CAD and CAE software tools used by the product developers already have the ability to exchange information by using this standard. These data should be reused also for the creation of technical documents.

For the other text types mentioned in the second section additional knowledge has to be obtained. One possibility for getting this knowledge is by using authoring tools. [RHH96] outline a scenario for authoring product specific knowledge and integrating it into a corporate memory. This should at best be carried out by the designers and product engineers as part of their activities. Thus, the acquisition of product specific knowledge will be tightly integrated into the overall engineering process.

In addition to this re-use of product knowledge, [GR] describe a workbench which enables the technical editor to specify the content of documents starting from the definition of its macro structure. The three main components of this workbench are an authoring tool for the interactive planning and structuring of a documentation, a document generator which takes this specification as input and generates the formatted multilingual documentations and a module to administer the relevant knowledge bases. The workbench also takes into account the integration of external knowledge sources.

One major research area for the sharing and integration of knowledge are ontologies. By formally defining concepts and their interrelationships they can allow for a shared understanding of a specific domain. With the help of ontologies, external knowledge sources that use this vocabulary can be integrated because the semantics of the entities is known. [WSJ 94] describe three formalisms which are especially suited for describing ontologies. For detailed descriptions of the structure of technical systems they propose EXPRESS, the object oriented data definition language of STEP. In their toolkit named VOID, they even offer tranlators from EXPRESS to other formalism like Ontolingua and vice versa. This fact at least facilitates the use of ontologies at the syntactic level in the industrial environment because of its compatibility with other product knowledge modelling efforts.

6 Open Issues

There are a number of open issues. We want to mention the following: In the STEP community the problem of integrating the technical documentation into the product life cycle has also been taken into consideration. In the field of documentation, the SGML family of standards is designed to model textual information for publishing purposes. There are approaches to combine the STEP and the SGML world [Swe] which have been successful in proposing a method for using SGML text strings as STEP objects. So STEP product models can contain text which can be used to produce SGML documents. Even with a tighter integration of STEP and SGML capabilities, for example the referencing of STEP entities in a uniform way from SGML-documents, there is still the risk that it will remain only at the level of canned text. We doubt that this approach will fulfill the real needs of multilingual technical documentation.

Two research projects following the approach of document creation based on STEP are DOCSTEP (Technical documentation creation and management using STEP) at RPK Karlsruhe/Germany and VOLVEX (Validation of specifications by natural language generation for VOLVO expressed in EXPRESS/STEP) at DSV Stockholm/Sweden which has recently been started. As these projects and others are reusing technical product information formalised in STEP/EXPRESS, which is already available in the enterprises, there arises the question whether this formalism already offers the necessary modelling capabilities. Our impression is that current product models like STEP lack the ability to model additional knowledge like instruction plans or functionality of complex products. There are initiatives to extend the expressiveness of EXPRESS that could hopefully take also into account the requirements coming from the area of technical documentation. Another possibility would be to extend the entities modelled in STEP/EXPRESS by providing additional external knowledge or to enlarge the definition of future application protocols with this knowledge.

References

AFGP96: Alessandro Artale, Enrico Franconi, Nicola Guarino, and Luca Pazzi, Part-whole relations in object-centered systems: An overview, Data & Knowledge Engineering Journal - North-Holland, Elsevier; special issue on Modelling Parts and Wholes (1996), no. 20, 347 - 383.
GR: B. Grote and T. Rose, A Workbench for the Production of Multilingual Technical Documentation, submitted to the German Conference on Artificial Intelligence (KI'97).
Keu91: Anne M. Keuneke, Device representation, the significance of functional knowledge, IEEE EXPERT (1991), 22 - 25.
LCM91: John Levine, Alison Cawsey, Chris Mellish, Lawrence Poynter, Ehud Reiter, Paul Tyson, and John Walker, IDAS: Combining hypertext and natural language generation, Proc. of the Third European Workshop on Natural Language Generation (Judenstein, Austria), 1991, pp. 55-62.
LRn96: Thorsten Liebig and Dietmar Rösner, Modelling of reusable product knowledge in terminological logics: a case study, Proceedings of the First International Conference on Practical Aspects of Knowledge Management (PAKM '96) (Michael Wolf and Ulrich Reimer, eds.), vol. 2, October 1996.
Man83: William C. Mann, An overview of the PENMAN text generation system, Proc. of the National Conference on Artificial Intelligence, AAAI, August 1983, pp. 261-265.
Mey88: B. Meyer, Object-oriented software construction, Prentice Hall, New York, 1988.
RHH96: Dietmar Rösner, Björn Höfling, and Knut Hartmann, From natural language documents to sharable product knowledge, Proceedings of the First International Conference on Practical Aspects of Knowledge Management (Michael Wolf and Ulrich Reimer, eds.), vol. 2, 1996.
RS92a: Dietmar Rösner and Manfred Stede, Customizing RST for the Automatic Production of Technical Manuals, Aspects of Automated Natural Language Generation - Proc. of the 6th International Workshop on Natural Language Generation (R. Dale, E. Hovy, D. Rösner, and O. Stock, eds.), Springer, Berlin/Heidelberg, 1992, pp. 199-214.
RS92b: Dietmar Rösner and Manfred Stede, TECHDOC: A System for the Automatic Production of Multilingual Technical Documents, , Reihe Informatik aktuell, Springer, Berlin/Heidelberg, 1992.
SR94: Manfred Stede and Dietmar Rösner, Generating multilingual documents from a knowledge base: The TECHDOC project, COLING-94, Proceedings (Kyoto), 1994.
Swe: Swedish Association for CALS, Interoperability between STEP and SGML, White Paper, http://www.admin.kth.se/SGML/Bibliotek/Litteratur/whitep/wp.html.
WSJ94: B. Wielinga, G. Schreiber, W. Jansweijer, A. Anjewierden, and F. van Harmelen, Framework and Formalism for Expressing Ontologies, Deliverable D01b1, ESPRIT Project 8145 KACTUS, 1994, http://www.swi.psy.uva.nl/projects/Kactus/abstracts/KACTUS-D01b1.html.

Bjoern Hoefling
Fri Apr 25 13:27:25 MET DST 1997