A data-oriented model for software classification

Last content update: 1999/01/05

Table of contents

Basic ideas
Glossary
The Nature field
Possible handling of file-formats
Possible handling of communication protocols
Documentation autoconvertion
Emulators
Programming interfaces
Annex: History

1. Basic ideas

The basic idea is quite simple indeed: many programs manipulate some sort of typed data (images, sound, HTML, plain text, etc.). It may not be currently (late 1999) easy for somebody to search the Debian archive for an application able to read or produce files in a particular file-format. And while that task is not easy with only one-app solutions in mind, an even harder task to identify all possible multi-apps solutions (eg, solutions involving first converting some file into another format suitable to a viewer). I propose an approach to those problems.

Thinking again about this approach, I realized it could be generalized to other areas, such as handling binary packages for use by emulators, or documentation auto-conversion, or other types of application interactions, such as client/server networking.

2. Glossary

Data class
Examples of data classes are data-file formats, executable file-formats, network protocols.
Data producer
An executable item (or maybe a library ?) that is able to produce data of some well-defined class(es).
Data consumer
An executable item (or maybe a library ?) that is able to use data of some well-defined class(es) (also refered to as virtual machine, or interpreter in earlier less formalized versions of this model).
Data translator
An executable that is able to consume data of some well-defined class(es) to produce use data of some other well-defined class(es). Both producers and consumers may be seen as particular cases or translators.

3. The Nature field

Although not directly part of the model described here, this proposed field may be useful in conjunction with it. This field is intended to describe the nature of things in a package, and its proposed possible values are quite correlated with the following sections.

Proposed possible values for the Nature field include: application, library, development-kit, documentation, server.

4. Possible handling of file-formats

Targets Nature: application.

In this case can see packages providing translators (including producers and consumers), as well as packages containing data of some class.

... for the packages

This can be handled using 2 new control-file tags.

Dataclasses:
declares classes of provided data files. Classes may be described in a hierarchical way to help selection (eg: file/image/png, file/text/xml, file/archive/rpm).
Translators:
declares a list of available translations using the package. Here are examples of possible syntax I thought about:
Translators: file/image/png:file/image/gif
Translators: file/image/png:gif
Translators: file/image/(png,gif,xpm):pnm, file/image/pnm:(png,gif,xpm)
Translators: file/source-code/fweb:(c,f77,latex)
Translators: file/source-code/c:file/executable/x86/linux

Package: eeyes
Translators: file/image/(png,gif,xpm):

Package: fractals-generator
Translators: :file/image/gif
    

You'll note possible problems with allowing implicit parts of the target dataclass name. Should probably be allowed only full name of just last component, as examplified above.

... for the frontends

Frontends can present the user with a selection mechanism using a hierarchical interface like the following:

Consumers
+- Image viewers
|  +- PNG viewers
|  |  +- pngviewer
|  |  +- xv
|  |  +- via GIF
|  |  |  +- translators
|  |  |  |  `- gif2png
|  |  |  +- imagemagick
|  |  |  `- gifviewer
|  |  +- via JPEG
|  |  |  +- ...
|  `- ...
+- XML viewers
+- RPM installers
...
    

5. Possible handling of communication protocols

Targets Nature: server and Nature: application (ie. clients).

Communication protocols are not unlike file formats, but differ in several ways, among which:

The term communication protocol is purposely generic, as it may include UNIX sockets as well as networking protocols; even other communication protocols such as SYSV message queues or shared memory may be specified if ever needed.

As for translators we'll mostly have producers and consumers here. For client/server communications we may only consider that the server is the producer and the client is the consumer, even if the client also sends some data to the server.

... for the packages

The Translators field may also be used, although its name may not be the best one. Examples would be:

Package: telnetd
Translators: :protocol/ip/tcp/telnet

Package: telnet
Translators: protocol/ip/tcp/telnet:

    

... for the frontends

The sysadmin could be proposed to choose among a list of network interfaces he wants available for local use on his machine (or network, should we have a frontend allowing to administer a set of machines), and then among a list of matching server and clients.

A similar frontend could be used on a server to tell things like I want to export a SMTP service (with restriction: not sendmail), but no TELNET service. This can make it mostly trivial to get a custom server up-and-running, especially when combined with debconf to setup the server config.

6. Documentation autoconvertion

Targets Nature: doc.

There was some time ago a discussion about only shipping the source code for documentation, and to have autoconvertion at install-time into locally defined prefered doc formats.

This could again be formalized by having a frontend ask the sysadmin to choose his prefered formats, and allowing him to select the relevant converters or chains of converters, exactly as described in the section about file-formats handling.

7. Emulators

Targets Nature: application.

In much the same way than datafile-formats can be described, binary executables can be classified, for use by various emulator programs.

Example data classes here would be:

file/executable/x86/linux
file/executable/java
    

Maybe interpreters can be described this way, but as usually script languages only have one interpreter, there may not be much interest in doing this.

8. Programming interfaces

Targets Nature: development-kit.

Development kits are not unlike all concepts presented here, using and providing a dataclass better named as API.

... for the packages

Here are possible examples:

Package: perl
Translators: :api/perl

Package: libnet-perl
Translators: api/perl:api/perl/libnet

Package: wxgtk2.1-dev
Version: api/xlib/gtk:api/wxwindows

Package: libwxxt-dev
Version: api/xlib/xt:api/wxwindows
    

... for the frontends

The user may select which type of development platform it will use (eg. api/perl).

Annex: History

This is the history of this model as I remember it. I still have to lookup through the Debian old mailing-lists archives to provide copies of original emails - these are probably too old and do not seem to show up in HTML format.

I first exposed this model a long time ago on the Debian mailing lists, speaking about a virtual machine model able to handle on-the-fly document-format translation on installation, and of possible use for describing binary formats usable by emulators. There was virtually no interest shown by other members of the Debian project so I stopped advocating.

I raised this issue again later but don't remember the circumstances well.

I raised it again on debian-project and then debian-devel when discussing an overhaul of the sectionning of the distribution (this discussion was triggered by my request to get rid of section base). Reference: Message-Id: <199810250832.JAA04703@my.mygale.org>; X-UIDL: 3ce6c35d260d84fa22f9cd534728bbc7.


Yann Dirson