This article touches upon generalising the term metadata and metacontent so as to be useful in everyday life.
What is Content?
1. A collection of data.
Metacontent is a collection of data about data or a collection of data.
What is Data?
1. From Wikipedia “Values of qualitative or quantitative variables, belonging to a set of items.”
What is the difference between metacontent and metadata?
Metadata refers to data about data or content, whilst metacontent refers to content about data or content.
E.g. Your age is metadata whilst a picture of you is metacontent.
Note: Both data and content are fuzzy, there are no absolutes. Your age could also be interpreted as content as it includes two or more numbers, each of which has a shape, history and meaning.
How do you store metacontent? For this, there is the Open Metadata specification.
Introducing a specification, Open Metadata, governed by the following set of rules, based on Towards a Theory of Meta-content by R.V.Guha
1. The representation, manipulation and storage of meta-content should not be tied to that of the content it describes. 2. The metacontent language should be very expressive. This means that is should support referencing (cross-channel, user, time etc.) as well as many types of data formats, including text, images, video, key/value pairs etc. 3. The authoring and publication of meta-content should be separable from its consumption. 4. The metacontent language should have reflective abilities. It should be possible, from within the language, to view metacontent itself as content and thus be combined into hierarchies. 5. It should be possible to aggregate two or more channels into a single channel or into a channel of channels.
Metadata/content is stored as files underneath parent folder.
This allows for any type of format to be stored, thus fulfilling both requirement nr 1 and 2. The alternative to this would be to store metadata within an existing file-format, such as Autodesk Maya’s .ma ASCII file. Thus limiting what you can store (ASCII only) and also its consistency with other formats (other file-formats may not be text-based, but binary).
Reading and writing of metacontent is separated via the API and more so via the user interfaces that build upon them. Similar to how Acrobat comes with a more limited version called Reader, separating the authoring process from the consumption process. This covers requirement nr 3.
E.g. we could store a picture of you under:
__images__.jpg is a folder. The double underscores signifying that a set of data is contained within, this is referred to as a channel or stream of data. We could then nest another set of images within this channel, fulfilling requirement nr. 4:
Within text-based channels it is possible to refer to neighbouring channels via the reference operator (@), thus fulfilling another aspect of requirement nr 4.
Utilising the reference operator, a separate channel may be constructed that only references other channels, and thus provides a composite channel. Useful when providing an additional overview channel to a very large amount of metadata. Thus fulfilling requirement nr 5.
Additionally, all metadata could be stored under a hidden .metacontent folder, thus alleviating the need to deal with the hidden-property of files when reading/writing.
/folder/.metacontent/__referenceImages__.jpg/image1.jpg /folder/.metacontent/__referenceImages__.jpg/image2.jpg /folder/.metacontent/__documentation__.txt/1_intro.txt /folder/.metacontent/__documentation__.txt/2_high-level-overview.txt /folder/.metacontent/__documentation__.txt/3_api-documentation /folder/.metacontent/__properties__.json/properties.json
The file-format contained within any given channel is singularly defined by the channel’s extension.
/folder/.metacontent/__imagechannel__.jpg/all.jpg /folder/.metacontent/__imagechannel__.jpg/images.jpg /folder/.metacontent/__imagechannel__.jpg/are.jpg /folder/.metacontent/__imagechannel__.jpg/jpeg.jpg
What are the disadvantages to this approach when compared with databases?
- Performance. Data must be accessed via file-system mechanisms which aren’t as fast as say a database query.
- Large hierarchies that must be maintained.
What are the advantages to this approach, compared to databases?
- No separation of data means no synchronization and no duplication or data.
- Consistency. All formats and types kept together.
- Logical. The file-system maps well to how we perceive the world around us. (The ball is in the box, the box is in the room, etc.)