Another Freenet Index Format
With this format, the idea is to provide a format more suitable to binary files (videos, audio, etc), but still able to index text files by words.
All suggestions are, of course, welcome :)
IMPORTANT NOTICE
This format is not
definitive ! If you want to use it, you're strongly encouraged to subscribe to the RSS feed attached to this page.
Why XML ?
- Flexibility : A program using this format will be able to add his own tags without disturb others (I must admit I'm not sure it's a good thing ...)
- Readability : XML is more readable for humans
- Coding charset : We can hope that XML parsers will help to avoid most of coding charset issues
(comment:
Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems. )
Sample
Index format
<?xml version='1.0' encoding='utf-8'?>
<index>
<header>
<title>Index title</title>
<owner>Index owner nickname</owner><!-- optionnal -->
<date>YYYYMMDD</date><!-- insertion date --><!-- optionnal -->
<email>email</email><!-- optionnal -->
<client>software (for example Thaw 0.7 rxxx)</client><!-- optionnal -->
<!-- Security note: If you already know a privateKey for the given index, don't erase it with the one found in the index
else a bad guy could easily do some nasty things like blocking a public index ... -->
<!-- Re-security note: If you already have a privateKey: check the both matches : If not, don't use it and don't republish the privateKey ! -->
<!-- Re-re-node : If you're using FCP, take care of a possible FCP injection -->
<privateKey>SSK@[...]</privateKey><!-- Optionnal, of course :) -->
<!-- Optionnal -->
<!-- category[/subCategory[/subsubCategory[...]]] -->
<!-- Used for the auto-sorting in Thaw -->
<!-- No case sensitivity -->
<category>freenet/thaw</category>
</header>
<indexes>
<!-- This can be used by user to make links to other indexes -->
<!-- Category is optional -->
<link key="USK@[...]" category="freenet/thaw" />
<link key="USK@[...]" />
[...]
</indexes>
<files>
<!-- a file size == 0 means that the file size is unknown -->
<file id="0"
key="CHK@[...]/thisIsAFile.avi"
size="5242880"
mime="video/x-msvideo">
<!-- Options are defined by file format filters or by the user himself -->
<option name="length" value="300" /><!-- In seconds -->
<option name="category" value="reportage" />
<option name="lastDownload" value="19700411" />
</file>
<file id="1"
key="USK@[...]/thisIsAnHTMLFile.html"
size="10240"
mime="text/html">
<option name="title" value="This a file in HTML" />
<option name="author" value="Someone" />
<option name="lastDownload" value="20060603" />
</file>
<file id="2"
key="USK@[...]/thisIsAFile.odt"
size="20480"
mime="application/vnd.oasis.opendocument.text">
<option name="title" value="This an OpenDocument" />
<option name="author" value="Someone else" />
<option name="lastDownload" value="20060603" />
</file>
[...]
</files>
<keywords>
<!-- v = value -->
<!-- <file id="file_id">position, position</file> -->
<!-- Negative position means it's in the filename -->
<!-- Positions are always counted started from 1 -->
<word v="ThisIsAWord"><!-- w == word -->
<file id="1">2,3</file><!-- f == file -->
<file id="3">1,8</file><!-- values inside <f></f> are word positions inside the given file -->
<file id="7">12,1</file>
</word>
<word v="HTML">
<file id="1">-4</file>
</word>
[...]
<!-- Sub-indexes list, for index splitting -->
<!-- Splitting is done based on the word first letters -->
<subIndex key="CHK@[...]">
<wordsStartingWith>a</wordsStartingWith>
</subIndex>
<subIndex key="CHK@[...]">
<wordsStartingWith>bcd</wordsStartingWith>
<wordsStartingWith>brouzouf</wordsStartingWith>
</subIndex>
</keywords>
<!-- If comments are activated: -->
<comments publicKey="SSK@[...]/" privateKey="SSK@[...]/">
<!-- When you insert a comment, insert it as USK@[...]/comment/0/comment.xml ; the node will automagically put the latest revision -->
<!-- See the format below -->
<!-- To get the comments : do it manually with SSK@[...]comment-[rev]/comment.xml -->
<!-- Always start fetching the comments from 0, even if you already know them : it avoid loosing comments with time -->
<!-- Some comments may be missing, so you can try to fetch immediatly 5 comments at once -->
<!-- If the index owner change the keys, purge all the message that your client know -->
<blackListed rev="5" /> <!-- ignore this comment -->
<blackListed rev="7" /> <!-- ignore this comment -->
</comments>
</index>
Sub-index format
This one is quite similar to the previous, hoping that it will help devs to reuse their code.
<?xml version='1.0' encoding='utf-8'?>
<index>
<files>
<!-- Mode lazy on : *** Put a file list specific to this sub-index here *** (see format used in the main index) -->
<!-- these files don't need to already be in the main index -->
<files>
<keywords>
<!-- v = value -->
<!-- <file id="file_id">position,position</file> -->
<!-- Negative position means it's in the filename -->
<!-- Positions are always counted started from 1 -->
<!-- File ids must correspond to files defined in this sub-index -->
<word v="AWord">
<file id="1">2,3</file>
<file id="3">1,8</file>
<file id="7">12,1</file>
</word>
<word v="Another">
<file id="1">-5</file>
</word>
[...]
<!-- Sub-indexes list, for index splitting -->
<!-- Splitting is done based on the word first letters -->
<subIndex key="CHK@[...]">
<wordsStartingWith>abc</wordsStartingWith>
</subIndex>
<subIndex key="CHK@[...]">
<wordsStartingWith>acd</wordsStartingWith>
<wordsStartingWith>azz</wordsStartingWith>
</subIndex>
</keywords>
</index>
Comment format
<?xml version='1.0' encoding='utf-8'?>
<comment>
<author>putANickNameHere</author> <!-- at display, Thaw will add "@"+Base64.encode(SHA256.digest(y)) -->
<text>putACommentHere</text>
<signature><!-- All the comments must be signed. NO EXCEPTION -->
<!-- signature in Thaw is generated using the class RSA implementation in Frost (itself using BouncyCastle) -->
<!-- Signature is generated from the following content : -->
<!-- comment publicKey (starting with 'SSK@' and ending to the first '/' (included))
nick name of the user (without the "@"+hashOfThePublicKey)
text of the comment
-->
<!-- element1 + "-" + element2 + "-" + [...] -->
<!-- The recommandation is to use the signature to know if a message is already in database or not -->
<sig>[...]</sig>
<publicKey>[...]</publicKey>
</signature>
</comment>
Recommendations
- One SSK key pair per index (Thaw may have some problems else)
- Video / Audio lengths always in seconds
- File sizes always in bytes
- Programs should ignore unknow tags
- Programs should not use home-made tags (ask if you want to add officially a tag)
- Attributes should be set.
- All the tags are optional
- Date format is 'YYYYMMDD' (ok, it may create problems near year 10000, but I'm not sure it will really bother us ;)