Another Freenet Index Format
With this format, the idea is to provide a format more suitable to binary files (videos, audio, etc), but still able to index text files by words.
All suggestions are, of course, welcome :)
IMPORTANT NOTICE
This format is not definitive ! If you want to use it, you're strongly encouraged to subscribe to the RSS feed attached to this page.
Why XML ?
- Flexibility : A program using this format will be able to add his own tags without disturb others (I must admit I'm not sure it's a good thing ...)
- Readability : XML is more readable for humans
- Coding charset : We can hope that XML parsers will help to avoid most of coding charset issues
(comment: Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems. )
Sample
Index format
<?xml version='1.0' encoding='utf-8'?> <index> <header> <title>Index title</title> <owner>Index owner nickname</owner><!-- optionnal --> <date>YYYYMMDD</date><!-- insertion date --><!-- optionnal --> <email>email</email><!-- optionnal --> <client>software (for example Thaw 0.7 rxxx)</client><!-- optionnal --> <!-- Security note: If you already know a privateKey for the given index, don't erase it with the one found in the index else a bad guy could easily do some nasty things like blocking a public index ... --> <!-- Re-security note: If you already have a privateKey: check the both matches : If not, don't use it and don't republish the privateKey ! --> <!-- Re-re-node : If you're using FCP, take care of a possible FCP injection --> <privateKey>SSK@[...]</privateKey><!-- Optionnal, of course :) --> <!-- Optionnal --> <!-- category[/subCategory[/subsubCategory[...]]] --> <!-- Used for the auto-sorting in Thaw --> <!-- No case sensitivity --> <category>freenet/thaw</category> </header> <indexes> <!-- This can be used by user to make links to other indexes --> <!-- Category is optional --> <link key="USK@[...]" category="freenet/thaw" /> <link key="USK@[...]" /> [...] </indexes> <files> <!-- a file size == 0 means that the file size is unknown --> <file id="0" key="CHK@[...]/thisIsAFile.avi" size="5242880" mime="video/x-msvideo"> <!-- Options are defined by file format filters or by the user himself --> <option name="length" value="300" /><!-- In seconds --> <option name="category" value="reportage" /> <option name="lastDownload" value="19700411" /> </file> <file id="1" key="USK@[...]/thisIsAnHTMLFile.html" size="10240" mime="text/html"> <option name="title" value="This a file in HTML" /> <option name="author" value="Someone" /> <option name="lastDownload" value="20060603" /> </file> <file id="2" key="USK@[...]/thisIsAFile.odt" size="20480" mime="application/vnd.oasis.opendocument.text"> <option name="title" value="This an OpenDocument" /> <option name="author" value="Someone else" /> <option name="lastDownload" value="20060603" /> </file> [...] </files> <keywords> <!-- v = value --> <!-- <file id="file_id">position, position</file> --> <!-- Negative position means it's in the filename --> <!-- Positions are always counted started from 1 --> <word v="ThisIsAWord"><!-- w == word --> <file id="1">2,3</file><!-- f == file --> <file id="3">1,8</file><!-- values inside <f></f> are word positions inside the given file --> <file id="7">12,1</file> </word> <word v="HTML"> <file id="1">-4</file> </word> [...] <!-- Sub-indexes list, for index splitting --> <!-- Splitting is done based on the word first letters --> <subIndex key="CHK@[...]"> <wordsStartingWith>a</wordsStartingWith> </subIndex> <subIndex key="CHK@[...]"> <wordsStartingWith>bcd</wordsStartingWith> <wordsStartingWith>brouzouf</wordsStartingWith> </subIndex> </keywords> <!-- If comments are activated: --> <comments publicKey="SSK@[...]/" privateKey="SSK@[...]/"> <!-- When you insert a comment, insert it as USK@[...]/comment/0/comment.xml ; the node will automagically put the latest revision --> <!-- See the format below --> <!-- To get the comments : do it manually with SSK@[...]comment-[rev]/comment.xml --> <!-- Always start fetching the comments from 0, even if you already know them : it avoid loosing comments with time --> <!-- Some comments may be missing, so you can try to fetch immediatly 5 comments at once --> <!-- If the index owner change the keys, purge all the message that your client know --> <blackListed rev="5" /> <!-- ignore this comment --> <blackListed rev="7" /> <!-- ignore this comment --> </comments> </index>
Sub-index format
This one is quite similar to the previous, hoping that it will help devs to reuse their code.
<?xml version='1.0' encoding='utf-8'?> <index> <files> <!-- Mode lazy on : *** Put a file list specific to this sub-index here *** (see format used in the main index) --> <!-- these files don't need to already be in the main index --> <files> <keywords> <!-- v = value --> <!-- <file id="file_id">position,position</file> --> <!-- Negative position means it's in the filename --> <!-- Positions are always counted started from 1 --> <!-- File ids must correspond to files defined in this sub-index --> <word v="AWord"> <file id="1">2,3</file> <file id="3">1,8</file> <file id="7">12,1</file> </word> <word v="Another"> <file id="1">-5</file> </word> [...] <!-- Sub-indexes list, for index splitting --> <!-- Splitting is done based on the word first letters --> <subIndex key="CHK@[...]"> <wordsStartingWith>abc</wordsStartingWith> </subIndex> <subIndex key="CHK@[...]"> <wordsStartingWith>acd</wordsStartingWith> <wordsStartingWith>azz</wordsStartingWith> </subIndex> </keywords> </index>
Comment format
<?xml version='1.0' encoding='utf-8'?> <comment> <author>putANickNameHere</author> <!-- at display, Thaw will add "@"+Base64.encode(SHA256.digest(y)) --> <text>putACommentHere</text> <signature><!-- All the comments must be signed. NO EXCEPTION --> <!-- signature in Thaw is generated using the class RSA implementation in Frost (itself using BouncyCastle) --> <!-- Signature is generated from the following content : --> <!-- comment publicKey (starting with 'SSK@' and ending to the first '/' (included)) nick name of the user (without the "@"+hashOfThePublicKey) text of the comment --> <!-- element1 + "-" + element2 + "-" + [...] --> <!-- The recommandation is to use the signature to know if a message is already in database or not --> <sig>[...]</sig> <publicKey>[...]</publicKey> </signature> </comment>
Recommendations
- One SSK key pair per index (Thaw may have some problems else)
- Video / Audio lengths always in seconds
- File sizes always in bytes
- Programs should ignore unknow tags
- Programs should not use home-made tags (ask if you want to add officially a tag)
- Attributes should be set.
- All the tags are optional
- Date format is 'YYYYMMDD' (ok, it may create problems near year 10000, but I'm not sure it will really bother us ;)