Google Summer of Code/2012

From Freenet Wiki
Jump to: navigation, search

Past experience

We have been involved in 2006, 2007, 2008, 2009, 2010, 2011, 2012 and 2013; in 2006, 2007, 2009 and 2010 we had lots of students, most of whom achieved something of value, and a few of whom became volunteer developers for Freenet. In 2008, we had only one student, who failed because of external issues. This may be partly due to us having a poor ideas list that year; many ideas were not feasible, were too big etc.

Students are much more likely to be accepted if they demonstrate an ability to contribute by e.g. contributing a bug fix or minor feature; judging students based purely on their applications is not viable. Having said that, a student's application is important, because it specifies what they are building and therefore how we can determine whether they have succeeded; the initial application is a starting point, and we will help to fill in the detail and ensure that it is feasible by the time you are accepted.

Students will be required to communicate publicly on the mailing lists, and where appropriate via IRC, the wiki and so on; your mentor is there for you, but talking only to your mentor is inefficient and ultimately bad for the project; we want to treat you as a developer (which means a valued volunteer), while giving whatever help is possible and needed. Also note that almost all of Freenet is written in Java.

Example Proposal Ideas

Please do not be limited by the below list. Students' own proposals certainly will be considered. You might also want to look at the uservoice page. In many cases, detailed proposals (which may be wrong but will be useful references) can be found on the bug tracker or the mailing lists. Remember that you can make up to 20 applications to us with different proposals; we encourage you to make as many applications as you want!

Web interface

Improve the web interface generally
There is much to do to make it more user friendly. We will hopefully have a set of mock-up designs soon, but there are also many small things linked from this bug report; many are not, please have a look around, also check the mailing list archives. A recent but not very detailed, professional mockup focusing on the homepage is here: [1]. An older detailed suggestion might also be an inspiration.
Get the web-pushing mode working well
A previous Summer of Code student built the web-pushing ajax/comet/GWT system, which does things like showing progress on pages with lots of images to speed things up when browser connections are limited, and live updating of the downloads page etc. This is rather buggy especially on slow browsers, and tends to lock up, so fixing this, improving it and then extending it to other areas (e.g. Freetalk) would be interesting. *Note:* We have recently had someone volunteer to rewrite our entire web interface using GWT, which would obviate the need for web-pushing mode - please consult the mailing list about this.

Code infrastructure and API

For some projects in this section, the best solution would simply use an external library to do most/much of the work, and simply integrate it into the current freenet code. So you should do some research into existing solutions first. This principle applies to all projects, but the ones where suitable libraries are most likely to actually exist, are tagged prefer-external.

Switch from the current code to JCA

  • Mentors: nextgens
  • Difficulty: 5/10

This project is about changing how we do cryptography in freenet. At the moment, we use our own-implementation of standards... This project is about switching to JCA and BouncyCastle.

The goal of the project is:

  • Review and question the usage of cryptography thorough the whole codebase
  • Switch from our own java (or JNI) implementation to a JCA equivalent
  • Benchmark or profile hot codepaths and propose enhancements
  • Enable users to use crypto-accelerators where possible
  • Write unit tests to ensure some form of backward compatibility with the existing code

This project can be divided in several subparts, come talk to me on IRC about it :)

Auto-update system

  • Mentors: infinity0
  • Difficulty: 5/10

The current updater makes some very specific assumptions as to the locations of the jars, and this method doesn't generalise to non-jar files. It also assumes wrapper.conf is in the current directory. This is at odds with attempts to package Freenet according to FHS. (There is already a debian package but currently this disables auto-update. Even though this is likely to remain the case due to Debian policy, we can use this package as an example to test out the enhanced auto-updater on.)

The auto-updater should be able to:

  • update jars, freenet.jar and freenet-ext.jar, and the various 3rd-party dependency libraries. (Currently said libraries are packaged inside freenet-ext.jar, but the newer version separates them out. The current auto-updater can't handle this, so we can't yet deploy the newer freenet-ext.)
  • update any native binaries/libraries (e.g. the wrapper files)
  • the above probably means having some sort of manifest that lists which files are actually installed. If this is the case, all of our installers need to be change to include such a manifest too.

Events framework and API

  • Mentors: infinity0
  • Difficulty: 9/10
  • Tags: prefer-external

freenet uses a custom-written framework for running callbacks and scheduled tasks, which is non-standard and lacks a lot of advanced functionality. Converting everything to use java.util.concurrent and/or com.google.common.util.concurrent would help this a lot. Ideally, the framework would be separated from the rest of freenet, possibly even to the point of being an external library, and provide a clean API for freenet and its plugins to use. This would also allow plugins to do much more advanced tasks (e.g. most of the things in the "plugins and applications" section further below) without having to implement their own framework.

The events framework should be able to, and provide a good API for plugins to:

  • run tasks in the background, or according to a particular schedule
  • define task+thread matching policies e.g. all database tasks must run in the "database" thread
  • cancel running tasks, if the task defines that it's cancellable
  • handle dependencies between tasks, so that e.g. if A depends on B and we cancel/complete B, A is automatically cancelled/completed.
  • attach callbacks to events such as task completion/cancellation (e.g. using com.google.common.util.concurrent.ListenableFuture)
  • group tasks into related categories so one group doesn't affect the other. (e.g. tasks for the group "Library/b-tree-write/index-<URL>" won't starve the resources of group "WoT/background-trust-calculation") (this is possibly logically equivalent to the "matching policies" point above but I haven't thought too much about it)

Config management

  • Mentors: infinity0
  • Difficulty: 6/10
  • Tags: prefer-external

Currently, config management code is very untidy. Code for read/writing the config in embedded inside the classes they control, which is spread out over many files not related to config management. This creates lots of implicit dependencies, which is not directly apparent in the syntax/structure of the code nor the semantics of the language. (In other words, spaghetti code.)

Config code should be separated from the actual application. It would also be nice if there was an API to expose this to plugins.

The config system should be able to:

  • support everything the current config system supports
  • use "includes" to separate different sections into different files - e.g. keep vital system-specific settings in a separate read-only file
  • read and write values, preserving comments in the file.
    With regards to mutability, there are three types of values:
    • immutable
    • mutable (takes effect after a restart)
    • mutable (effects the running daemon immediately)
    Note that the actual code for doing the setting should be implemented outside of the config system.

Using the Google Guice framework would probably help this a lot as well.

Plugins and applications

Integrating DVCS via Infocalypse (Mercurial) with the Web of Trust

  • Mentors: ArneBab
  • Difficulty: 4/10

Currently code sharing with infocalypse is still a bit inconvenient. Cloning and sharing a repository requires: `hg init; hg fn-pull --uri USK@…; hg up; (hack); hg fn-create --uri USK@/mycode.R1/ ; (share the public key somehow)`

This should be simplified to: `hg clone freenet:ID/reponame; cd repo; (hack); hg push freenet:ID/reponame`

The ID should be the name of a given WoT ID. If that name is not unique, any number of letters from its public key can be added. The URI scheme is an extension of what hg-git offers, so it is doable in finite time. The bigger task would be to add clean integration with the WoT.

Also infocalypse needs some additions to its USK handling, to ease discovery of new revisions.

Simplified URI schemes would be `hg clone freenet:reponame`: If an ID has the repo, get it, if multiple IDs have a repo of that name, ask which one to take.

The overall goal is turning freenet with infocalypse into a decentral, anonymous, community hacking platform by connecting it over the Web of Trust.

Deliverables:

  • Clone, pull and push via the freenet:key/repo scheme.
  • As key allow simple USK keys or WoT IDs.
  • If a WoT ID is used, advertise the repo info in the WoT ID.
  • Optionally provide additional convenience measures, like automatically creating freesites with information about the repository. See the site extension for an example.
  • Optionally add pull requests and anything else needed for a full fledged community hacking platform.
  • Optionally: “find forks of my repo in freenet”. Infocalypse would have to register a WoT context, which contains the public key as well as the public key the repo was cloned from (if it was cloned from freenet). Then any plugin could connect the forks again. It could actually offer the option `hg path --add-forks`, which gets all public forks in freenet.

Further Ideas

A good filesharing/file search system
This should tie in with the Web of Trust, allowing users to publish indexes and search those of their anonymous friends, rate others' indexes, merge them into their own, set up long-term file searches, preload indexes for faster searches, and so on. It might also integrate with Freetalk to help with discussions on labelling or rating. The problems of spam/deliberately corrupt content are very similar on Freenet to on traditional p2p, although the solutions may be different, especially as it isn't possible to trace spammers; trusted community maintained indexes have developed as a working means of solving these problems on web-based filesharing. Note that we already have a scalable forkable on-freenet btree search system to use as a backend, but it is not yet used for anything, and it is not distributed or WoT-compatible.
Another interesting area for filesharing is a distributed, WoT-based way to download data by conventional hashes rather than CHKs, which could tie in with other networks; this is also related to the wierd stuff (backups) at the bottom.
Secure reinsert-on-demand filesharing, to improve the volume of content that is available. This is a lot harder than it sounds, but in any case we need searching first. (Reinserts using the same keys are seriously insecure, although getting the downloaders to do random inserts may help significantly)
Data retention plugin
Some very small changes at the node layer will allow probing for whether a key is available from a random node. Users could then maintain a list of content that they care about; the plugin would download it all as a binary blob, regularly probe for reachability, and reinsert it when needed (possibly single random not fetchable blocks). For this to work really well we might need selective reinsert support in the client layer, but that's not necessary for a basic implementation. What is important is supporting both files and sites, and letting users publish their lists and subscribe to others. This could even evolve into a full blown distributed backup system, with configurable encryption for files and the ability to recognise which files are popular and avoid inserting them (might need fixing GetCHKOnly).
A microblogging and/or real-time chat system
Both of these things would actually be implemented in a fairly similar way. Evan has done a fair amount of work on how to efficiently implement microblogging over Freenet. Sone does something like this but is fairly greedy with network resources. Flip does IRC.
FCP libraries
Good FCP libraries in more languages.
Easy-to-use tools for inserting freesites (freenet-hosted web sites) and files
We already have a blogging tool, but it needs more work, and tools to make it easy to insert existing content etc would also be useful. This should support uploading files of any size, should avoid re-uploading larger files on every update, but should be configurable to do so on a schedule, should work from within the freenet web interface as a plugin, and may support WebDAV uploads direct from authoring software. The ability to mirror stuff from the web would also be useful.
Scalable fork-and-merge distributed revision control over Freenet
This would integrate the new scalable on-Freenet b-trees from the new Library format by infinity0, in order to scale up to at least Wikipedia scales (to implement a wiki over Freenet using a fork-and-merge model). It would tie in closely with the Web of Trust (the trust network backing Freetalk), integrating with its identities and announcing forks, and allowing users to easily see changes in other forks and integrate them. The most obvious use for this is a wiki-over-freenet (note that because of spam and denial of service attacks, instant anonymous editing of a wiki on freenet is not possible), it might also be useful for distributing spidering Freenet, for source code (e.g. if we want to deploy a new build only after a certain number of people we trust have signed it, and then build it from source), or for anything that needs a forkable database over Freenet. You might also need to optimise the btrees' data persistence by e.g. including the top level metadata for each chunk in the next layer up. Scalable fork and merge databases are closely related to this. Infocalypse already provides a working plugin for efficient sharing of Mercurial repositories, but it needs integration with the Web of Trust, and added efficiency through btrees would be useful.
Better freesite searching
Lots of work has been done on this, but more could be done: Using the new library format, rewriting the indexes on the fly after gathering a few hours' data rather than writing it from the database over a week, support for long-term searches, web of trust integration, better support for stop-words (maybe aggregating them with common before/after words), tokenisation for tricky languages (Chinese, Japanese), distributing spidering across multiple users (as scaling is getting to be a serious problem now), etc.
Wiki over Freenet
A wiki over Freenet would be really awesome. In fact it could be a killer app. But it is not easy to implement, as there are several challenges. You can learn more there. There have been many attempts; some are hard to use and based on DSCMs, some are easier to use and not scalable.
WebOfTrust optimization
Currently the Web of Trust plugin is quite slow. Optimizations on the database level and on synchronization strategies should offer the chance it much faster. A first step would be profiling to find the real performance bottlenecks. Care will be needed to ensure using scalable algorithms.

Client layer

More content filters
We have to "filter" HTML, images, etc to ensure that they are safe for the web browser, and won't give away the user's IP address via inline images, scripting etc. Finishing the SVG filter written for 2009, implementing support for SVG embedded in XHTML embedded in ATOM (we have an ATOM filter but it is not integrated yet), maybe an RSS filter, would be useful. More audio and video formats would be very helpful (particularly WebM and H.264), and with HTML5-based video playback support could make embedded video almost viable. Last year's SoC included the beginnings of a javascript player but it is far from what it could be: Making it really viable would require deeper changes related to fetching data in order, access to partially downloaded content, and possibly an applet to show which parts have been downloaded and maybe to display those formats that we support (likely ogg) in browsers that don't support them. See here for more on embedded video playback: [2]. PDF would be very valuable but the spec is huge, however it is believed that minimal sufficient functionality is not *so* huge. ODF is similarly a possibility but again is gigantic. Javascript is an option for JS geniuses (create a safe API and then force the JS sent to the browser to only use that API; please talk to us in detail as there are many side-issues with any sort of "safe scripting", we have some ideas about safe APIs though, either based on only being able to access keys related to specific users which are manually approved, or fixing the fetch times).

Low to mid level stuff

Network Probes, Statistics, and Analysis

  • Mentors: Mentor requested
  • Difficulty: ?/10

operhiem1 intends to address this for Google Summer of Code 2012.

Freenet is intended to form a topology which allows efficient local path selection. Unfortunately, currently next to nothing is known about the actual link length distribution. This makes it very difficult to improve as it is only measured indirectly. It is therefore proposed to expose over FCP the probe functionality which is currently undocumented and only available on TMCI, expand probe functionality, and write tools to more thoroughly and quickly present information gathered by the probes.

  • Clean up the existing probe code, both to document, improve readability, and correct errors and annoyances:
    • Reported peer location overlap: peer locations are reported in an ambiguous overloaded fashion:
      • 1 means either location 0 not backed off or an unknown location
      • 0 means either location 0 or an unknown location backed off.
      • Using a Location object rather than a double which encodes two booleans (backed off or not, known or not) and a double.
    • Probe traces can be recieved after a probe completition message is sent. Waiting a while for more traces would be nice.
  • Document probe functionality and make it available over FCP.
    • Documentation would make it easier to use and allow its function to be better defined, which would also help in writing and improving it.
  • Plot network attributes and make graphs of network topology revealed through probe requests.
    • Node churn
    • Network size
    • Link length distribution
    • Graph topology
  • Implement improved probe and trace requests as in bugs #3568 and #3550 to provide more detailed information on network structure, health, and behavior.
Flag to cache only, not store
would allow transient messages such as IRC (FLIP) from taking storage space.
No RecentlyFailed on realtime queue
Much more transport layer stuff
The current transport layer has been improved significantly but leaves much room for improvement. Ideally we'd like to detect available bandwidth automatically. We have nothing remotely like Path MTU detection; we should automatically adapt to find the right packet size for the connection, both finding what will work at all, and what gives the best bandwidth/loss tradeoff. We tend to get transfer failures on slow connections (not low bandwidth limit, low bandwidth available on that specific connection). We probably should use something like cumulative acks, currently all packets are acked once, it should be possible to ack the same packet twice with 0 bandwidth cost in many cases using ranges. We may want to divide up blocks differently depending on how fast the connection is. We may want to make tradeoffs between fairness to all peers (the current policy) and allowing individual peers more bandwidth for a short period (e.g. because they have requested a bunch of fproxy pages), or have "idle priority" traffic which is only sent when *no* peer wants to send anything (e.g. bloom filter sharing), which may also impact on packet size. And so on. Generally, the transport layer needs to be more robust, especially on slow connections, and it needs to feed information into the load management layer more quickly so that we only accept requests that we can complete in a reasonable time, given the current state of the connection. There are various bugs about this on the bug tracker. Running as well as possible on fairly slow connections is particularly useful in some of the places where Freenet may be needed most.
Transport plugins
Currently Freenet only supports UDP. Make it able to use TCP, HTTP, various steganographic transports (e.g. VoIP). Freenet should provide all the heavy lifting crypto etc, it should be *EASY* to write a transport plugin, just register it with the appropriate type, give block size and so on, and Freenet will do the rest. Last year's work on new packet format should really help although some transports (really small packets e.g. pretending to be Skype) will still need to do their own splitting/reassembly (this should probably happen within the node too, although it should be possible to turn it off).
Simulators
Simulating different load management mechanisms would be particularly useful. Simulations of various attacks are also a very important area.
Bandwidth
Different bandwidth limits at different times of the day/week would help many users. We also need support for monthly transfer limits (separate from the existing peak per second limits), and autodetection of the connection's capacity, possibly with options for very latency sensitive behaviour for e.g. gamers (like some Bittorrent clients do). All this must be really easy to use.
Support for networking protocols with huge delays, for use with packet radio.

Friend to friend stuff

More F2F functionality
Hopefully by the time of the SoC we will have much easier darknet peer adding, invites etc, but we need more functionality: Various forms of easy to use chat, possibly realtime, allowing conversations across nodes, both within the web interface using javascript/ajax and via external clients e.g. Jabber/XMPP, possibly with voice support; easy to use, reliable file transfers; labelling downloaded files and bookmarks so that they are visible to your friends of a particular trust level; searching these file lists and transferring the files; possibly automatically fetching files both from friends and freenet; virtual LAN (hamachi style) functionality; social networking style functionality, with very careful privacy protections - after all the friend-to-friend darknet is literally a social network, we should make it as useful as possible without jeopardising privacy.

The Tahrir Project

Tahrir is a sister project to Freenet, incorporating some of the same ideas, although focussed more on "twitter"-like functionality (or, more generally, microblogging). Much of Tahrir's infrastructure is complete, including a powerful but easy to use low-level transport layer, together with crypto functionality, however the components need to be brought together into a working prototype. More information at The Tahrir Project.

How to sign up

You can sign up as a student until the April 6th deadline here: http://www.google-melange.com/gsoc/homepage/google/gsoc2012

Personal tools