Skip to main content

Replied to a post on github.com :

So, for syndication and outgoing webmentions, I think it might be useful (especially if withknown.com provided a hosted service, as has previously been discussed), to bake in some techniques to mitigate mass interception (as, I have mentioned before, is IETF best practice now)

We have previously discussed as part of #203, that it might be advantageous to use some techniques to obfuscate a user's social graph, since this is often more valuable to an attacker than message content and is harder to protect (should be noted that it is still possible to extract, but it's just not as easy). We could, for example, route mentions over tor etc.

Trouble is, if you're watching that network, you can perform statistical analysis based on the traffic being sent - size of packet, time it arrived, etc. This is called a confirmation attack.

This kind of attack is really super easy if you've got a centralised node, e.g. the proposed withknown.com message server. If I wanted to see if Alice and Bob are friends, I could do so easily by watching that server, even if all the traffic was encrypted. All I'd do is watch for a packet from Alice hitting that server, and then a packet of similar size leaving the server towards Bob. If I wanted to get clever about it, I could confirm this by watching to see if Bob performed something that looks like a GET request which was shortly followed by Alice sending a page and Bob receiving something of a similar size shortly thereafter. This would effectively establish Alice and Bob as talking, regardless of whether the traffic was encrypted (and sent over tor in some situations).. and yes, could probably be done anyway, but with a centralised server this makes it trivial to automate.

So, here are some thoughts, in no particular order:

* Queues should be asynchronous, and messages from the queue should not be sent as fast as possible. I would suggest that messages are sent randomly at a time somewhere between 0 - n, where n is the maximum number of minutes a message is permitted to be in the queue before being sent immediately.
* Queues are not FIFO, I'd suggest randomly shuffling on insertion.
* For mentions, explore the possibility for obfuscating the message size - for example, it should be safe to pad the content with nonsense - for example if the message is ```?source=https://foo&target=https://bar```, could we not add ```&buffer=_randomamountofrandomcrap_``` ? Unless their mention server is very badly written, it should be ignored. Padding assumes message is sent over TLS so you can't just pull the raw form data, but if you're not using TLS for everything at this point, you're an idiot, and there's no technical fix for that.
* Parsing incoming mentions should be similarly queued, so when Bob receives Alice's mention ping, the retrieval of that page is also queued and retrieved some time later.

For a message queue service (this applies to any hosted messaging service, so although I'm using withknown.com's potential offering as a reference, this is applicable to any implementation, hence why I'm mentioning it as part of the open source discussion), there are additional operational security considerations:

* Since, for vanilla mention at least (if we can guarantee known->known we could extend the protocol) zero knowledge is probably not possible, it should nonetheless be the goal. So, for example, I'd suggest not retaining a log of the transaction.
* I'd also suggest never writing a queue to disk - data loss is probably not a problem, since messages being queued are largely fire and forget anyway.

Nothing is perfect, and these are some random thoughts on a rainy sunday afternoon while I procrastinate doing my accounts. I also add that I am by no means an expert, and professional attackers are probably able to get around this sort of thing. Thoughts?