Strategy to find out if your web application is being partially or fully censored by an ISP

Question

Strategy to find out if your web application is being partially or fully censored by an ISP

Asked 11 years, 9 months ago

Viewed 2,395 times

22

Question

Like, conceptually speaking,it is possible to detect that a web application has been partially or totally blocked by an internet provider?

Two situations that I believe could allow this would be 1) some additional javascript that, by sampling, returns failures to load or perform specific functions, associated or not with terms and 2) develop a browser extension that uses unknown Ips and informs you when your website stays offline for the user.

Case study

Twitter recently confirmed that uploading photos of people from a government-controlled ISP (Internet Access Provider) of Venezuela is blocking the sending and display of images. Probably only occurs timeout. Other basic operations work.

Another point is that, sysadmins in certain situations manage to block access to requests that have certain terms. For example, sending an unencrypted request trying to post a message with terms on a blacklist could return an error without even the service provider knowing.

Note: Other case study suggestions are welcome

Using a VPN solves this problem.

– user314

2014/02/28 at 17:12
Po, some regions of Brazil have ISP that are confusing.. you don’t know if they’re blocking or if you’re rolling a man-in-the-Middle operation.. I ran into a problem where they were screwing with the requisitions.. I think it was for incompetence but in doubt, switching to https solved the dilemma!

– Eduardo Xavier

2015/09/11 at 18:19

5 answers

13

From the server’s point of view a censored communication is indistinguishable from a communication that has never been initiated by the client. Similarly, from the customer’s point of view a censored website is indistinguishable from a website that does not exist/is out of order. The solution in both cases is to seek a back channel in which communication is successful, and use that channel to inform about the unavailability of the original communication. In the absence of such a channel, a maximum of infer that certain segment of its users/visitors is having some problem, monitoring for abrupt changes in its access pattern.

Note: the most guaranteed way to avoid partial censorship would be to use HTTPS - because that way all the content of the communication, including query strings, would be confidential between client and server. However there are cases where this is not possible (e.g.: China, which has banned the use of HTTPS as far as I know) or that this can be circumvented (e.g., an attack Mitm Institucional, where the censoring agent has power over the client code or its Cass).

In principle this could occur so far in Brazil, but the AC-Root of ICP-Brazil is not recognized by browsers by default (which is why everyone sees security alerts when accessing secure government websites). On the other hand, programs that make institutional Mitm by default - such as Opera Mini and Nokia’s "accelerator" - allow this to be done without "giving in to sight"...

Canais Alternativos

In the case of a partially censored application, one way would be to have an error code that identified the communication failure, but constructed in such a way that this code was communicated smoothly. That is, when sending http://example.com/foo?texto=palavra-censurada and receive a timeout (or other error), the client code should immediately send a second request to http://example.com/bar?cod=251 where the code 251 would be the equivalent of "requisition foo failed". From there, the server could use data mining techniques/clustering or similar to try to identify what failed customers have in common.

In the case of a fully blocked application, the only solution (for customers) is to try to use a proxy (for example, "down for Everyone or just for me?"). This will let them know if the server is down or if they’re the only ones who can’t access it. If the server itself can maintain such an unlocked proxy, then it would be feasible to maintain a browser extension that makes one ping this proxy whenever an attempt to access the main site fails (although the mere existence of this extension would draw the attention of those who censored the site, so that it would also censor the extension...).

Peer-to-peer

In the impossibility of designating a single server to act as an alternative channel (for example, because that server lives being blocked) a possibility would be to try to use communications peer-to-peer (P2P) so that users would exchange information with each other about the server’s state. In the browser, the default [proposed] Webrtc can be of great help when widely supported (currently is already in Firefox, Chrome and Opera).

However, Webrtc alone does not solve the problem of NAT Traversal - the ability of computers belonging to different subnets to communicate with each other. This problem can become nonexistent with the adoption of Ipv6, where there are sufficient addresses for every device imaginable without the need for subnets (or rather, there is a possibility they have a global IP although are part of a subnet), but while most systems still use Ipv4 NAT Traversal is still required.

The project serverless webrtc (Webrtc without a [signage] server), for example, eliminates the need for any server except for STUN (and there are several servers public for this purpose) - provided that the pairs to connect collaborate on a protocol of their own. I have not studied this protocol in depth, but I believe that it would be possible for the system to offer facilities for this to be done a posteriori (i.e. negotiate the connection proactively, use the connection if censorship occurs).

Unusual Access Patterns

In general, as in the web architecture are clients who always initiate a communication, there is no way to know for sure if a lock like this occurred. However, if your service has a reasonable number of registered visitors/users, you can apply data mining/clustering to assist in fault detection.

Let’s say several pre-existing customers stopped using the site simultaneously from the same date [approximately]. In a grouping, this would be a characteristic (dimension? metric? sorry, but I only have a superficial knowledge of DM) interesting that would bring together a set of users. If you have information about the geographic location of these users (or other type of information, such as IP addresses) - information that also groups users into clusters - then the system can do the correlation between one feature and another (i.e. place and date of last access).

This correlation would indicate a problem: be it censorship, as in the present discussion, or the fall of a link, as in the case of split network (see Reply by @Alexandre Marcondes). Auxiliary measures could then be taken to confirm or refute that a censorship is indeed in progress.

Censured Content

Finally, a brief note about the case where certain contents (say, keywords) are being censored, but the rest of the application is ok. It is very difficult for the client to communicate to the server what is being censored, since the attempt to "circumvent" censorship (via alternative channel, alternative coding, etc.) could itself also be censored. A widely used feature in situations where encryption is unavailable is steganography: a technique of security for obscurantism (security through obscurity) which consists in sending apparently innocuous messages but containing a second "hidden" message in it.

There are tools that enable steganography to an acceptable degree of confidentiality (requiring a password to access the secret part) and plausible deniability (the variations in the original message that hide the secret are indistinguishable from random variations, resistant to the most common statistical analysis techniques). Meanwhile, the distribution of these tools has to take place in a private character (otherwise the agent of censorship will also have access to them) - which makes it impracticable to use them in the context of web applications. In addition, their "performance" is well below the desired, since the message "visible" has to be orders of magnitude greater than the message "invisible".

Good answer. I stay here thinking a way of what you cited as "alternative channel" not being a fixed IP but something variable. Maybe even thinking about some concept of P2P (using Ips of the people who access the site itself). And only activate this feature when there is distrust. As censorship is punctual, it is more complicated for a sysadmin to even know that censorship would be blocked.

– Emerson Rocha

2014/02/17 at 18:55
1

@Emersonrochaluiz At this point (P2P) Webrtc can help a lot. And as for detecting censorship, perhaps a very large site that uses aggressive data mining on inactive users will achieve some result, but overall I agree that it is a very problematic situation...

– mgibsonbr

2014/02/17 at 19:06
1

Dude, if you don’t want to do it now, at least if no one quotes Webrtc in the next few days, increase your response by at least citing it. This question will be very popular this year

– Emerson Rocha

2014/02/17 at 19:09
1

@Emersonrochaluiz I did a quick research on Webrtc, and I prefer to wait until the weekend (when you are less busy), so I can give a more appropriate answer. Unfortunately, while using ipv4 at least, you can’t avoid the need for a server, even for P2P (for NAT Traversal). Then update the reply with more details, for now I leave that link for those who want to know more.

– mgibsonbr

2014/02/19 at 19:36

Browser other questions tagged server-side software-architecture server-client

You are not signed in. Login or sign up in order to post.

by OnoSendai • **36,218** points · Answer 1 · 2014-02-20T17:04:49+00:00

For evaluation purposes, let’s assume that your web application is configured conventionally:

HTTP via TCP port 80
HTTPS via TCP port 443

And that, by block, you mean:

The connection never gets established with the server, or
The connection receives an HTTP response indicating failure (501, 404, etc.)

(That rules out impersonation and Middle-man, where instead of a block the malicious agent implements a server and redirects the DNS queries of a service known to this, and which responds as if it were the original service).

And that the environment setting follows the following pattern:

inserir a descrição da imagem aqui

Where, by Internet, I mean a routing set that is outside the scope of ISP control.

So we have two questions:

How to determine an environment implemented in order to cause request failures?
How to circumvent implemented limitations?

For conversation starters, it is interesting to make it clear that you will never know deterministically (with 100% certainty) that you are being actively blocked. Errors can be caused by several factors - a poorly configured DNS, a reset proxy, a cache you are experiencing flush.

That said, we can separate the blockages by scope.

Its blockade may be total, as was the case in Egypt in 2011. Not only DNS queries failed, but also attempts to establish connection with public services such as google.com and wikipedia.org. With a utility table and a script that runs GET commands on these services it is easy to define whether this is your case.

Site blocks can be of two types: Total, where any call to the mentioned domain receives a failure status; or selective, where a default is set - specific TCP ports, parts of the HTTP request (to prevent access to one or more pages or to block a specific MIME Type).

Let’s assume that this is the case, and that we can represent our environment like this:

inserir a descrição da imagem aqui

Detection

The most basic would be a protocol established between the client and server aspects of the application. After a POST the client can ask the server, in a second call, if the content sent was successfully received. A negative response may trigger a blocking warning.

The second is the establishment of secondary servers (which may be geographically in a different region from the primary service) whose responsibility is only to ask if, from their point of view, the primary service is in the air:

inserir a descrição da imagem aqui

If after a file failure the application receives a positive response from the secondary servers, this may be an indication that the application is being actively blocked.

Solutions

In the case of a total blockade there is obviously not much to be done.

In the case of a partial block, the content to be sent can be obfuscated to prevent detection by the established filter. Some possibilities:

Sending content via HTTPS, taking advantage of the channel’s native encryption, if still active:
Sending content to a URL other than the original;
MIME Type change (for example, encoding the file in Base64 and sending as value of a text field instead of Multipart)
Targeted content upload during multiple requests;
Uploading to a secondary server, which will redirect content to the primary service; and
IP tunneling, among others.

Several of these implementations can be combined to prevent direct detection.

by Leandro Amorim • **1,865** points · Answer 2 · 2014-02-19T19:37:10+00:00

6

How the "reliability" of files is validated today?

Using MD5 and generating a hash code.

An interesting way to validate whether there was content modification between the server and the client would be to generate a hash from what will be delivered. Sending it to the client and a script (javascript) would validate the content. Same thing Client communication - Server, the client would generate a hash and send a different communication (maybe by Ajax) to the server before posting the data from a form for example.

As a basis for study Crypto-js;

However, if access to the server is fully blocked, this will not have to know that the customer is trying to contact, in which case it would be necessary to use an alternative tunneling as example the use of proxy or the network tor.

@mgibsonbr Suggested as a source of study as well:
SHA-2 - Script to generate hash
Stanford Javascript Crypto Library - Cript library for javascript

If the script comes from the server, and the server is being censored, how can we trust the script? This might even work if you use two servers (and in this case I would use SHA-2 instead of MD5 and sjcl instead of Crypto-js), or a browser plugin/extension with that code, but with a website it is only impossible. In particular if the person censuring it is the government or other entity in a position to execute a Mitm Institucional (i.e. with the collaboration of a CA).

– mgibsonbr

2014/02/19 at 19:42
So, if the question is selective blocks, and the application is based on "working" a script that would validate what was sent to the client (by the client itself), it would not be impossible to block this script, but the ISP would have to "inject" another script to override this need, otherwise the client application would reject for not being able to "validate". It would really be hard to build this "application" on the client. I agree about the algorithm, about the library I don’t know, but it will be a good study. :)

– Leandro Amorim

2014/02/19 at 19:53
2

That’s right, the question was about censorship, not Mitm, so at first one can assume that the scripts downloaded from the server will remain intact. But I still don’t understand how a hash (or even an HMAC) could help differentiate between, say, a timeout and a censorship... Could you give an example? (by the way, I suggested the SHA-2 because it is most indicated in situations where security is involved, but if the concern is only integrity so MD5 might be a good solution - especially for its superior performance)

– mgibsonbr

2014/02/19 at 20:59
If you are receiving error from the server it is easy to detect that there is a problem, which can be for several reasons, the question would be when there is no error, the content arrives, but arrives modified can be said that there was a Mitm (with content change)this way a hash would help validate the information, but the tb hash may suffer Mitm, so this hash would have to be validated in some way. An example would be to use key pairs (in the case of the client uploaded to the server, and privately generated locally by the javascript application in Bowser), and the opposite HTTPS would be a solution...

– Leandro Amorim

2014/02/20 at 17:25
Exactly, in case of problems of this type, I recommend using a VPN. Pay preference, by the way proxy.sh is an excellent option. But there are also free options like -> http://198-200-84-181.cpe.distributel.net:61929/en/about.aspx

– user314

2014/02/28 at 17:17

by Alexandre Marcondes • **2,135** points · Answer 3 · 2014-02-20T15:39:25+00:00

Unfortunately this kind of thing (censorship and blocking through Firewalls and Proxies) cannot be actively detected by the server. Clients that previously connected will no longer be able to access and the server is isolated. As is the case in a network split, for example where each segment of the network is isolated from each other. In the case of network split may be something unintentional, but the effect is the same. For example:

Broke a fiber optic cable that takes the Internet from Brazil to Europe and another >from the US to Europe. In this case the Americas will not be able to communicate with Europe, but the Internet continues to work for both in other cases.

The best solution for this is to have multiple servers in the locations where you want to actively monitor and test your site access from within the target network. For example:

Have servers on all continents or countries tested to see if you can access your website from within the X country.
Have servers using the connections of all the Isps you want to test to see if a specific ISP has blocked your site
Have distributed agents (programs that people install to monitor your site)
Have browser plugins that act as agents checking access to your site

No validation, Hash or critpography will help in detecting censorship or blocking, but they may help in detecting changes, manipulations or tampering in the content distributed by the site.

by BrenoZan • **601** points · Answer 4 · 2014-02-27T20:46:33+00:00

Infrared view:

Only in the case of content partially blocked, could count the views of valid browsers and cross-reference with the access reference of the other data.

for example: an image is only called when the home is viewed, if the count of views of that image is much lower than that of the "mother" page it is possible that the content is being partially blocked, with text content can-js calls that load the text and increment the sent skeleton as "mother"

however there are problems in this approach, caching all types that should be avoided with headers as cache-control (obviously charging more bandwidth and processing), in addition to opting for client side dynamic text, will lose almost all relevance to google...