CCTV

Some of your website browsing may be being captured to a greater extent than you think. Source: Shutterstock

Be careful what you type: you’re being watched

MOST web users are used to their movements on the web and on individual websites being tracked by a combination of scripts and cookies. Which pages we visit, what we search for, which products we show interest in – we expect those things to reappear at some point or in some way affect our browsing in the future.

Most of the time such tracking is pretty harmless, and there are several options by which we think we can opt out of such tracking. Security-conscious Internet users can turn off Javascript in the browser, use an “incognito” mode or similar, employ ad-blocking, or even opt to use DNS servers that filter out ad servers and other commercial elements.

More sites are, however, now use “session replay” scripts, which record every detail of our site visits: mouse movements, page scrolling, and even individual keystrokes, all in real-time: a virtual movie of our visit, in effect.

Most analytic services traditionally available to site owners aggregate results to provide broad trend data, but session replay scripts actually record individual sessions in real-time, for playback by the website owner to increase customer knowledge.

These scripts are available from suppliers such as Yandex, UserReplay, Smartlook and Clicktale (the clues are in the names), and feature, according to a recent report by Freedom-To-Tinker.com on 482 of the top 50,000 sites as ranked by Alexa.

The scripts which can be “Set up […] in a matter of seconds” (according to one of the suppliers) attempt to exclude passwords from their capture, but in some instances, such as text input fields on some mobile-optimized sites, passwords are captured in the clear.

Additionally, information is available to site owners which may have been entered during the web visit, but not actually submitted to the site. It’s easy to see that having several attempts at entering data such as for existing medical conditions, will give the website owner more information than one might ideally like to give. For instance, the line between “clinical depression” and “mild schizophrenia” may be blurred in purely medical terms, but to a medical insurance company, there’s a world of difference.

The session capture script creators are aware that website owners should not necessarily have access to all data entered on a page. Passwords, for instance, are commonly retained but are stored and validated in encrypted form, and are not retrievable verbatim when we forget them (which is why we have to reset passwords).

Many companies also do not hold credit card information, but rather rely on third-party payment gateways whose code is embedded into pages, in order to process financial data.

In these ways, perfectly reputable sites operate securely.

But if an entire session is captured, this data can “leaked” to the site, via the script. In order to combat this, certain data is automatically redacted by the script, and so is theoretically never passed on.

However, automated redaction depends very much on the way the web pages are individually coded. For instance, if a form field is not given a “cc-number” attribute (credit card number), then any card data entered escapes redaction and will be captured as part of the recorded session.

Camera

Some data you input on the web (even in error) may be captured, before you hit ‘Submit’. Source: Shutterstock

On one of the sites surveyed by Freedom-To-Tinker.com, Bonobos – an online shopping site – users’ full credit card number, card expiry date, CVV code, name and billing address are thus available, letter-by-letter.

Session recording scripts can be set to stipulate which fields are manually redacted, in addition to those automatically withheld (at least, in theory). However, if sensitive information is entered by mistake into an un-redacted field, it is available to anyone with access to the session playback.

While leaking data via the scripting companies may alarm some, what is also of concern is that end-users of such scripts (website owners) can access session playbacks via an unprotected, unencrypted web connection (HTTP not HTTPS) to the script capture dashboard. This unsafe connection can playback sessions that were originally created via an HTTPS connection, and therefore undertaken with the assumption of at least some security.

And while tracking protection such as ad-blocking software from suppliers like EasyList and EasyPrivacy blocks some of the session capture scripts in question, they do not block all known session capture scripts, and only partially block sensitive data in several cases.

Even without the use of a third party ad-blocker, some users may be browsing with settings they assume protect them. One of the tracking scripts, from UserReplay, can be set to honor users’ “Do Not Track” (DNT) browser setting, but according to the survey, not one of the top one million sites ranked by Alexa which use UserReplay actually honored the DNT signal.

The website owners who use session capture scripts are clearly not actively wishing to gather sensitive information; their motives are commercial in that they are attempting to gather information from users’ behavior in order to offer a “better experience”, or similar. However, the scripts employed can easily cause a serious security breach of consumers’ data in an entirely unregulated way.

Caveat emptor (buyer beware) should now perhaps be a mantra from the point that a browser is launched, rather than as a reminder at the point of actual purchase.

Big Brother may not be watching your every move, but the websites you’re visiting might be. (Not this one, of course.)