Who is visiting your site, and why: the two great mysteries of web design. Source: Shutterstock

New tool uses big data to help websites attract specific audience

A BIG DATA MINING TOOL developed during a hackathon will enable website owners to determine how attractive sites are to specific demographics compared to similar sites chosen from a pool of a million web pages.

The tool allows web publishers to determine reasons behind particular visitor spikes, site interactions, social buzz and other effects caused by a site’s contents.

Without the type of data mining available only to Internet giants like Google, Twitter, and Facebook, many website owners struggle to determine why or how their online efforts are successful, or otherwise.

The tool – Know Your Audience (the working title says it all) – was developed during a week-long project, Hackweek XLIII, organized by LiveRamp. The technology combines a data pool developed by Acxiom which has been growing since the 1960s, and an online dataset held by LiveRamp.

This tool, given access to both, can match online browsing history and demographic information including gender, education, ethnicity and many more attributes.

Any number of variables can be drawn out from over 8000 demographic and purchase-habit related data types. Results can be divided similarly against demographic, so for instance, the attractiveness of a particular keyword can be compared to the educational level achieved by subjects.

The tool provides a way to check if visitors to a website are in fact, the desired audience. Websites can then be altered regarding their text content especially, to attract a different, more desired online population.

The data was gathered by a large, lightweight, distributed web crawling engine (similar to the ones used by search engines), which fetched and parsed keywords from a million web pages using LiveRamp’s services.

Demographic information can then be associated with each web user, and then to keywords. Keywords were processed with natural language processing algorithms, and by statistically examining word frequencies, a linear model was trained for each demographic variable. This data can predict the proportions of different demographics of the visitors of a website, according to keywords appearing on that website.

The project’s roots in a hackathon are currently apparent, with a command line query language the only way to draw data out of the data lake at present, but the real-world uses for this particular type of technology, especially if wrapped in an attractive graphical user interface, are apparent.

It is unclear whether LiveRamp or Acxiom wish to monetize this technology, but any entrepreneur worth his or her salt would undoubtedly seize upon it, if available, to market it to a social media and digital marketing industry whose practices are based mainly on guesswork.


Queries are currently command line only.

Technologies do exist which purport to be able to track individual users to websites, but these rely on factors which are decreasing, year-on-year, such as the use of fixed IP addresses, land-based connections to the Internet (i.e., not via mobile) and the continuing prevalence of IP version 4.

Acxiom, LiveRamp’s parent company, is a US-based, publicly traded marketing technology company, with which specializes in the study of demographics (Demographics was the company’s original name at its inception in 1969). The company suffered a massive data breach in 2003 when more than 1.6 billion customer records were stolen.