Web Mining


Fool’s Gold or Uncut Diamond?

In the early ‘90s, so just twenty years ago, Tim Berners-Lee was running the first web server at CERN in Switzerland. Since then the internet has grown into a data repository of enormous dimensions, and keeps on growing every second.

WEB 2.0 Emergence

With the emergence of Web 2.0 and all manner of user-generated content – ranging from blogs via social networking websites to product ratings and discussion forums – the idea of harvesting all this information in some reliable and scalable form becomes very appealing. Consumers will talk about product experience on the web but they also use the web as an information source when planning a purchase. When it comes to trusted information about products, other consumers’ recommendations tend to clearly beat the manufacturers’ web sites.

So web mining – the exploration and analysis of relevant user-generated web content – is of great interest to marketers and market researchers. A key output of web mining activities is the share of voice (SOV) a brand or product gains in relation to its competitors. SOV is usually based on counting how often a certain query appears across the observed parts of the social web in a specific time interval. The descriptive SOV is often enhanced by adding the element of positive, negative or neutral sentiment to the analysis. Even further significance can be added by assessing the influence of the user posting comments on a specific brand. Modern web mining systems also often allow researchers to engage with selected bloggers or users.

Web Mining and Search Engine Marketing

This gives brand managers the opportunity to move from passive monitoring towards active participation. Advanced analytical approaches go further than basic sentiment analysis and try to identify the themes of postings and conversations. The overall trend in web mining is moving towards ‘understanding’ what is being said. The attempt to answer the ultimate question – whether web mining is just fool’s gold or should be considered an uncut diamond – requires a closer look at the underlying technical procedures. The process of a web mining project is highly iterative and has five constituent steps: the discovery of relevant sources, the extraction of data from these sources, followed by extensive data cleaning, analysis and finally the generation of insightful reports.


The discovery stage is handled in various ways, but the two most popular approaches can be characterised as wide – covering as many sources as possible – or focussed. A key element of the discovery stage is the definition of appropriate queries.

The latter can go terribly wrong if the brands or products of interest have many semantic neighbours or twins. For example, the ‘Mars’ chocolate bar; Mars is a planet, a Greek god and a chocolate bar. Which one a consumer refers to can be very hard to identify, even if sophisticated Boolean logic is applied. The challenge of query definition is to reduce the false hits, whilst at the same time capturing all the buzz about the product of interest. Assuming the discovery phase goes well and all data has made its way to the researcher’s lab, it’s still a challenge to remove irrelevant information. The blog posts and web conversations still have to be separated from all the noise that comes with them.

The analysis phase is then all about understanding the sentiment in which a brand is mentioned: positive, neutral or even negative. Getting this right is not an easy task either, since slang, abbreviations (like LOL), irony, sarcasm, complex sentences and other difficulties can lead to misclassifications.

A major challenge to analysis is the fact that the vast majority of mentions may all turn out to have the same sentiment – be it neutral, positive or negative. Factoring all these challenges in, there is still good reason for optimism. However, web mining as a valuable tool for marketers comes at a price. First of all, quality control is the key! It is very risky to draw conclusions from fully automated systems that come without various quality controls – preferably in the form of experts checking the results of each process step, refining queries, optimising the valence analysis, etc. Depending on the objectives of a project, a decision has to be made as to whether broad “listening” to the social web or targeted “monitoring” of specific sources (e.g. recommendation sites) is the right approach to take. Last, but not least, whilst web mining can provide a very useful supplementary analysis, it should never be confused with research data derived from representative samples. Although it may sounds odd, given the size of the social web, for some categories for which there’s still just not enough data out there to provide robust insights. If expectations are realistic and objectives are clear, a thorough process makes web mining the uncut diamond but it can easily turn into fool’s gold, if not managed properly.

Where is The Connection Between Search Engine Marketing and Web Mining?

Search engine marketing is closely connected with web mining. Web mining gives the needed analytical insights needed to push marketing campaigns even further. Most of the time the mining campaigns areas oriented to collect data from customers which offers clear insights in their behaviour. The marketing campaign which uses this data creates adds and other materials to target the predefined audience. This system works like a charm if both parts are working closely together.

If you want to learn more about how to use pre-mined data and create your own SEM business around that you should check out our latest post called Parallel Profits Review. Parallel profits is an online training course by Aidan Booth and Steve Clayton which reveals how you can start your own marketing agency making over 100k/year. Get more information about Parallel Profits here  or check out our review page here.

Get more information about search engine marketing in the video below:

We will be posting more information about web mining as well as the latest news from the marketing industry on our homepage as this industry is developing really fast. If you need any additional information make sure you send us an email or contact us through our contact us page.

Update: 26.4.2019

Search engine marketing as getting more and more momentum in the last months. We have seen that huge bump while the promotion for a brand new course called Knowledge Business Blueprint by Tony Robbins and Dean Graziosi was running. They have published a brand new course about how you can leverage your knowledge by running masterminds for people who want to learn more about your expertise. The course is launching on the 30th April 2019 and will be closed on the 10th May. You can get more information on the official website: www.theknowledgebusinessblueprint.net

gfkamerica Administrator

Be the first to comment

Leave a Reply

Your email address will not be published.