Internet contains a huge amount of information, which is rapidly growing at an ever increasing pace. People, organizations and corporations from the whole world are adding different types of information to the web continuously in various languages. The web therefore contains potentially very interesting and valuable information. This course will investigate various techniques for processing the Web in order to extract such information, refine it and make it more structured, thus making it both more valuable and accessible. These techniques are often referred to as web mining techniques.
The domains within the Internet that we will study are databases, e-commerce web sites, wikis, virtual communities and blogs. Semantic web and Web 2.0 are two other concepts that are relevant for the course. Web mining is considered to contain three main areas, namely web content mining, web structure mining and web usage mining. Web structure mining is closely related to information search techniques, and web usage mining to opinion mining or sentiment analysis. Also related is the automatic construction of sociograms. Web content mining can for example be used to find the cheapest airline tickets, by monitoring all web based databases of all airlines in order to attempt to find the lowest common denominator of all databases.
Web mining techniques explored in the course are human language technology, machine learning, statistics, information retrieval and extraction, text mining, text summarization, automatic classification, clustering, wrapper induction, normalization of data, match cardinality of data in different databases, interface matching, schema matching, sentiment analysis, opinion mining, extraction of comparatives, forensic linguistics etc.
Read more here Web Mining/Web-mining, WEBMIN/IV2038, 7.5 hp.