Link-Based Methods for Web Information Retrieval
Clive Nettey

Abstract:
Although commercial search engine companies have reported a great deal
of success in appropriating link-based methods, these methods have
struggled to demonstrate significant performance improvements over
content-only retrieval methods in several off-line Web IR
evaluations. In this thesis the effectiveness of link-based methods is
assessed against content-only retrieval baselines. Algorithms
embodying established HITS, in-degree, realised in-degree, and sibling
score propagation techniques are evaluated alongside variants of those
algorithms. The variant algorithms are devised to aid in three
secondary lines of investigation relating to link-based methods: the
effects of link randomisation, the utility of sibling relationships
and the influence of link densities.
All established link-based algorithms are demonstrated to improve on
several content-only retrieval baseline performance metrics with the
realised in-degree algorithm proving to be particularly effective
across all considered metrics. In relation to the other lines of
investigation, the experimentation reveals that: leveraging sibling
relationships does not lead to significant performance improvements,
higher link densities do not afford performance improvements and that
algorithms are susceptible to link randomisation.