Hyperlink Network Geographies

Researcher: Harald Meier


Project Overview

The Internet offers an overwhelming pool of data that can be used for network research. With a hyperlink being the connecting edge between two websites, it is a possible link between two actors and thus a possible spatial link between two locations. The analysis of a vast number of hyperlinks from a group of websites helps to reveal group-related networking patterns that can be evaluated from different geographic angles. As directed flows of information, hyperlinks represent one dimension in CASTELLS conception of the space of flows (1996).

The main objective of this project is to conduct a variety of quantitative network studies with the hyperlink method presented below. It is a simple methodology which helps to gather basic network knowledge from a group of websites. It delivers two major levels of evaluation that build on one another: The first level focuses on the identification of central websites, their responsible actors and important groups within a network. The second level explores the underlying spatial configurations which offers a variety of geographic insights on information flows between and within continents, countries and cities. By mapping the locations of the connected websites and its responsible actors, network specific clusters may be discovered. After all it is also possible to observe network dynamics by collecting hyperlink data in periodic intervals.

A case study on the hyperlink networks of international NGOs has already shown the potential of the method to analyse online networks in the non-profit sector identifying New York, London, Geneva and Washington DC as the most important cities of global civil society (see RB 439). Since the method can be applied to any group of websites, it bears great potential for the social sciences in general and world city network research in particular. It is therefore reasonable to conduct further studies to underpin the findings of the case study as well as applying the method to websites from the economic sector.

While the method itself is straightforward, the data collection process is rather sophisticated and time consuming because a vast amount of network data needs to be collected and processed. Thus, a crucial part of this research project is concerned with automating issues in order to raise the productivity of the method. A continuously growing network database with thousands of datasets has been created to enrich the hyperlink data with crucial information needed for deeper insights into a network. It is the ultimate goal to create a research software which delivers a variety of network evaluations with only a few mouse clicks. Anyone interested in this method may feel free to contact the Digital Space Lab ( with proposals for studies, questions and discussions. Financial supporters are also very welcome.

Method Overview

This is a simplified overview of the method. A detailed research manual will be available in the near future.

Step 1: Seed List

Any hyperlink network analysis begins with a list of websites, the so-called seed list containing basic information such as the URL, the name of the responsible actor and address information.

Step 2: Webcrawl

The seed list is entered into a webcrawler which will extract all outgoing hyperlinks from the websites in that list. It is also possible to include incoming hyperlinks from the web with the help of a web service. Once the hyperlink data collection is completed, a table is created which shows all hyperlinks as directed edges between two URLs.

Step 3: Statistical Evaluation

In the next step, network statistics need to be calculated for each website to find out which nodes are important to the network. The next data collection steps will focus only on those sites. The calculations can be done with a software such as NodeXL.

Step 4: Linkscaping

The sum of all networked websites which appear in a hyperlink analysis are termed as the hyperlinkscape of a network. Due to the broad variety of contents, a simple categorization system is needed to allow differentiated analysis. Helpful categories (linkscapes) are “Economy”, “State” and “Society” which are derived from the organizational form of the responsible actors behind the websites.

Step 5: Spatial Analysis

To allow spatial analysis of a network, address information needs to be collected as well. After that the data can be projected onto the city level and a number of rankings can be created. The connectivity value of a city is calculated by adding up the connectivity values of all websites located in that city. By comparing the locations of the linking site with the linked site, the scope of the hyperlinks can be identified easily. It is possible to take only intercity, international or intercontinental links into account. Leaving the intracity links aside, the rankings are based solely on the external relations of a city. This way one can find out in how far a city is embedded in intracity, national or international networks and its relations to other cities.


For results of this project, see GaWC Research Bulletin 439.