Description

LAB 10 – NETWORKS AND EIGENVECTOR CENTRALITY

1. Introduction

Last week we learned how to use iterative techniques to find the dominant eigenvector for a system, and used this to analyze a Markov process. This week we are going to further explore these techniques and see how they can be used to analyze networks. Simply put, a network consists of a collection of nodes and edges, where each edge can be thought of as a connection which joins two nodes. We often represent networks graphically, by drawing a dot for each node, and a line for each edge as shown below in Figure 1. Networks are sometimes referred to as graphs in mathematics, and the nodes are sometimes referred to as vertices.

Networks are incredibly useful in real world applications, as they allow us to represent relationships between objects in a wide variety of different systems. For example, a data scientist might use networks to represent connections on social media, by displaying each Facebook user as a node of the network, with an edge joining two nodes when the corresponding users are Facebook friends (see Figure 2). Urban planners may use networks to design important infrastructure, by representing water pump station as nodes, and water mains as edges. Law enforcement officials may use networks to understand criminal organizations, with gang members represented by nodes, and known affiliations between members represented by edges. Finally, we could use networks to represent Hollywood actors and actresses, connecting two actors/actresses if they’ve starred in a movie together. This would allow us to answer definitively whether all of Hollywood really is 6 degrees from Kevin Bacon. By introducing a slight variation on the idea of a network as defined above, we can capture other types of relationships. For example, we could add a direction to each of our edges, and think of them as providing a connection from one node, to another node. We indicate the direction of each edge by adding an arrow, and call the resulting network a directed network (see Figure 5). Directed networks allow us to model relationships that are not symmetric. For example, the food chain could be represented as a directed network, with each node representing a different species, and a directed edge added which travels from each predator to the prey it is known to eat. In this network there would be an edge from the node representing lions to the node representing gazelles, because lions are known to eat gazelles, but not one going the other way.

Perhaps one of the most ubiquitous directed networks in our everyday world is the internet, or the world wide web. Understanding this network is crucial to a number of tasks that we do everyday, often without thinking twice about them. It is this network that we will focus on in this lab. There are several different ways in which the internet can be modeled as a network, each of which captures a different aspect of the web. For example, we could model the physical infrastructure of the internet (wires, hubs, and internet service providers) as a network, to model the way in which data is sent from one computer to another over the internet. Another way to represent the internet is by assigning a node to each of the estimated 1.8 billion webpages on the internet, and then adding a directed edge from webpage A to webpage B when there is a link on webpage A which redirects the user to webpage B. It is this second viewpoint which we will be using in this lab. To understand the importance of being able to study this network, consider the problem of designing an internet search engine, like Google, Yahoo, Bing, or AltaVista. A user comes to LAB 10 – NETWORKS AND EIGENVECTOR CENTRALITY 3 the search engine website and types in a search term, say “Spider-Puppy”, hoping to find a website with an image like the one below: