[BANANA] Berkeley Lab - Scientific Computing Seminar - May 9, 2008, 1:00pm

Esmond G. Ng EGNg at lbl.gov
Fri May 2 10:40:53 PDT 2008


Berkeley Lab - Scientific Computing Seminar

Date:  Friday, May 9, 2008

Time:  1:00pm-2:00pm  

Location:  Building 50F, 1647 Conference Room

Seminar Speaker:
    Michael W. Mahoney
    Yahoo Research

Title:  Community Structure in Large Social and Information Networks

Abstract:

The concept of a community is central to social network analysis, and 
thus a large body of work has been devoted to identifying community 
structure. For example, a community may be thought of as a set of web 
pages on related topics, a set of people who share common interests, or 
more generally as a set of nodes in a network more similar amongst 
themselves than with the remainder of the network. Motivated by 
difficulties we experienced at actually finding meaningful communities 
in large real-world networks, we have performed a large scale analysis 
of a wide range of social and information networks. Our main methodology 
uses local spectral methods, which are a novel application of ideas from 
scientific computation to internet data analysis. Our empirical results 
suggest a significantly more refined picture of community structure than 
has been appreciated previously. Our most striking finding is that in 
nearly every network dataset we examined, we observe tight but almost 
trivial communities at very small size scales, and at larger size 
scales, the best possible communities gradually ``blend in'' with the 
rest of the network and thus become less ``community-like.'' This 
behavior is not explained, even at a qualitative level, by any of the 
commonly-used network generation models. Moreover, this behavior is 
exactly the opposite of what one would expect based on experience with 
and intuition from expander graphs, from graphs that are well-embeddable 
in a low-dimensional structure, and from small social networks that have 
served as testbeds of community detection algorithms. Possible 
mechanisms for reproducing our empirical observations will be discussed, 
as will implications of these findings for clustering, classification, 
and more general data analysis in modern large social and information 
networks.

Sponsor of Seminar:  Esmond G. Ng



More information about the BANANA mailing list