As it is something I don’t know a whole lot about, I recently got the bright idea to to start working with social network analysis in infection transmission. A search of the literature turned up a few interesting gems, mainly of infection transmission through sexual networks, but little in the way of actual data. There were plenty of boiled down examples of other people’s data, but they don’t post the data for people to play with. I could easily simulate some data. A network analysis software package, UCINET, has a feature to create a random network. However, I felt this to be cheating and desired to get my hands on something real.
In a rare moment of spontaneity, I posted a call for study subjects through the School of Public Health’s open student mailing list. Surprisingly, I got about 65 responses of people willing to expose their contact networks for a day. Picking a single day, I asked the group of volunteers to fill out a form stating whether they had contact any of the other volunteers on the list and how many people they may have had contact with who were not on the list. I was intrigued by how many people were willing to answer the survey and return them to me with out compensation of any kind. I was also surprised at how difficult it is to create a survey that provides you with exactly the type of data you want in the format you wish.
The basic network of people who were on the list looks like this:
At first I was worried that the data might be worthless, due to the lack of overlap in volunteers or possibly due to too much overlap, as might be the case if all of the people on the list have a class together on the study day. However, the network appears intuitive, and knowing the individuals on the list, the clustering present is logical. The circles in the top corner are isolates who had no contact with people in the group. The red dot in the center is me. I had a wide variety of contacts since I was the one doing the survey. Although scientifically, it might not make sense to include my self in the contact network, I do have contact with many of the people on the list regularly, so I could as a member. The clump to the left of me is primarily Epid PhD students, of which I’m a member. It has to be said that they provided the most concise data.
Including the contacts people had that were not on the list, we can see that the results get a little more interesting:
In addition to the cool looking patterns, we can see that many people have contacts well outside the immediate study group. In fact, the people in this study had a mean of 20 contacts per person for the single day. The contact distribution was highly skewed, with some people having as high as 150 contacts for the single day. Contact rates varied by department and by degree. The colored circles represent what’s called “K-cores”, that is groups that are more connected to one another than with other groups. Here, in this case, the K-cores roughly turn out to represent differernt departments represented by the study group. In fact, it’s is fairly surprising how well it pegged individuals into their respective groups. It even positioned me right between Epid and Biostat. I am one of the larger blue dots up top, and Epid (blue) and Biostat (green) are to the left and right of me. HBHE is mostly scattered black dots. The size of the dot represents the relative proportion that the member constitutes of the K-core.
Most of the study group were folks from the School of Public Health. HBHE is by far the most connected of all the departments. Other (a mixture of departments spread around campus) was the least connected of the people who bothered to report their affiliation. Not surprisingly, PhD students reported a lower level of contact than Master’s students, a difference confirmed by a Wilcoxon test with a p-value of .033.