Wednesday, January 18, 2012

Mining Facebook Data: Most "Liked" Status and Friendship Network


UPDATE 05/2014: The text is now obsolete. Use Rfacebook package instead, see examples here and there.

Professional R Enthusiast published a quick manual how to use Facebook Graph API. I particularly like a trick to obtain an access-token using Graph API Explorer.

Now, you can easily employ R to get your most "Liked" Facebook status ever. For me it was this photo followed by a lot of posts about my kids. The same code can be applied to Facebook Group or Page. For example the most popular videos, that appeared on TED Page last year, were the following:
See the code, it is not so long.

Now let us try something more sophisticated. Before Xmas a lot of my friends tested MyFnetwork app to visualize their friendship networks (see my network below). Surely, this is not the first app doing this. However, this might be the first one really useful. I can see groups of my friends separated (colleagues vs. friends of my wife vs. high school classmates vs. university classmates). Highlighting tries to emphasize the key persons in each group but unfortunately it misses an adjustment for total number of friends (Facebook enthusiasts like PetrC or LenkaZ seem to be more special than they really are). 

Original myFnetwork graph

So how difficult would it be to produce a similar graph with R? Actually, as you can see it is just a few lines of code. First I scraped the list of friends, then for each of them I got the list of mutual friends and finally Rgraphviz package does the plotting stuff. 

R/Graphviz plot with initials


As you can see the graphs are pretty similar (most likely, MyFnetwork also uses some port of Graphviz code). Of course, there exists endless list of modifications. For example, you can first download friends' profile pictures and then use custom node plotting function to produce something like the following:
R/Graphviz plot with profile pictures
Now you can guess who is my wife and who is the problematic friend from the previous post :-) Anyway, myFnetwork claims to get over 1.3 million users in 6 weeks. How difficult could it be make R Facebook/app?


Romain Francois: Crawling facebook with R

Update: I am getting comments about your installation problems with RCurl and Rgraphviz packages. Honestly, I am not administrator of my Ubuntu Linux Server and I have only a limited knowledge about possible issues. RCurl seems to be ok even on my Win32 machine - read the FAQ. Rgraphviz is a bit more tricky: see How to install it under Windows but I would recommend you a decent linux distribution for this work.

43 comments:

  1. Replies
    1. I thank you. If there would be no R-bloggers this post would not be read by >100 readers in the first 24 hours.

      Delete
  2. Cool. Let's make a "facebook" R package.

    ReplyDelete
    Replies
    1. Interesting idea. I am terribly busy until next Thursday but then we can talk about it! Anybody wants to join?

      Delete
    2. I'm happy to help.
      I wrote the RGoogleAnalytics package on http://code.google.com/p/r-google-analytics/

      I guess we can reuse alot of the structure there

      Delete
    3. I'm interested with the R-Facebook, hope can give some contributions.
      p/s I had posted "Any twitteR type package for Facebook in R Programming?" at Stackoverflow there.

      Delete
    4. I am in... very new but have enough experience in some other packages...

      Delete
    5. So many people interested in developing R/Facebook package. Perfect! On Friday I will try to mail all of you and we can exchange ideas and divide work / responsibility.

      If anybody else want to be involved, please, let me know by the end of the week. Insatiable Warrior - I am unable to find your email.

      Delete
  3. I can't run this code on my R Studio and get this error message
    "Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
    SSL certificate problem, verify that the CA cert is OK. Details:
    error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"

    I made sure a use an Active Token Number. Can you please help me fix this?I so look forward to explore this feature.

    ReplyDelete
    Replies
    1. In a situation like this - please, Google your error first. I am no expert in RCurl but here is the solution:

      http://stackoverflow.com/questions/8122879/roauth-on-windows-using-r

      Delete
    2. The other (more dangerous) approach is to use "ssl.verifypeer = FALSE" within each getURL call.

      Delete
    3. The problem was further commented and solved by btibert3 on Data Twirling

      http://www.brocktibert.com/blog/2012/01/19/358/

      Delete
  4. This is very cool thank you.

    My understanding is that Rgraphviz is no longer avaliable through CRAN. For those that like to stay with CRAN packages you can get a similar graph using:

    require(network)
    net1<- as.network(friendship.matrix)
    plot(net1, label=friends.initial)

    ReplyDelete
    Replies
    1. Great! I like it I would just delete arrowheads and use plot(net1, label=friends.initial, arrowhead.cex=0).

      Anyway, as Andrew reminded below Rgraphviz is still on Bioconductor.

      Delete
    2. RGraphviz is available through Bioconductor:

      source('http://bioconductor.org/biocLite.R')
      biocLite('Rgraphviz', depend=T, type='source')

      Delete
  5. Good post. I myself made a quick attempt at the Facebook Graph API for fun ( http://tonybreyal.wordpress.com/2011/11/10/facebook-graph-api-explorer-with-r ) to get the contents of posts on a wall feed. One of the problems of the Graph API is that is often returns a smaller subset than what you'd expect from visual inspection of the Facebook.com website (because privacy settings are strongly enforced at all endpoints of the the API). I ended up just web scraping the data via XPath expressions available in the XML package.

    ReplyDelete
    Replies
    1. Interesting.

      I did my first Facebook RCurl web-craping in January 2011. I was even able to log into Facebook account automatically. The problem is the following - every time Facebook changes its webpage it stops working. The second problem are AJAX features (yes, there are some roundabouts like m.facebook.com or pretend that you are an old web browser). Last, I heard that Facebook terminates your account if you do this too much.

      I did not experienced missing items in feeds but I will test it. Anyway, you are right there are things you cannot do with Graph API - like list of friends for somebody who is not your friend.

      Delete
    2. Very true, that's the problem with web-scraping in general: if the HTML structure changes you're screwed until you find time to adapt your code.

      Keep up the good work :)

      Delete
  6. Turns out that you need to install the Rgraphviz package from bioconductor by:
    source("http://www.bioconductor.org/biocLite.R")
    biocLite("Rgraphviz")

    ReplyDelete
  7. Really enjoyed this post. :)
    For even more fun, you might try using the Tulip visualization program to do things like force directed maps, or even 3D maps. You can import the graphviz dot code directly into Tulip for rendering.
    Check it out: http://tulip.labri.fr/TulipDrupal/
    ~Alex

    ReplyDelete
  8. When I try plot de graphic, I get this error message:

    > plot(g, "neato", attrs=attrs, nodeAttrs=nAttrs)
    Erro em as.double(x) :
    cannot coerce type 'S4' to vector of type 'double'

    Anyone would help me?!

    ReplyDelete
  9. I second Djongs, - I get the same error, - any suggestions?

    ReplyDelete
  10. Congratulations! Great Job! I have followed your example, but I have used "igraph" library to plot my graphs.

    You can get the code here: https://github.com/sciruela/facebookFriends

    ReplyDelete
  11. What about finding the most users of our Facebook page? Who post the most comments? any suggestions?

    ReplyDelete
  12. Hey this is a great post . Can I use a portion of it on my site ? I would obviously link back to your page so people could view the complete post if they wanted to. Thanks either way.

    ReplyDelete
  13. Impressive and amazing to me, i am impressed by this site.

    ReplyDelete
  14. Can we do this without the RGraphviz package?
    I couldn't install RGraphviz from CRAN.

    http://cran.r-project.org/web/packages/Rgraphviz/index.html

    ReplyDelete
    Replies
    1. Yes, you can. Try igraph package as Sciruela suggested above.

      Delete
  15. Hi, I am working on my thesis and I would like to use this code. I would cite you of course, but I would like to get your permission first. If you are interested in answering some questions on the potential of r and data mining in social media please let me know! Thanks.

    ReplyDelete
    Replies
    1. Hi Kristine! Of course, you can use the code. I believe you even do not need my permission. I am no expert in social media, I am bioinformatician.

      Delete
  16. I'm receiving an error

    "An active access token must be used to query information about the current user."

    I tried to refresh the token and clear it and get it again, but but still I can't make this work.

    Any ideas?

    ReplyDelete
    Replies
    1. I also got such error.
      So, i tried to use a fixed facebook function (see, http://stackoverflow.com/questions/15046111/how-to-get-most-popular-facebook-post-in-r) and then the problem was solved.

      Hope this helps a bit...

      #A fixed facebook function in http://stackoverflow.com/questions/15046111/how-to-get-most-popular-facebook-post-in-r
      require(RCurl)
      require(rjson)

      facebook <- function( path = "me", access_token = token, options){
      if( !missing(options) ){
      options <- sprintf(
      "?%s&",
      paste(
      names(options), "=", unlist(options),
      collapse = "&", sep = ""
      )
      )
      } else {
      options <- "?"
      }

      urlTemplate <- "https://graph.facebook.com/%s%saccess_token=%s"
      data <- getURL(
      sprintf(
      urlTemplate,
      path,
      options,
      access_token
      )
      )
      fromJSON( data )
      }

      Delete
  17. Here i gald to read this article here very nicely explaining about it. Thanks for nice info.

    facebook application development companies

    ReplyDelete
  18. I have a problem, when i try to create the friendship.matrix. When i try to run the for-loop i get an error saying:
    `[<-`(`*tmp*`, i, friends.id %in% mutualfriends, value = 1) : Indexing beyond the limits (I translated it from the german message).
    I hope that somebody can help me.

    ReplyDelete
  19. so where is the R package for Facebook ?

    ReplyDelete
    Replies
    1. #type on R the following or just copy paste it on console:

      #Installing packages
      install.packages("Rfacebook")
      install.packages("Rook")

      #Assign libraries
      library (Rfacebook)
      library (Rook)

      Delete
  20. I'm stuck at "tmp <- facebook( path=paste("me/mutualfriends", friends.id[i], sep="/") , access_token=token)"
    how is it possible to get every friend's access token!?
    please help ><
    I'm really interested in this application
    thanks in advance

    ReplyDelete
  21. I can get this error message

    SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

    and I solve this problem adding a option on getURL function

    geURL("you web address", ssl.verifypeer = FALSE)

    ReplyDelete
  22. This is good. How do you plot if you have a huge friend list. is there a way to filter, lets say a vector containing just the list of names we want to see the network among.

    Thanks

    ReplyDelete
  23. Please note that fb removed the access to the endpoint friends for people not using your app in v2.0. Therefore, it is most likely to not work anymore. :(

    ReplyDelete