Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Wednesday, April 26, 2017

Making your plots interactive - update

Two years ago I have been experimenting with Shiny and interactive plots and I have published here a post that remains to be one of the most read pages of this blog. I feel guilty about it because I was wrong about almost everything. A few weeks later, the free use of shinyapps.io was over. Meanwhile, plotly package, I did not appreciate that time so much, has matured and added more R functionality.

If you are into ggplot2, it is now super-easy to add interactivity to your graph. Just take your static ggplot object pl like the following

mpg dataset
and run

library(plotly)
ggplotly(pl)

That's it. You can configure your tooltip text with ggplotly(pl, tooltip = "text"), see my Rmarkdown example. And it works not only for a simple scatterplot but for more complex graphs as well, as @cpsievert tweeted today.


There are now also many easy-to-use htmlwidgets. And with the ggiraph package, you get now your click as Shiny input.

Tuesday, January 31, 2017

Blogging from RStudio with blogdown package

In my previous post, I described how to make a simple personal Github page with Jekyll. You can blog in R/Markwown this way, you just need to ask rmarkdown to keep .md files and push them (with images and other assets) to Github.

The blogdown package automates this process and open the resulting page in RStudio viewer. You just provide .Rmd files and any time you make a change, blogdown crunches all the R chunks, generates .md files and uses pandoc to translate them into HTML pages.

I am going to use it mainly for code snippets that are too short for this blog but too long for a tweet. Plus I do not wish to forget them. See my blogdown blog on the first figure. Here, I will provide a step-by-step guide how to create a similar one.


My blogdown blog at http://simecek.github.io/blog


1) Create a Github repository:

Go to your Github account, click the big green button "New Repository" and name your blog (it will later appear at http://USERNAME.github.io/BLOGNAME where BLOGNAME is a name of your repository).

Do not create the "README" or "LICENCE" file. If you did, just delete it because blogdown requires to start with an empty directory and throws a message "Warning message: In blogdown::new_site() : The directory '.' is not empty" otherwise.

New repository

2) Create RStudio Project:

Open RStudio, select File -> New Project -> Version Control -> Git. Copy URL address from your browser (http://github.com/USERNAME/BLOGNAME) into Repository URL field. Click "Create Project" button.

New RStudio project

3) Install blogdown and create an empty skeleton

You can install blogdown from Github with devtools:

devtools::install_github('rstudio/blogdown')

After that you might need to install HUGO, a platform that blogdown stands on, with blogdown::install_hugo(). Finally, a skeleton of a new blog can be generated by blogdown::new_site().

4) Customize config.toml file

Open the file "config.toml" and add a new line with publishDir = "docs" (preferably somewhere at the beginning of the file). This makes your HTML page to be generated into "./docs" folder and setting of your Github Pages will be later much easier. If you prefer hard ways, Amber Thomas and Jente Hidskes describe how to keep default "publishDir" and push the "./public" folder into a branch with git subtree.

You might also want to add the links to your Twitter, Github... accounts, maybe even Google Analytics and Disqus. For Disqus, I did not manage to use a default template and was forced to create my own version of footer. Solved - see Update.

Finally, call blogdown::serve_site() and view the blog. Any change you do immediately propagates into your RStudio viewer.

If you do not like the default look, you can choose and customize a theme from Hugo Theme gallery (use blogdown::install_theme function).

5) Commit Changes, Push Them to Github and Github Page Setting

Finally, go into your RStudio "Git" panel, select all files and commit them with and an appropriate message ("initial commit of my first blogdown blog, hurray!"). Of course, if you prefer git in a terminal to RStudio "Git" panel, you can commit and push to Github from a command line (git push -u origin master).

Next, you need to tell Github where to look for your Github pages. In the repo "Settings" (http://github.com/USERNAME/BLOGNAME/settings), find "GitHub Pages -> Source", select "master branch /docs folder" and push the "Save" button.

Setting Github Pages

6) Adding a content

Finally, your blog is online. You can access it at http://USERNAME.github.io/BLOGNAME.

To add content use either new_post() function or RStudio add-in "New post" as shown on a figure below. Use format "R Markdown" if your post contains R code that needs to be run and "Markdown" otherwise. If you are new to markdown, Help -> Markdown quick reference (from RStudio menu) contains a lot of useful tips.

New post addin

UPDATE 2/1/2016: Regarding Disqus problem: it is caused by blogdown::serve_site() function that ignores some settings in config.toml file, namely baseurl option. If you want to use Disqus, set URL of your blog to baseurl and run blogdown::build_site(local=FALSE) just before committing and pushing changes to Github. See build_site() help for more information.

Thursday, January 26, 2017

Simple Personal Github Page (with Jekyll)

I remember when I was creating my first Github Page. I was still quite new to git, did not understand branches and I was lost in Jekyll configuration. This step-by-step guide is intended for people in such situation that are, however, users of a) Github b) RStudio (if not, read this).

1. Create Github Repository

Go to your Github account and click the big green button "New Repository", name your new repository USERNAME.github.io (for example simecek.github.io).

New Github repo

2. Create RStudio Project

Open RStudio, select File -> New Project -> Version Control -> Git. Copy URL address from your browser (http://github.com/USERNAME/USERNAME.github.io) into Repository URL field (see figure below). Click "Create Project" button.

New version controlled project

Locate the folder (USERNAME.github.io) on a disk. You will need it in the following step.

3. Select and Customize Jekyll Template

If you wish, you can use any static web page as your Github Page. For example, you can just have a redirecting script. Here, I will show how to build a simple landing page with Jekyll.

First of all, select a template you want to use, for example from http://jekyllthemes.org/

Jekyll Themes gallery


Save and extract template ZIP file into your RStudio project folder USERNAME.github.io, so it should now look something like this (the folder on a disk, NOT yet Github repository).

Read extracted README file. Typically, you need to customize _config.yml and index.html files. The page itself is generated by Jekyll (ruby gem). Github supports Jekyll, so you might let generation of the actual HTML pages on him.

4. Commit Changes and Push Them to Github

Finally, go into your RStudio "Git" panel, select all files and commit them with and an appropriate message ("initial commit of my first Github Page, hurray!").

Git panel is just next to Environment and History


You might want to check that Github Pages are turned on in your repo's setting (https://github.com/USERNAME/USERNAME.github.io/settings). You might need to wait for a minute or two before the page is generated. Do not like it? Modify - commit - push (- make a tea - rinse - and repeat).

Thursday, July 14, 2016

One more reason to use Feather

It was quiet here for some time. Not because I have nothing to blog but because I have no time to do so properly. One of those almost forgotten posts was about Feather package / module.

Feather is a fast, lightweight binary format for data frames with R and Python implementation. The original RStudio announcement is here. And sure, the speed improvement is impressive. See my numbers for saving ~ 100 million probabilities:

# haplotype probs: 192 animals x 8 x 64000 markers
> format(object.size(probs), units="Mb")
[1] "750 Mb"

# saveRDS or save needs almost a minute to write probs to disk
> system.time(saveRDS(dprobs, file="DO192_probs.rds"))
   user  system elapsed
 50.701   0.574  51.678

# write_feather needs 6-7 seconds
> system.time(write_feather(dprobs, file="DO192_probs.feather"))
   user  system elapsed  
  1.344   1.051   6.272

Feather is even better if you compare it to traditional text formats like CSV. As David Smith explains in his blog, one of the reasons is traditional formats are row-oriented while internal R's storage is column-oriented.

Diagram credit: Hadley Wickham

I have one more reason to use Feather. If you have datasets with many columns (e.g. genes in human/mouse genome) and you need fast access to just one column (e.g. Shiny app), then Feather is ideal because its columns are automatically indexed.

read_feather("DO192_probs.feather", column = "19_48310898")
   user  system elapsed
 0.068   0.000   0.069

Sure, there are other solutions, like rhdf5 or RSQLite, but Feather is the easiest to use, at least for me, at least in R. See David Smith (Microsoft R) for more details: http://blog.revolutionanalytics.com/2016/05/feather-package.html

Wednesday, May 20, 2015

Teaching R course? Use analogsea to run your customized RStudio in Digital Ocean!

Two years ago I taught an introductory R/Shiny course here at The Jackson Lab. We all learnt a lot. Unfortunately not about Shiny itself, but rather about incompatibilities between its versions and trouble with its installation to some machines.

And it is not only my experience. If you look into forums of Rafael Irizarry MOOC courses, so many questions are just about installation / incompatibilities of R packages. The solution exists for a long time: run your R in a cloud. However, customization of virtual machines (like Amazon EC2) used to be a nontrivial task.

In this post I will show how a few lines of R code can start a customized RStudio docklet in a cloud and email login credentials to course participants. So, the participant do not need to install R and the required packages. Moreover, it is guaranteed they all run exactly the same software. All they need is a decent web browser to access RStudio server.

RStudio server login

Running RStudio in Digital Ocean with R/analogsea

So how complicated is it today to start your RStudio on clouds? It is (almost) a one-liner:
  1. If you do not have  Digital Ocean account, get one. You should receive a promotional credit $10 (= 1 regular machine running without interruption for 1 month):
    https://www.digitalocean.com/
    (full disclosure: if you create your account using the link above I might get an extra credit)
  2. Install analogsea package from Github. Make sure to create Digital Ocean personal access token and in R set DO_PAT environment variable. Also create your personal SSH key and upload it to Digital Ocean.
  3. And now it is really easy:

    library(analogsea)
    # Sys.setenv(DO_PAT = "*****") set access token

    # start your machine in Digital Ocean
    d <- docklet_create(size = getOption("do_size", "512mb"))
    # run RStudio on machine 'd' (rocker/rstudio docker image)
    d %>% docklet_rstudio()
The last line should open your browser with RStudio login page (user "rstudio", password "rstudio"). If not, use summary(d) to get the IP address of your machine and go to http://your_machine:8787

It will cost you ~$0.01 per hour ($5 per month, May 2015). When you are done, do not forget to stop your Digital Ocean machine (droplet_delete(d)). At the end, make sure that you successfully killed all your machines - either log in to Digital Ocean or by calling droplets() in R.

Wednesday, February 11, 2015

Make your R plots interactive

THIS POST IS NOW OBSOLETE, USE PLOTLY OR GGIRAPH, SEE MY CODE EXAMPLE

As a part of my daily job, I draw scatterplots, lots of them. And because there are thousands of genes expressed in any mouse or human tissue, my typical plot looks something like this (code). (Actually, it is a comparison of variance that can be attributed to "sex" factor in mRNA vs. protein expression.)

The question is - what is a gene in the top right corner? And what is the one next to him? And this one? And that?

Wednesday, November 5, 2014

Tweeting at #IMGC14 conference

The 28th annual International Mammalian Genome Conference was held over the last week in Bar Harbor, MA. For the first time, the official conference hashtag #IMGC14 was introduced. Twitter shares plummeted 9% next day. Pure coincide? I do not think so!

Totally, 79 participants contributed 1546 tweets. Guess who was the Twitter evangelist?


The distribution of tweets in time reveals when the lobster was served as a conference dinner.

Tuesday, May 13, 2014

RStudio: Pushing to Github with ssh-authentication

If RStudio prompts you for a username and password every time you try to push your project to Github, open the shell (Git menu: More/Shel...) and do the following:

1) Set username and email (if you did not do that before)
git config --global user.name "your_username"
git config --global user.email "your_email@example.com"

2) Create SSH key
ssh-keygen -t rsa -C "your_email@example.com" 

In RStudio, go to menu Tools / Global options / Git SVN / View public key and copy the key to your Github account setting (Edit profile / SSH keys / Add SSH key).

To check that ssh-authentication works, try to run
ssh -T git@github.com

and you should get something like

Hi your_username! You've successfully authenticated, but GitHub does not provide shell access. 

3) Change remote.origin.url from HTTPS to HTTP 

It might be Windows specific, but after 1)+2) RStudio still asks me for user name and password. After a long Google search, I have found a solution here and that is
git config remote.origin.url git@github.com:your_username/your_project.git

Hip, Hip, Hurrah!



If it was trivial for you, I do apologize. I am still very bad in guessing what could be useful for somebody and what not so much. That is why I have this blog and Github account in the first place.

One example, last year I published a paper in JSPI journal that improves a test for interaction in some very specific 2-way ANOVA situation (just one observation per group). The paper submission was an odyssey, mostly because of me. In one moment I doubted whether to retract the paper or not and I even did not upload the package to CRAN at first, just put it on Github.

Then I discovered that some guys found it and had built their package using it. They presented the results at UseR! 2013 conference. I might have met one of those biologists but I am sure I never mentioned my package to them. Finally, - and this is a bit embarrassing - I received an email from Fernando Tusell that I misspelled his name in one of my functions.

In summary, even if you see your work as non-essential from your perceptive, the others may have different view. Just do your best and share your results. Github is a perfect place for this.


Wednesday, January 1, 2014

Shiny Year 2014

http://glimmer.rstudio.com/simecek/pf2014en/ (source on Github)

"Pour Felicitér" is a French phrase used by Czech and Slovak to wish a happy new year, but not in French speaking countries. It is dating back to the beginning of 19. century when French was popular in Czech/Austrian city population similar way as English is today.

I originally wanted to make it a snow flake to share a plenty of snow we have in Maine. But then I discovered this Xmas R post and made it Shiny.

Let your 2014 be shiny as well!


Wednesday, August 8, 2012

Get a path to your Dropbox folder

I am currently designing my RStudio - Dropbox - Mardown/Knitter/Wordpress - Github workflow. One problem is that working on multiple machines with different version of Windows means I somehow need to tell R where my Dropbox folder is located.

I used to set the working directory at the beginning of my R scripts but it became tedious to change the path all the time. The solution might be to add definition of 'dropbox.folder.path' variable to .Rprofile files. Or there is a hard way - to write a script detecting Dropbox location automatically.

Wednesday, January 18, 2012

Mining Facebook Data: Most "Liked" Status and Friendship Network


UPDATE 05/2014: The text is now obsolete. Use Rfacebook package instead, see examples here and there.

Professional R Enthusiast published a quick manual how to use Facebook Graph API. I particularly like a trick to obtain an access-token using Graph API Explorer.

Now, you can easily employ R to get your most "Liked" Facebook status ever. For me it was this photo followed by a lot of posts about my kids. The same code can be applied to Facebook Group or Page. For example the most popular videos, that appeared on TED Page last year, were the following:
See the code, it is not so long.

Now let us try something more sophisticated. Before Xmas a lot of my friends tested MyFnetwork app to visualize their friendship networks (see my network below). Surely, this is not the first app doing this. However, this might be the first one really useful. I can see groups of my friends separated (colleagues vs. friends of my wife vs. high school classmates vs. university classmates). Highlighting tries to emphasize the key persons in each group but unfortunately it misses an adjustment for total number of friends (Facebook enthusiasts like PetrC or LenkaZ seem to be more special than they really are). 

Original myFnetwork graph

So how difficult would it be to produce a similar graph with R? Actually, as you can see it is just a few lines of code. First I scraped the list of friends, then for each of them I got the list of mutual friends and finally Rgraphviz package does the plotting stuff. 

R/Graphviz plot with initials


As you can see the graphs are pretty similar (most likely, MyFnetwork also uses some port of Graphviz code). Of course, there exists endless list of modifications. For example, you can first download friends' profile pictures and then use custom node plotting function to produce something like the following:
R/Graphviz plot with profile pictures
Now you can guess who is my wife and who is the problematic friend from the previous post :-) Anyway, myFnetwork claims to get over 1.3 million users in 6 weeks. How difficult could it be make R Facebook/app?


Romain Francois: Crawling facebook with R

Update: I am getting comments about your installation problems with RCurl and Rgraphviz packages. Honestly, I am not administrator of my Ubuntu Linux Server and I have only a limited knowledge about possible issues. RCurl seems to be ok even on my Win32 machine - read the FAQ. Rgraphviz is a bit more tricky: see How to install it under Windows but I would recommend you a decent linux distribution for this work.

Wednesday, January 11, 2012

Toying with Google Apps Script

Google offers an access to its services with Apps Scripts (JavaScript). That gives you a possibility to connect your spreadsheet to a fascinating variety of tools like geocoder, stock info, language translator, or email.

My java-scripting abilities are rather limited but just playing with tutorial examples I was quickly able to produce a script analyzing time distribution of received emails. It looks through your Gmail for the given contact and record the times of emails sent by it.

So this is email behaviour of my wife. You can see a peak at the noon when babies are having a nap and another local maximum at the end of the day when they are finally sleeping in beds.

Wednesday, December 7, 2011

UseR! 2011 slides are now available

I have just realized that UseR! 2011 presentation slides are now available from the conference web site.

Unfortunately, no big surprise this year. Or maybe this is good news as it means that I have all the important stuff in my RSS Reader. And by the way, this blog is now listed on www.r-bloggers.com.

To be fair there was a couple of interesting talks (reaching my attention before the slides were published), namely

Paul Marrel: Vector Image Processing
Paul is converting static PDF image into dynamic SVG graphics. This is not so much about R but it is really cool.

Markus Gesmann: Using the Google Visualisation API with R
Easy way how to put dynamic images on your web page. Particularly useful, if you work with GPS tracking data. Verified :-)

Susan Ranney: It's a Boy! An Analysis of Tens of Millions of Birth Records Using R
RevoScaleR functionality demonstrated on 3.1GB dataset of birth records.

Ulrike Grömping: Design of Experiments in R
This is a bit personal. I have very little (read "no") experience with designing experiments (DoE). Back in 2008 I met my wife's supervisor that was an author of an old commercial DoE software. We both realized that R missed a lot of DoE methods. The problem was my lack of theoretical knowledge in this area and his style of work. I gave up but I am happy that they finally finished the job without me - the book was published and the package is available on CRAN. Looking into DoE CRAN Task View - we were not alone.

Wednesday, November 23, 2011

Prague Half Marathon Ranking: 2% or 25% missing?


I am a regular participant of Prague International Half Marathon. In a mass event like this the horde of runners needs a long time to reach the starting line. To make the times mutually comparable the “start time” is measured and afterwards subtracted from the “finish time”. Also the crowd is organized to corridors in such a way that faster runners are ahead of the slower ones.

Sometimes everything goes wrong and that was the case of the year 2010. Imagine yourself to train for months then make your best – just to discover that your time of start was not recorded. Organizers apologized but claimed that only less than 2% of the participants were affected. Really?

Wednesday, October 26, 2011

How to display R code on a web page

Starting to write a blog I need a way how to publish my R codes. One possibility would be to just add some formatting with Pretty R. Nice, but I miss a repository with all codes ever submitted and possibility to make corrections.

The final solution was to create an account on Github and submitting codes via Gist. The only problem was you cannot see the code in the Google Reader. I believe I can live with it.

Getting Genetics Done: Syntax Highlighting R Code, Revisited

Tuesday, October 25, 2011