To prepare to follow along, you will need to have the datasets downloaded, some spreadsheet program (e.g., Excel), and a programming language (e.g., R) installed. See the below sections for how to get everything set up
You can find all the datasets needed from the workshop from the book’s GitHub Page:
You can click on the “Code” drop-down and select “Download Zip” to download the data and files for the lesson materials.
Microsoft Excel and Google Sheets are the more common types of spreadsheet software used. If you do not have Excel, there are other free options available:
- LibreOffice (Free and open source): https://www.libreoffice.org/
- FreeOffice (Free but closed source): https://www.freeoffice.com/en/
All of these programs can read and write Excel and comma delineated (CSV) files, which you can use to follow along.
Below you will find the installation instructions for setting up the programming language we will be using (e.g., R, Python) along with its programming environment (e.g., RStudio, Jupyter) which can be downloaded for free.
We will be using R and RStudio for the workshop. If you would like a video installation tutorial, please see the R section of The Carpentries workshop template .
The links to install R can be found here: https://cloud.r-project.org/. Navigate to the correct operating system.
For Mac users, download the
.pkg file under the “Latest release” section.
For Windows users, please install both the
base version as well as
After you have installed R, you can install RStudio. We will use RStudio as the integrated development environment (IDE) to write and work with R code. Rstudio can be downloaded from the following location: https://rstudio.com/products/rstudio/download/
Installing R packages
Once we have Within RStudio, there will be a “Packages” tab in the bottom right panel. Click on the “Install” button.
In the pop-up window type in “tidyverse remotes” and click “Install”.
The Console section of RStudio will begin installing the
tidyverse package we will be using.
remotes package will be used to install the
medicaldata package for some of the medical datasets we will be using.
On the left side you should see a “Console” tab.
To install the
medicaldata package type in:
Testing your R installation
When the installation is finished, you can check if the package was installed properly and load the package by scrolling down the “Packages” tab and clicking the checkbox next to “tidyverse”.
We will be using the Anaconda distribution for Python. The download links and instructions can be found here: https://docs.anaconda.com/anaconda/install/.
You can accept all the default options for the installation. The Anaconda installation direction has this to say about the advanced options presented during the installation:
Choose whether to add Anaconda to your PATH environment variable. We recommend not adding Anaconda to the PATH environment variable, since this can interfere with other software. Instead, use Anaconda software by opening Anaconda Navigator or the Anaconda Prompt from the Start Menu.
Choose whether to register Anaconda as your default Python. Unless you plan on installing and running multiple versions of Anaconda or multiple versions of Python, accept the default and leave this box checked.
The end of the installer gives you the option to install the PyCharm Integrated Development Environment. We will not be using PyCharm for the workshop, but you can install and use that instead of JupyterLab, if you wish.
Testing your Python installation
For Windows + Mac, Anaconda comes with the “Anaconda Navigator” application. It’s normal for it to take a while the first time you launch it. If you open it, you should be given the option to launch JupyterLab
Once you have opened jupyter lab, you can use the file browser on the left side to open the ds4biomed folder that you have downloaded and unziped/extracted. Click the “Python 3” button under “Notebook” to launch a Python notebook.
To check that the packages we need are working, Type the following commands into the open cell block
import pandas as pd import seaborn as sns import statsmodels
You can then execute and load the packages by pressing the right-triangle “play” button towards the top. What you should see is a number increment to the left of the code block you typed in the code. If you see a “*” it means it is still executing the code.
Binder (For Installation Issues)
If you are having installation issues, there is a Binder instance with all the packages and datasets setup that you can use in a browser. You can click one of the Binder badges to launch a programming environment for you in the cloud.
Note that these instances can take a very long time to start up and install everything, it can take anywhere from 1 minute to over 30 minutes depending on how long the last time someone has used a binder instance.
One caveat with Binder is that each instance does not persist any data. So if you are idle for too long, or close out of the tab, all the work you did within Binder will be lost.
You will need to either copy/paste your work to a place on your computer, or export the files out of Binder if you wish to have files saved.