ICOS Big Data Summer Camp

University of Michigan

Room R0210 - Ross School of Business - 701 Tappan Street, Central Campus
May 9-12 & May 19th, 2016
9:00 am - 5:00 pm

General Information

Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, H. V. Jagadish, Cliff Lampe, and Brian Noble

Coordinators: Teddy Dewitt, Todd Schifeling

Instructors: Reed Coke, Jerry Davis, Teddy Dewitt, Julian Katz-Samuels, Gareth Keeves, Eun Woo Kim, Yong Hyun Kim, Colleen Van Lent, Brian Noble, Eric Seymour, Todd Schifeling

Guides: Nivi Karki, Deepak Kumar, Jordan Liu, Lawrence Yong

Who: The course is aimed at graduate students and other researchers.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).

Contact: Please mail schifelt@umich.edu for more information.

Resources: Go here for example papers and data sources.


Schedule

Monday 09:00 Introduction and Overview with Jerry Davis (Intro ppt, pdf. Assignment ppt, pdf)
10:15 Break
10:30 Journey into Coding - Yong Hyun Kim
11:15 My Coding Addiction: Or How I Got from Camp Attendee to Here - Todd Schifeling
12:00 Lunch break
1:00 The Setup & Command line with Todd Schifeling (Directions: install, command line)
2:00 Group formation & How to learn in groups: lessons from design teams, Brian Noble (ppt)
Tuesday 09:00-10:45 Introduction to SQL with Teddy Dewitt (Main slides: pdf, slides on joins: pdf)
10:45-11:00 Coffee Break
11:00-12:00 Using SQL with Eric Seymour (Materials: email for link)
12:00-1:00 Lunch break
1:00-1:20 Using SQL to construct network datasets
1:20-5:00 Group Work (play data)
4:00-5:00 Check-in and end of day discussion
Wednesday 09:00-10:45 Introduction to Python with Colleen Van Lent (HTML, Notebook)
10:45-11:00 Coffee Break
11:00-12:00 Using Python with Eun Woo Kim & Reed Coke (Slides: ppt, pdf)
12:00-1:00 Lunch break
1:00-1:20 Using text re-use analysis to track legislative influence with Julian Katz-Samuels
1:20-4:00 Group Work (scraping links)
4:00-5:00 Check-in and end of day discussion
SIGN IN SHEET
Thursday 09:00-10:45 Introduction to APIs with Todd Schifeling (Install, Code, Lecture)
10:45-11:00 Coffee Break
11:00-12:00 Using APIs with Gareth Keeves
12:00-1:00 Lunch break
1:00-1:20 Now What? with Mariana Carrasco-Teja, Assistant Director of MICDE (Slides)
1:00-4:00 Group Work & Python + SQL (slides, code)
4:00-5:00 Check-in and end of day discussion
Thursday May 19th 1:00-4:00 Final Session with Group Presentations. Ross R0210
4:00-5:00 Dominicks!

Setup

To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.

Overview of the tools

Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.

The Bash Shell

Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.

Python

Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. We teach with Python version 2.7, since it is still the most widely used. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all-in-one installer.

IPython Notebook

The IPython Notebook is a web-based interface for interactive computing with Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The IPython Notebook comes pre-loaded on many all-in-one python installers like Anaconda CE.

SQL

SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use a firefox plugin called SQLite Manager, for the lessons.

Windows Installation

Python

  • Download and install Anaconda CE.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor

Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). Please ask your instructor to help you do this.

Firefox SQLite Plugin

Windows doesn't have sqlite3 available on the the command line, so we will use this plugin for Firefox instead. To install it:

  • Start Firefox.
  • Go to the plugin homepage.
  • Click the "Add Now" button.
  • Click "Install Now" on the dialog that appears after the download completes.
  • Restart Firefox when prompted.
  • Depending on Firefox version, either 1) Select "SQLite Manager" from the "Tools" menu or 2) Go to "customize" in main menu and drag SQLite into the menu.

Mac OS X Installation

Python

  • Download and install Anaconda CE.
  • Use all of the defaults for installation except make sure to check Make Anaconda the default Python.

Editor

We recommend Text Wrangler or Sublime Text. In a pinch, you can use nano or vi, which should be pre-installed.

Firefox SQLite Plugin

Instead of using sqlite3 from the command line, we will use this plugin for Firefox instead. To install it:

  • Start Firefox.
  • Go to the plugin homepage.
  • Click the "Add Now" button.
  • Click "Install Now" on the dialog that appears after the download completes.
  • Restart Firefox when prompted.
  • Depending on Firefox version, either 1) Select "SQLite Manager" from the "Tools" menu or 2) Go to "customize" in main menu and drag SQLite into the menu.