COMP/STAT 112: Introduction to Data Science

James Normington, PhD

While we wait….

  • Sit next to people you don’t know!

Introductions

Introduce yourself to your neighbors!


Here are some things you can share, if you choose:

  • Your preferred name (+ pronunciation tips)
  • Aspects of who you are and have been (e.g. pronouns, geographical identity, cultural identity, hobbies/passions)
  • Aspects of who you’d like to be (e.g. personal/professional/academic goals)
  • How you are feeling about COMP/STAT 112? Is this a requirement for your degree, an academic interest, or both?
  • Anything else?

What is data science?

  • Data science is the process to gain insights from data. The practitioners of data science (data scientists) combine statistics, programming, and domain knowledge to collect, clean, and analyze data, and present results to stakeholders.

What is the difference between data science and statistics?

  • Data scientists are concerned with the entire life-cycle of a data analytic question – from data collection/acquisition, data storage, data cleaning, data analysis, and presentation of results.

  • Statisticians are mostly concerned with data analysis. Statisticians use linear algebra, probability theory, and statistical modeling to best mimic a data generation process and explain relationships between variables.

What is the difference between data science and statistics?

  • Data scientists often rely on models and algorithms created by statisticians (and, more recently, computer scientists!). Statisticians often rely on data scientists to provide clean, analytic-ready data.

  • “Data scientist” and “statistician” are not mutually exclusive titles, either. One title may suit you better, depending on who you are talking to.

Data science as an intersection

Image credit: user BowTied_Raptor on substack.com

Big Data


The Four V’s of Big Data (Image credit: Shalina Patidar, DZone.com)

Data Science in Liberal Arts

The liberal arts setting provides an opportunity to synthesize lenses for data developed in the social and hard sciences, humanities, and fine arts

  • Data Science applies these lenses to extract knowledge from data within a particular domain of inquiry and contexts such as
    • educational policy making,
    • ecological modeling,
    • journalism,
    • computational linguistics, etc.

Data Science Skills

Data Science skills mapped to Job Titles (image credit: Dave Holtz)

Data Science Jobs

  • Government agencies (e.g., NSA, CIA)
  • Science institutions (e.g., NASA, NIH)
  • Companies/divisions specializing in data analysis (e.g., IBM)
  • Retail companies that have huge amounts of data and analyze it to drive business decisions (e.g., Amazon, Netflix, Target, Etsy, General Mills)
  • Other sectors: journalism, healthcare, biotech/genomics, NGOs, finance, insurance, gaming and hospitality, energy/utilities, manufacturing, pharmaceuticals

Data Science Projects

Data Journalism

Public Policy

More Examples

Data Science Projects

Who am I

Prof. James Normington

https://jamesnormington.github.io/

Pronouns: he/him/his

  • Wife: Angela, married since 2018

  • Daughter: Phoebe, born Feb 2022

Introductions

Now, take a turn introducing another person in the class.

Course Details

Syllabus

  • Learning Goals
  • Community of Learners
  • Course Components
  • Communication
  • Environment You Deserve

Learning Goals

Overall Learning Goal

Gain confidence in carrying out the entire data science pipeline,

  • from research question formulation,
  • to data collection/scraping,
  • to wrangling,
  • to modeling,
  • to visualization,
  • to presentation and communication

Learning Goals

Overall Learning Goal

Gain confidence in carrying out the entire data science pipeline,

One visual representation of the pipeline with legos. The first step is data collection with a pile of legos. The second step is a data preparation with the legos separated by color. The third step is data visualization with legos separated and organized by color and type so you can more easily understand what’s happening. The fourth step is data analysis in which legos are put together into some creation such as roof, lawn, etc. Lastly, the final step is data storytelling in which someone is playing with a lego house.

Learning Goals

Overall Learning Goal

Gain confidence in carrying out the entire data science pipeline,

Cute fuzzy monsters putting rectangular data tables onto a conveyor belt. Along the conveyor belt line are different automated “stations” that update the data, reading “WRANGLE”, “VISUALIZE”, and “MODEL”. A monster at the end of the conveyor belt is carrying away a table that reads “Complete analysis.”

Art by Allison Horst

Learning Goals

By the end of the course, you’ll be able to:

  • Appreciate the role of data science in a wide range of disciplines
  • Identify, collect, and wrangle data from multiple sources
  • Visualize a variety of types of data
  • Find code online and adapt it to your given task
  • Using iterative refinement and teamwork, take a data science project from concept to reality
  • Communicate your results so that they’re reproducible and accessible for a broad audience

Community of Learners

Target Audience.

  • No matter your statistics and coding background (no experience to expert) or major and interests, this course is for YOU!

Learn by doing.

  • Learning by doing entails getting stuck, making mistakes, asking questions, and getting feedback.

Community of Learners

Collaboration.

  • Working effectively in a group setting is an essential life skill that requires practice and demonstrably improves your learning

Community building.

  • People learn best in community when they feel safe, seen, and cared about.

Course Components

Activities & Assignments

  • In class activities (Notes + exercises) –> finish after class and turn in as assignments
  • Opportunity to practice skills and dig deeper

Tidy Tuesday & Iterative Viz

  • Regular visualization practice on new, real data
  • Opportunities to iterate based on feedback
  • Opportunity to engage with wider data science community

Course Components

Midterm Assessment (currently scheduled for Tues. March 14th)

  • Part 1: Conceptual, 20 minutes. No notes.
  • Part 2: Application, 40 minutes. Open notes and computer.

Final Project & Presentation

  • Group data science project
  • Opportunity to showcase skills and learn new things on a real data set

Communication

Slack Channels: class-wide messaging platform for content-related questions

  • General channel: class-wide announcements
  • Content-specific class-wide channels: data-viz, troubleshooting, etc.
  • Section-specific channels: section-1
  • Study-group channel (optional): class-wide channel to seek classmates to work together outside of class

Email or DM in Slack: for anything personal in nature (e.g. illness, feeling overwhelmed, feedback, etc.)

Environment You Deserve

Macalester College values diversity and inclusion.

  • I’m committed to creating a diverse, inclusive, and equitable learning environment. There will never be space in my classroom for racism, sexism, transphobia, homophobia, ableism, or any other fear or belief which attacks the humanity of my students.

  • MSCS Community Guidelines.

  • These guidelines were created by the MSCS faculty and staff in our ongoing efforts to create a community that is more welcoming, supportive, and inclusive.

Environment You Deserve

  • Respect: regard the feelings, wishes, experiences, and traditions of others as individuals

  • Empathy: try to sense and understand others’ emotions and feelings

  • Start with Curiosity: don’t assume; instead, ask a question

  • Supportive Community: you are not learning in isolation but rather, in a community ready to help and assist each other

Let’s Get Started

Go to https://jamesnormington.github.io/112_spring_2023/course-schedule.html

After Class

Continue working through the activity:

  • Try each line of code (copy and paste) in the console and then check solutions on activity online
  • Complete Practice Section at the end and turn that in on Moodle (Assignment 1) by next Wednesday @ 11:59pm