COMP/STAT 112: Introduction to Data Science
While we wait….
- Sit next to people you don’t know!
Introductions
Introduce yourself to your neighbors!
Here are some things you can share, if you choose:
- Your preferred name (+ pronunciation tips)
- Aspects of who you are and have been (e.g. pronouns, geographical identity, cultural identity, hobbies/passions)
- Aspects of who you’d like to be (e.g. personal/professional/academic goals)
- How you are feeling about COMP/STAT 112? Is this a requirement for your degree, an academic interest, or both?
- Anything else?
What is data science?
- Data science is the process to gain insights from data. The practitioners of data science (data scientists) combine statistics, programming, and domain knowledge to collect, clean, and analyze data, and present results to stakeholders.
What is the difference between data science and statistics?
Data scientists are concerned with the entire life-cycle of a data analytic question – from data collection/acquisition, data storage, data cleaning, data analysis, and presentation of results.
Statisticians are mostly concerned with data analysis. Statisticians use linear algebra, probability theory, and statistical modeling to best mimic a data generation process and explain relationships between variables.
What is the difference between data science and statistics?
Data scientists often rely on models and algorithms created by statisticians (and, more recently, computer scientists!). Statisticians often rely on data scientists to provide clean, analytic-ready data.
“Data scientist” and “statistician” are not mutually exclusive titles, either. One title may suit you better, depending on who you are talking to.
Data science as an intersection
Image credit: user BowTied_Raptor on substack.com
Big Data
The Four V’s of Big Data (Image credit: Shalina Patidar, DZone.com)
Data Science in Liberal Arts
The liberal arts setting provides an opportunity to synthesize lenses for data developed in the social and hard sciences, humanities, and fine arts
- Data Science applies these lenses to extract knowledge from data within a particular domain of inquiry and contexts such as
- educational policy making,
- ecological modeling,
- journalism,
- computational linguistics, etc.
Data Science Jobs
- Government agencies (e.g., NSA, CIA)
- Science institutions (e.g., NASA, NIH)
- Companies/divisions specializing in data analysis (e.g., IBM)
- Retail companies that have huge amounts of data and analyze it to drive business decisions (e.g., Amazon, Netflix, Target, Etsy, General Mills)
- Other sectors: journalism, healthcare, biotech/genomics, NGOs, finance, insurance, gaming and hospitality, energy/utilities, manufacturing, pharmaceuticals
Who am I
Prof. James Normington
https://jamesnormington.github.io/
Pronouns: he/him/his
Wife: Angela, married since 2018
Daughter: Phoebe, born Feb 2022
Introductions
Now, take a turn introducing another person in the class.
Course Details
Syllabus
- Learning Goals
- Community of Learners
- Course Components
- Communication
- Environment You Deserve
Learning Goals
Overall Learning Goal
Gain confidence in carrying out the entire data science pipeline,
- from research question formulation,
- to data collection/scraping,
- to wrangling,
- to modeling,
- to visualization,
- to presentation and communication
Learning Goals
Overall Learning Goal
Gain confidence in carrying out the entire data science pipeline,
Learning Goals
Overall Learning Goal
Gain confidence in carrying out the entire data science pipeline,
Learning Goals
By the end of the course, you’ll be able to:
- Appreciate the role of data science in a wide range of disciplines
- Identify, collect, and wrangle data from multiple sources
- Visualize a variety of types of data
- Find code online and adapt it to your given task
- Using iterative refinement and teamwork, take a data science project from concept to reality
- Communicate your results so that they’re reproducible and accessible for a broad audience
Course Components
Activities & Assignments
- In class activities (Notes + exercises) –> finish after class and turn in as assignments
- Opportunity to practice skills and dig deeper
Tidy Tuesday & Iterative Viz
- Regular visualization practice on new, real data
- Opportunities to iterate based on feedback
- Opportunity to engage with wider data science community
Course Components
Midterm Assessment (currently scheduled for Tues. March 14th)
- Part 1: Conceptual, 20 minutes. No notes.
- Part 2: Application, 40 minutes. Open notes and computer.
Final Project & Presentation
- Group data science project
- Opportunity to showcase skills and learn new things on a real data set
Communication
Slack Channels: class-wide messaging platform for content-related questions
- General channel: class-wide announcements
- Content-specific class-wide channels: data-viz, troubleshooting, etc.
- Section-specific channels: section-1
- Study-group channel (optional): class-wide channel to seek classmates to work together outside of class
Email or DM in Slack: for anything personal in nature (e.g. illness, feeling overwhelmed, feedback, etc.)
Environment You Deserve
Macalester College values diversity and inclusion.
I’m committed to creating a diverse, inclusive, and equitable learning environment. There will never be space in my classroom for racism, sexism, transphobia, homophobia, ableism, or any other fear or belief which attacks the humanity of my students.
MSCS Community Guidelines.
These guidelines were created by the MSCS faculty and staff in our ongoing efforts to create a community that is more welcoming, supportive, and inclusive.
Environment You Deserve
Respect: regard the feelings, wishes, experiences, and traditions of others as individuals
Empathy: try to sense and understand others’ emotions and feelings
Start with Curiosity: don’t assume; instead, ask a question
Supportive Community: you are not learning in isolation but rather, in a community ready to help and assist each other
Let’s Get Started
Go to https://jamesnormington.github.io/112_spring_2023/course-schedule.html
After Class
Continue working through the activity:
- Try each line of code (copy and paste) in the console and then check solutions on activity online
- Complete Practice Section at the end and turn that in on Moodle (Assignment 1) by next Wednesday @ 11:59pm