This is a part-time course.
Skills & Tools: Use Python to mine datasets and predict patterns.
Production Standard: Build statistical models--regression and classification--that generate usable information from raw data.
The Big Picture: Master the basics of machine learning and harness the power of data to forecast what's next.
What You'll Learn:Unit 1: Research Design and Exploratory Data Analysis What is Data Science - Describe course syllabus and establish the classroom environment
- Answer the questions: "What is Data Science? What roles exist in Data Science?"
- Define the workflow, tools and approaches data scientists use to analyze data
Research Design and Pandas - Define a problem and identify appropriate data sets using the data science workflow
- Walkthrough the data science workflow using a case study in the Pandas library
- Import, format and clean data using the Pandas Library
Statistics Fundamental I - Use NumPy and Pandas libraries to analyze datasets using basic summary statistics: mean, median, mode, max, min, quartile, inter-quartile, range, variance, standard deviation and correlation
- Create data visualization – scatter plots, scatter matrix, line graph, box blots, and histograms – to discern characteristics and trends in a dataset
- Identify a normal distribution within a dataset using summary statistics and visualization
Statistics Fundamental II - Explain the difference between causation vs. correlation
- Test a hypothesis within a sample case study
- Validate your findings using statistical analysis (p-values, confidence intervals)
Instructor Choice - Focus on a topic selected by the instructor/class in order to provide deeper insight into exploratory data analysis
Unit 2: Foundations of Data Modeling
Introduction to Regression - Define data modeling and linear regression
- Differentiate between categorical and continuous variables
- Build a linear regression model using a dataset that meets the linearity assumption using the scikit-learn library
Evaluating Model Fit - Define regularization, bias, and errors metrics;
- Evaluate model fit by using loss functions including mean absolute error, mean squared error, root mean squared error
- Select regression methods based on fit and complexity
Introduction to Classification - Define a classification model
- Build a K–Nearest Neighbors using the scikit–learn library
- Evaluate and tune model by using metrics such as classification accuracy ⁄ error
Introduction to Logistic Regression - Build a Logistic regression classification model using the scikit learn library
- Describe the sigmoid function, odds, and odds ratios and how they relate to logistic regression
- Evaluate a model using metrics such as classification accuracy ⁄ error, confusion matrix, ROC ⁄ AOC curves, and loss functions
Communicate Results from Logistic Regression - Explain the tradeoff between the precision and recall of a model and articulate the cost of false positives vs. false negatives.
- Identify the components of a concise, convincing report and how they relate to specific audiences ⁄ stakeholders
- Describe the difference between visualization for presentations vs. exploratory data analysis
Flexible Class Session - Focus on a topic selected by the instructor ⁄ class in order to provide deeper insight into data modeling
Unit 3: Data Science in the Real World Decision Trees and Random Forest - Describe the difference between classification and regression trees and how to interpret these models
- Explain and communicate the tradeoffs of decision trees vs regression models
- Build decision trees and random forests using the scikit-learn library
Natural Language Processing - Demonstrate how to tokenize natural language text using NLTK
- Categorize and tag unstructured text data
- Explain how to build a text classification model using NLTK
Dimensionality Reduction - Explain how to perform a dimensional reduction using topic models
- Demonstrate how to refine data using latent dirichlet allocation (LDA)
- Extract information from a sample text dataset
Working with Time Series Data - Explain why time series data is different than other data and how to account for it
- Create rolling means and plot time series data using the Pandas library
- Perform autocorrelation on time series data
Creating Models with Time Series Data - Decompose time series data into trend and residual components
- Validate and cross-validate data from different data sets
- Use the ARIMA model to forecast and detect trends in time series data
The Value of Databases - Describe the use cases for different types of databases
- Explain differences between relational databases and document-based databases
- Write simple select queries to pull data from a database and use within Pandas
Moving Forward with your Data Science Career - Specify common models used within different industries
- Identify the use cases for common models
- Discuss next steps and additional resources for data science learning
Flexible Class Session - Focus on a topic selected by the instructor⁄class in order to provide deeper insight into data science in the real world
Final Presentations - Present final presentation to peers, instructor, and guest panelists who will identify strengths and areas for improvement
School Notes:
For students enrolling in 12 week part time and immersive classes, it is not recommended that you book more than one class simultaneously.