Syllabus

Welcome to Data Science Programming II! In this course, we will learn object-oriented programming to create tree and graph data structures to represent hierarchical data and implement algorithms for efficiently searching these structures.

We'll often create our own datasets, using techniques like logging, benchmarking, web scraping, and A/B testing.

In the last third of the semester we'll explore some basic machine learning techniques, including regression, classification, clustering, and decomposition.

Additions To Syllabus Made During Semester

none yet

Course Instructor

Yiyin Shen (graduate student - Department of Computer Sciences) yshen82@wisc.edu

Lecture

Time: MTWR 10:00 AM - 10:50 AM Jun 5 - Aug 13
Zoom Meetings: Canvas - Zoom - Upcoming Meetings - CS320 SU23 Lectures

Lecture recordings will be provided through the schedule page or Canvas - Zoom - Cloud Recordings

Lab

Time: TR 11:00 AM - 11:50 AM Jun 5 - Aug 13
Zoom Meetings: Canvas - Zoom - Upcoming Meetings - CS320 SU23 Labs

We'll post a weekly lab activities document. During the lab, you will be assigned to a breakout room with your assigned study group, and you can work on the lab activities individually or with your study group. TAs and PMs will circulate breakout rooms to answer questions and check your progress in finishing the lab activities.

If you have extra time at the lab after completing the lab document, you can work with your assigned study group on projects or quizzes. You cannot work on quizzes unless you have finished all lab activities and the current on-going project. There are five attendance points for each lab:

2 for joining the breakout room no later than 5 minutes after the lab begins
3 for efficient progress on finishing the lab activities: At the end of each lab, you need to submit screenshots of the work (code and/or running results) you have done so far to Canvas. You don't have to finish every lab activity, but sufficient working progress is needed.

Communication

We message the class regularly via Canvas announcements. We recommend updating your Canvas settings so that the "Announcement" option is "Notify immediately" so that you don't miss something important.

See the help page for details about how to contact us.

We have various forms for us to leave (optionally anonymous) feedback, submit regrade request, and thank TAs/mentors.

Grading

Grading breakdown

49% - 7 projects (7% each, no drops)
7% - online quizzes (1% each, 1 drop out of 8 quizzes)
11% - midterm 1
11% - midterm 2
15% - final
4% - lab attendance (6 drops out of 18 labs)
2% - lecture attendance (14 drops out of 38 lectures)
1% - class surveys

Letter Grades

At the end of the semester, we will assign final grades based on these thresholds:

93% - 100%: A
88% - 92.99%: AB
80% - 87.99%: B
75% - 79.99%: BC
70% - 74.99%: C
60% - 69.99% D

Letter grade ranges include decimal points, meaning we will NOT be rounding up scores at the end of the semester. No extra credit is given in this course.

Graded Component Details

Projects

Submission: Everybody will individually upload either a .py file, a .ipynb, or a zip (as specified) file for each project with the submission tool. Every project has a regular deadline and a hard deadline. Hard deadline will always be 4-days after the regular deadline. Exception: p7's hard deadline will be the same as the regular deadline.

Collaboration: Even though everybody will make their individual submission, every project will have (1) a group part to be optionally done with your assigned study group and (2) an individual part. For the group part, any form of help from anybody on your group is allowed (even looking at each other's code); I recommend you find times for everybody on the group to work at the same time so you can help each other through coding difficulties in this part. You're also welcome to do the "group" part individually, or with a subset of your assigned study group. For the individual part, you may only receive help from course staff (instructor/TAs/mentors); you may not discuss this part with anybody else (in the class or otherwise) or get help from them.

Late Policy:

Students have a bank of 8 late days for the semester.
For a given project, you may use 2 late days without any deduction. After that, 10% deduction per late day, for the next 2 days. Projects which are late by more than 4 days will not be accepted.
For a given project, you cannot use more than 2 late days, as deduction kicks in on the 3rd day.
After the bank runs out, 10% deduction will be applied per late day.
You may not use late days on the last project.
Late days only apply to projects. They do not apply to Quizzes.
Late days are automatically applied and do not need to be requested.
Late days are calculated as whole days. That is, even if your project is late by 2 hours, that counts as 1 whole late day.
For calculating late days, we will always consider your last possible submission. We will not be accepting requests to grade a prior submission for the same project.

Code Review: A TA will give you detailed comments on specific parts of your assignment. This feedback process is called a "code review", and is a common requirement in industry before a programmer is allowed to add her code changes to the main codebase. Read your code reviews carefully; even if you receive 100% on your work, we'll often give you tips to save effort in the future.

Project Grading: Grades will be largely based on automatic tests that we run. We'll share the tests with you before the due date, so you should rarely be too surprised by your grade. Though it shouldn't be common, we may deduct points for serious hardcoding, not following directions, or other issues. Some bugs (called non-deterministic bugs) don't show up every time code is run -- if you have such an issues, we may give you a different grade based on the tester than what you were expecting based on when you ran it. Finally, our tests aren't very good at evaluating whether plots and other visualizations look how they should (a human usually needs to evaluate that).

Auto-grader: The autograder will run hourly after the release of a project. Because of this, we expect you to try submitting your project early and make sure nothing crashes. However, this should not be a substitute for running tester.py locally. You should only try submitting once you pass the tests locally.

Clearing the auto-grader is a mandatory part of the project submission process.
If your project fails auto-grader, it will be your responsibility to utilize office hours and make an appropriate resubmission. The resubmission will also be counted towards late day usage.
Regular project deadlines will be applicable for autograder failures as well. That is, your project submission must clear auto-grader within the hard deadline for a project. If not, we are unable to grade your project submission.

Allowed Packages: anything that comes pre-installed with Python may be used. Additionally, you may install and use the following if they're useful: jupyter, pandas, numpy, matplotlib, requests, beautifulsoup4, statistics, recordclass, sklearn, haversine, gitpython, graphviz, pylint, lxml, flask, bs4, html5lib, geopandas, shapely, descartes, click, netaddr, torch==1.4.0+cpu, torch vision=0.5.0+cpu. Using unapproved packages may result in a score of zero when submitted for grading because the autograder won't be able to run your code without those packages.

Quizzes

There will be a short Canvas quiz due at the end of most Thursdays. Make sure you know the rules regarding what is allowed and what is not.

Allowed

however much time you need
discussing answers with members of your assigned study group who are taking the quiz at the same time
referencing texts, notes, or provided course materials
searching online for general information
running code

NOT allowed

taking it more than once
discussing answers with anybody outside of your group
discussing with members of your group who have already completed the quiz when you haven't completed it yourself yet
posting anything online about the quizzes
using such material potentially posted by other 320 students who broke the preceding rule

Midterms and Final

These will be multiple choice exams taken through Canvas - Quizzes with HonorLock.

Midterm 1: Friday, June 30th, 7:00PM - 8:30PM
Midterm 2: Friday, July 21st, 7:00PM - 8:30PM
Final: Thursday, August 10th, 10:00AM - 12:30PM

Readings

We'll sometimes assign readings from the following sources (all free):

Think Python 2nd Edition by Allen B. Downey: Read Online
Automate the Boring Stuff with Python by Al Sweigart: Read Online
Principles and Techniques of Data Science by Sam Lau, Joey Gonzalez, and Deb Nolan: Read Online
Scipy Lecture Notes by many contributors: Read Online

Cheating

Yeah, of course you shouldn't cheat, but what is cheating? The most common form of academic misconduct in these classes involves copying/sharing code for programming projects. Here's an overview of what you can and cannot do:

Acceptable

any collaboration with your assigned study group members on the group part of a project
doing worksheets with friends
copying code examples from online examples that is NOT specific to your project (if project solutions are leaked online, you may not use that). If you copy code, you must cite it in your code with a comment (think of it like citing a quote in a essay -- without the citation, you're plagarizing).
using ChatGPT to ask simple questions. For example: how do I use "self" inside a class constructor?

NOT Acceptable

using ChatGPT to solve project questions in entirety - please note that this will lead to your work getting detected by the plagiarism detector
getting project help of any kind for the group part from anybody who is not either (a) on your assigned study group or (b) 320 staff
getting project help of any kind for the individual part from anybody who is not 320 staff
using part or all of project solutions found online
breaking any of the rules listed under the "Quizzes" section
reporting lab attendance for yourself or someone else who didn't actually attend (dropping in for a few minutes is not "attending")
counting lab attendance as merely showing up without spending substantial time on the assigned lab activities
using TopHat while not actually physically present in the room (since we sometimes use this for attendance)
helping somebody else cheat

Citing Code: you can copy small snippets of code from stackoverflow (and other online references) if you cite them. For example, suppose I need to write some code that gets the median number from a list of numbers. I might search for "how to get the median of a list in python" and find a solution at https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python.

I could (legitimately) post code from that page in my code, as long as it has a comment as follows:

    # copied/adapted from https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python
    def median(lst):
      sortedLst = sorted(lst)
      lstLen = len(lst)
      index = (lstLen - 1) // 2

      if (lstLen % 2):
        return sortedLst[index]
      else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0

In contrast, copying from a nearly complete project (that accomplishes what you're trying to do for your project) is not OK. When in doubt, ask us! The best way to stay out of trouble is to be completely transparent about what you're doing.

Similarity Detection: of course, with about 400+ students, it's hard for a human TA to notice similar code across two submissions. Thus, we use automated tools to looks for similarities across submissions. Such similarity detection is an active area of computer science research, and the result is tools that detect code copying even when students methodically rename all variables and shuffle the order of their code. We take cheating detection seriously to make the course fair to students who put in the honest effort.