Welcome to Data Programming II! We'll be building on what you learned in Data Programming I. In that course, we mostly worried about getting the code correct -- in the first part of 320, we'll start thinking about how to make it efficient. Next, we'll be learning more about the Internet (among other things, you'll build a simple website for distributing a dataset) and advanced visualizations (like maps and animations). We'll conclude the course with a light introduction to machine learning with the sklearn package.
Revisions to Syllabus
- Sep 28: see new "Project Grading and Resubmit Policy" section
Project Grading and Resubmit Policy (added Sep 28)
The autograder will be run periodically during 2 days days prior to a project deadline (from Monday night if the deadline is on Wednesday and so on). Because of this, we expect you to try submitting your project early, and make sure nothing crashes. However, this should not be a substitute for running test.py locally. You should only try submitting once you pass the tests locally. The responsibility of making sure your code does not crash is therefore placed on the student.
Unexpected Crash: If your code crashed/received significantly less than expected and you had tried submitting it (and it ran successfully) before the deadline, you will be able to resubmit up for the next two days (until Friday if the deadline is on Wednesday) at midnight without being penalized. In order for this to not be counted as late, you will have to coordinate with your TA. Please mail your group TA.
Other resubmit: If after the deadline you are not satisfied with the grade the autograder gives you, you may resubmit afterwards but this will eat into your late days.
Note that TAs will start grading submissions usually by the weekend after the deadline. Therefore any submissions that are not submitted by the next two days as mentioned above might not get feedback. Special Cases: If there were extenuating circumstances (e.g., documented illness) that prevented you from completing your work on time, and you have insufficient late days remaining, please speak with your instructor (Tyler). In general, if you're struggling to keep up with the course for any reason, please set up an appointment with the instructor (Tyler) to discuss how you can get back on track.
Please use standard packages in python for the assignment.
The following packages are allowed for use in all the projects: jupyter, pandas, numpy, matplotlib, requests, beautifulsoup4, statistics, recordclass, sklearn, haversine, gitpython, graphviz, pylint, lxml, flask, bs4, html5lib, geopandas, shapely, descartes, click, netaddr, torch==1.4.0+cpu, torch vision=0.5.0+cpu.
Using specialized packages will result in a score of zero when submitted for grading. Please contact TA if you are not sure how to code something without specialized packages or for the possibility to use for specialised packages.
About Being Online...
A major advantage of in-person classes is that it's easy to meet your peers and learn from them. So one of my priorities for this semester is to make sure you get to know at least a few of your fellow students well. To facilitate this, we'll be randomly placing all of you in groups of 4-7. Policies will strongly incentavize collaboration within your group, though all work may still be done individually for those who prefer. Each group will meet together (virtually) at a regular time weekly to discuss course content and projects, starting in week 3. A peer mentor (or sometimes a TA) will join these meetings to help answer tough questions you may have.
Being online can also have advantages, like scheduling flexibility and the ability to self pace. Aside from the weekly group meeting, everything is asyncronous: lectures, labs, quizzes, and even the final. As long as you keep ahead of the deadlines, you can decide when you spend time on these things.
I encourage you to take control of the pacing of lectures. Say I'm doing a coding demo in the lecture recording, and you've gotten a little lost. You should pause me and give it some thought -- or better, type up the code yourself and play with it until it makes sense! Print more and think less.
Getting HelpThere are a few ways to get help:
- Check for "walk"-in office hours (instructor, TA, and mentor): go to "Resources" menu and click "Office Hours"
- Email your group's main TA (they'll reach out to you at the beginning of the semester)
- Email me at firstname.lastname@example.org to schedule an appointment outside of office hour time
Besides email (described above), there are four ways we'll communicate outside of class.
1. Piazza: You can ask questions (and see the other questions) here. Do not post code snippets that are >5 lines long, that's considered cheating.
2. Canvas: We'll make announcements on Canvas and periodically upload grades there (detailed feedback will only be on this site, however)
3. Class Forms: We have various forms for us to leave (optionally anonymous) feedback and report exam conflicts.
4. Code Review: You will upload projects using this tool. Via the same tool, TAs will leave comments on your code. Even projects scoring 100% often have a lot of room for improvement, so please take these seriously. When submitting, you can ask for specific kinds of feedback, based on what coding skills you're most interested in developing.
"Lab" consists of at-home exercises to help you understand topics and prepare for projects.
You can decide when to do labs each week, but I recommend finding a time when (a) other members of your group are also doing it, and (b) somebody is offering office hours. Then, if you get stuck on something, you'll be able to get unstuck right away.
- 56% - 7 projects (8% each)
- 28% - 14 quizzes (2% each)
- 10% - final
- 6% - participation
At the end, you'll have a score out of 100, and I'll set a curve. This is the first time I'm trying this format for the course, so I can't give a good estimate for what the cutoffs will be for various letter grades yet.
Submission: Everybody will individually upload either a .py file or a .ipynb (as specified) file for each project with the submission tool.
Collaboration: Even though everybody will make their individual submission, every project will have (1) a group part and (2) an individual part. For the group part, any form of help from anybody on your group is allowed (even looking at each other's code); I recommend you find times for everybody on the group to work at the same time so you can help each other through coding difficulties in this part. For the individual part, you may only receive help from course staff (instructor/TAs/mentors); you may not discuss this part with anybody else (in the class or otherwise) or get help from them.
Late Policy: You will have 7 late days, which you can use across projects at your own discretion. You may use all your late days on the same project, if you like. Using late days on a project does not defer the deadline for subsequent projects, so be careful not to let work pile up. You may not use late days on the last project. Late days are automatically applied if a project is turned in late. After late days are exhausted, anything late will receive zero credit by default. Please talk to me if you're falling this far behind.
Code Review: A TA will give you detailed comments on specific parts of your assignment. This feedback process is called a "code review", and is a common requirement in industry before a programmer is allowed to add her code changes to the main codebase. Read your code reviews carefully; even if you receive 100% on your work, we'll often give you tips to save effort in the future.
Project Grading: Grades will be largely based on automatic tests that we run. We'll share the tests with you before the due date, so you should rarely be too surprised by your grade. Though it shouldn't be common, we may deduct points for serious hardcoding, not following directions, or other issues. Some bugs (called non-deterministic bugs) don't show up every time code is run -- if you have such an issues, we may give you a different grade based on the tester than what you were expecting based on when you ran it. Finally, our tests aren't very good at evaluating whether plots and other visualizations look how they should (a human usually needs to evaluate that).
There will be a short quiz at the end of each week, on Canvas. I recommend taking it soon after you've studied the week's content, but the deadline is generally Monday of the following week at 11:59pm (unless Monday is a holiday). Make sure you know the rules regarding what is allowed and what is not.
- however much time you need
- discussing answers with members of your assigned group who are taking the quiz at the same time
- referencing texts, notes, or provided course materials
- searching online for general information
- taking it more than once
- discussing answers with anybody outside of your group
- discussing with members of your group who have already completed the quiz when you haven't completed it yourself yet
- posting anything online about the quizzes
- using such material potentially posted by other 320 students who broke the preceding rule
I'll post more details when this gets closer. But basically, this will be an open-ended mini project on a topic of your choosing. It will be due at the final exam time posted in the student center.
Some of the things that could count towards participation:
- filling class surveys
- attending weekly group meetings and bringing good questions
- leaving good comments in posted discussions
- completing relevant labs before projects
We'll assign readings from three main sources this semester (all free). Stay on top of them!
- Think Python 2nd Edition by Allen B. Downey: Read Online
- Automate the Boring Stuff with Python by Al Sweigart: Read Online
- Principles and Techniques of Data Science by Sam Lau, Joey Gonzalez, and Deb Nolan: Read Online
- Scipy Lecture Notes by many contributors: Read Online
As a student, you may experience a range of issues that can be barriers to learning. These might include isolation related to COVID-19, strained relationships, anxiety, high levels of stress, alcohol/drug problems, feeling down, loss of loved one, and/or loss of motivation. Services exist on campus to support students who find themselves in these situations, like University Health Services and the Dean of Students Office. You can learn more about free, confidential mental health services at UHS by calling 608-265-5600 Opt. 2 or visiting uhs.wisc.edu. Drop-in staff are available daily at the Dean of Students Office to support students and answer questions. To learn more about the Dean of Students Office, please call 608-263-5700 or visit doso.students.wisc.edu.
In general, if you have any issues keeping up with the course for any reason, it's better to let me know -- I'll do my best to connect you with resources and try to think of accomodations to help out. I want to do whatever I can to help you succeed.
Yeah, of course you shouldn't cheat, but what is cheating? The most common form of academic misconduct in these classes involves copying/sharing code for programming projects. Here's an overview of what you can and cannot do:
- any collaboration with your group members on the group part of a project
- doing worksheets with friends
- copying code examples from online examples that is NOT specific to your project (if project solutions are leaked online, you may not use that). If you copy code, you must cite it in your code with a comment (think of it like citing a quote in a essay -- without the cite, you're plagarizing).
- any help from other students on the final (with citation)
- getting project help of any kind for the group part from anybody who is not either (a) on your assigned group or (b) 320 staff
- getting project help of any kind for the individual part from anybody who is not 320 staff
- using part or all of project solutions found online
- breaking any of the rules listed under the "Quizzes" section
- helping somebody else cheat
Similarity Detection: of course, with >100 students, it's hard for a human TA to notice similar code across two submissions. Thus, we use automated tools to looks for similarities across submissions. Such similarity detection is an active area of computer science research, and the result is tools that detect code copying even when students methodically rename all variables and shuffle the order of their code. We take cheating detection seriously to make the course fair to students who put in the honest effort.
Citing Code: you can copy small snippets of code from stackoverflow (and other online references) if you cite them. For example, suppose I need to write some code that gets the median number from a list of numbers. I might search for "how to get the median of a list in python" and find a solution at https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python.
I could (legitimately) post code from that page in my code, as long as it has a comment as follows:
# copied from https://stackoverflow.com/questions/24101524/finding-median-of-list-in-python def median(lst): sortedLst = sorted(lst) lstLen = len(lst) index = (lstLen - 1) // 2 if (lstLen % 2): return sortedLst[index] else: return (sortedLst[index] + sortedLst[index + 1])/2.0
In contrast, copying from a nearly complete project (that accomplishes what you're trying to do for your project) is not OK. When in doubt, ask us! The best way to stay out of trouble is to be completely transparent about what you're doing.
Earning a recommendation letter is much harder than earning an A in this course. At a minimum, I'll want to see you doing something complex and interesting beyond the assingments. For a typical letter, I'll have collaborated with a student on some project for multiple months, with many iterations of feedback.
Most grad schools require recommenders to fill long forms rating students on various abilities (see an example below). Make sure that if you're asking me, I would be able to fill such a form without needing to put "I don't know" as my answer to many of the questions.