CS 2803 Data Manipulation for Computer Scientists
Instructor
- Christopher Simpkins, simpkins@cc.gatech.edu
*IMPORTANT:
- You must email me from your official Georgia Tech email address, that is, the email address listed for you in the official course roster in Canvas. I use your official email address to create email filters.
- Include context in your email – which class you’re in, etc.
- Include all of your IDs, e.g., 9-digit GTID, Buzzport login ID, official name. Every system at GT uses a different ID and if I can’t easily look up relevant information about your request because you didn’t give me enough identifying information, I may ignore your email.
- Please understand that professors are slow or unresponsive to email because we are drowning in email. If I don’t respond within 48 hours just send me a gentle reminder.
- Do not send me messages in Canvas.
Course Description
This course will provide background and experience in reading, manipulating, and exporting data for engineering, business and scientific applications. Specific topics include file I/O, string processing, web scraping, writing HTML and basic interfacing with SQL databases (reading / writing data in pre-existing tables). Students will learn to build programs controlled by basic graphical user interfaces. Assignments will be modeled after business, engineering, and scientific problems.
Learning Outcomes
Student in the class will achieve the following learning objectives:
(Competency) Students will be able to:
- Write programs using various data types, and using basic techniques such as assignment, method calls, while loops, for loops, and conditionals.
- Use and manipulate several language provided data structures such as: Lists, Dictionaries, and Strings.
- Read and write data to and from text files, both as plain text and in structured formats (such as CSV).
- Read a textual representation of numerical data and convert it to the appropriate (integer/floating point) data type.
- Load HTML pages with a program, and extract specific pieces of information from the HTML.
- Write a program that can generate a report in text or HTML format which includes elements under program control.
- Connect to existing SQL databases and insert and retrieve data from the database.
- Program interactive graphical user interfaces consisting of a graphically organized set of widgets, including a minimum of one from each of the following classes (Label, Button, Text Field).
- Implement simple business or mathematical algorithms (calculating interest payments, averaging a row of data, calculating standard deviation) into a program.
- Use compound data structures provided by the programming language such as lists, arrays, and dictionaries to hold sequences or sets of data, including two-dimensional (tabular) data.
- Use objects and associated methods provided by the programming language.
- Write programs that are easy to understand so that others may modify and improve them.
(Movement) Students will increase their:
- Familiarity with compound data structures (lists, arrays, dictionaries), including nested data structures (multi-dimensional arrays, etc…) and indexing into multi-dimensional data structures.
- Speed and accuracy in converting problem statements into programs.
- Understanding of and ability to quickly use basic program structures such as iteration, conditionals, and function calls due to repeated practice of these concepts.
- Understanding of the event driven programming model, specifically as applied to graphical user interfaces.
- Ability to break a medium sized problem down into smaller parts and solve each sub-problem individually.
- Ability to test and debug programs.
(Experience) Students will:
- Practice the process of constructing moderately sized (100-300 line) programs from written requirements.
- Deal with data that may include missing elements or malformed representations.
- Work in pairs to solve programming problems.
Requirements
Grading
- Homework: 20%
- Quizzes: 10%
- Exams: 50%
- Final Exam: 20%
Grade Cutoffs: A: 90, B: 80, C: 70, D: 60. No rounding.
Assignments
Two or three in-class written midterm exams, a final exam, short in-class quizzes, and 7-12 homework assignments. Your last homework assignment may be due the week preceding final exams. Assignments must be turned in before the date and time indicated as the assignment’s due date.
Class Participation
In-class exercises cannot be made up if you do not attend the class. It’s a violation of the Academic Honor Code to submit work or sign in for other students.
Academic Integrity and Collaboration
We expect academic honor and integrity from students. Please study and follow the academic honor code of Georgia Tech: http://www.honor.gatech.edu/content/2/the-honor-code. You may collaborate on homework assignments, but your submissions must be your own. You may not collaborate on in-class programming quizzes or exams.
Due Dates, Late Work, and Missed Work
-
Assignments are due on the day and time listed on Canvas. Multiple resubmissions are allowed, so submit early and often so you aren’t in a rush on the due date. Absolutely no late submissions will be accepted. There is no grace period, so submit your assignments well before the deadline.
-
Make-up exams are held at 11:00 on the Tuesday following the exam, unless otherwise announced. If the make-up exam room is not announced before the make-up day, report to the TA lab. Make-up exams are only given to students with special circumstances such as serious illness, hospitalization, death in the family, judicial procedures, military service, or official school functions. Provide us with a copy of your letter from the registrar in advance for official school functions. For other excused absences you must provide documentation to the Dean of Students’s office (in the “flag” building near the ice cream cone statue) within one week of your return from illness/activity. The Dean of Students’s office will verifiy your excuse and send your instructors a notice. The Dean of Student’s office will also send instructors a request for flexibility in cases which don’t fall within the official excuesed absences listed above but warrant considertation. An any case, if you believe you should be excused from a scheduled exam and don’t have an excuse from the Registrar, see someone in the Dean of Students’s office. Excusal from coursework or make-up opportunities are granted at the sole discretion of your instructor.
Regrades
To contest any grade you must submit an official regrade form to the Head TA within one week of the assignment’s original return date. The original return date is the date the exam was first made available for students to pick up or the grade was posted online in the case of homework assignments and programming quizzes. Note that a regrade means just that – we will regrade your assignment from scratch, which means you may end up with a lower score after the regrade.
Course Outline
This outline applies to Fall and Spring semesters. Summer schedule is compressed into 11 instructional weeks.
- Weeks 1 - 5: Programming in Python
- Weeks 6 - 10: Data Formats, Retrieval, and Storage
- Weeks 12 - 15: Data Analytics, Machine Learning and Big Data
Prerequisites
At least one of:
- Undergraduate Semester level CS 1301 Minimum Grade of C
- Undergraduate Semester level CS 1315 Minimum Grade of C
- Undergraduate Semester level CS 1321 Minimum Grade of C
- Undergraduate Semester level CS 1371 Minimum Grade of C
Course Materials
Note: O’Reilly books listed below are available through Georgia Tech’s Safari Onine subscription. See http://www.library.gatech.edu/search/ebooks.php
-
Required Text: Introducing Python, by Bill Lubanovic, O’Reilly Media, November 2014.
- Print ISBN: 978-1-4493-5936-2, ISBN 10: 1-4493-5936-1
- Ebook ISBN: 978-1-4493-5935-5, ISBN 10: 1-4493-5935-3
-
Recommended Books:
- Think Python, 2nd Edition, by Allen B. Downey, O’Reilly Media, December 2015. Available free at http://greenteapress.com/wp/think-python-2e/ and from O’Reilly at http://shop.oreilly.com/product/0636920045267.do
- Python in a Nutshell: http://shop.oreilly.com/product/0636920012610.do
- Fluent Python (Advanced): http://shop.oreilly.com/product/0636920032519.do
- Flask Web Development: http://shop.oreilly.com/product/0636920031116.do
- Tkinter GUI Application Development Blueprints: http://shop.oreilly.com/product/9781785889738.do
- Python Data Science Handbook: http://shop.oreilly.com/product/0636920034919.do
- Programming in Python 3 (2nd edition) : Mark Summerfield - Addison Wesley, ISBN: 0-321-68056-1
- Dive into Python 3 – Mark Pilgrim – Apress ISBN: 978-1430224150
Non-Discrimination
The Institute does not discriminate against individuals on the basis of race, color, religion, sex, national origin, age, disability, sexual orientation, gender identity, or veteran status in the administration of admissions policies, educational policies, employment policies, or any other Institute governed programs and activities. The Institute’s equal opportunity and non-discrimination policy applies to every member of the Institute community.
For more details see http://www.policylibrary.gatech.edu/policy-nondiscrimination-and-affirmative-action