A common task in data management systems is receiving data from an external system in XML, JSON, or CSV files and storing it in a relational database. Python includes libraries for easily reading these files and interacting with databases. We store data in databases for record keeping and so that we can answer questions about the data.
In this assignment you will
write a Python program that parses input data files and inserts the data from them into a database, and
create a SQL script with queries to answer questions about data in the database.
You’re starting a new job as a junior data scientist at the Centers for Disease Control (CDC). In this role you need to manage and analyze data about doctors and patient care. You receive doctor data from outside sources in the form of XML, CSV and JSON files which you need to insert into your database, and you use the database to answer questions about the data.
Write a Python script to import data from files and insert the data into the database. Create your database with this database schema script: doctors-schema.sql. You should read the database schema script to understand the database.
Write a Python script named import_doctors.py
that takes five command line arguments:
""
as command line argument),Your script should insert information from the files above into the appropriate tables in the database with appropriate key and foreign key values.
Write a SQL script, doctors-queries.sql
, that includes queries to answer the following questions:
What are the first names and last names of the patients who have cardiologists for primary care providers (PCP)?
What are the first names and last names of the patients who saw their doctor (PCP) in May 2010?
OPTIONAL BONUS (5 points): What are the first name and last name of the doctor who has the most patients (not the most visits)?
OPTIONAL BONUS (5 points): What are the first names and last names of the doctors who have no patients (not visits)?
Your doctors-queries.sql
should contain only the SELECT
queries requested above.
conda install pymysql
.AUTO_INCREMENT
primary keys, meaning you don’t have to supply a key value when you insert new rows into those tables. You’ll need those ids after inserting the rows, which you can get with cursor.lastrowid
. See the course slides for the Python DB-API or the Python DB-API docs.Submit your import_doctors.py
, doctors-queries.sql
to the assignment on Canvas as attachments.
Practice safe submission! Verify that your HW files were truly submitted correctly, the upload was successful, and that your program runs with no syntax or runtime errors. It is solely your responsibility to turn in your homework and practice this safe submission safeguard.
This procedure helps guard against a few things.