Carnegie Mellon University Department of Computer Science /615 - Database Applications C. Faloutsos & A. Pavlo, Fall 2016

Similar documents
b. states.xml Listing of states and their postal abbreviations.

Homework Assignment 2. Due Date: October 17th, CS425 - Database Organization Results

Homework Assignment 4 Root Finding Algorithms The Maximum Velocity of a Car Due: Friday, February 19, 2010 at 12noon

Data Structure and Algorithm Homework #1 Due: 2:20pm, Tuesday, March 12, 2013 TA === Homework submission instructions ===

COMS F18 Homework 3 (due October 29, 2018)

b. hurricanescale.xml - Information on the Saffir-Simpson Hurricane Scale [3].

HOMEWORK 4: SVMS AND KERNELS

10. The GNFA method is used to show that

Machine Learning: Assignment 1

Homework 6: Image Completion using Mixture of Bernoullis

Lab Exercise 6 CS 2334

Appendix 4 Weather. Weather Providers

Computer Algorithms in Systems Engineering Spring 2010 Problem Set 8: Nonlinear optimization: Bus system design Due: 12 noon, Wednesday, May 12, 2010

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Homework Assignment 2 Modeling a Drivetrain Model Accuracy Due: Friday, September 16, 2005

Homework 2 (by Prashasthi Prabhakar) Solutions Due: Wednesday Sep 20, 11:59pm

OpenWeatherMap Module

Machine Learning: Homework 5

Unit 7 Trigonometry Project Name Key Project Information

HIRES 2017 Syllabus. Instructors:

CSE 355 Test 2, Fall 2016

CS246: Mining Massive Data Sets Winter Only one late period is allowed for this homework (11:59pm 2/14). General Instructions

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Homework 2 (by Prashasthi Prabhakar) Due: Wednesday Sep 20, 11:59pm

AS The Astronomical Universe. Prof. Merav Opher - Fall 2013

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1

COMPOUND INEQUALITIES

CSCE 155N Fall Homework Assignment 2: Stress-Strain Curve. Assigned: September 11, 2012 Due: October 02, 2012

B page 1(7) ENTRANCE EXAMINATION FOR THE TECHNICAL CURRICULUM

CSE 241 Class 1. Jeremy Buhler. August 24,

ORF 363/COS 323 Final Exam, Fall 2016

Mars Attacked! 2011 ICTSA Programming Challenge

Problem Set 4. General Instructions

9.1 HW (3+3+8 points) (Knapsack decision problem NP-complete)

Modeling the Motion of a Projectile in Air

Mathematics (Project Maths Phase 3)

CS103 Handout 08 Spring 2012 April 20, 2012 Problem Set 3

GEOLOGY 100 Planet Earth Spring Semester, 2007

Best weekend greetings to CoCoRaHS participants, friends, and family!

A Note on Turing Machine Design

Prelab 7: Sunspots and Solar Rotation

Project Instructions Please Read Thoroughly!

Written Exam Linear and Integer Programming (DM554)

HONORS CHEMISTRY Bonding Unit Summative Assessment

Written Exam Linear and Integer Programming (DM545)

Announcements. Problem Set 6 due next Monday, February 25, at 12:50PM. Midterm graded, will be returned at end of lecture.

CHEMICAL INVENTORY ENTRY GUIDE

Jr. Meteorologist Club

June If you want, you may scan your assignment and convert it to a.pdf file and it to me.

Forecast.io Driver. Installation and Usage Guide. Revision: 1.0 Date: Tuesday, May 20, 2014 Authors: Alan Chow

Searching Algorithms. CSE21 Winter 2017, Day 3 (B00), Day 2 (A00) January 13, 2017

MAT 211, Spring 2015, Introduction to Linear Algebra.

Algorithm Design and Analysis Homework #5 Due: 1pm, Monday, December 26, === Homework submission instructions ===

WeatherHawk Weather Station Protocol

1 Linear and Absolute Value Equations

Exam 1. March 12th, CS525 - Midterm Exam Solutions

Designing Information Devices and Systems I Spring 2017 Babak Ayazifar, Vladimir Stojanovic Homework 2

ON SITE SYSTEMS Chemical Safety Assistant

Geographical Databases: PostGIS. Introduction. Creating a new database. References

CSEP 521 Applied Algorithms. Richard Anderson Winter 2013 Lecture 1

Vector Analysis: Farm Land Suitability Analysis in Groton, MA

Calculator Exam 2009 University of Houston Math Contest. Name: School: There is no penalty for guessing.

Core Curriculum Supplement

HONORS CHEMISTRY Bonding Unit Summative Assessment

A100H Exploring the Universe: Introduction. Martin D. Weinberg UMass Astronomy

H + O. Warm-Up What is wrong with this Chemical equation? You must use all of the following words and write in complete sentences:

Welcome to Physics 211! General Physics I

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101

Required Textbook. Grade Determined by

Flight Utilities Metar Reader version 3.1 by Umberto Degli Esposti

CSCI3390-Assignment 2 Solutions

Advanced Introduction to Machine Learning

Wednesday, 10 September 2008

Outline. Wednesday, 10 September Schedule. Welcome to MA211. MA211 : Calculus, Part 1 Lecture 2: Sets and Functions

Sun s rotation period. Teacher s Guide - Intermediate Level CESAR s Science Case

Physics 2BL: Experiments in Mechanics and Electricity Summer Session I, 2012

Homework 4, Part B: Structured perceptron

SIXTH GRADE WORKBOOK

Carnegie Mellon Univ. Dept. of Computer Science Database Applications. SAMs - Detailed outline. Spatial Access Methods - problem

Checkpoint Questions Due Monday, October 1 at 2:15 PM Remaining Questions Due Friday, October 5 at 2:15 PM

Introduction to Machine Learning

AS 101: The Solar System (Spring 2017) Course Syllabus

PH300 Spring Homework 08

7.4 DO (uniqueness of minimum-cost spanning tree) Prove: if all edge weights are distinct then the minimum-cost spanning tree is unique.

Mathematical Induction

GE 226 Introduction to Experimental Labs. Stan Shadick, Lab Co-ordinator

ISP 205: Visions of the Universe. Your Professor. Assignments. Course Resources

University of New Mexico Department of Computer Science. Final Examination. CS 362 Data Structures and Algorithms Spring, 2007

Measurements of a Table

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

MATH 1241 Common Final Exam Fall 2010

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Johns Hopkins Math Tournament Proof Round: Automata

Examiners Report/ Principal Examiner Feedback. January GCE Mechanics M1 (6677) Paper 1

MATH 3330 INFORMATION SHEET FOR TEST 3 SPRING Test 3 will be in PKH 113 in class time, Tues April 21

CS261: Problem Set #3

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Designing Information Devices and Systems I Fall 2017 Homework 3

Final Exam. 1 True or False (15 Points)

Displaying Latitude & Longitude Data (XY Data) in ArcGIS

Transcription:

Carnegie Mellon University Department of Computer Science 15-415/615 - Database Applications C. Faloutsos & A. Pavlo, Fall 2016 Homework 2 (by Lu Zhang) - Solutions Due: hard copy, in class at 3:00pm, on Monday, Sep. 26 Due: tarball, BlackBoard at 3:00pm, on Monday, Sep. 26 Grading Info: 1. Any sql queries which cannot run in Postgres (because of syntax error for example) will get 0 score. 2. Contact corresponding TA according your last name, if request a regrading. KANG : A - Gondi LIN : Gupta - Lee MENON : Li - Merigoux ZHANG H. : Moesching - Vernet ZHANG L. : Wang - Z Reminders: Plagiarism: Homework is to be completed individually. Typeset all of your answers. Handwritten answers will get zero points. Late homeworks: in that case, please email it to all TAs with subject line: 15-415 Homework Submission (HW 2) and the count of slip-days you are using. For your information: Graded out of 100 points 2 questions total Rough time estimate: 1-2 hours for setting up postgres approx. 4-6 hours for completing the questions Revision : 2016/10/13 10:14 1

15-415/615 Homework 2, Page 2 of 19 DUE: Sep. 26, 3:00pm Question Points Score Bike Sharing in Bay Area: Cities and Bikes 75 Bike Sharing in Bay Area: Weather 25 Total: 100 Homework 2 continues...

15-415/615 Homework 2, Page 3 of 19 DUE: Sep. 26, 3:00pm Preliminaries Cluster machine assignments Each student is assigned with an andrew cluster machine, and a specific port, for this homework (you might be sharing the machine with other 415/615 students). Your machine and port assignment is on Blackboard, under grade center. You use some other machines in the range ghc25..86 (say, if your machine is down) but you MUST use your assigned port on any machine that you try, to avoid collisions with other students. Postgres set-up Before starting the homework, follow the instructions for setting up PostgreSQL and importing the data we ll be working with, available at http://15415.courses.cs.cmu.edu/ fall2016/hws/hw2/. What to deliver: Check-list Both hard copy, and soft copy: 1. Hard copy: What: hard copy of your SQL queries, plus their output. When: Sep. 26, 3:00pm Where: in class Contrary to HW1, keep all your answers in one document, but still provide (course#, Homework#, Andrew ID, name). 2. Soft copy: tar-file: What: A gzip tar file (<your-andrew-id>.tar.gz) with all your SQL code. See next paragraph on how to easily generate the tarball. When: Sep. 26, 3:00pm Where: on Blackboard, under Assignments / Homework #2 Details of the tar file. Obtain http://15415.courses.cs.cmu.edu/fall2016/hws/hw2/ hw2-data.tar.gz - after tar xvfz, check the directory./hw2: replace the content of each place-holder hw2/queries/*.sql file, with your SQL code. Your SQL queries will be auto-graded by comparing their outputs to the correct outputs. When comparing each query s output, the grading script prints checking qnn correct if nothing below ----, where qnn is the number of the question. Only when your answer is wrong, the grading script prints the difference (computed by diff) between your query output and the correct output. The goal is that, with your answers in the directory queries, when the grader loads the correct outputs in the directory outputs and types make, he/she should see no difference. For your queries, the order of the output columns is important their names, are NOT. (our grading script turns off the column headers.) Naming Convention. The place-holder queries/*.sql files have the obvious naming convention: For example, the place-holder file for Question 1(a) is named as q1a.sql. Each Homework 2 continues...

15-415/615 Homework 2, Page 4 of 19 DUE: Sep. 26, 3:00pm place-holder file contains a trivial query SELECT hi. Except for outputs/q1a.txt, all other outputs just contain hi. The only exception is outputs/q1a.txt, which contains the output for the ( Sample ) question. Sanity Check. Before doing any other question, check your answers to the ( Sample ) question (Question 1(a)) to ensure your output matches the formatting (in addition to being correct): Enter your answer into queries/q1a.sql and run make the diff script should show no difference. Easy Packaging. For your convenience, we automated the packaging of the submission: When you are done, type make submission, and this should automatically generate the tar-gzipped-file you need to submit. However, it still is your responsibility to make sure that all the sql query files are included. FAQs Q: May we use views? A: Yes - you may use any feature of SQL that is supported by postgres on the andrew/ghc linux machines. But, make sure that no extra output is generated - remember, we ll do a diff, against our answers. Q: What if a question is unclear? A: Our apologies - please post on blackboard or write down your assumptions, and solve your interpretation of the query. We will accept all reasonable interpretations. Q: What if my assigned machine is not responding? A: Our apologies again - as we said earlier, please use another machine, in the range ghc25..86 but with your assigned port number, YYYYY. Homework 2 continues...

15-415/615 Homework 2, Page 5 of 19 DUE: Sep. 26, 3:00pm General Grading Policies -5 for not following instructions - examples include: Submitting query code that is not prepared by make submission and it wont work with the autograding script Hard copy does not contain the outputs Tarballs are not submitted on Blackboard (except the ones that use slip days) No hard copy submitted nor email submission No penalty for creating views but not dropping them. Each students code is graded against a clean database. Get 0 points for leftover template code (SELECT hi). -2 per query if query code is incorrect but hard copy is correct. Get 50% of the points if the result is wrong, but the sql code is close to correct. Close to correct means there s one minor mistake get 0 points for more than one mistakes. Examples of minor mistake include (also see each question): Missing limit Incorrect column order or missing a column (0 points for missing more than one column) Tuples are not ordered as required. Homework 2 continues...

15-415/615 Homework 2, Page 6 of 19 DUE: Sep. 26, 3:00pm Question 1: Bike Sharing in Bay Area: Cities and Bikes[75 points] For this question you will look into bike sharing data collected from five cities in the Bay Area. Several of you will probably find jobs there, and you may like using these bikes. Figure 1 gives the schema of the tables you will use. For example, the tuple in trip table (5088, 183, 2013-08-29 22:08:00, Market at 4th, 76, 2013-08-29 22:12:00, Post at Kearney, 47, 309) means that the bike #309 had a trip #5088 from station #76 Market at 4th at 2013-08-29 22:08:00 to station #47 Post at Kearney at 2013-08-29 22:12:00. The tuple in station table (2, San Jose Diridon Caltrain Station, 37.3297, -121.902, 27, San Jose, 2013-08-06, 95113 ) means that the station #2 San Jose Diridon Caltrain Station in San Jose city with latitude as 37.3297, longitude as 121.902, has 27 docks and is installed in 2013-08-06. Its zip code is 95113. The tuple in weather table ( 2013-08-29, 74, 68, 61, 10, 10, 10, 23, 11, 28, 4,, 286, 94107 ) means that in 2013-08-29, the city with zip code 94107 has max temperature 74, mean temperature 68, minimal temperature 61, max visibility miles 10, mean visibility miles 10, maximum wind speed 23 mph, mean wind speed 11 mph, max gust speed 28 mph, cloud cover 4, no special weather events, and wind dir degree 286. (a) [0 points] ( Sample ): Intention: Count the number of cities. The purpose of this query is to make sure that the formatting of your output matches exactly the formatting of our autograding script. Details: Print the number of cities (eliminating duplicates). select count ( distinct ( c i t y ) ) from s t a t i o n (b) [5 points] ( Warm-up ): Intention: Count the number of stations in each city. Details: Print city name and number of stations. Sort by number of stations (decreasing), and break ties by city name (increasing). Question 1 continues...

15-415/615 Homework 2, Page 7 of 19 DUE: Sep. 26, 3:00pm Figure 1: Tables for bike sharing Question 1 continues...

15-415/615 Homework 2, Page 8 of 19 DUE: Sep. 26, 3:00pm select city, count ( s t a t i o n i d ) as cnt from s t a t i o n group by c i t y order by cnt desc, c i t y asc a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. c). -1 for each missing/extra/wrong tuple. (c) [5 points] ( City-name stations ): Intention: List all stations name of which contains its city s name, for example, station Mountain View Caltrain Station in city Mountain View. Details: Print station name and city name. Sort by city name (increasing), and break ties by station name (increasing). select station name, c i t y from s t a t i o n where station name like % c i t y % order by c i t y asc, station name asc a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. c). -1 for each missing/extra/wrong tuple. (d) [5 points] ( Self-loops ): Intention: Find the percentage of self-loop trips among all trips. A trip is self-loop when its start station is the same as its end station. Details: Print the percentage as a decimal (round to four decimal places using ROUND()). select round ( s e l f l o o p c n t. cnt 1. 0 / t r i p c n t. cnt, 4) as percentage from ( select count ( ) as cnt from t r i p where s t a r t s t a t i o n i d = e n d s t a t i o n i d Question 1 continues...

15-415/615 Homework 2, Page 9 of 19 DUE: Sep. 26, 3:00pm ) as s e l f l o o p c n t, ( select count ( ) as cnt from t r i p ) as t r i p c n t a). -2 for incorrect rouding. b). Get 0 for any other incorrectness. (e) [5 points] ( Best travelled bikes ): Intention: List the top 10 bikes which have been to most number of distinct stations. A bike has been to a station as long as this station is start station or end station in one of its trips. For example, if bike #1000 travelled from station A to station B in a certain trip, then bike #1000 has been to both station A and B. Details: Print bike id and number of distinct stations that it has been to. Sort by number of distinct stations (decreasing), and break ties by bike id (increasing). with b i k e a n d s t a t i o n ( b i k e i d, s t a t i o n i d ) as ( select b i k e i d, s t a r t s t a t i o n i d as s t a t i o n i d from t r i p union select b i k e i d, e n d s t a t i o n i d as s t a t i o n i d from t r i p ) select b i k e i d, count ( s t a t i o n i d ) as cnt from ( select distinct on( b i k e i d, s t a t i o n i d ) b i k e i d, s t a t i o n i d from b i k e a n d s t a t i o n ) as tmp group by b i k e i d order by cnt desc, b i k e i d asc limit 10 a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. Question 1 continues...

15-415/615 Homework 2, Page 10 of 19 DUE: Sep. 26, 3:00pm c). -1 for each missing/extra/wrong tuple. For seemingly correct solutions (no diff output), check d and e. d). -1 for only considering either start station or end station. e). -1 for using station name instead of station id to represent station (In this case, bike #27 probabaly has wrong count 62). (f) [15 points] ( San Jose lover ): Intention: Find all the bikes which have been to all stations in San Jose (and, zero or more, outside San Jose). As above, a bike has been to a station as long as this station is start station or end station in one of its trips. For example, if bike #1000 travelled from station A to station B in a certain trip, then bike #1000 has been to both station A and B. Details: Print the count of such bikes. r s = π (R S) (r) π (R S) [(π (R S) (r) s) r] For easy reading, let π (R S) (r) be Pi (R-S)(r), s o l u t i o n one Divison with b i k e i n s j ( b i k e i d, s t a t i o n i d ) as ( / r / select distinct on( b i k e i d, s t a t i o n i d ) b i k e i d, s t a t i o n i d from ( select b i k e i d, s t a r t s t a t i o n i d as s t a t i o n i d from t r i p, s t a t i o n where ( t r i p. s t a r t s t a t i o n i d = s t a t i o n. s t a t i o n i d ) and ( s t a t i o n. c i t y = San Jose ) union select b i k e i d, e n d s t a t i o n i d as s t a t i o n i d from t r i p, s t a t i o n where ( t r i p. e n d s t a t i o n i d = s t a t i o n. s t a t i o n i d ) and ( s t a t i o n. c i t y = San Jose ) ) as bike union ), s t a t i o n ( s t a t i o n i d ) as ( / s / select s t a t i o n i d from s t a t i o n where c i t y = San Jose ), b ( b i k e i d ) as ( / Pi (R S )( r ) / select distinct ( b i k e i d ) from b i k e i n s j ) Question 1 continues...

15-415/615 Homework 2, Page 11 of 19 DUE: Sep. 26, 3:00pm select count ( ) from ( select b i k e i d / Pi (R S )( r ) / from b except / / ( select b i k e i d / ( Pi (R S )( r ) C join s ) r / from( select b i k e i d, s t a t i o n i d from b, s t a t i o n / C Join / except / / select / r / from b i k e i n s j ) as r u l e o u t ) ) as s a n j o s e l o v e r s o l u t i o n two Count number with b i k e a n d s t a t i o n ( b i k e i d, s t a t i o n i d ) as ( select b i k e i d, s t a r t s t a t i o n i d as s t a t i o n i d from t r i p union select b i k e i d, e n d s t a t i o n i d as s t a t i o n i d from t r i p ), s t a t i o n i n s j ( s t a t i o n i d ) as ( select s t a t i o n i d from s t a t i o n where c i t y = San Jose ) select count ( ) from( select count ( s t a t i o n i d ) as cnt from b i k e a n d s t a t i o n where s t a t i o n i d in ( select s t a t i o n i d from s t a t i o n i n s j ) group by b i k e i d ) as t r i p i n s j where t r i p i n s j. cnt = ( select count ( ) from s t a t i o n i n s j ) a). -2 for using station name instead of station id to represent station (In this Question 1 continues...

15-415/615 Homework 2, Page 12 of 19 DUE: Sep. 26, 3:00pm case, the wrong answer is probably 187). b). -5 for outputing all valid bikes id instead of the count. c). Get 0 for any other incorrectness. For seemingly correct solutions (no diff output), check a and d. d). -2 for only considerting either start station or end station. (g) [10 points] ( Most popular station per city ): Intention: For each city, find the most popular station in that city. Popular means that station has the highest count of visits. As above, Either starting a trip or finishing a trip at a station is counted as one visit to that station. For example, if a trip starts at station A, and also ends at station A, then station A has 2 counts of visits. Details: For each city, print city name, most popular station name and its visit count. Sort by city name, ascending. Hint: You may use Postgres s built-in function ROW NUMBER() (faster execution, but less elegant). Either solution will get full points. / Be c a r e f u l with UNION! UNION w i l l remove d u p l i c a t e t u p l e s a u t o m a Try : s e l e c t 1 union s e l e c t 1 / s o l u t i o n one with ROWNUMBER() with v i s i t ( s t a t i o n i d, cnt ) as ( select s t a r t c n t. s t a t i o n i d, s t a r t c n t. cnt + end cnt. cnt as cnt from ( select s t a r t s t a t i o n i d as s t a t i o n i d, count ( ) as cnt from t r i p group by s t a r t s t a t i o n i d ) as s t a r t c n t, ( select e n d s t a t i o n i d as s t a t i o n i d, count ( ) as cnt from t r i p group by e n d s t a t i o n i d ) as end cnt where s t a r t c n t. s t a t i o n i d = end cnt. s t a t i o n i d ), v i s i t c i t y ( city, s t a t i o n i d, cnt, rank ) as ( select city, v i s i t. s t a t i o n i d, v i s i t. cnt, ROWNUMBER( ) over ( p a r t i t i o n by c i t y order by v i s i t. cnt desc ) as rank Question 1 continues...

15-415/615 Homework 2, Page 13 of 19 DUE: Sep. 26, 3:00pm ) from v i s i t, s t a t i o n where v i s i t. s t a t i o n i d = s t a t i o n. s t a t i o n i d select v i s i t c i t y. city, s t a t i o n. station name, v i s i t c i t y. cnt from v i s i t c i t y, s t a t i o n where v i s i t c i t y. rank = 1 and s t a t i o n. s t a t i o n i d = v i s i t c i t y. s t a t i o n order by c i t y asc s o l u t i o n two without ROWNUMBER() with v i s i t ( city, s t a t i o n i d, cnt ) as ( select s t a r t c n t. city, s t a r t c n t. s t a t i o n i d, s t a r t c n t. cnt + end cnt from ( select city, s t a r t s t a t i o n i d as s t a t i o n i d, count ( ) as cnt from t r i p, s t a t i o n where s t a t i o n. s t a t i o n i d = t r i p. s t a r t s t a t i o n i d group by city, s t a r t s t a t i o n i d ) as s t a r t c n t, ( select city, e n d s t a t i o n i d as s t a t i o n i d, count ( ) as cnt from t r i p, s t a t i o n where s t a t i o n. s t a t i o n i d = t r i p. e n d s t a t i o n i d group by city, e n d s t a t i o n i d ) as end cnt where s t a r t c n t. s t a t i o n i d = end cnt. s t a t i o n i d ), e x c l u d e d s t a t i o n ( s t a t i o n ) as ( select distinct ( v1. s t a t i o n i d ) from v i s i t as v1, v i s i t as v2 where v1. c i t y = v2. c i t y and v1. cnt < v2. cnt ) select v i s i t. c ity, s t a t i o n. station name, v i s i t. cnt from v i s i t, s t a t i o n where s t a t i o n. s t a t i o n i d = v i s i t. s t a t i o n i d and v i s i t. s t a t i o n i d not in ( select s t a t i o n from e x c l u d e d s t a t i o n ) order by c i t y asc Question 1 continues...

15-415/615 Homework 2, Page 14 of 19 DUE: Sep. 26, 3:00pm a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. c). -1 for each missing/extra/wrong tuple. For seemingly correct solutions (no diff output), check d. d). -1 for using station name instead of station id to represent station (outpus in this case may be the same as correct ones). (h) [10 points] ( The cumulative traveling history of a bike #697 ): Intention: Find the accumulative traveling durations of bike #697. Details: For bike #697 s each trip, print end time and accumulative duration so far. Sort by cumulative duration (increasing). For example, if its traveling history is shown in Table 1, then the result should be as Table 2. End Time Duration 2016/08/01 00:05:00 2400 2016/09/01 00:06:00 3600 2016/09/02 00:07:10 2000 Table 1: Traveling example. select end time, ( select sum( duration ) from t r i p as i where i. b i k e i d = 697 and i. end time <= o. end time ) as ac from t r i p as o where o. b i k e i d = 697 order by ac asc End Time Cumulative Duration 2016/08/01 00:05:00 2400 2016/09/01 00:06:00 6000 2016/09/02 00:07:10 8000 Table 2: Cumulative traveling duration. Question 1 continues...

15-415/615 Homework 2, Page 15 of 19 DUE: Sep. 26, 3:00pm a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. c). -1 for each missing/extra/wrong tuple. (i) [10 points] ( Unrecorded movement of bike #697 ): Intention: Usually, but not always, if a bike ends its trip at station X, its next trips starts at station X. Occasionally, a truck collects some bikes from station X, and moves them to station Y, to satisfy the higher demand there. This is what we call unrecorded movement. Find all unrecorded movements of bike #697. Details: For each unrecorded movement, print former trip id, former end time, former end station name, latter trip id, latter start time, and latter start station name. Sort by former trip id (increasing), break ties with latter trip id (increasing). Hint: You are strongly recommended to use the ROW NUMBER() built-in function. It should be much faster but again, we ll give full points for any correct solution. with t r i p s w i t h i d as ( select row number ( ) over ( order by s t a r t t i m e n u l l s last ) as rownum, id, s t a r t t i m e, end time, s t a r t s t a t i o n n a m e, end station name, s t a r t s t a t i o n i d, e n d s t a t i o n i d from t r i p where b i k e i d = 697 order by t r i p. id asc ) select t1. id, t1. end time, t1. end station name, t2. id, t2. s t a r t t i m e, t2. s t a r t s t a t i o n n a m e from t r i p s w i t h i d as t1, t r i p s w i t h i d as t2 where t1. rownum+1 = t2. rownum and t1. e n d s t a t i o n i d!= t2. s t a r t s t a t i o n i d order by t1. id asc, t2. id asc a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. Question 1 continues...

15-415/615 Homework 2, Page 16 of 19 DUE: Sep. 26, 3:00pm c). -1 for each missing/extra/wrong tuple. For seemingly correct solutions (no diff output), check d. d). -1 for using station name instead of station id to represent station (outpus in this case may be the same as correct ones). An example is using former trip end station name!= latter trip start station name. (j) [10 points] ( Data entry errors - overlapping trips ): Intention: One of the possible data-entry errors is to record a bike as being used in two different trips, at the same time. Thus, we want to spot pairs of overlapping intervals (start time, end time). To keep the output manageable, we ask you to do this check for bikes with id between 500 and 600 (both inclusive). Note: Assume that no trip has negative time, i.e., for all trips, start time <= end time. Details: For each conflict (a pair of conflict trips), print the bike id, former trip id, former start time, former end time, latter trip id, latter start time, latter end time. Sort by bike id (increasing), break ties with former trip id (increasing) and then latter trip id (increasing). Hint: 1. Report each conflict pair only once, so that former trip id < latter trip id. 2. We give you the (otherwise tricky) condition for conflicts: start1 < end2 AND end1 > start2 select t1. b i k e i d, t1. id, t1. s t a r t t i m e, t1. end time, t2. id, t2. s t a r t t i m e, t2. end time from t r i p as t1, t r i p as t2 where t1. b i k e i d = t2. b i k e i d and t1. b i k e i d between 500 and 600 and t1. id < t2. id and t1. s t a r t t i m e < t2. end time and t2. s t a r t t i m e < t1. end time order by t1. b i k e i d asc, t1. id asc, t2. id asc a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. c). -1 for each missing/extra/wrong tuple. For seemingly correct solutions (no diff output), check d. d). If where statement differ from what we give in the hint (hopefully they will take the hint), check the following boundary cases, -2 for each case miss: case a). t1.start time = t2.start time = t1.end time = t2.end time > No conflict (multiple trips happening in one single minute) Homework 2 continues...

15-415/615 Homework 2, Page 17 of 19 DUE: Sep. 26, 3:00pm case b). t1.start time = t2.start time AND t1.end time = t2.end time AND t1.start time!= t1.end time > Conflict case c). t1.start time = t2.start time = t2.end time AND t1.end time > t1.start time > No conflict. Homework 2 continues...

15-415/615 Homework 2, Page 18 of 19 DUE: Sep. 26, 3:00pm Question 2: Bike Sharing in Bay Area: Weather..... [25 points] Here we try to figure out how weather impacts biking in bay area. (a) [5 points] ( The most popular station on special weather days ): Intention: Find, on special weather days, which station was visited most as start station of trips? A special weather day is a day that has a non-empty string for weather.events, such as Rain, Fog and so on. Details: Print the most popular station name and its times of being a start station. If there are more than one most popular stations, print the alphabetically first station. select s t a t i o n. station name, t. cnt from s t a t i o n, ( select t r i p. s t a r t s t a t i o n i d, count ( ) as cnt from t r i p, s t a t i o n, weather where s t a t i o n. z i p c o d e = weather. z i p c o d e and t r i p. s t a r t s t a t i o n i d = s t a t i o n. s t a t i o n i d and date ( t r i p. s t a r t t i m e ) = weather. date and weather. events not like group by t r i p. s t a r t s t a t i o n i d order by cnt desc, t r i p. s t a r t s t a t i o n i d asc limit 1 ) as t where s t a t i o n. s t a t i o n i d = t. s t a r t s t a t i o n i d a). -1 for any extra columns. b). Get 0 for wrong result. For seemingly correct solutions (no diff output), check c. c). -1 for using station name instead of station id to represent station (outpus in this case may be the same as correct ones). (b) [5 points] ( Bad weather - shorter trips? ): Intention: Among all short trips, compute the percentage of short trips that are happening on special weather days. (As above, special weather day means that weather.events is not an empty string). A short trip is a trip whose duration is <= 60 (units). Note (update after release): Only consider weather of start stations. Details: Print the percentage as a decimal, rounded to four decimal places using ROUND(). select round ( s h o r t t r i p o n s p e c i a l w e a t h e r. cnt Question 2 continues...

15-415/615 Homework 2, Page 19 of 19 DUE: Sep. 26, 3:00pm 1. 0 / s h o r t t r i p. cnt, 4) as percentage from ( select count ( ) as cnt from t r i p, s t a t i o n, weather where t r i p. s t a r t s t a t i o n i d = s t a t i o n. s t a t i o n i d and s t a t i o n. z i p c o d e = weather. z i p c o d e and date ( t r i p. s t a r t t i m e ) = weather. date and duration <= 60 and weather. events not like ) as s h o r t t r i p o n s p e c i a l w e a t h e r, ( select count ( ) as cnt from t r i p where duration <= 60 ) as s h o r t t r i p This question on the first version of released writeup is not clear stated, more specifically, the weather can include 1. weather on start station, 2. weather on end station or 3. weather on start station or end station. In all these cases, the result is 0.1600. Students will not lose score when they consider any of the three cases. a). -1 for any extra columns. b). Get 0 for wrong result. For seemingly correct solutions (no diff output), check c. c). -1 for using station name instead of station id to represent station (outputs in this case may be the same as correct ones). (c) [15 points] ( Weather on quiet days ): Intention: Are quiet (=low activity) days correlated with bad weather? Find the stations whose least popular day (as start station), had special weather events (nonempty weather events). Note: popularity in this question means count of trips that started at that station, which is slightly different from definitions in previous questions. Note (update after release): 1. If there are more than one least popular day for a station, take the earliest day. 2. For simplicity, DO NOT take days with zero trips into account. Details: Print station name, date (in Postgresql s date format, instead of timestamp), and weather events. Sort by station name (increasing). select s t a t i o n. station name, f e w e s t. fewest date, weather. events End of Homework 2

15-415/615 Homework 2, Page 20 of 19 DUE: Sep. 26, 3:00pm from s t a t i o n, weather, ( select o. s t a t i o n i d as s t a t i o n i d, ( select date ( s t a r t t i m e ) from t r i p as i where o. s t a t i o n i d = i. s t a r t s t a t i o n i d group by date ( s t a r t t i m e ) order by count ( ) asc, date ( s t a r t t i m e ) asc limit 1 ) as f e w e s t d a t e from s t a t i o n as o ) as f e w e s t where f e w e s t. s t a t i o n i d = s t a t i o n. s t a t i o n i d and s t a t i o n. z i p c o d e = weather. z i p c o d e and f e w e s t. f e w e s t d a t e = weather. date and weather. events not like order by s t a t i o n. station name asc This question on the first version of released writeup is not clear stated, more specifically, all least popular days are considered. In this case, the result contains 765 rows. Students will not lose score when considering all least popular days, but following rubrics still hold. a). -1 for any extra columns. b). -2 for wrong order if results are right but order is wrong. c). -5 for missing one column. Get 0 for missing more than one columns. d). -1 for each missing/extra/wrong tuple (choose a nearer value between 21 and 765 as base). e). Get half score for considering not only the earliest least popular day. For seemingly correct solutions (no diff output), check f. f). -1 for using station name instead of station id to represent station (outpus in this case may be the same as correct ones). Homework 2 continues...