On this page you find the weekly homeworks. We will post them approximately one week before they are due.

For each homework that involves code, you must follow the following submission guidelines:

  • Submit an archive (tar, zip) containing your solution to the course CMS before the homework deadline. The archive filename must follow this pattern: info4302_hw{homework_number}_{firstname}_{lastname}.(tar.gz|zip)
  • We accept code contributions in Java, Ruby, Php, and Python. Please make sure that you use recent language versions.
  • Your code must be compile- and executable from the command line. If you use Java, you should provide build files using Apache Ant or Apache Maven. You can use IDEs such as Eclipse or Netbeans for development, but please don't send us your IDE project files.
  • Your code should follow directory layout best-practices, e.g., source-code in a folder src or lib, executables in a folder bin, etc. This is language-specific and not standardized, but you will find guidelines, for instance here, on the Web.
  • The root directory of your archive file must (!!!) include a README file, which explains how we can compile and run your code and what we are expected to see. There you must also acknowledge every piece of code you borrow. It is fine to use others' code in this class, but it is not fine to represent it as your own, even accidentally. That is plagiarism and a serious offense in both the work and academic worlds.

HW1: Introductory Reading (Due: 09/04, 11:59pm)

Read the items listed in the introductory class and answer the following questions.

  • What are the visionary aspects in V. Bush's paper w.r.t the World Wide Web?
  • Compare Tim Berners-Lee's initial WWW proposal with the Web as we know it today. What are the key differences?
  • What are the key characteristics of the Web that enable the convergence of Social and Technological Networks, as described in Kleinberg's article?
  • After reading "Creating a Science of the Web", how would you explain "Web Science" to your grandmother.

Submit your answers in a single (pdf or txt) document no longer than 2 pages.

HW2: Identification and Interaction on the Web (Due: 09/11, 11:59pm)

The purpose of this assignment is to understand and practice two of the three architectural bases of the Web: Identification and Interaction. You will also learn how to use browser add-ons and cURL to debug your Web Information System. The HTTP specification included in the readings for this week is an important reference for this assignment. You will also find the Web architecture document that is part of the readings useful. Of course, you can also search the web via Google, etc. for help with many of the answers. Please submit your answers as a single text (pdf, txt) document.

Task 1: Identifiers

Consider the following identifier schemes (each nicely described in Wikipedia)

  • International Standard Book Number (ISBN)
  • Digital Object Identifier (DOI)
  • Uniform Resource Identifier (URI)
  • Persistent URL (PURL)
  • Domain Name System (DNS)

Briefly describe each of them in terms of the following characteristics:

  • Persistence: what are the mechanisms and inherent capabilities to last forever?
  • Granularity: what type of entity do they identify (documents, persons, abstract concepts...)?
  • Uniqueness: what are the mechanisms to ensure global uniqueness?
  • Governance: who, if anyone, manages them?
  • Actionability: how are they, or not, tied to an access mechanism?

Task 2: HTTP in your Web browser

Most Web browsers provide some tools to monitor their HTTP interactions with Web resource. You can install the Firefox Web Developer add-on, enable Safari's Develop Menu, use Chrome's Developer Tools, or use any other browser's development tools. If you haven't done so in the past, either install those tools or enable them in your browser of choice. Also, find a way to either clear or empty your cache in your browser. For example in Safari this is done via a menu command, in chrome this is done via the preferences. Once you discover how to do this, clear or empty the cache in your browser (make sure that you clear the cache rather than your entire web history, which you probably don't want to do). Now dereference http://www.cs.cornell.edu/lagoze and answer the following questions.

  • How many web resources were requested and returned by this single HTTP request?
  • What is the nature (content-type) of each resource?
  • What is the meaning of the status code returned for each resource?
  • When you hit your browser's back button and reload the page, what has changed in the HTTP transactions and why? How does this relate to the cache that you cleared at the beginning of this exercise?

Task 3: HTTP with cURL

In this task you will use curl: a command line based HTTP utility to examine HTTP transactions. If you are running Windows, Mac OS, or Linux, curl should already be available from a terminal window. The curl web page provides versions for all common operating systems. There are some tutorials available on the Web that help you quickly learn to use it. Take note with some useful commandline options such as -H that allows you to add arbitrary request headers and -v that verbosely displays your request headers and corresponding response headers

Use curl to experiment with the following HTTP GET scenarios:

  • Scenario 1: access http://www.google.com to retrieve its versions in french and spanish.
  • Scenario 2: access to http://dbpedia.org/resource/Berlin to retrieve its versions in text/html and application/rdf+xml. Describe what the resource identified as http://dbpedia/resource/Berlin denotes. What is the "object of interest" (using the terminology of the web architecture document) that it stands for?
  • Scenario 3: access to content/representation for URI doi:10.1021/ci050378m through the proxy URI http://dx.doi.org/10.1021/ci050378m (note this will only work at Cornell due to licensing restrictions). Think carefully when you answer the following question. What does each of the resources (and their respective URIs) involved in accessing a representation denote (make sure to consider the DOI, the proxy, and all other URIs)?

For each scenario report the following characteristics:

  • the number of resources involved in the HTTP transaction.
  • the number of representations and their associations with the resource.
  • the role of content negotiation in the relationship between resources and representations.
  • the role of redirection in the relationship between resources and representations.

HW3: XML and XML schema languages (Due: 09/18, 11:59pm)

The purpose of this homework is to become familiar with the third pillar of the Web Architecture: Data Formats. It focuses on XML-based formats and gives you some practice with XML and XML schema languages. For working with XML, you can download trial versions of oXygen XML Editor or Altova XML Spy or simply use your text editor or libxml from your console.

Task 1

Visit your favorite TV station's web site (e.g., BBC One) and look at the online program schedule it provides. Identify relevant data items and create

  • an XML or RELAX NG schema for program information with has meaningful element/type names and makes use of namespaces.
  • a well-formed and valid XML instance document containing some sample data from the station's program information website.

There is obviously no single or right solution for this task. Just think logically: which information provided on the program information Web site could be relevant for end users or other applications.

Task 2

Write a short program that parses your XML instance document and changes the start and end times of a single telecast. It should output the resulting information on the console.

What you should turn in

  • An XML schema or RELAX NG file called {your_lastname}_task1.(xsd|rng|rnc)
  • An XML instance document called {your_lastname}_task1.xml
  • The program code for task two, following the guidelines mentioned above.

HW4: XPath and XSLT (Due: 09/25, 11:59pm)

The goal of this assignment is to give you some experience using XPath and XSLT as tools for manipulating XML documents.

Task 1

Define expressions that deliver meaningful results when being executed over the XML instance data you have created in the previous homework (HW3). These expressions, of course, completely depend on the structure and quality of your schema.

Write an XPath expression that

  • delivers all subtitles of all telecast nodes (e.g., Series 13, Back from the Dead from the telecast Doctors)
  • counts the number of telecasts on a certain day
  • returns all nodes representing the start and end times of telecasts.

Task 2

Using the file driving.xml as the source file, transform it into HTML that renders it a manner similar to that shown in the file driving.pdf. You will find these two files in hw4.zip on the course CMS site attached to this assignment.

A few additional guidelines:

  • Your output HTML must be valid HTML.
  • Don't sweat over making your output HTML an exact match for the example in driving.pdf. Your output should match the general placement of information in the example, but exact spacing and exact wording can differ.
  • Don't sweat over issues such as CSS formatting of the HTML. I really don't care about this at this point.
  • You code should use rule-based logic (Good XSLT) rather than procedural or sequential programming style (Bad XSLT). I should make clear what differentiates good XSLT from bad XSLT.
    • BAD XSLT: it is entirely possible to write an XSLT program in purely procedural form using standard for loops and conditional statements. For example, I've seen programs that had a single template pattern match rule for the root of the input XML tree, and then a body for that rule which iterated through various tree nodes performing manipulations. While the program worked, it really represented a misuse of the language, which is a poor procedural language, but a really good tree walk and pattern matching language.
    • GOOD XSLT: as demonstrated in class, the heart of XSLT is a set of pattern matching rules (templates) that match a tree walk (either default depth first or directed via attributes to the template and apply template statements). A good XSLT program exploits these capabilities and uses procedural statements only where absolutely necessary.
    So, we want you to write a GOOD XSLT program. Full credit will not be awarded if you follow the less preferred, procedural route.

What you should turn in, A single zip file containing:

  • A txt file with your three XPath expressions for task 1.
  • Your xslt file for task 2.

HW5: RESTful Web Services (Due: 10/8, 1pm)

The goal of this homework is to give you experience in building RESTful Web Services and to become familiar with the design principles of Resource Oriented Architectures as discussed in the lectures. In this homework you can reuse results from HW3 and HW4.

Task 1

Design an implement a program information web service that follows the principles of a RESTful architecture and operates on the (if necessary extended) XML data you created in HW3. The service must deliver XML and implement the following use cases (UC):

  • UC1: Retrieve a list of telecasts from the server. The list should just contain the title, start- and end times for each telecast, not the full details
  • UC2: Retrieve the details for a certain telecast. The details should contain available details (e.g., subtitle, synopsis) and the user ratings for this telecast.
  • UC3: Rate a telecast. The client can rate a certain telecast with 1-5 stars and the server calculates the average of all ratings. For this homework, it is not necessary to store the individual ratings of each user on the server. Just calculate the average.

Follow the design methodology outlined in this tutorial. Think carefully about the resources you need to expose, their relationships with each other and which methods of the uniform HTTP interface they need to implement. Also implement the principle of connectedness in your resource representations.

Persisting data in a database is not necessary for this homework. Just make sure that your service reads some sample data when it starts-up and can answer client requests based on these data. You can keep everything in memory and lose data when your service shuts down. Your service should deliver XML resource representations.

Feel free to experiment with any RESTful programming framework (Ruby on Rails, Django, etc.). You can also write a simple service that implements functions/methods for UC1-UC3, calls these functions depending on an HTTP request's URI and delivers the desired response or performs the requested action. If you use Java, implementing a Java Servlet should be sufficient. If you use PHP, you might need Apache's mod_rewrite module to handle all HTTP requests in a single script file. With Ruby you can use the built-in WEBrick Web Server toolkit for implementing Servlets, in Python it should be sufficient to implement a simple HTTPServer and a HTTPRequestHandler.

What you should turn in

A zip/tar.gz file info4302_hw5_{firstname}_{lastname}.{zip|tar.gz} containing your program code, the sample data file, and a detailed README file that explains how to run and test UC1-UC3 with your service.

HW6: Basic RDF construction and programming (Due: 10/23, 11:59PM)

The purpose of this homework is to give you experience with using the Resource Description Framework (RDF) to express basic knowledge assertions and to create those assertions using an RDF API. The assignment is based on the following three paragraphs of basic (albeit stilted) natural language statements:

A man has the name given name John, family name Smith. John Smith's birthday is April 5, 1991. John Smith's home page is http://smith.org/me. John Smith knows Mary Jones. John Smith lives in Ithaca, NY.

A woman has the name Mary Jones. Mary Jones attends school at Cornell University. Mary owns the book that has the title Been Down So Long Looks Like Up to Me. Mary Jones owns a BMW. Mary Jones has a blog at http://mary.org/myblog. Mary Jones lives at street 11 Main St., city Anyville, state NY.

A book has the title Been Down So Long Looks Like Up to Me. The book has an author Richard Farina. The book is published by Penguin Classics. The book has the ISBN 978-0140189308. Richard Farina went to school at Cornell University.

The instructions for the homework are as follows;

  1. Model the three paragraphs as RDF statements. In as many cases as possible use elements (classes and properties) from the FOAF and Dublin Core vocabularies. Hint: although most facts can be expressed in these vocabularies, there are a few that can't, in which case you should use a single new namespace of your choosing. Make sure you use the correct namespace URIs for all vocabularies. You will notice that most of the assertions in the paragraphs are binary, but there are a few that are n-ary, which will require the use of b-nodes. In as many cases as possible, associate classes (types) with resources. Also, use resources rather than literals as much as possible, although there are some cases where a literal object in a statement is appropriate.
  2. Write a (Java, PHP, Python, Ruby) program that employs an RDF API (there is a list of APIs on the readings page). Your program should write out your model in RDF/XML on completion. Hint: if you view your RDF/XML in the W3C RDF Validation Service it should be a fully connected graph.

Please submit your solution to CMS according to the general instructions for code homework. IMPORTANT: If your program is not runnable according to your instructions and/or you fail to include all required libraries, it will get zero credit

HW7: SKOS vocabularies and SPARQL queries (Due: 10/30, 11:59PM)

The goal of this homework is go give you hands-on experience with the Simple Knowledge Organization System (SKOS) and the SPARQL query language and protocol.

Task 1: Create a SKOS Vocabulary

Choose a domain of interest and create a controlled vocabulary for that domain using the Simple Knowledge Organization System (SKOS). You can design a vocabulary for recipes, sports, cars, beer, or whatever else comes into your mind and is possibly useful in an information system. Your vocabulary should be of a reasonable size (~ 20-30 concepts) and demonstrate your understanding of SKOS. It should fulfill the following criteria:

  1. Each concept must have a skos:prefLabel and should, if applicable, also have skos:altLabel and skos:hiddenLabel properties
  2. Some concepts should have hierarchical (skos:broader, skos:narrower) and some should have associative (skos:related) relationships
  3. If applicable, concepts should be mapped to similar concepts in other Linked Data sources (see http://www.w3.org/TR/skos-primer/#secnetworking)
  4. For some concepts you should reuse information from other sources (e.g., DBpedia abstracts as value for documentary concept notes)
  5. Your resulting SKOS model must be consistent. Use this (http://demo.semantic-web.at:8080/SkosServices/check) service to validate your result.
  6. You must describe your vocabulary with metadata (title, your name, etc.)

You can write your SKOS vocabulary by hand in any RDF serialization format, but we strongly recommend to use the PoolParty demo server we set up for this course. You will receive more detailed access instructions per email and in our Tuesday class we will also give a brief intro into PoolParty.

Task 2: SPARQL Queries against DBpedia

Formulate SPARQL queries that answer, when manually executed against http://dbpedia.org/sparql, the following questions:

  1. All films directed by Stanley Kubrick
  2. The English and, if available, Spanish abstracts of ten 1980s horror movies
  3. The human-readable names (in english) and the birth dates of all actors that were starrings in movies directed by Stanley Kubrick. Note that the result should also contain actors with (according to Wikipedia) unknown birth dates.

Task 3: Programmatic SPARQL execution

Write a small program that executes the SPARQL queries from Task 2 against the DBpedia SPARQL endpoint and prints out the results.

What you should turn in

  • The RDF-serialized SKOS file and a README-SKOS.TXT file that explains how your vocabulary implements criteria 1-5
  • A file SPARQL.TXT containing your SPARQL queries
  • Your program / script for executing the queries and detailed instructions how we can test it

HW8: Linked Data Publishing (Due: 11/6, 11:59pm)

The goal of this homework is to become familiar with Linked Data Publishing and to learn how to serve RDF using a custom server-side script or servlet. This homework builds on HW3, HW5 and HW6 and you can reuse existing data and code.

Task 1: RDF Program Information Dataset

Write a program that creates RDF representations from your existing program information data. You can do this by creating an XSLT stylesheet that transforms your XML data into RDF/XML, which you can then load into your application using an RDF API such as Jena. Alternatively, you can write a small program that reads your XML document, iterates through your data, and adds triples/statements to an RDF model. For describing the semantics of your data, you can use the terms defined by the BBC Programmes Ontology and/or define your own vocabulary. Serialize your program information dataset to a file using the Turtle syntax.

Task 2: Linked Data Service

Modify your service from HW5 so that it can read in your program information graph and implements the 303 URI pattern, as described here, for UC2. Please make sure that your service exposes dereferencable URIs and can serves both, RDF and simple HTML representations. We will test your service using cURL, and execute HTTP GET requests against the URLs your service provides. For instance:

  • curl -I -H "Accept: application/rdf+xml" http://localhost:8080/myservice/telecasts/1234 -> should show the HTTP response that redirects the client to the RDF document URI
  • curl -I -H "Accept: text/html" http://localhost:8080/myservice/telecasts/1234 -> should show the HTTP response that redirects the client to the HTML document URI
  • curl -H "Accept: application/rdf+xml" http://localhost:8080/myservice/telecasts/1234.rdf -> should deliver an RDF/XML-serialized RDF document
  • curl -H "Accept: text/html" http://localhost:8080/myservice/telecasts/1234.html -> should deliver an HTML document

If you are using custom vocabulary URIs (e.g., http://localhost:8080/myservice/vocab/Telecast) in your data, deliver at least some minimal information, such as an rdfs:label, when they are being dereferenced. You can ignore q-values and expect that HTTP Accept header will always contain only one mime-type.

What you should turn in

  • A file {netid}_{hw8}_data.ttl containing your program information dataset
  • A file {netid}_{hw8}_README.TXT explaining how we can set-up, run, and test your service
  • An archive {netid}_{hw8}_app.{zip|tar} containing your source code

HW9: Ontologies (Due: 12/2, 11:59pm)

The goal of this homework is to become familiar with OWL ontologies and Protege.

Task 1: Explain ontology meaning

  1. Download the file animals.owl from the CMS.
  2. Examine the file using Protege, a text editor, or the RDF validator.
  3. Write, in simple english, a list of the assertions in the ontology. I am looking for statements like as follows (for a fragment of fictional ontology). You don't have to use this exact language but try to be precise and express all meaning in the ontology including distinguishing between necessary and necessary & sufficient conditions, and universal and existential conditions.
    • A human is a class
    • A man is a class that is a subclass of a human
    • A woman is a class that is a subclass of a human
    • A necessary condition of the class child is that members must have one mother and one father.
    • A necessary and sufficient condition of the class father is that individuals are men and they have at least one hasChild relationship to an individual of class child
    • isMotherOf is a property that has the range child and domain mother
    • Every child that is the object of a hasChild relationship to a mother must be a human.
    • Some child of a ProudParent must attend Cornell
    • hasFather is a property that has the domain child and range father

Task 2: Ontology editing

Enhance the ontology as follows (Make sure to read the statements carefully distinguishing between necessary and necessary & sufficient conditions, and universal and existential conditions.):

  • Define a class meat-eater that is defined as an animal that eats animals.
  • Define a class zebra that is a plant-eater that eats only leaves.
  • Define a class tiger that is an animal that eats only plant-eaters
  • Define a class yummy-plant that defined as a plant eaten by meat-eaters and plant-eaters
  • Assert that the subject of the eats property is an animal.
  • Assert that if R1 is-part-of R2 and R2 is-part-of R3 then R1 is-part-of R3
  • Assert a property is-eaten-by so that R1 eats R2 implies R2 is-eaten-by R1

Task 3: Creating an ontology

The purpose of this task is to give you experience creating an OWL ontology from scratch. An important key to doing this assignment successfully is to tightly scope your ontology. Do not try to model to large a concept space. For example, an ontology modeling the U.S. government is probably too much of an undertaking. However, the following are workable:

  • the fraternity system at Cornell
  • the participants in a single sport (e.g. basketball: forwards, coaches, cheerleaders, centers, tall players, etc.)
  • the clothing you own (outer ware, shirts, dress ware, winter clothing, etc.)

Whatever you choose, your ontology should contain the following characteristics:

  • your explicit hierarchy should be at least two-levels deep below owl:Thing. It can be deeper but going more than three levels is probably too complicated.
  • it should include some properties with domains and ranges.
  • It should include at least two of the following property characteristics: transitive, inverse, symmetric, functional, inverse functional.
  • it should include asserted conditions that are sufficient to infer an implicit poly hierarchy from reasoning over your explicit hierarchy.
  • It must be consistent.

What you should turn in

  • A text file {netid}_{hw9}.txt containing your ontology explanation
  • A file {netid}_{hw9-1}.xml containing the serialization of your modified ontology
  • A file {netid}_{hw9-2}.xml containing the serialization of your new ontology