Cornell Information Science


INFO 295: Mathematical Methods for Information Science

Tue/Thu 2:55-4:10 PM Hollister 110
4 credits, S/U Optional
Co-requisite: Math 231 or equivalent

This course teaches basic mathematical concepts in information modeling. Topics to be covered include graph theory, discrete probability, finite automata, Markov models and hidden Markov models. We will be guided by examples and applications from various areas of information science such as: the structure of the Web, genome sequences, natural languages, and signal processing.


Professor: Paul Ginsparg (301 College Ave, 5-7371, ginsparg@cornell.edu)
Office hours: Wednesdays 2-3 PM (or by appointment)
Course website: http://www.infosci.cornell.edu/Courses/info295/2006fa/ (this page)

Note: This is the course site for Fall 2006
The course site for Fall 2007 is here

Lecture 1 (Thu 24 Aug 06)

Intro, overview of topics to be discussed. Background discussion of Set Theory.
(See also INFO 295 Fall 2005)

Lecture 2 (Tue 29 Aug 06)

Probability and Counting 1 (Probability space, events, binomial coefficients, conditional probability).
See also counting examples.

Lecture 3 (Thu 31 Aug 06)

More on conditional probability: Bayes' Law and examples. (see also Bayesian inference)

Lecture 4 (Tue 5 Sep 06)

More on independent events, binomial distributions, random variables, expectation and variance (Probability and Counting 2)

Problem Set 1

Lecture 5 (Thu 7 Sep 06)

Cancelled due to conflict with presidential inauguration.

Lecture 6 (Tue 12 Sep 06)

Continued Probability and Counting 2 notes on independent events, expectation and variance of Bernoulli distributions

Lecture 7 (Thu 14 Sep 06)

Some comments on normal distributions and the central limit theorem (see also), and review of exponentials and logarithms.

Lecture 8 (Tue 19 Sep 06)

Discuss www-admin spam, introduce A plan for spam (Graham, 2002) (plus "naive Bayes" and combining probabilities).
(See also section 6.3 of Rosen, and "naive Bayes" document classification and Bayesian spam filtering.)

Problem Set 2

Lecture 9 (Thu 21 Sep 06)

Completed A plan for spam (Graham, 2002) and follow-up (Graham, 2003), using more details on "naive Bayes".

Lecture 10 (Tue 26 Sep 06)

Some final comments on combining probabilities ( p(L or R | O) vs p(O | L and R) ).
Start Graph Theory 1 (Basic definitions, isomorphisms, paths)

Lecture 11 (Thu 28 Sep 06)

Graph Theory 2 (Eulerian and Hamiltonian circuits)
(See also Hamiltonian circuit)
Graph Theory 3 (Trees, planarity, colorings)

Lecture 12 (Tue 3 Oct 06)

finish Graph Theory 3 (Euler's formula), 6 color theorem (and 5 color theorem).

Lecture 13 (Thu 5 Oct 06)

Graph Theory 4 (Graph algorithms), and application to clustering.

Problem Set 3


Fall Break (Tue 10 Oct 06)

Lecture 14 (Thu 12 Oct 06)

Finish Graph Theory 4 (Graph algorithms) Some comments on Erdös-Renyi random graphs, Milgram's "small world" experiment, and Collective dynamics of 'small-world' networks (Watts/Strogatz, 1998).

Lecture 15 (Tue 17 Oct 06)

More comments on small world networks: and brief discussion of quantitative methods applied to social networks, node and link centrality (browse sections 1.1, 1.2, 1.3.6, and 1.4 [pp 1-3, 14-25] of Who is the best connected Scientist? (Newman, 2003)).

Recapped brief history of the web, started to discuss Page rank algorithm.

See also historical background material:
1) Vannevar Bush, As We May Think, Atlantic Monthly, July 1945
2) The World Wide Web: A very short personal history, Tim Berners-Lee
3) The Anatomy of a Large-Scale Hypertextual Web Search Engine (Brin/Page, 1998), equivalent html (sections 1-3, pp 1-6)

Lecture 16 (Thu 19 Oct 06)

Continued discussion of Page rank algorithm and Perron-Frobenius theorem.

Lecture 17 (Tue 24 Oct 06)

Completed discussion of Page rank algorithm, discussed other features used in ordering heuristics, mentioned "Google bombing" (NYTimes, 22 Jan 04) and Authoritative sources in a hyperlinked environment (Kleinberg, 1998).
Comments on random graphs and Poisson distributions (problem 4 of midterm).

Midterm (24--31 Oct)

Lecture 18 (Thu 26 Oct 06)

Some additional comments on Poisson distributions.
Finite State Automata 1 (alphabet, DFA, regular sets)

Lecture 19 (Tue 31 Oct 06)

De Bruijn sequences, many examples of DFAs

Lecture 20 (Thu 2 Nov 06)

Finite State Automata 2 (NFA, subset construction) and example of subset construction.

Lecture 21 (Tue 7 Nov 06)

Comments on midterm election problem.
Finite State Automata 3 (Regular expressions)

Problem Set 4

Lecture 22 (Thu 9 Nov 06)

Connection between NFA's and regular expressions (examples). Construction of automaton for A* from automaton for language A. More examples of the regexp/NFA connection, examples of language parsing and number recognition. Colorless green ideas sleep furiously.

Lecture 23 (Tue 14 Nov 06)

Ultimately periodiocity and constraints on lengths of elements of regular sets. More on Chomsky transformational grammars and NFA's for gapped sequence alignment. Examples of regular grammars, example of context-free grammar (SCIgen - An Automatic CS Paper Generator).

Lecture 24 (Thu 16 Nov 06)

Started discussion of Markov Chains 1 (Basic definitions, transition matrix). Markov Chains 2 (Ergodic chains, stationary distributions)

Lecture 25 (Tue 21 Nov 06)

More on PageRank as a Markov process, "Mark V Shaney" and other examples generated from trigram probability distributions, worked example, genomics example from Durbin et al pp 46-57 (handout in class).
Examples of Markov chains to determine the expected number of coin flips to reach a given number of heads in a row.

Problem Set 5

Lecture 26 (Tue 28 Nov 06)

Introductory parts on Hidden Markov Models from A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner, 1989), 257-261
Set up example of Viterbi by hand.

Lecture 27 (Thu 30 Nov 06)

Finish example of Viterbi by hand.
Example of max probability state not on maximum probability path.
Example of CpG islands from Durbin et al pp 46-57 (handout).
Example from Bursty and Hierarchical Structure in Streams.


Final Exam

Take Home: available 2p.m. Tues 5 Dec, due 2 p.m. Fri 8 Dec (should take no more than a few hours...). Contact me by e-mail with any questions, and I'll have office hours on Wed afternoon, 2-4.


Background text for Lectures 1-5: Chpt 2, Sections 5.3,5.4, and Chpt 6 of Rosen, Discrete Mathematics and its Applications (6th edition), on reserve at engineering library, also used by cs 280. The textbook has an associated website with supplementary material. (These are the sections on Sets, Permutations and Combinations, Binomial Coefficients, Probability Theory, Bayes' Theorem, Expected Value and Variance. Note that in earlier editions of the book these sections were in Chpts 1 and 4.)
For lectures 6--8, see Rosen sections 6.3,6.4 (Bayes' Thm, Expected Value and Variance)
For lectures 10--13, see Rosen chapters 9,10 (Graphs, Trees)
For lectures 18--22, see Rosen sections 12.2-12.4


Advising Notes

This course is a required course for the new Information Science (IS) major in the College of Arts and Sciences and the College of Agriculture and Life Sciences. It can also be used to satisfy the math/stats course requirement for the Information Science minor/concentration.
INFO295 and CS280 may not both be taken for credit towards the IS major.