INFO 295: Mathematical Methods for Information Science

Tue/Thu 2:55-4:10 PM
Information Science Building, 301 College Ave, Room 130
4 credits, S/U Optional
Co-requisite: Math 231 or equivalent

This course teaches basic mathematical concepts in information modeling. Topics to be covered include graph theory, discrete probability, finite automata, Markov models and hidden Markov models. We will be guided by examples and applications from various areas of information science such as: the structure of the Web, genome sequences, natural languages, and signal processing.


Professor: Paul Ginsparg (301 College Ave, 5-7371, ginsparg@cs.cornell.edu)
Office hours: Wednesdays 2-3 PM (or by appointment)
Course website: http://courses.cit.cornell.edu/cis295_2004fa/ (this page)

Note: This is the course site for Fall 2004
The course site for Fall 2007 is here

Lecture 1 (Thu 26 Aug 04)

Intro, overview of topics to be discussed. Background discussion of Set Theory.
(See also CIS 295 Spring 2004)

Lecture 2 (Tue 31 Aug 04)

Probability and Counting 1 (Probability space, events, binomial coefficients, conditional probability)

Lecture 3 (Thu 2 Sep 04)

Probability and Counting 2 (Bayes law, random variables, expectation)

Lecture 4 (Tue 7 Sep 04)

Continued Probability and Counting 2, discussed www-admin spam, introduced A plan for spam (Graham, 2002).

Problem Sets 1,2

Lecture 5 (Thu 9 Sep 04)

Finish Probability and Counting 2, continue A plan for spam (specifically "naive Bayes" and combining probabilities).

Lecture 6 (Tue 14 Sep 04)

Finish A plan for spam, plus more details on "naive Bayes".

Lecture 7 (Thu 16 Sep 04)

Some final comments on combining probabilities ( p(L or R | O) vs p(O | L and R) ).
Started Graph Theory 1 (Basic definitions, isomorphisms, paths)

Lecture 8 (Tue 21 Sep 04)

More Graph Theory 2 (Eulerian and Hamiltonian circuits)
(See also Hamiltonian circuit)

Lecture 9 (Thu 23 Sep 04)

More Graph Theory 3 (Trees, planarity, colorings)

Problem Sets 3,4

Lecture 10 (Tue 28 Sep 04)

More Graph Theory 4 (Graph algorithms)
Mentioned Vannevar Bush, As We May Think, Atlantic Monthly, July 1945

Lecture 11 (Thu 30 Sep 04)

The World Wide Web: A very short personal history, Tim Berners-Lee
The Anatomy of a Large-Scale Hypertextual Web Search Engine (Brin/Page, 1998), equivalent html

Lecture 12 (Tue 5 Oct 04)

internet != www. Illustrated Perron-Frobenius theorem, continued Brin/Page (1998), illustrated calculation of page rank and other features used in ordering heuristics, mentioned "Google bombing" (NYTimes, 22 Jan 04)

Lecture 13 (Thu 7 Oct 04)

Authoritative sources in a hyperlinked environment (Kleinberg, 1998)
digression on small world networks: Collective dynamics of 'small-world' networks (Watts/Strogatz, 1998)
Quantitative methods applied to social networks. See Who is the best connected Scientist? (Newman, 2003)

(12 Oct 04, fall break)

Lecture 14 (Thu 14 Oct 04)

Finite State Automata 1 (alphabet, DFA, regular sets)

Lecture 15 (Tue 19 Oct 04)

Some comments on recessive X-linked carrier probabilities (e.g., Duchenne Muscular Dystrophy) and relation to problem of casino with some loaded dice. Discussion of counting equal heads/tails coin flips using Stirling's formula (anticipating the voting problem).

Lecture 16 (Thu 21 Oct 04)

Finite State Automata 2 (NFA, subset construction)

Lecture 17 (Tue 26 Oct 04)

midterm

Lecture 18 (Thu 28 Oct 04)

more on Finite State Automata 2 (NFA, subset construction)
(comment on analog vs digital computers), example of subset construction

Lecture 19 (Tue 2 Nov 04)

Finite State Automata 3 (Regular expressions)

Lecture 20 (Thu 4 Nov 04)

review of some midterm answers. connection between DFA's and regular expressions, DFA's for language parsing

Problem Sets 5,6

Lecture 21 (Tue 9 Nov 04)

review rest of midterm answers. construction of automaton for A* from automaton for language A.

Lecture 22 (Thu 11 Nov 04)

ultimately periodiocity and constraints on lengths of elements of regular sets. more examples of language parsing

Lecture 23 (Tue 16 Nov 04)

continue examples of language parsing. completion of finite state automata and regular expressions (examples). start discussion of Markov Chains 1 (Basic definitions, transition matrix)

Lecture 24 (Thu 18 Nov 04)

introductory parts from A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner, 1989), 257-259
See also "Biological Sequence Analysis", by Durbin, Eddy, Krogh, Mitchison (ISBN: 0521629713) pp 46-48 (Class handout)

Lecture 25 (Tue 23 Nov 04)

6 color problem
Markov Chains 2 (Ergodic chains, stationary distributions)
Start HMMs from Rabiner review, pp 259-261 (linked from previous lecture)

Problem Set 7

Lecture 26 (Tue 30 Nov 04)

Continue HMMs from Rabiner review, pp 261-263 (linked from lecture 24)

Lecture 27 (Thu 2 Dec 04)

Worked out example of Viterbi by hand.
More comments on HMMs from Rabiner review, pp 263-264 (and sketched pp 275-284) and Durbin et al pp 48-57 (earlier class handout).

Final Exam

Take Home: available noon Tues 7 Dec, due 1 p.m. Fri 10 Dec (should take no more than a few hours...)


Background text for Lectures 2,3: Chpt 4 of Rosen, Discrete Mathematics and its Applications (on reserve at engineering library, also used by cs 280)

Background text for Lectures 14,16,18,19: D. Kozen, Automata and Computability, pp 1-54.

Background text for Lectures 23-25 Sheldon Ross, Stochastic Processes


Advising Notes

This course is a required course for the new Information Science (IS) major in the College of Arts and Sciences and the College of Agriculture and Life Sciences. It can also be used to satisfy the math/stats course requirement for the Information Science minor/concentration.
INFO295 and CS280 may not both be taken for credit towards the IS major.