INFO 295: Mathematical Methods for Information Science
Tue/Thu 2:55-4:10 PM Hollister 110
4 credits, S/U Optional
Co-requisite: Math 231 or equivalent
This course teaches basic mathematical concepts in information
modeling. Topics to be covered include graph theory, discrete
probability, finite automata, Markov models and hidden Markov models.
We will be guided by examples and applications from various areas of
information science such as: the structure of the Web, genome
sequences, natural languages, and signal processing.
Professor: Paul Ginsparg (301 College Ave, 5-7371,
Office hours: Wednesdays 2-3 PM (or by appointment)
Course website: http://www.infosci.cornell.edu/Courses/info295/2007fa/ (this page)
Lecture 1 (Thu 23 Aug 07)
Intro, overview of topics to be discussed. Background discussion of
INFO 295 Fall 2006)
Lecture 2 (Tue 28 Aug 07)
Probability and Counting 1
(Probability space, events, binomial coefficients, conditional probability).
See also counting examples.
Problem Set 1 (due in class Thurs, 6 Sep 07)
Lecture 3 (Thu 30 Aug 07)
More on conditional probability: Bayes' Law and examples. (see also Bayesian inference)
Lecture 4 (Tue 4 Sep 07)
Law and examples.
More on independent events, binomial distributions, random variables,
expectation and variance
(Probability and Counting 2)
Lecture 5 (Thu 6 Sep 07)
Probability and Counting 2 notes on independent events,
expectation and variance of Bernoulli distributions
Problem Set 2 (due in class Thurs, 20 Sep 07)
Lecture 6 (Tue 11 Sep 07)
Some comments on
distributions and the central limit theorem (see also), and review of exponentials and logarithms.
Lecture 7 (Thu 13 Sep 07)
Discuss www-admin spam,
introduce A plan for spam
(plus "naive Bayes"
(See also section 6.3 of Rosen, and "naive Bayes" document classification
Lecture 8 (Tue 18 Sep 07)
Completed A plan for spam
and follow-up (Graham, 2003),
using more details
on "naive Bayes".
(Additional updates on this approach, including
and "white papers" can be found at
Lecture 9 (Thu 20 Sep 07)
Some final comments on combining probabilities
( p(L or R | O) vs p(O | L and R) ).
Start Graph Theory 1 (Basic definitions, isomorphisms, paths)
Suggested Tue 16 Oct as likely date for in-class midterm exam.
Problem Set 3 (due in class Thus, 4 Oct 07)
Added note: here is a hint for [optional] 5b
Lecture 10 (Tue 25 Sep 07)
Graph Theory 2 (Eulerian and Hamiltonian circuits,
see also Hamiltonian circuit)
Comments on De Bruijn sequences and magic tricks.
Example of Gray Coding from course text (pp 642-643).
Lecture 11 (Thu 27 Sep 07)
Graph Theory 3 (Trees, planarity, colorings, Euler's formula)
Lecture 12 (Tue 2 Oct 07)
(Comments on hint for [optional] 5b)
6 color theorem
(and 5 color theorem).
Started Graph Theory 4 (Graph algorithms).
Plus some comments on Milgram's "small world" experiment.
Lecture 13 (Thu 4 Oct 07)
Continue Graph Theory 4 (Graph algorithms), proof of minimum spanning tree algorithms,
and application to clustering.
Fall Break (Tue 9 Oct 07)
Lecture 14 (Thu 11 Oct 07)
Finish Graph Theory 4 (Graph algorithms).
Some comments on Erdös-Renyi random graphs,
Brief discussion of quantitative methods applied to social networks, node
and link centrality (browse sections 1.1, 1.2, 1.3.6, and 1.4
[pp 1-3, 14-25]
of Who is
the best connected Scientist? (Newman, 2003)), more on Milgram's "small world" experiment, and
Collective dynamics of 'small-world' networks (Watts/Strogatz, 1998).
Problem Set 4 (due in class Tues, 30 Oct 07)
Lecture 15 (Tue 16 Oct 07)
Lecture 16 (Thu 18 Oct 07)
Final comments on small world networks, clustering coefficient.
Comments on random graphs and Poisson distributions (problem set 4).
Some additional comments on Poisson distributions.
Lecture 17 (Tue 23 Oct 07)
(Graph of C(17,i)/217 : probability of only
four of seventeen last names in class starting in latter half of alphabet < ~10%)
Recap brief history of the web, start to discuss
algorithm, see also AMS feature column.
Example iterations from n=4 example at end of class.
See also historical background material:
0) W3C "Little History"
1) Vannevar Bush, As We May Think, Atlantic Monthly, July 1945
2) The World Wide Web: A very short personal history, Tim Berners-Lee
3) The Anatomy of a
Large-Scale Hypertextual Web Search Engine (Brin/Page, 1998), equivalent
(sections 1-3, pp 1-6)
Lecture 18 (Thu 25 Oct 07)
Continue discussion of
AMS feature column.
(Note: office hours shifted to Monday 2 pm, at 301 College Ave office for weeks
of 15,22,29 Oct)
Lecture 19 (Tue 30 Oct 07)
Complete discussion of
and Perron-Frobenius theorem
(for the mathematically inclined, see also The $25B Eigenvector).
Problem Set 5 (due in class Thu 15 Nov)
Lecture 20 (Thu 1 Nov 07)
Discuss other features used
in ordering heuristics, mention
"Google bombing" (NYTimes, 22 Jan 04)
sources in a hyperlinked environment (Kleinberg, 1998).
Begin Finite State Automata 1 (alphabet, DFA, regular sets)
Some additional resources (suggested by a student last year as useful, feel
free to forward others you've found):
Lecture 21 (Tue 6 Nov 07)
Continue Finite State Automata 1 (alphabet, DFA, regular sets),
examples of DFAs.
Lecture 22 (Thu 8 Nov 07)
State Automata 2 (NFA, subset construction) and
example of subset construction.
Finite State Automata 3 (Regular expressions)
Lecture 23 (Tue 13 Nov 07)
Ultimately periodiocity and constraints on lengths of elements of regular sets.
Connection between NFA's and regular expressions
Construction of automaton for A* from automaton for language A.
Lecture 24 (Thu 15 Nov 07)
More examples of the regexp/NFA connection, examples of language parsing and
number recognition. Colorless green ideas sleep furiously.
Started discussion of
Markov Chains 1
(Basic definitions, transition matrix), more on PageRank as a Markov process.
Problem Set 6 (due in class Thurs 29 Nov)
Lecture 25 (Tue 20 Nov 07)
Relation between Markov chains and random walks on weighted graphs,
genomics example from Durbin et al pp 46-57 (handout in class).
Examples of Markov chains to determine the expected number of coin flips to
reach a given number of heads in a row.
Lecture 26 (Tue 27 Nov 07)
"Mark V Shaney" and
examples generated from trigram probability distributions,
Markov Chains 2 (Ergodic chains, stationary distributions).
Introductory parts on Hidden Markov Models from
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner, 1989), 257-261
Set up example of Viterbi by hand.
Lecture 27 (Thu 29 Nov 07)
Finish example of Viterbi by hand.
Example of max probability state not on maximum probability path.
Example of CpG islands from Durbin et al pp 46-57 (handout).
Example from Bursty and Hierarchical Structure in Streams.
available 2p.m. Tues 4 Dec, due 2 p.m. Fri 7 Dec
(should take no more than a few hours...). Contact me by e-mail with any
questions, and I'll have office hours on Wed afternoon, 2-4.
Background text for Lectures 1-5: Chpt 2, Sections 5.3,5.4, and Chpt 6 of Rosen, Discrete Mathematics
and its Applications (6th edition), on reserve at engineering library, also used by
(These are the sections on Sets, Permutations and Combinations, Binomial
Coefficients, Probability Theory, Bayes' Theorem, Expected Value and
Variance. Note that in earlier editions of the book these sections were in
Chpts 1 and 4.)
For lectures 6--8, see Rosen sections 6.3,6.4 (Bayes' Thm, Expected Value and Variance)
For lectures 9--14, see Rosen chapters 9,10 (Graphs, Trees)
For lectures 21--24, see Rosen sections 12.2-12.4
This course is a required course for the new Information Science (IS)
major in the College of Arts and Sciences and the College of
Agriculture and Life Sciences. It can also be used to satisfy the
math/stats course requirement for the Information Science
INFO295 and CS280 may not both be taken for credit towards the IS major.