INFO 295: Mathematical Methods for Information Science
Tue/Thu 2:55-4:10 PM Hollister 110
4 credits, S/U Optional
Co-requisite: Math 231 or equivalent
This course teaches basic mathematical concepts in information
modeling. Topics to be covered include graph theory, discrete
probability, finite automata, Markov models and hidden Markov models.
We will be guided by examples and applications from various areas of
information science such as: the structure of the Web, genome
sequences, natural languages, and signal processing.
Professor: Paul Ginsparg (301 College Ave, 5-7371,
ginsparg@cornell.edu)
Office hours: Wednesdays 2-3 PM (or by appointment)
Course website: http://www.infosci.cornell.edu/Courses/info295/2006fa/ (this page)
Note: This is the course site for Fall 2006
The course site for Fall 2007 is here
Lecture 1 (Thu 24 Aug 06)
Intro, overview of topics to be discussed. Background discussion of
Set
Theory.
(See also
INFO 295 Fall 2005)
Lecture 2 (Tue 29 Aug 06)
Probability and Counting 1
(Probability space, events, binomial coefficients, conditional probability).
See also counting examples.
Lecture 3 (Thu 31 Aug 06)
More on conditional probability: Bayes' Law and examples. (see also Bayesian inference)
Lecture 4 (Tue 5 Sep 06)
More on independent events, binomial distributions, random variables,
expectation and variance
(Probability and Counting 2)
Problem Set 1
Lecture 5 (Thu 7 Sep 06)
Cancelled due to conflict with
presidential inauguration.
Lecture 6 (Tue 12 Sep 06)
Continued
Probability and Counting 2 notes on independent events,
expectation and variance of Bernoulli distributions
Lecture 7 (Thu 14 Sep 06)
Some comments on
normal
distributions and the central limit theorem (see also), and review of exponentials and logarithms.
Lecture 8 (Tue 19 Sep 06)
Discuss www-admin spam,
introduce A plan for spam
(Graham, 2002)
(plus "naive Bayes"
and combining
probabilities).
(See also section 6.3 of Rosen, and "naive Bayes" document classification
and Bayesian
spam filtering.)
Problem Set 2
Lecture 9 (Thu 21 Sep 06)
Completed A plan for spam
(Graham, 2002)
and follow-up (Graham, 2003),
using more details
on "naive Bayes".
Lecture 10 (Tue 26 Sep 06)
Some final comments on combining probabilities
( p(L or R | O) vs p(O | L and R) ).
Start Graph Theory 1 (Basic definitions, isomorphisms, paths)
Lecture 11 (Thu 28 Sep 06)
Graph Theory 2 (Eulerian and Hamiltonian circuits)
(See also Hamiltonian circuit)
Graph Theory 3 (Trees, planarity, colorings)
Lecture 12 (Tue 3 Oct 06)
finish Graph Theory 3 (Euler's formula),
6 color theorem
(and 5 color theorem).
Lecture 13 (Thu 5 Oct 06)
Graph
Theory 4 (Graph algorithms), and application to clustering.
Problem Set 3
Fall Break (Tue 10 Oct 06)
Lecture 14 (Thu 12 Oct 06)
Finish Graph Theory 4 (Graph algorithms)
Some comments on Erdös-Renyi random graphs,
Milgram's
"small world" experiment, and
Collective dynamics of 'small-world' networks (Watts/Strogatz, 1998).
Lecture 15 (Tue 17 Oct 06)
More comments on small world networks:
and brief discussion of quantitative methods applied to social networks, node
and link centrality (browse sections 1.1, 1.2, 1.3.6, and 1.4
[pp 1-3, 14-25]
of Who is the best connected Scientist? (Newman, 2003)).
Recapped brief history of the web, started to discuss
Page rank algorithm.
See also historical background material:
1) Vannevar Bush, As We May Think, Atlantic Monthly, July 1945
2) The World Wide Web: A very short personal history, Tim Berners-Lee
3) The Anatomy of a
Large-Scale Hypertextual Web Search Engine (Brin/Page, 1998), equivalent
html
(sections 1-3, pp 1-6)
Lecture 16 (Thu 19 Oct 06)
Continued discussion of
Page rank
algorithm and Perron-Frobenius theorem.
Lecture 17 (Tue 24 Oct 06)
Completed discussion of
Page rank
algorithm, discussed other features used
in ordering heuristics, mentioned
"Google bombing" (NYTimes, 22 Jan 04)
and Authoritative
sources in a hyperlinked environment (Kleinberg, 1998).
Comments on random graphs and Poisson distributions (problem 4 of midterm).
Midterm (24--31 Oct)
Lecture 18 (Thu 26 Oct 06)
Some additional comments on Poisson distributions.
Finite State Automata 1 (alphabet, DFA, regular sets)
Lecture 19 (Tue 31 Oct 06)
De Bruijn sequences, many examples of DFAs
Lecture 20 (Thu 2 Nov 06)
Finite
State Automata 2 (NFA, subset construction) and
example of subset construction.
Lecture 21 (Tue 7 Nov 06)
Comments on midterm election problem.
Finite State Automata 3 (Regular expressions)
Problem Set 4
Lecture 22 (Thu 9 Nov 06)
Connection between NFA's and regular expressions
(examples).
Construction of automaton for A* from automaton for language A.
More examples of the regexp/NFA connection, examples of language parsing and
number recognition. Colorless green ideas sleep furiously.
Lecture 23 (Tue 14 Nov 06)
Ultimately periodiocity and constraints on lengths of elements of regular sets.
More on Chomsky transformational grammars and
NFA's for gapped sequence alignment.
Examples of regular grammars, example of context-free grammar
(SCIgen - An Automatic CS Paper
Generator).
Lecture 24 (Thu 16 Nov 06)
Started discussion of
Markov Chains 1
(Basic definitions, transition matrix).
Markov Chains 2 (Ergodic chains, stationary distributions)
Lecture 25 (Tue 21 Nov 06)
More on PageRank as a Markov process,
"Mark V Shaney" and
other
examples generated from trigram probability distributions,
worked example,
genomics example from Durbin et al pp 46-57 (handout in class).
Examples of Markov chains to determine the expected number of coin flips to
reach a given number of heads in a row.
Problem Set 5
Lecture 26 (Tue 28 Nov 06)
Introductory parts on Hidden Markov Models from
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner, 1989), 257-261
Set up example of Viterbi by hand.
Lecture 27 (Thu 30 Nov 06)
Finish example of Viterbi by hand.
Example of max probability state not on maximum probability path.
Example of CpG islands from Durbin et al pp 46-57 (handout).
Example from Bursty and Hierarchical Structure in Streams.
Final Exam
Take Home:
available 2p.m. Tues 5 Dec, due 2 p.m. Fri 8 Dec
(should take no more than a few hours...). Contact me by e-mail with any
questions, and I'll have office hours on Wed afternoon, 2-4.
Background text for Lectures 1-5: Chpt 2, Sections 5.3,5.4, and Chpt 6 of Rosen, Discrete Mathematics
and its Applications (6th edition), on reserve at engineering library, also used by
cs 280.
The textbook has an associated website
with supplementary material.
(These are the sections on Sets, Permutations and Combinations, Binomial
Coefficients, Probability Theory, Bayes' Theorem, Expected Value and
Variance. Note that in earlier editions of the book these sections were in
Chpts 1 and 4.)
For lectures 6--8, see Rosen sections 6.3,6.4 (Bayes' Thm, Expected Value and Variance)
For lectures 10--13, see Rosen chapters 9,10 (Graphs, Trees)
For lectures 18--22, see Rosen sections 12.2-12.4
Advising Notes
This course is a required course for the new Information Science (IS)
major in the College of Arts and Sciences and the College of
Agriculture and Life Sciences. It can also be used to satisfy the
math/stats course requirement for the Information Science
minor/concentration.
INFO295 and CS280 may not both be taken for credit towards the IS major.