http://www.cis.jhu.edu/~parky/Enron

Scan Statistics on Enron Graphs


An Enron email dataset has been made public by the U.S. Department of Justice.
The dataset is available here.
Other processed versions of the dataset are available here.


Our version of the data set is available here:
(please right click to download these files instead of browsing them!)
184 employee names as well as email addresses.
(time, from, to) tuples,
where "time" is in elapsed seconds since Jan. 1, 1970, and "from" and "to" are employee indices.
Please also note that the employee index starts from 0!
(time, from, receiver, tag),
where "tag" is 0 (to), 1 (CC), and 2 (BCC) for "receiver".
(time, from, receiver, topic),
where "topic" is assigned based on 3-means clustering of randomly selected 3,120 out of all 125,409 messages, then NN classification for the whole corpus.
(NB: Topic "0" means an outlier, e.g., too few words or all meaningless numbers in the message body, etc.)
(time, from, receiver, LDC_topic),
where "LDC_topic" is assigned based on Michael W. Berry's 2001 Annotated (by Topic) Enron Email Data Set. There are 32 topics.
(NB: Topic "0" means an outlier, e.g., too few words or all meaningless numbers in the message body, etc. Topic "-1" means there is no matching topic.)


[1] C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park, "Scan Statistics on Enron Graphs," Computational and Mathematical Organization Theory, Volume 11, Number 3, p229 - 247, October 2005, Springer Science+Business Media B.V.. (figures)

[2] C.E. Priebe, J.M. Conroy, D.J. Marchette, and Y. Park, "Scan Statistics on Enron Graphs," SIAM International Conference on Data Mining, Workshop on Link Analysis, Counterterrorism and Security, Newport Beach, California, April 23, 2005. (siamenron.pdf, the reprint)

[3] Gina Kolata, "Enron Offers an Unlikely Boost to E-Mail Surveillance," New York Times,, Week in Review, May 22, 2005. (Full article) ("Finding Patterns in Corporate Chatter")

[4] C.E. Priebe, "Scan Statistics on Enron Graphs," IPAM Summer Graduate School: Intelligent Extraction of Information from Graphs and High Dimensional Data,, UCLA, July 11-29, 2005. (cepssg072105.pdf) (video presentation, you may need the lastest version of RealPlayer.)

[5] C.E. Priebe, "Scan Statistics on Enron Graphs," 2005 Fall Department of Applied Mathematics and Statistics Seminars,, September 15, 2005, The Johns Hopkins University. (cepssgams2005.pdf)

[6] Y. Park, C.E. Priebe, D.J. Marchette, "Scan Statistics on Enron Hypergraphs,", Interface 2008, Durham, North Carolina, May 21, 2008, (hgraph-interface08-handout.pdf)

[7] Y. Park, C.E. Priebe, D.J. Marchette, "Anomaly Detection using Scan Statistics on Enron Graphs and Hypergraphs,", The Satellite Workshop of the IASC 2008 Conference, Seoul, Korea, December 1-3, 2008, (iasc-handout.pdf)

[8] Y. Park, C.E. Priebe, D.J. Marchette, A. Youssef, "Anomaly Detection using Scan Statistics on Time Series of Hypergraphs,", Workshop on Link Analysis, Counterterrorism and Security at the SIAM International Conference on Data Mining, Sparks, Nevada, May 1-3, 2009, (hyperenron.pdf, the reprint)

[9] Y. Park, C.E. Priebe, A. Youssef, "Anomaly Detection in Time Series of Graphs using Fusion of Invariants,", Computational and Mathematical Organization Theory, submitted, 2010.



  • Carey E. Priebe <cep AT jhu.edu>
  • John M. Conroy <conroy AT super.org>
  • David J. Marchette <dmarchette AT gmail.com>
  • Youngser Park <youngser AT jhu.edu>
    Last edit: April, 21, 2010 by Youngser Park