Mark D. Smucker
Home
Publications
Curriculum Vitae
Software
Notes
 
 
 
APCorr
This is a set of R functions that implement Yilmaz et al.'s AP Correlation Coefficient (APCorr). We have extended the original APCorr measure to handle ties in the ranking by sampling and averaging values of APCorr for permutations of tied ranks.

This code was written for and used in the evaluation of the TREC 2013 Crowdsourcing Track. If you use this code, please cite:

Mark D. Smucker, Gabriella Kazai, and Matthew Lease, "Overview of the TREC 2013 Crowdsourcing Track," TREC 2013. [PDF].

Download code: apcorr.r

Paired Randomization Test
This is a perl script implementation of Fisher's two-sided, paired randomization test. More details about the test can be found in:

Mark D. Smucker, James Allan, and Ben Carterette. "A Comparison of Statistical Significance Tests for Information Retrieval Evaluation," CIKM '07, Lisboa, Portugal, 2007. [PDF]

I also have a C++ version of this code, but it is not quite ready for a public release. If you run into any issues with this code, please let me know. If you use this script and publish results, please cite the CIKM paper rather than this web page.

The script takes as input the relational output of trec_eval with the -q option to produce metrics for all queries. More usage information is available at the top of the script.

Download: paired-randomization-test-v2.pl

ContactCollector
In the summer of 2003, I wrote a piece of software called ContactCollector that scans through my email messages in Microsoft Outlook and collects the names and email addresses of the people with whom I've had correspondence over the years. ContactCollector then gives you the opportunity to add the email addresses to your Outlook contacts.

ContactCollector has many rough edges and comes with no gauruntee that it won't destroy all your email and contacts etc., but it works for me! It requires .Net 1.1 and was written to work with Outlook 2002, but I hope it works with Outlook 2000 and Outlook 11 as well.

Please let me know what you think by writing to . I personally discovered that what I really wanted was a means to search for email addresses rather than collect them. I have not included source code.

Download: ContactCollectorSetup.zip (size 710 KB)

SharpAIM / SharpTOC
SharpAIM is an example IM program that uses the C# library SharpTOC to communicate on the AOL IM system using the TOC Protocol. It is an improved port of Jeff Heaton's JavaTOC. SharpTOC should make writing a bot (an AIM bot) quite a bit easier in C#. The download includes all source code.

Download: SharpAIM.zip (size 1.6 MB)

DocViewer
DocViewer is a small program that reads in the .docs files being used in the 646 IR class and provides a simple interface to display those documents in a text pane. (It also works for the .dat trec format files used by CIIR.) DocViewer also allows one to read in a results file produced by lemur and easily view the results.

DocViewer is written in C# and runs on the .Net 1.1 platform (for Windows unless Mono works). You'll have to install .net 1.1, but that is easier than installing java. The setup will prompt you to install .net 1.1 if you don't have it installed.

You can get DocViewer at: http://www.cs.umass.edu/~smucker/646/DocViewer.zip

It includes the source in case you want to make it better (or make it work if it doesn't for you). At the moment, it wants a results file in simple, not trec format.