Dan Simonson is Principal Computational Linguist at BlackBoiler, working to improve the automation of contract negotiation. He completed his PhD in computational linguistics at Georgetown University in the Department of Linguistics. He's most interested in using linguistics and natual language processing to take objective and interesting slices of the real world to yield insight and understanding and, if possible, to use natural language processing to solve problems of direct material value. In particular, this has led him to pursue problems related to narrative schemas, information extraction and retrieval, semantic modality, critical discourse analysis, and applying these topics to one another.
Dan defended his dissertation, Investigations of the Properties of Narrative Schemas, on November 17th, 2017, and the dissertation was accepted by the grad school on April 25th, 2018. His dissertation committee consisted of Tony Davis, Amir Zeldes, and Nate Chambers.
He also feels very weird writing about himself in the third person and will cease immediately.
Use my name at gmail.com. (I will respond to you from a different address that the first one forwards to. Let me know if you have some kind of filter that requires me to respond from the first one.)
Bash Your Way into Bash is a tutorial for using bash. It's for people who have no experience using command line interfaces.
ft is a library for dealing with lists of dictionaries in Python. It makes counting and finding things eas(y|ier). You can get ft off of pip (pip install ft). I've made an official page here with some code examples; the github repo is here.
SPyTS is a tool for scraping tweets. For us plebs who don't have firehose access to Twitter, It spreads queries out over as evenly as possible of a period and prevents exceeding Twitter's API rate limits. SPyTS is available on github.
Contract negotiation is a lengthy and expensive process. Much of this process is actually repetitive. The text exhibits less variation than newswire, and often lawyers make the same edits to contracts over-and-over. This sort of repetition is of the kind that is ripe for automation using natural language processing. However, this has not come without challenges. Much existing NLP technology has been developed to be used with millions or billions of documents, not tens or hundreds. Additionally, NLP has often been applied to synthesize insights from data---to create new outputs---not to perform precision work on documents with minimal changes.
These problems have required the team at BlackBoiler to develop new NLP technologies for the advanced nature of the work we are doing; I've been with BlackBoiler since May 2015---full-time since May 2018---to to discover properties of contract documents that enable automation and then to do it. To begin our efforts, we were awarded an NSF SBIR Phase I grant (1721878). The tools I contributed have reshaped our contract review and automation process, placing us deterministically above competing products in terms of both capabilities and precision, and BlackBoiler was recently awarded with our first patent for these innovations.
Simonson, D. (2021). Supervised Identification of Participant Slots in Contracts. In the Natural Legal Language Processing (NLLP) Workshop 2021, EMNLP 2021, Punta Cana, Dominican Republic. [paper]
Simonson, D.E., Herr, J., Avant, J.T., Riedel, G.P., Broderick, D.P. (2020). Systems, Methods, and Computer Program Products for Slot Normalization of Text Data. US10614157 [patent]
Simonson, D., Broderick, D.P., and Herr J. (2019). The Extent of Repetition in Contract Language. In the Natural Legal Language Processing (NLLP) Workshop 2019, NAACL 2019, Minneapolis, MN. [paper] [slides]
Simonson, D.E., Avant, J.T., Riedel, G.P., Herr, J., Broderick, D.P. (2019). Systems, Methods, and Computer Program Products for a Clause Library. US10311140 [patent]
A lot of what we know is grounded in stories. Some stories tend to repeat themselves, and once they do enough, it's been hypothesized that we genericize the stories into something called a script or narrative schema. I work to extract this type of world knowledge from language data and apply it to problems that would otherwise be inaccessible to quantitative analysis.
My dissertation focused on the extraction of these sorts of schemas---particularly leveraging and improving techniques pioneered by Nate Chambers and Dan Jurafsky---to understand their distribution and properties, and how to apply this knowledge to practical, real-world tasks.
You can find more information on this thread of research here.
Simonson, D. and Davis, A. (2021). Narrative Homogeneity and Heterogeneity in Document Categories. In Computational Analysis of Storylines: Making Sense of Events. Eds. Caselli, T., Hovy, E., Palmer, M., and Vossen, P. Cambridge University Press. [chapter]
Simonson, D. and Davis, A. (2018). Narrative Schema Stability in News Text. In COLING, Santa Fe, NM. [paper]
Simonson, D. (2017, November). Investigations of the Properties of Narrative Schemas. Doctoral Dissertation. Advisor: Davis, A. R. Committee: Zeldes, A. and Chambers, N. Georgetown University, Washington, D.C. [dissertation] [Dissertation Defense Slides]
Simonson, D. and Davis, A. (2016, November). NASTEA: Investigating Narrative Schemas through Annotated Entities. In the Second CnewS Workshop, EMNLP 2016, Austin, TX. [paper] [Workshop Slides] [DCNLP Slides]
Throughout my time at Georgetown, I have been involved in a project building a theory and corpus of gradable modal expressions [NSF-funded, BCS-1053038]. Modal expressions are those that express possibilities. They span all parts of speech.
I played a number of roles on this project. During the experimental aspects of the project, where the annotation guidelines were developed and tested, I was responsible for reporting interannotator agreement scores. During the corpus construction component, I built and maintained a cross-platform tool for adjudicating annotator output. Throughout both stages of the project, I managed data as it flowed between phases of the project.
Rubinstein, A., Harner, H., Krawczyk, E., Simonson, D., Katz, G., and Portner, P. (2013). Toward Fine-grained Annotation of Modality in Text. In Proceedings of the Tenth International Conference for Computational Semantics (IWCS 2013). [Paper]
Simonson, D., Rubenstein, A., Chung, J., Harner, H., Katz, E.G., Portner, P. (2012, February). Categorizing Modals with Amazon Mechanical Turk. In the Proceedings of the Mid-Atlantic Colloquium of Studies in Meaning (MACSIM 2012). [Poster]
Zeldes, A. and Simonson, D. (2016, August) Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality. In the Proceedings of LAW X: 10th Linguistic Annotation Workshop, Berlin. [paper]
Sierra, S., Simonson, D. (2014, October). Gender and cool solidarity in Mexican Spanish slang phrases In the Proceedings of New Ways of Analyzing Variation 43. Chicago, IL. [Slides from NWAV Presentation]
During my undergrad, I was a physics major and participated in astronomy research under the guidance of Harold Butner. I presented posters in two annual meetings of the American Astronomical Society as results of this research. These proceedings derived from two separate projects involving the DEBRIS target set, a search for binary stars using the Herschel Space Observatory. The first was a search for estimates of stellar age in the literature of our project's target stars; the second reported preliminary results of an observing run in the infrared identifying candidate binaries.
Simonson, D. E., Butner, H. M., Trelawny, D. T., Evans, C. M., Duchene, G., Rodriguez, D. R., ... and DEBRIS, T. (2010, January). Searching for Previously Unresolved Binaries in DEBRIS Survey Target Stars. In Bulletin of the American Astronomical Society (Vol. 42, p. 400). [Poster]
Butner, H. M., McCauley, P., Simonson, D., Matthews, B., Greaves, J. S., Duchene, G., ... and Zuckerman, B. (2009, January). Stellar Ages Of The Debris Sample Stars. In Bulletin of the American Astronomical Society (Vol. 41, p. 209). [Poster]
Pluto never should have been planet.
My first website was D's C&c Page. I lost the whole thing when, for some reason, Geocities decided I violated their TOS. No explanation was given.
“We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.”