Scott Farrar, Ph.D.
SUMMARY: Versatile technical skills in machine learning, data mining, pattern recognition, scientific computation, experimental design and analysis. Extensive experience building simulations and analytical tools for understanding intelligent systems, with over 5 years experience in Web Search. Broad background in both Computer Science and Cognitive Science. Work eligibility: US Citizen.
RELEVANT SKILLS:
Python, Java, C, R, SQL, XML, HTML, PostScript.
Linux, Hadoop, Map-Reduce, TreeNet, GBDT.
WORK EXPERIENCE:
Senior Research Engineer. Yahoo!, Inc. (formerly Inktomi). Sunnyvale, CA. 2003-present.
Senior member of the Machine Learned Ranking (MLR) Features Team at Yahoo Search. Toolbar Feature Project Lead: Yahoo Toolbar data log exploration, feature conception, extraction of relevant statistics, MLR validation and analysis, productionization including disk/memory budgeting and feature compression (TreeNet, GBDT). Performed experimental design for numerous relevance studies, created automated tools for measuring Discounted Cumulative Gain (DCG), statistical significance testing, comparative feature distribution visualization, result set data drill-down, etc. (Python, Java, R, HTML, XML). Exploratory data mining and statistics gathering using map-reduce on Yahoo's massively parallel Grid infrastructure. Systematic study of region/language features for US-English MLR function, resulting in significant relevance improvement and simplified search engine configuration. Project management/release troubleshooting for multiple MLR algorithm deployments.
Team lead for Paid Inclusion (PI) Relevance Team. Proposed, developed, implemented, and deployed the first MLR algorithms for PI ranking (PIMLR) at Yahoo. Immediate relevance and revenue (> $3M/year) improvements; enabled PI to benefit from future long-term relevance gains from MLR methodology. End-to-end ownership of project: initial problem formulation, training set gathering, experimentation and modeling, active learning, extensive offline and online relevance testing, measurement of user click behavior, infrastructural modifications to the search engine, deployment, troubleshooting, documentation and training.
Pioneered rigorous web-PI content blending methodology based on Dilution Testing and parameter optimization. Allowed first systematic understanding of interaction between PI and other web content during blending, greatly aiding PI business decisions. Evangelized Dilution Testing as a general means for measuring Search Engine behavior and performance. Proposed and evangelized new relevance metric (Rank Weighted Average) for Sponsored Search.
Software Engineer. QED Labs. San Jose, CA. 2002-2003.
Team member for a component-based software package to perform image analysis and 3D reconstruction of viruses from electron microscope photographs. Designed and implemented a system for dynamic application package activation. Proposed an object persistence layer using CORBA-based C++ introspection and a relational database backend (PostgreSQL). Refactored software to improve correctness and efficiency.
Data Mining Scientist. Digital Impact, Inc. San Mateo, CA. 1999-2001.
Lead developer of data mining and knowledge discovery technology. Designed and implemented high throughput database applications for dynamic targeting in electronic commerce (Java, Oracle 8i, SQL, PL/SQL, Unix). Technical lead for all project stages: initial client consultation, requirements gathering, technical design, project plan, engineering, QA testing, performance tuning, rollout to production environment, documentation of technology and process flow, and user training. Point person for architectural, design, feasibility, and data warehouse scalability questions.
Designed and implemented a content annotation and storage system for holding targeting metadata and hypertext content (XML, Oracle 8i). Analyzed performance of real-time production system database using SQL Trace. Aggressively optimized application queries using index hints, statistics, and application caching (PL/SQL, Java). Generated reports from ad hoc queries detailing important trends in client data sets. Presented algorithms and technology to both technical and non-technical audiences. Evaluated third-party technologies to answer buy-versus-build questions (Delano). Mentored junior engineers in various projects, instructing them in effective software engineering techniques.
OTHER WORK EXPERIENCE:
Research Assistant, Department of Cognitive Science. University of California, San Diego. La Jolla, CA. 1992-1999.
Investigated biologically plausible models of information processing in the sensorimotor cortex using machine learning techniques. Simulated multiple neural network architectures to analyze and solve bilateral coordination problems. Built robot motion planners using dynamic programming and gradient decent techniques to identify optimal movement trajectories. Optimized feedforward and recurrent neural networks to emulate the behavior of motion planners. Extensively analyzed neural network internal representations with custom visualization tools, and compared these to published experimental data. Published results in the journal Biological Cybernetics. Constructed all algorithms, simulations, and analytical and visualization tools from scratch (C++, Linux, Windows NT).
Teaching Assistant, Department of Cognitive Science. University of California, San Diego. La Jolla, CA. 1992-1999.
Course Topics: Artificial Intelligence, Neurological Development, Cognitive Neuroscience: Functional Neurobiology and System Neurobiology, Modeling Cognitive Phenomena, Introduction to Computing.
Research Intern. Teleos Research. Palo Alto, CA. 1994.
Proposed and constructed an artificial sound localization system inspired by that of the barn owl. Assisted in the construction of an in-house visual tracking system.
Research Intern. Fuji-Xerox Palo Alto Laboratory. Palo Alto, CA. 1993.
Applied a Hidden Markov Model part-of-speech text-tagger to extract syntactic content from document images without optical character recognition.
Engineering Intern. Apple Computer. Cupertino, CA. 1992.
Contributed to the database design and testing of an experimental educational application.
Research Intern. Applied Materials Japan. Narita, Japan. 1991.
Designed a research project to study the properties of thin titanium films on silicon and silicon dioxide wafers using collimated Physical Vapor Deposition. Performed clean-room experiments using electron and optical microscopy and presented results detailing recommended process conditions for customer use.
Programmer. Stanford Linear Accelerator Center. Stanford, CA. 1990.
Implemented a user interface to display particle detector readings as part of a programming team.
EDUCATION, HONORS, MEMBERSHIPS:
Ph.D. University of California, San Diego. Cognitive Science. 1999.
Dissertation: Neural Network models of the brain mechanisms of bilateral coordination.
Ford Foundation Predoctoral Fellowship
M.S. University of California, San Diego. Cognitive Science. 1994.
B.S. Stanford University. Computer Science. 1992.
Association for Computing Machinery (ACM)
SELECTED PAPERS:
Farrar, DS and Zipser, D. (1999) Neural Network models of bilateral coordination. Biological Cybernetics. 80(3):215-225.
Sibun, P and Farrar, DS. (1994) Content characterization using word shape tokens. Proceedings of the 15th International Conference on Computational Linguistics. 686-690.