Personal Information



Neeraj Agrawal

Date of birth

April 6, 1980



Marital Status





[email protected]


Microsoft R&D Software Pvt Ltd

Campus-I, Gachibowli,

Hyderabad- 500046







Bachelor of Technology 
Department of Computer Science and Engineering 
Indian Institute of Technology, Bombay India 

CPI (Cumulative Performance Index) : 7.9/10 

Full Record







 Date of Award

1998 to  2002

Indian Institute of Technology Bombay

Bachelor of Technology

Computer Science and Engineering


April 2002

1996 to 1998

Delhi Public School

Higher Secondary Certificate

Engineering Drawing

83% PCM

April 1998

Awards and Achievements

  • I, along with 3 others batch mates, won the 3rd prize in Eureka 2001 international student business plan competition held every year at IIT Bombay 
  • Secured a rank among the top 0.1% (102nd among more than 120,000 participants) in the Joint Entrance Examination (JEE-1998) for entrance into the Indian Institute of Technology.
  • Secured a rank 10th among more than 44,000 in the MP State Pre Engineering Exam (PET) 1998.
  • Bravo award for both of the projects I worked on at IBM India Research Lab.

My DBLP Entry  

  • TAP: A Platform for Enabling Enterprises to Develop Business Specific Text Analytic Applications”, Neeraj Agrawal, Scott Holmes, Sachindra Joshi, Sumit Negi, Accepted in COMAD 2005.
  • "A Bag of Paths Model for Measuring Structural Similarity in Web Documents, Sachindra Joshi, Neeraj Agrawal, Raghu Krishnapuram and Sumit Negi, SIGKDD - The Ninth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, Washington D.C., August 2003.
  • "EshopMonitor: A Web Content Monitoring Tool". Neeraj Agrawal, Rema Ananthanarayanan, Rahul Gupta, Sachindra Joshi, Raghu Krishnapuram and Sumit Negi,  ICDE 2004. (Industrial track)
  • "EShopMonitor: A Comprehensive data extraction tool to monitor web sites". Neeraj Agrawal, Rema Ananthanarayanan, Rahul Gupta, Sachindra Joshi, Raghu Krishnapuram and Sumit Negi, Published in IBM System Journal.
  • Dynamically Pluggable Filter Objects on AspectJ
    R. K. Joshi, Neeraj Agrawal
    AspectJ Implementation of Dynamically Pluggable Filter Objects in Distributed Environment,
    Proceedings of 2nd German Workshop on Aspect-Oriented Software Development at University of Bonn,Feb.21-22, 2002; in Technical Report no. IAI-TR-2002-1, University of Bonn, 2002.


  • A method for Finding Structural Similarity in Semi Structured Documents. Neeraj Agrawal, Sachindra Joshi, Raghuram Krishnapuram and Sumit Negi. Filed at US Patent office in June 2003.
  • Automated process for identifying and delivering domain specific unstructured content for advanced business analysis Dec 2004.
  • A System and a method for focused Re-Crawling of WebSites with Sachindra Joshi and Sreeram Balakrishnan, USPTO, July 2005


  • Algorithms and Data structures
  • Hypertext Information Retrieval, Databases,
  • System Architecture, Programming.

Professional Experience/Training




Arcot R&D Software Pvt Ltd

Software Engineer II

Dec2005-Till now

IBM India Research Lab

Member of Technical Staff

June 2002 – Dec 2005

WebTek Soft Pvt Ltd.

Summer Intern

May 2001-June 2001 (2 months)

Research and Projects

Security and Signing

    At Arcot I have been working on credential (like certificates, ArcotId, QnA, username password etc) management and document signing system.  This work is going on in collaboration with Adobe Acrobat Reader and will be the part of Acrobat 8.0 release. Using this PDF documents could be signed. I am working many aspect of this project. Technologies used Java, Web Services, and Relational databases.

Application Server Capacity Planning (July 2005- Dec 2005)

    Goal of this project to is to build the automated capacity planning tool which can dynamically suggest the allocation of hardware resources to different tiers and applications in a application server. IBM Tivoli Monitoring products can provide the performance statistics. There is another tool which can model the application and the hardware through Queuing models. Our aim is to build a prototype which can take the data from monitoring products and transform it into the format required by the modeling tool. We are in the design phase. We are a team of two and I am leading this project.

Bharti call record Data Warehouse Optimization (Apr – July)2005

    Bharti Telecom the biggest mobile service provider in India has a highly growing data warehouse. IBM Global Services has a outsourcing deal with Bharti for managing all backend operations. IGS was struggling to keep the cost of the data warehouse system manageable and was threatening the feasibility of the outsourcing deal. I along with two other team mates studied the whole design of the Call Record Data warehouse schema and proved that our recommendations will reduce the storage requirements by at least 25 % and improved performance to the similar extent. Expected saving from our recommendations is to be around a million dollars this year and in next 10 years projected saving to be around

Web Crawler: (Jan  2005- Dec 2005)

     Scaling of eShopMonitor Crawler to crawl up to 20 million web pages from 200 thousand documents. EShopMonitor crawler was a in memory crawler i.e. it kept all the URLs in memory. Hence memory was the bottleneck which prevented it from going beyond 200 k documents. I analyzed the whole code and serialized some data structure to disk without causing any decrease in the performance by building a in memory cache of the same.

          Currently work is going on to make to make it a focus crawler. Basically we are interested in crawling only the business news from the 1000 news websites (USA) on a daily bases. This is required for a project whose goal is to find the marketing leads for medium and small Enterprises in USA. We want it to be focused on business needs so that our hardware resource requirements do not bloats up.


WebFountain Appliance (WFSDK)(2003-2004)

Details of the project can be found at Worked in the team size of four.

IBM Web Fountain is an e-business on demand Innovation Services solution that collects, stores, and analyzes massive amounts of unstructured and semi-structured text. It is built on an open, extensible platform that enables the discovery of trends, patterns, and relationships from data. Web Fountain SDK is the standalone version which will be used as the gateway to access the core Web Fountain cluster. I am the owner of all the WF Appliance Web Services, On Topic Store Builder and few other tools on this platform. Currently most of the customers of IBM Web Fountain are being served through this platform.

eShopMonitor (2002- Dec 2005)

The aim of eShopMonitor project at IBM India Research Lab is to generate wrappers and monitor dynamically generated web pages. EShopMonitor consist of a crawler which simulates forms to crawl hidden web, a miner component which mines and tags the important fields and stores. It has a powerful query engine through which users can execute queries like what is the cheapest airline/least time taking route from New York to Washington among all airline (of course only if eShopMonitor is configured to monitor all airlines web site and they contain data). It maintains the (daily) versions of pages for queries like airline routes where price has changed in last 10 days (it would use flight number to identify same flights). Miner component was my responsibility. I helped in the design of crawler and query engine.
Currently eShopMonitor is helping two of the IBM biggest commercial sites (EMEA region)

B.Tech Project: Dynamically Configurable Filter Object Model (2001-2002)
Manager: Prof. R. K. Joshi (IIT Bombay)

We extend the Filter Object Model so that filter objects can be attached/detached at run time and filter classes not required during compilation of application. This capability can be used as an elegant way of achieving Dynamic Software Evolution. I implemented a prototype for Java using AspectJ.

B. Tech Seminar: Information Retrieval for Web (2000-2001)
Advisor : Prof. R. K. Joshi (IIT Bombay)

My B.Tech Seminar involved survey of various algorithms for ranking the documents on Web and identifying the one which will be suitable for search engine being developed for Indian Languages. It includes the Google's page rank, HITS etc.

Summer Project: Workflow Engine (2001)
Manager: Mr. Abhijits Sharma (Director of Itellix)

This project involved developing a fast, scalable and robust Workflow Engine. It was build on Weblogic server, Enterprise Java Beans (EJB), JMS, JAVA, XML (xalan and xerces), TOPLINK. It supported transactions, concurrent execution of business Workflows.  This engine simulated the workflow specification written in XML. It is powerful enough to simulate a specification for online graduate application program. Users can access it on the web. It is now being used by Dresdner Bank for leasing workflows. Because a workflow specification its XML specification can be written in a day to two, dresdner bank can create new leasing worlflow with changing costomers and economic condition. In two months I wrote approx 70% of the engine component which is 13,000 lines long.

Designing Database with low memory requirement(2001)
Course project at IIT Bombay

As a part of course project we developed a modified version of MySql where in the the Query Processing components were modified to ensure the restrictions in memory and persistent storage. We identified table joins as the principle memory consumer and change the recursive join (join (t1,..,tn-1, tn-2) = join (join(, tn)) used by MySql query engine to pipelined join. We tested it on Linux OS and found a large decrease in peak memory requirement and relatively slow execution as expected.(course project)

Image Search Engine (2001)  

Course project at IIT Bombay

Developed a prototype search engine to find images which are similar to the given one. Basic algorithm was to break image into parts (for example 4 adjacent pixels). Cluster them using EM clustering algorithm and rank the images on similarity score computed by matching the similar clusters in two images(course project).

Central Authentication Server (2000)

Course project at IIT Bombay

This involved setting up an authentication server aka Microsoft Passport. It involved designing the backend information systems and databases and setting up a fluid web interface for the same. All the communication between servers and clients, authentication server and servers was secured by DSA encryption. It was implemented using Apache, MySql, JAVA, and SSL. (course project))

FPGA-based 16-bit divider  interfaced with a 8085 Kit (2001)

Course project at IIT Bombay

We implemented the 16-bit divider logic on a FPGA kit and controlled the same from a 8085 kit that was interfaced with the FPGA.  It maximized performance by having concurrent evaluation of components and synchronizing them.(course project)

Software Exposure

Operating Systems

Linux, HP-UX, Solaris, Windows, WinNT.



Have also coded in C++, C, shell script, awk etc

Web Technologies

HTML, XML, J2EE, Web Services, Enterprise Java Beans.


Oracle, IBM DB2, MySQL, TopLink.

Postgraduate  Courses undertaken at IIT Bombay/Delhi

  •  At IIT Delhi: Randomized Algorithms, Quantum Computing and Information Theory, Parallel Computing, Game Theory, Proof and Types, Agent Technologies (AI).
  • At IIT Bombay:  Advance Database Management Systems (audit), Implementation Techniques For Relational Database Management System, Object Oriented Systems, Information Retrieval and Mining for Hypertext and the web.


I hereby certify that the particulars given herein are true to the best of my knowledge and belief.

Date: 11 Arpil, 2005
Place: Bangalore Neeraj Agrawal

free hit counter
Hosted by