Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 222222
Loughborough University

Centre for Innovative and Collaborative Construction Engineering

2010

Dr Daniela Xhemali

Thesis

Automated Retrieval and Extraction of Training Course Information from Unstructured Web Pages

Project Title

Automated Retrieval and Extraction of Training Course Information from Unstructured Web Pages

Company

Limehurst House Limited, Apricot Training Management

Supervisors

Academic:
Professor Chris Hinde
Dr Roger stone

Industrial:
Ms Sue Concannon
Mrs Hilary Hale

Director of Research:
Professor Dino Bouchlaghem

Research Period

2006 - 2010

Automated Retrieval and Extraction of Training Course Information from Unstructured Web Pages

Context/Background

Apricot is an independent training and skills brokerage and provides Human Resource Development services to organisations across the East Midlands. The Company's aim is to improve business performance and competitiveness through the effective identification of needs, sourcing, administration and management of training and skills development programmes that meet the existing and future needs of employers.Daniela is a graduate in Computer Science, CV attached.

Aims and Objectives

The specific aims and objectives of make locating sources of information critical to its business more efficient by automating as much as possible using modified data mining and document summarisation technologies.

Method and Current Status

Between the two supervisors is a substantial body of knowledge about web systems, search engines, artificial intelligence and data mining. The project is staged so that there are usable outcomes fairly early on in the project. A more detailed project description is attached.

Benefits/Expected Outcomes

The benefits to the company will be in reducing the effort required to obtain information from training organisations. A successful outcome to the first stages of the project is likely to save the company substantial sums of money. The research outcomes will be publishable.

Xhemali, D., Hinde, C.J. and Stone, R.G. (2009) Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages. International Journal of Computer Science issues, 4(1), PP. 16-23.

Xhemali, D., Hinde, C.J. and Stone, R.G. (2010) Domain-Independent Genotype to Phenotype mapping Through XML Rules. International Journal of Computer Science Issues. 7(3), pp. 1-9.

Xhemali, D., Hinde, C.J. and Stone, R.G. (2007) Embarking on a Web Information Extraction Project. The UK 2007 Workshop on Computational Intelligence, London.

Xhemali, D., Hinde, C.J. and Stone, R.G. (2010) Generic Evolution of Regular Expressions for the Automated Extraction of Course Names from the Web. Proceedings of the 2010 International Confernce on Generic and Evolutionary Methods. Las Vegas, USA, pp. 118-124.

Xhemali, D., Hinde, C.J. and Stone, R.G. (2010) Genetic Evolution of ‘Sorting’ Programs Through a Novel Genotype-Phenotype Mapping. Proceedings of the 2010 International Conference on Evolutionary Computation. Valencia, Spain.

 

Search

Contact us

+44 (0)1509 222623

The Centre Administrator
CICE
Loughborough University
Leicestershire
LE11 3TU

Join us on Linkedin