Proyecto auspiciado por inflexa gana fondo de Fondecyt junto al DCC de la Universidad de Chile
Proyecto auspiciado por inflexa gana fondo de Fondecyt junto al DCC de la Universidad de Chile.
Carlos Hurtado será el investigador encargado.
El siguiente es el resumen en inglés del proyecto:
Título: Data Mining for Content Syndication in the Web
Investigador: Carlos Hurtado
Content syndication arises when one party makes its content available in the Web allowing other parties to pick it up and process it automatically and periodically. The syndicated content is published in channels (e.g., weblogs, digital communities, or media in the web) as RSS metadata, which is then collected by a class of software called aggregators or RSS feeds. This technology allows Web users to receive a continuous stream of fresh data of their interest including news, videos, pictures, etc.
Data Mining is broadly defined as the science of extracting useful knowledge from large datasets, which comprises clustering, classification, finding associations and correlation, information extraction and link analysis, among others. The goal of this project is to develop data mining techniques to improve tasks such as ranking, topic discovery, personalization, visualization and filtering of RSS data. Open problems are the development of vectorial and semantic models of RSS data and channels, the support of ranking, filtering and information extraction over streams of RSS data, the use of natural language processing techniques, the management of time and dynamic feature spaces, among others. The project will be based on data collected by Orbitando.com an RSS aggregator which currently collects data from more than 6000 channels on topics related to Chile.
Carlos Hurtado será el investigador encargado.
El siguiente es el resumen en inglés del proyecto:
Título: Data Mining for Content Syndication in the Web
Investigador: Carlos Hurtado
Content syndication arises when one party makes its content available in the Web allowing other parties to pick it up and process it automatically and periodically. The syndicated content is published in channels (e.g., weblogs, digital communities, or media in the web) as RSS metadata, which is then collected by a class of software called aggregators or RSS feeds. This technology allows Web users to receive a continuous stream of fresh data of their interest including news, videos, pictures, etc.
Data Mining is broadly defined as the science of extracting useful knowledge from large datasets, which comprises clustering, classification, finding associations and correlation, information extraction and link analysis, among others. The goal of this project is to develop data mining techniques to improve tasks such as ranking, topic discovery, personalization, visualization and filtering of RSS data. Open problems are the development of vectorial and semantic models of RSS data and channels, the support of ranking, filtering and information extraction over streams of RSS data, the use of natural language processing techniques, the management of time and dynamic feature spaces, among others. The project will be based on data collected by Orbitando.com an RSS aggregator which currently collects data from more than 6000 channels on topics related to Chile.
