WHICH METHODS TO DISCOVER PATTERNS IN TEMPORAL DATABASES?

 

Data mining processes allow us to find methods for discovering discriminatory patterns in temporal databases. There are several cases and situations where this is particularly relevant and applicable.

The Contrast Set Mining is based on discovering interesting patterns contrasting two or more groups, where each pattern is a Contrast Set: a set of attribute-value pairs that differ widely in their distribution between groups.

It is, therefore, a more specialized situation of models of Association Rules.

A proposed technique is the Rules for Contrast Sets (RCS) which seeks to express each contrast found in terms of rules. The main purpose of the work carried out was to extend this approach to a task of Temporal Data Mining.

To this end, a set of specific standards was developed to capture the statistically relevant changes over the established timeline.

To ascertain the accuracy of the proposal and its ability to find relevant information, it was applied to two distinct data sets:

 

  • One with the statistical performance of NBA players from the 40’s until 2009 to understand the evolution of positional differences over time in terms of contribution to the game;

 

  • Another fell on the labor market with data collected by the Ministry of Labor from 1986 to 2009 to find discriminatory patterns between genders and how these have changed over the years.

 

In the first case, it was possible to clearly ascertain the current trends of the game: greater outdoor playing capacity from the athletes, more differentiated individual contribution in each position, and clear increase in effectiveness, especially in indoor positions in recent years.

In the second case, it was concluded that factors such as higher education assumed higher prevalence in males in the 1980s, changing, afterward, in the new millennium to females with an increasingly dominant position.

It was also possible to verify differences between the sexes, especially in the highest salary percentiles, which are still a current reality.