# Selection of representative polling stations

## Description

In  every  electoral process the carrying out of surveys is very important, in order to make accurate predictions of the global electoral results. Usually, these polls are made on representative samples of the electorate, being these samples a wide  subset of the population. If it was possible to know which are the most representative polling stations, that is the ones whose electorate most accurately reflect the position of the overall electorate, then the surveys could be carried out  only in these representative districts. In such a way, the predictions of  the survey would be accurate enough, being its cost much lower.
Summing up, this problem consists on the determination of  a subset of the whole polling stations set, such as its deviation from the mean results is small enough. In this way , polls carried out on these specific electoral areas will predict the global results with a great accuracy.

## Instances :

An instance of this problem is presented here with available data corresponding to the last autonomous elections in Madrid, held on October 2003. Each example corresponds to a specific polling table,  having 22 atributes which indicate the number of votes obtained by each of the 22 political partyes that took part on the elections. The exact attributes description, as well as the 5865 examples corresponding to the totality of the polling tables are:

Attributes description
Original Data

The original data have been processed in order to organize them in polling sections instead of polling tables, resulting 3928 examples. These data organized in polling sections have been processed again in order to transform the number of votes of each political party in a relative value with respect to the electoral roll of the section. This relative value is a real number between 0 and 1 and it is calculated  dividing  the number of votes by the number of people registered in that section. Both transformed sets of data are available following this links:

Data of sections (absolute)
Data of sections (relative)

## Solutions to the problem:

This is not a well known problem and, thus, there are not well known solutions for it. We propose two approaches to solve it and present the obtained results for the instance of the problem given above.

Approach 1

We have used the Self-Organizing Map (SOM) algorithm created by Professor Teuvo Kohonen [Kohonen, 1982, Kohonen, 1990, Kohonen, 1995c, Kohonen et al., 1996b] to select the most representative sections. Basically, we have tried eight different bidimensional topologies of SOM: 5*5,  6*6, ... and 12*12 neurones. Each polling section is characterized by 22 significant attributes, which represent the relative values of the electoral results for each political party. Thus, 3928 22-dimensional vectors  are mapped into different bidimensional maps by grouping similar vectors (electoral sections) together. On the other hand, the vector corresponding to the global polling results is also mapped. The bidimensional region where this global vector is mapped is the selected region, being all the vectors mapped into this region the corresponding selected electoral sections.
The following links show the obtained results for each chosen topology. Besides, a summary with the a global measure of the error for each topology is given. The SOM topology is represented by N * N, being N the number of rows and columns of the bidimensional map.

Results:

SOM topologies:    5*5, 6*6, 7*7, 8*8, 9*9, 10*10, 11*11, 12*12

Summary for all topologies

Approach 2
Genetic Algorithms ......

Last Updated: 6/07/05                                                                               For any question or suggestion, click here to contact with us.