After performing a series of automatic actions and manually correcting the wrong names, we went on to the next step, which was to assign geographical coordinates to all addresses and locale names, in order to enable the visualisation of the content on the maps. For places other than Katowice, we were not interested in details such as district or street address, so we adopted two complementary solutions.
For post codes outside Katowice, we first matched them to relevant names of municipalities using Poland’s postal code database. In the case of small localities which do not have a separate post code, we provided the names of municipalities to which they belong (e.g. Sarnów = Psary). Then, using geographic coordinates database for Poland’s municipalities, we were able to identify and assign their geolocation data. One problem at this stage were duplicate municipality names within the country. For instance, each record containing Chrzanów ended up with two coordinates assigned, one for a town in Lesser Poland Voivodship and one for a village in Lublin Voivodship.
In the first place, we simplified the database by abandoning the isolation of neighbouring municipalities of the same name (e.g. rural commune of Siedlce and the isolated city of Siedlce). Next, after merging the above, we verified duplicate records based on post codes, thanks to which we knew that we were dealing with e.g. Chrzanów, Lesser Poland Voivodship, or Olsztyn, Varmian-Masurian Voivodship, as opposed to its much smaller namesake in Silesian Voivodship. We used a similar method when processing data from Facebook page statistics, where the name of voivodship features alongside the town/city name.
More problematic would have been cases of having just the name of the municipality or commune, e.g. Bobrowniki. Then, it was helpful to know the context in which the data was obtained. If it came from a survey at a local event, such as a small concert in a pub, it would most likely mean Bobrowniki in Silesian Voivodship rather than the one in Kuyavian-Pomeranian Voivodship. However, such an inference would be risky in relation to Katowice’s Off Festival, which gathers audiences from all over Europe.