The geographical distribution of Katowice cultural events is one of the themes taken up at the workshop What Does Facebook Know about Culture? A Social Media Study. Based on data collected from the site, we have tried to pinpoint cultural event centres in Katowice and check whether they match the cultural zones and centres endorsed in mainstream media narratives. Additionally, we wanted to examine any seasonal variation in the distribution of cultural events and see the popularity of particular venues changed over the years. Last but not least, we tried to identify the event organisers in order to find out whether particular centres are led by public, private or non-governmental entities. Of course, the early project stages saw many more questions and topics to investigate. However, a preliminary data analysis and discussion of priorities led us to select the most relevant and worthwhile issues.
Obviously, in order to find answers to our questions and verify our research hunches, we had to delve into data. Our investigation was based on Katowice cultural event posts previously acquired through a public Facebook API.
Our initial goal was to set up and test a simple tool. In order to examine its development and probe its effectiveness, we decided to reduce the number of events analysed to those held in 2016 alone. This allowed us to speed up the process while making sure that the sample selected was sufficient to identify any possible problems or deficiencies and establish the workflow pattern.
We started by cleaning the data needed for exploration. From the existing database, containing a number of details about events published on Facebook by their organisers, we chose the data that was most relevant to our investigation, namely the organiser as well as the event location and date.
To find out if particular cultural centres are led by institutions or non-institutional entities, we introduced an additional typology to our existing database by splitting the event organisers into two groups: (1) public and (2) private and non-governmental. At this point it was necessary to manually enter the attributes, as depending on available Facebook pages categories was simply not accurate enough. Firstly, there is no such category as cultural institution. Secondly, to show the specific nature of their activities, page administrators often choose categories that do not quite correspond to the actual organisational status. For example, choosing the “location” category allows Facebook users to easily “check in” at a given venue, while the category of “community” can be a way to show a given entity’s reach.
Even a preliminary examination of the Facebook data revealed that not all of the selected events contained the location information. Some of the organisers deliberately or inadvertently ignored the event’s address field. Regardless of the organisers’ motivations, however, the incomplete data had to either be rejected or edited (subject to availability of location details). It was necessary not only to fill in the missing data, but also to check whether the places specified in the database are linked to specific geographical coordinates.
The geographic data analysis was the determining factor in choosing the exploration tools and technologies. The data we were interested in were about events taking place in Katowice. As the first step, we wanted to georeference the events we analysed. As georeferencing information for many events was hidden in the event description, we had to retrieve it using a purpose-written script. This way a simple database was created, including each event’s description, organiser information and geographic coordinates.
As a next step, we wanted select those cultural events from the database that took place in Katowice. For this purpose, we used a publicly available data set from the National Register of Boundaries and QGIS, a geographic information system application designed for, among others, collecting, processing and analysing spatial data, and creating maps. Based on the CSV file, we created a vector layer with points representing event venues. Then we cut out of it those which are held in Katowice, i.e. are located within a polygon representing the boundaries of the city. Finally, we saved the thus obtained data into the GeoJSON file.
Having developed a set of complete and cleaned data on events in Katowice – all duly categorised and featuring exact locations expressed as geographic coordinates – we were ready to show the distribution of our events on a map. For this purpose we used Leaflet, a JavaScript library which offers the functionalities of aggregating points into groups (with predefined compaction), adding labels to points (e.g. event name), and assigning different styles depending on the selected parameter (in our case, the map pin colour corresponding to the type of organiser).
Thanks to Leaflet, we were then able to put the database on the map and give it a desired style with data on the number of events and their organisers visualised in a clear and unfussy manner. Based on the map, we were able to study the distribution of “cultural spots” and search for the area in which the largest number of cultural events are aggregated. Thanks to appropriate styling, it was also easy to see the division of events into those organised by public entities and those delivered by NGOs and private promoters.
By adding additional attributes to the output data, it is possible to study various characteristics of the events in different ways, such as location variations depending on the season/month of the year. The developed model of operation is a prototype that will subject to further development in the coming months. The final product will illustrate e.g. the variability of “cultural centres” over the years.