The program
The art-based program under study, known informally as the as the Acciona Program, is the The Promotion of Art in Education Program. Implemented in 2007 by the Chilean National Council for Culture and the Arts, in collaboration with the Ministry of Education and the Balmaceda Youth Art Foundation, in order to enhance the effectiveness of formal education by improving the quality of artistic and cultural education in the hours of free programming (i.e., afternoon free time), the program brings professional artists into public schools for one semester with each artist performing a weekly 90-min afternoon workshop. The program uses workshops in various artistic disciplines developed by skilled artists in the area, who are paid competitively and have been previously selected by open competition through a public-private institution, the Balmaceda Youth Art Foundation. The selected artists, with the support of a schoolteacher, works during the entire semester with a group of students. The novelty of the AP lies in its use of working artists as opposed to art teachers. It also aimed to improve the creative skills of students through art.
Although there are numerous organizations and initiatives that could have been assessed in the context of this study, the AP was selected chiefly because of its reputation in producing workshops of consistent quality. For this analysis, it was pertinent to only include initiatives that met certain minimum quality standards and that showed little variation in standards. Otherwise, finding significant effects would have been more difficult36,68. For example, considering the standard art or music classes offered in schools would introduce unworkable complexity into the study given the high level of heterogeneity in the quality and design of these courses12. In this regard, the AP offered a unique setting in which to study the effects of art-based programs on behavior and educational outcomes. Table 8 summarizes the sample of participating schools.
In some schools, the allocation of students to the AP workshops was undertaken by the school principal with no particular allocation rule and therefore randomly. Moreover, some students in the sample participated in just one or two workshops over the course of two years. Both these facts are exploited in the analysis. For example, imagine a student from class “A” in the 11th grade in year 0. In year 0 the principal allocates the AP program to class “A” during the first semester but to class “B” during the second semester in order to be fair to both classes. During the following year, however, the school principal allocates the AP to class “A” in the 12th grade (i.e., the same class that participated in the AP the year before) because of a particular scheduling restriction, and in the second semester the AP is assigned to a class that had not participated previously in the program (e.g., class “B” in the 11th grade of that year). In this case, the data collected on this school would present three distinct groups based on participation—namely, 2-year workshop participation, 1-year workshop participation, and no workshop participation.
Program procedure
The AP is offered as a part of regular afternoon schooling hours, when public schools typically allocate a range of alternative courses to students, referred to in this paper as workshops (The name of these courses in Spanish is talleres, which translates to workshop in English. Despite the fact that the talleres are undertaken during the school day, they would be regarded as something akin to after-school activities in a US context), and range in subject from advanced math to soccer. The fact that we have an active control group—that is, the students of comparison were doing something meaningful during “AP time”—works in favor of the findings, making them conservative compared to an AP versus no other course scenario.
Operationally, the AP workshops cover various artistic disciplines, such as painting, dance, music, etc. The curricula for the workshops are developed by professional artists with relevant expertise who are paid competitively and are selected through an open competition under the guidance of a public-private institution, the Balmaceda Youth Art Foundation (BYAF), which is the main institution that promotes out-of-school art education in Chile. The high reputation of BYAF and the attractive salary explain the capacity of the AP to attract excellent artists to the program. The selected artists, with the support of a schoolteacher, teach the 90-min weekly workshops over an entire semester with a single group of students.
The quality of the program and its consistency is supported by its design that consists of Acciona Formación, Acciona Asistencia Técnica and Acciona Mediación69. Acciona Formación consists of the modules that the artists in charge of the workshops must share with the teachers, in order to link contents, and implement pedagogical strategies. These modules are in charge of the Ministry of Cultures and provided by the University of Chile faculty.
An important point that is achieved through Acciona Asistencia Técnica is technical assistance, which makes it possible after a year of having support and advice provided by the Ministry of Cultures, the continuity in a sustainable and consistent manner to this orientation. Technical assistance consists of the support of an external advisor to the management of educational establishments to guide the issue of resources, design and implementation of the program.
Finally, Acciona Mediación, which consists of artistic–cultural mediation activities whose purpose is to create educational spaces that link the local cultural experience of students with global artistic–cultural experiences, through experimentation and appreciation of the various artistic manifestations—cultural to develop significant educational experiences that allow the achievement of learning, encourage appreciation of various artistic–cultural manifestations that contribute to promoting training processes and value and recognize local culture through artistic or cultural manifestations that disseminate regional heritage and/or national.
Data collection
The consistency of the AP enables the survey of four different high schools within which the program was implemented. In order to establish which schools would be considered, an initial screening of educational institutions was performed based upon the following exclusionary criteria: (i) the AP had been implemented in the school in or before 2009; (ii) the school was geographically accessible to the researchers; and (iii) the school was not directly affected by the 2010 tsunami and earthquake. Subsequently, every school that passed the initial screening was contacted in order to identify (1) the years of implementation of AP in each class (in order to get a sense of the intensiveness of treatment); (2) the criteria under which the administration had decided to implement the program (to avoid selection bias of more or less skillful students in the dimensions of this study); and (3) whether the school had another class in the same grade level that had not yet taken part in AP (in order to assess the possibility of generating suitable control groups).
Moreover, only those schools where students participated in AP purely because they belonged to a certain class or group of students (chosen by the principal) were selected, as opposed to because they possessed certain skills or characteristic behaviors (i.e., giftedness, or particular desires or aptitudes, etc.). The intention was to choose schools where the selection of students for the AP program was random—that is, the selection was not related to a particular dimension possessed by the selected students and participation in the program was not voluntary. Four educational institutions met the required conditions: three of the schools were located in the metropolitan area of Santiago, and one in the Los Rios Region.
The sample, both the treatment and control group participants, comprised a total of 297 students, 172 and 115, respectively, in 9th and 10th grades across the four schools (14–16 years old range). When making comparisons between treatment groups, we distinguished those students who were treated with one and two semesters of the program, a sample number that varies in the estimates due to the use of common support. We visited the schools and included all treated and controls in our analysis. In other words, all students from the cohorts that had gone through the program, whether they had participated or not, were taken into consideration.
The assessments using the evaluation booklet were conducted during regular school hours, with prior coordination with the Technical Pedagogical Unit (TPU) coordinator at each educational institution. To ensure the correct application of the criteria throughout the scoring process, the scores given by a team of three artists were monitored on a daily basis by a psychologist in order to assess the consistency of scoring between iterations. In addition, a second scoring process was conducted for each test. If the difference between the two sets of scores was greater than the standard deviation of the test scores, all the scoring of that day were rejected. For more details on the TTCT can be found in the Supplementary Information.
During the period of this study, from 2010 to 2011, informed consent forms were not typically utilized for non-invasive studies like the present, particularly when they were part of standard public policies integrated into the regular curriculum. Consequently, the Microdata Center at the University of Chile received an exemption from securing Institutional Review Board (IRB) approval and from the obligation of disseminating these types of consent forms to the students or their guardians. As previously noted, the methodology of this study incorporated the administration of surveys, executed within the confines of regular school hours and overseen directly by the school’s Head of the Technical Pedagogical Unit (TPU), referred to as the Jefe de la Unidad Técnica Pedagógica in Spanish and with the Principal´s approval. Essentially, these surveys were seamlessly integrated into the routine assessment framework, thereby serving as a conventional measure of achieving the educational objectives set out by the school’s curriculum.
Approach to creativity
To assess creativity, two forms of Torrance’s Test of Creative Thinking (TTCT) were used. A measure for fluidity, flexibility, and originality was obtained using a written version of TTCT, hereafter TTCTW. And a measure of 18 further dimensions of creativity was obtained using a graphical form of TTCT, hereafter TTCTG. Both forms will be discussed in the next section. All school academic scores were standardized to facilitate comparison between classes and schools.
Given that modern approaches, including the one underlying this study, conceive of creativity as a synthesis of both cognitive and noncognitive skills, the TTCT was deemed the most appropriate instrument for assessing the creativity of AP students and for assessing how creativity can be impacted by participation in the workshops. Furthermore, the TTCT is the most widely recognized and employed test of creativity and, more specifically, of divergent thinking27,28,29. Unlike the instruments employed in other tests—such as the Piers-Harris Self-Concept Scale (PHSCS)—the instruments used in the TTCT require a very sensitive method of scoring to effectively objectify certain subjective information.
Torrance’s test of creative thinking: graphical form
The graphical form of Torrance’s Test of Creative Thinking (TTCTG) is an instrument for measuring creative skills through graphical exercises. It is a psychometric instrument developed by Ellis Paul Torrance in 1966 and subsequently revised in 1988 and 1992 (Torrance, 1988, 1992), and in ref. 70. The methodology was translated and validated in Chile by Parra71 assessing seventh and eighth-grade students, and has only been validated for Chilean students in these two age groups. The results of this test are, however, valid for other ages provided that the conclusions are drawn from the comparison of results of similar groups, as is the case in this study.
Overall, the test evaluates eighteen skills. Five of these were considered to be fundamental drivers of creativity by Torrance, the creator of the instrument, and the remaining thirteen offer additional insights into the creative potential of a subject. The test consists of three sections, each of which evaluates some or all of these eighteen creative thinking skills. For this study, we applied activities two and three, which measure all of the eighteen skills that the test attempts to assess.
The five fundamental skills assessed in this exam, also referred to as norm-referenced measures (NRM), are fluency, originality, elaboration, the abstractness of title, and resistance to premature closing. Examples of fluency and elaboration are exhibited in Figs. 1 and 2, respectively. An individual’s overall score in this test is effectively the sum of these five areas. The results of the test are, however, also influenced by the thirteen remaining creative forces it evaluates (expressiveness, storytelling articulateness, movement or action, expressiveness of titles, synthesis of incomplete figures, synthesis of lines, unusual visualization, internal visualization, extending or breaking boundaries, humor, richness of imagery, colorfulness of imagery, and fantasy), also known as criterion-referenced measures (CRM). An example of a test with a high-scoring CRM component is shown in Figs. 3 and 4.
The thirteen CRM-evaluated creative forces are considered by the authors of this paper to be as important as the NRM; however, a response may contain a greater or fewer number (or none) of these creative forces. The score obtained from the creative forces of the CRM is a bonus added to the score obtained on the basis of the five NRM measures, resulting in the final score of the TTCTG.
The TTCTG was evaluated with a 20% rate of double correction. Thus, at the end of each day, two evaluators scored the same 20% of the surveys. If a difference greater than one standard deviation in the score of a particular dimension was found between the evaluators, all surveys scored were declared invalid, and all surveys collected on that day were scored again.
Torrance’s test of creative thinking: written form
The written form of Torrance’s Test of Creative Thinking (TCTTW) was first implemented in Chile by the firm Consultancy for Development in 200872. Following refs. 73 and 74, we created an instrument to measure three dimensions of creativity—namely, fluency, flexibility, and originality. The TCTTW consists of two open questions that assess the three dimensions.
For this study, Test Form 2 was randomly selected. It includes the following questions:
-
1.
What do you think might be the disadvantages of having a cell phone?
-
2.
What do you think could be the similarities between a carpet and a washing machine?
The answers were classified into categories and thematic fields. The number of categories appearing in an answer set determines a participant’s fluency score, and the number of thematic fields included defines the flexibility score. Finally, an originality score is assigned to the answers with a frequency less than or equal to 1% of respondents.
Each of the three dimensions evaluated yields a raw score. All scores are standardized so that the values are expressed in equivalent scales, and it is therefore possible to add them together to obtain a total score. Note that this procedure assumes that the three dimensions are equally weighted. The TCTTW was evaluated with a 100% rate of double correction.
Approach to self-concept: Piers-Harris scale
The Piers-Harris Self-Concept Scale (PHSCS) was created in 1967 and revised and amended in 1976 and 198475. The instrument was adapted and validated in Chile by Gorostegui75. for students between the third and sixth year of primary school education. Gorostegui created a version consisting of seventy questions or claims, each of which could be answered either with a yes or a no. The test provides an overall score as well as six sub-scores, which correspond to specific dimensions of self-concept. These include behavioral adjustment, intellectual and school status, physical appearance and attributes, freedom from anxiety, popularity, happiness, and satisfaction.
Final dataset
The questionnaire applied to schoolchildren collected sociodemographic information on sex and age, socioeconomic information such as the education of the parents (guardians) and household income scales, and other characteristics of interest about the household equipment and the activities carried out by the respondent. Information on TTCT creativity and self-concept (PHSCS) was collected using separate questionnaires. Finally, the information about academic performance in different subjects such as mathematics, language, arts and a general average, was obtained from the schools. All academic scores and scores on the TTCT and PHSCS tests were standardized by courses/institutions. The participants received no written consent form, but they were told that the participation in the study was voluntary by the researchers who were collecting the data.
Ethics
We have been granted an exemption from the Microdata Center at the University of Chile for an ethical review. The primary reasons for the waiver are twofold. First, the object of the study is a public policy program that was already in effect at the time of the research, thus, with the approval of the principal and all relevant authorities of the school. As such, the research does not involve any intervention or interaction with the participants beyond what is already occurring within the bounds of the public policy program. The research is essentially a passive evaluation of an ongoing policy. Second, our research methodology mainly involves non-invasive surveys that are distributed to the participants of the program. These surveys are carefully designed to be respectful of the participant’s time and privacy, avoiding any questions that could be deemed intrusive or irrelevant. The anonymity of the participants is fully preserved, and the data obtained are solely used for the purposes of this research. The surveys were created and administered with full cognizance of and compliance with the cultural norms and values prevalent in Chile. It is important to note that the waiver from the ethical review does not mean that the study is conducted without any regard for ethical considerations. On the contrary, our research is built upon a foundation of ethical guidelines, respect for the rights and welfare of the participants, and a commitment to maintaining their dignity and privacy. We have ensured that all aspects of the research adhere to the ethical standards set forth by the University of Chile and the broader scientific community.
Empirical strategy
Propensity Score Matching (PSM) was implemented to estimate the quantifiable impacts of the AP. The PSM method corresponds to a semiparametric estimator of the differences in the averages of the relevant outcome (e.g., TTCTG, TTCTW, PHSCS, GPA) between the AP participants and their control group. Specifically, PSM was constructed, using a binary probit model, to account for the estimated probability of a student participating in the program. In other words, a statistical clone of each participant was created based on the probability that they participated in the program given their socioeconomic status and demographic characteristics. Because of the inaccuracy of such measures in small samples, however, a matching method conditional on the vector of any given characteristic was not performed. Nevertheless, the propensity score transforms the characteristic vector of each individual into a scalar from 0 to 1.
To explain PSM formally, let us define the following relevant variables in the PSM method:
Y 0,i is the outcome variable f if the student did not participate in the AP. Di(0,1) is the treatment dummy, which takes the value 1 if student i participated and 0 otherwise. X is the student’s characteristics vector.
$${Prob}(X)={Prob}(D=1/X).$$
In addition, according to Rosenbaum and Rubin (1983), the following must hold:
$$0\,<\, {Prob}(X) \,<\, f,$$
(1)
$$(Y\,i0,Y\,i1)\perp {Di}/{Pro}(X),$$
(2)
where the constraint (1) ensures that the probability is well-defined, and (2) is known as “unconfoundedness” or selection based on observables, which is conditional upon student characteristics. That is to say, Eq. (2) indicates that conditional on the propensity score of the observable covariates the assignment to treatment should not correlate (i.e., that it is orthogonal, indicated by ⊥) with the outcome under either the AP or the control condition. This implies that the actual allocation to the AP was in fact random. Consequently, if both (1) and (2) hold, one can properly obtain the average impact of the AP through
$$\Delta (X)=E(Y1-Y0/{Prob}(X),D=1)=E(Y1/{Prob}(X),D=1)-E(Y0/{Prob}(X),D=1),$$
(3)
where the term E(Y0/Prob(X), D = 1) represents the outcome of interest for a student at a selected school assigned to the AP but who did not participate, which is impossible to observe. Specifically, we follow the Propensity Score Matching Kernel–Epanechnikov (bandwidth, 0.06.). For robustness, we also estimated other methods such as Propensity Score Matching 1 Closest Neighbor and Propensity Score Matching Radius 5 Closest Neighbors. The results were similar. Essentially, these three methods differ in the metrics and procedures performed as they create the aforementioned statistical clone in the control group, given the existence of a trade-off between bias and variance of the estimator of the average treatment impact76,77,78,79. Later, bias reduction after the matching analysis was performed.
Notably, in the existing literature, there is no consensus on the optimal number of variables for the estimation of a propensity score—that is, whether the model should be parsimonious or overparameterized80. According to ref. 81, an overparameterized model does not generate inconsistent estimates; however, it does increase the variance, causing inefficiency in the estimators. In addition, Heckman et al.82 show that omitting relevant variables also biases these estimates. Zhao83 indicates that it is necessary to omit any variables that affect treatment because they can be affected by or influence participation, which prevents the researcher from properly identifying the impact of a given treatment. Finally, Caliendo and Kopeinig80 state that economic theory, previous research in similar areas, and knowledge of incumbent institutions are effective guides for model selection. We therefore constructed a model including the variables that seemed most relevant given the parameters of this study.
The selection of variables used to define the propensity score of the students was based on (1) the findings in the literature regarding the variables most closely linked to variations in creativity, school performance, and socioemotional skills and (2) The existence of sorting in the Chilean public education system84. Whether or not students had participated in similar workshops was also taken into account because it can affect the results of both psychometric tests and academic outcomes.
Following this methodology, we considered student sex, the education level of both parents, family per capita income, school, and class, number of books at home, and whether or not the student had participated in extracurricular artistic activities inside or outside school before the implementation of the program. This information was derived from an initial survey and was verified with administrative data drawn from the participating schools.
Following ref. 51, the control variables in the regressions are characterized by the variable of interest that corresponds to the treatment dummy through AP workshop, a distinction according to sex with a dummy that takes the value of 1 for women, a group of covariates that characterize the status socioeconomic of the observation measured according to the education of both parents in years, and another group of variables that capture the investment of the parents in the Internet, books and artistic activities outside the school to which the observation is subjected. In the case of matching, the same guidelines are considered, characterizing by sex, socioeconomic variables (including income levels in addition to parental education) and additional investment variables.
As is to be expected, the present study has certain limitations. First, as Bryson et al.81 argue, the use of common support can distort measurement if the sample in question is small. Although common support was used in this study, we do not feel that this has adversely impacted the results. When the results for the case using common support were compared to those for the case without conditioning on common support, the impacts did not change substantially. Second, the standard deviations for impact estimates are considered to be overestimated. However, of key importance with regard to the propensity score is that it is well-defined and meets the quality conditions that were outlined previously. Regarding the correction of standard errors and given the unfavorable results presented in ref. 85, the standard errors of the estimators were not derived via bootstrapping. Consequently, as a measure of robustness, the impacts of participating in at least one semester of AP workshops and at least two semesters of AP workshops were estimated using the standard error correction and asymptotic distribution proposed by Abadie and Imbens76, which has been generalized for various functions in more recent studies by the same authors86. This procedure ensured that the statistical significance of our findings was robust to the correction of standard errors.
Matching quality
For this study, the methods suggested by Rosenbaum and Rubin87 were followed, which involved assessing the quality of the aforementioned matching methods by estimating the percentage reduction in the bias between the average standardized variables for both participants and controls. For each of the variables considered when estimating the propensity score, a standardized bias was calculated both before and after the matching process. This was defined as
$${{SE}}_{{Before}{Match}}=100* \frac{X1-X0}{\sqrt{0.5[V1\left(X\right)-V0\left(X\right)]}}$$
(4)
$${{SE}}_{{Before}{Match}}=100\frac{{X1}_{{match}}-{X0}_{{match}}}{\sqrt{0.5[{V1}_{{match}}\left(X\right)-{V1}_{{match}}\left(X\right)]}}$$
(5)
where X (V) represents the mean (variance) of the sample, and the subscripts 1 and 0 identify the treated and the control participants, respectively.
Doubly robust reweighted regression model
We also implemented a doubly robust reweighted regression model (DRRW)88. The main difference between this type of analysis and a standard linear regression model is that each observation under a DRRW is weighted by the inverse frequency of the propensity score. The weights for each individual (Wi) were estimated by the following equation:
$${W}_{i}=\frac{1-{D}_{i}}{1-\hat{Y}}+\frac{{D}_{i}}{\hat{Y}}$$
(6)
where Di is the treatment dummy variable, and Yˆ is the estimated propensity score, which follows the specification set out in Section Empirical Strategy.
A DRRW model was estimated for each outcome and intensity of treatment, controlling for the same observable characteristics as the PSM analysis. The main conclusion remains the same in terms of the significance of the main outcomes of this study—that is, overall GPA, math GPA, language GPA, and the 13 creative forces (CRM) are similarly impacted in either case. Moreover, it was possible to identify the significance and magnitude of other factors relevant in explaining academic achievement and creativity. These include socioeconomic status (i.e., the variable “family has a car”), additional educational resources (i.e., “family has computer at home”), cultural capital (i.e., “number of books at home” and “participation in extracurricular art activities”). Similarly as the PSM estimations, these variables have been widely used in the human capital literature that studying education and labor markets. This study was not preregistered and the data are not publicly available due to privacy restriction.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.