Getting stakeholders acquainted with 
the rationale behind the construct of the 

English language pro� ciency test 
of the University of Costa Rica for the 
Ministry of Education of Costa Rica

Un acercamiento al constructo de la prueba de 
dominio lingüístico del idioma inglés desarrollada 

por la Universidad de Costa Rica para el Ministerio 
de Educación de Costa Rica

Recepción: 27 de mayo de 2021
Aceptación: 21 de septiembre del 2021
doi: 10.22201/enallt.01852647p.2022.75.1013

Walter Araya Garita
Universidad de Costa Rica, 

Facultad de Letras, 
Escuela de Lenguas Modernas
walter.arayagarita@ucr.ac.cr

José Fabián Elizondo González
Universidad de Costa Rica, 

Facultad de Letras, 
Escuela de Lenguas Modernas
josefabian.elizondo@ucr.ac.cr

Ana Carolina González Ramírez
Universidad de Costa Rica, 

Facultad de Letras, 
Escuela de Lenguas Modernas

ana.gonzalezramirez@ucr.ac.cr


[ 120 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Abstract
This paper collects evidence to build a 
content-validity argument of the lan-
guage proficiency test of the University 
of Costa Rica for high school students 
by analyzing the theoretical founda-
tions supporting the construction and 
administration of a custom-made lan-
guage test and its localized context, 
including a description of the test and 
the input of external stakeholders. Ad-
ditionally, this paper provides sugges-
tions and recommendations for future 
testing experiences of this type, while 
paving the way for researchers who in-
tend to follow this area of interest.

Keywords: language proficiency; 
standardized testing; content valid-
ity; localization; foreign language 
assessment

Resumen
Este artículo tiene como objetivo re-
copilar evidencia para construir un ar-
gumento de validez de contenido acer-
ca de la prueba de dominio lingüístico 
de la Universidad de Costa Rica para 
estudiantes de secundaria, a través  
de un análisis de los fundamentos teó-
ricos que sustentan la construcción y 
administración de una prueba de idio-
ma hecha a medida para un contexto 
localizado. El documento incluye una 
descripción de la prueba y las aporta-
ciones de las principales partes intere-
sadas. Al mismo tiempo, este artículo 
proporciona sugerencias y recomen-
daciones para futuras experiencias en 
pruebas de este tipo, mientras que alla-
na el camino para futuros investigado-
res que pretendan continuar con estos 
esfuerzos académicos.

Palabras clave: dominio lingüístico; 
evaluación estandarizada; validez de 
contenido; localización; evaluación 
de lengua extranjera


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 121 ]

1. Introduction

In 2016, the Costa Rican Ministry of Education (Ministerio de 
Educación Pública; mep, for its abbreviation in Spanish) modifi ed 
the English curriculum nationwide. In a national effort coined as 
Alliance for Bilingualism, the mep implemented multiple chang-
es in the English programs to meet cultural, societal, and fi nan-
cial demands as a strategy to transform Costa Rica into a bilingual 
country, which in turn would attract investors, generate jobs, re-
vitalize the economy, and foster study opportunities abroad (Azo-
feifa, 2019). Additionally, in 2019 the mep decided to eliminate 
their traditional national English tests, which were pass-or-fail, 
reading comprehension multiple-choice tests. Before 2019, senior 
high school students were required to obtain a mark of at least 70 
out of 100 to be eligible for graduation. Students failing to meet 
this score would require taking this test as many times as needed 
to achieve the minimum pass score, impeding them from starting 
college or getting a job. However, instead of administering the 
traditional test, the mep opted for administering a language profi -
ciency test, which is diagnostic in nature as defi ned by Brown and 
Abeywickrama (2019: 10). This new test will not evaluate content 
but the English performance of students based on the Common Eu-
ropean Framework of Reference for Languages (cefr descriptors) 
(Cordero, 2019: November 29).

Despite the multiple language certifi cations currently avail-
able, none seems to meet mep’s specifi c requirements and needs. 
First, the test administration and publication of its results must 
consider mep’s annual schedule, which requires test administrators 
to carry out all related activities within a very tight time window. 
Second, the uneven distribution and availability of resources have 
introduced challenges for public education institutions. Finally, 
mep’s economic restrictions should also be considered when choos-
ing between the different options for English certifi cation.

In an attempt to address these needs, the School of Modern 
Languages (Escuela de Lenguas Modernas; elm, for its abbrevia-


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 122 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

tion in Spanish) of the University of Costa Rica (ucr) decided to 
create a more locally-sensitive option called Language Profi ciency 
Test (prueba de dominio lingüístico; pdl). Over the past 30 years, 
the elm has accumulated vast experience in the design and ad-
ministration of a variety of reliable language tests providing valid 
evidence supporting the interpretation of their scores according to 
the intended uses. The language profi ciency test designed for the 
English for Specifi c Purposes program of the National Council of 
University Presidents — a test for faculty members, and the offi -
cial certifi cation of translators and interpreters, among others — 
attest to elm’s expertise in the fi eld, broadened by the guidance of 
international language testing professionals. The School not only 
administers its own language tests but is also an internationally 
recognized center for some renowned language certifi cations. This 
acknowledgment has required many faculty members to receive 
extensive training and gain valuable experience as authorized cer-
tifi ed examiners. In addition, the elm possesses the know-how for 
administering large-scale, high-stakes tests nationwide, such as the 
Entrance Examination designed by the Psychological Research In-
stitute and the School of Statistics at ucr. More recently, the elm
has started digitizing some of its tests to facilitate their application 
and recording of results. This test automatization has widened the 
options towards considering three possible formats for test admin-
istration: online, offl ine, and hybrid (a combination of offl ine and 
online options that requires minimum bandwidth) suitable for the 
digital infrastructure of Costa Rica’s public high schools, which 
may or may not have a strong Internet connection.

2. Literature review

Standardized language testing may seem overwhelming and intim-
idating to many stakeholders,1 especially educators, whose work 

1 The term stakeholder is understood as “[any] person whose interests should 
be taken into consideration” (Coombe, 2018: 38). Furthermore, the four types 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 123 ]

effectiveness is at stake, and their students. As Shohamy (2001, as 
cited in Fulcher, 2010: 8) argued, “one reason why test takers and 
teachers dislike tests so much is that they are a means of control”. 
Students conceive this type of testing as a punishment focused on 
identifying their weaknesses and areas for improvement; teachers, 
on the other hand, believe it to a demonstration of lack of trust in 
their expertise by supervisors. Brown and Abeywickrama (2019: 1) 
summarized this view when they stated that students and teach-
ers, both stakeholders of the assessment process, “are not likely to 
view a test as positive, pleasant, or affi rming”. Despite these nega-
tive perspectives, different scholars have viewed standardized lan-
guage assessments as a means through which distributive justice, 
as Messick (1989, as cited in Fulcher, 2010: 4) proposed, could be 
achieved. Fulcher (2010: 4) further acknowledged the importance 
of testing as necessary and acceptable worldwide norms on which 
high-stakes decisions can be made, not only for those in charge of 
designing the instruments but also for test users, who need to see 
the test as designed, administered, scored, and reported fairly and 
equitably.

The language competence assessment concept has been con-
tinuously redefi ned through time, according to the needs of us-
ers and the evolution of language teaching and learning theories. 
Authors such as Oller (1979, as cited in Brown & Abeywickra-
ma, 2019: 13) stated that during the decades of 1970 and 1980, 
language competence was viewed as “a unifi ed set of interacting 
activities that could not be tested separately”. Cloze and dicta-
tion exercises, where several skills were assessed simultaneously, 
represented the language competence concept. However, during 
the mid-1980s, Canale and Swain (1980, as cited in Brown & 
Abeywickrama, 2019: 14) recommended a shift from this struc-

of stakeholders described by Hooge and Helderman (2008, as mentioned in 
Hooge, Burns & Wilkoszewski, 2012: 13) are also used in this paper, to wit: 
primary stakeholders, internal stakeholders, vertical stakeholders, and hori-
zontal stakeholders.


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 124 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

ture-centered approach to assessment towards a more communica-
tive one, which involved more real-life tasks that language learners 
may eventually face. Accordingly, Savignon (1985: 131) agreed 
that “communicative competence certainly requires more than 
knowledge of surface features of sentence-level grammar”. What 
is more, when it comes to authenticity in assessment, Bachman 
(1990) and Weir (1990: 86, as cited in Brown & Abeywickrama, 
2019: 16) highlighted the importance of asking questions such as 
“where, when, how, with whom, and why language is to be used, 
and on what topics, and with what effect” in order to measure lan-
guage competence. Supporting this view, Jamieson, Eignor, Grabe 
and Kunnan (2008: 57) asserted that communicative competence 
“accounts for language performance across a wide range of con-
texts, includes complex abilities responsible for a particular range 
of goals and takes into account relevant contexts”. More recently, 
Bachman and Palmer (2010, as cited in Brown & Abeywickra-
ma, 2019: 15) included “the need for a correspondence between 
language test performance and language use” among the funda-
mental principles of language testing. This more realistic commu-
nicative view of language assessment permeates some of the most 
renowned language tests currently on the market.

Today, assessing communicative language competence is per-
formed more holistically. Since profi ciency in a given language 
goes beyond knowing its grammar, other equally — if not more — 
important features should also be accounted for when testing lan-
guage profi ciency. In fact, as Badger and Yan (2012: 7), stated, “the 
main feature of the pedagogic orientation of a clt [Communica-
tive Language Teaching] course is students’ ability to use the sec-
ond language (L2), rather than knowledge about language, with 
a balance between the four skills”. Along these same lines, the 
Common European Framework of Reference for Languages (cefr) 
(Council of Europe, 2002) provides a framework that lists the 
necessary communicative language activities and strategies, as well 
as the communicative language competences (linguistic, sociolin-


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 125 ]

guistic, and pragmatic) that should be considered when designing 
language assessment instruments. Likewise, the American Coun-
cil on the Teaching of Foreign Languages (actfl) shares cefr’s 
emphasis on communication, expanding it to an intercultural com-
munication approach. More recently, actfl coined the term inter-
cultural communicative competence, defi ned as “using language 
skills, and cultural knowledge and understanding, in authentic con-
texts to effectively interact with people. It is not simply know-
ing about the language and about the products and practices of a 
culture” (Van & Shelton, 2018, January: 35). Hence, it is evident 
that the concept of mastering a language as a second or foreign 
language by a speaker keeps changing as new theories continue 
to evolve.

One may think that the analysis and construction of standard-
ized tests may have reached a stagnation point; however, the val-
idation of language assessment is an ongoing process (Chapelle, 
2012; Brown & Abeywickrama, 2019). A key step in the test vali-
dation process entails defi ning validity: “an overall evaluative judg-
ment of the degree to which empirical evidence and theoretical 
rationales support the adequacy and appropriateness of interpre-
tations and actions based on test scores or other modes of assess-
ment” (Messick, 1989, as cited in Messick, 1995: 741). Therefore, 
this concept “is not a property of the test or assessment as such, 
but rather of the meaning of the test scores” (Messick, 1996: 245). 
Recently, this view has been supported and expanded in the Stan-
dards for Educational and Psychological Testing (American Ed-
ucational Research Association [aera], American Psychological 
Association [apa], National Council on Measurement in Education 
[ncme], and Joint Committee on Standards for Educational and 
Psychological Testing [us], 2014: 11), which further highlights 
the importance of “accumulating relevant evidence to provide a 
sound scientifi c basis for the proposed score interpretations”. The 
second step entails collecting evidence to build a validity argu-
ment, which analyzes it “to make a case justifying the score-based 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 126 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

inferences and the intended uses of the test” (Carr, 2015: 331). To 
operationalize this validity argument, the guide Standards for Edu-
cational and Psychological Testing outlines fi ve sources of validity 
evidence: evidence based on response processes, evidence based 
on internal structure, evidence of relations to other variables, evi-
dence for validity and consequences of testing, and content-based 
evidence (American Educational Research Association [aera] 
et al., 2014).

In light of the vast scope of validation processes, the numer-
ous types of evidence available to demonstrate it, and the multiple 
research approaches that can be adopted, this paper will attempt 
to gather ‘content-based evidence’ as its primary focus aiming to 
meet some of the validity standards stated above. First, a test can 
claim to have content validity “if [it] actually samples the subject 
matter about which conclusions are to be drawn, and if it requires 
the test-taker to perform the behavior measured” (Brown & Abey-
wickrama, 2019: 32). To add further detail to the elements to be an-
alyzed and demonstrate content validity, the Standards for Educa-
tional and Psychological Testing (American Educational Research 
Association [aera] et al., 2014: 14) state that “test content refers 
to the themes, wording, and format of the items, tasks, or questions 
on a test”. Content validity is also linked to being ecologically sen-
sitive: “serving the local needs of teachers and learners. What this 
means in practice is that the outcomes of testing — whether these 
are traditional ‘scores’ or more complex profi les of performance 
— are interpreted in relation to a specifi c learning environment” 
(Fulcher, 2010: 2). More recently, other authors have also studied 
this concept; some of them, such as Coombe (2018: 28), even re-
named it localization and have defi ned it as:

A test that is designed to cater to the local needs of the test 
population. This may mean choosing appropriate cultural top-
ics and making sure the processes of test design, piloting, ad-
ministration and scoring refl ect local needs and expectations. 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 127 ]

In more recent localization movements, this has also involved 
localization of language use in context to include the spread 
and changing shape of English in countries that use English 
as an offi cial language.

Based on the concepts and context provided above, it is possible to 
affi rm that ucr’s language profi ciency test supports this fi rst claim: 
it serves the local needs of teachers and learners in Costa Rican 
high schools as test users can interpret students’ scores and profi les 
of performance as sensitive to this specifi c learning environment.

Last, the fi nal block towards building a solid validity argument 
for a standardized test consists of aligning a test with language pro-
fi ciency descriptors such as those provided by the cefr the charac-
teristics of test takers and administrators (O’Sullivan, 2016: 148). 
Similarly, the Association of Language Testers in Europe (2020: 
26) further emphasizes that the test should be linked to a theoret-
ical construct as a minimum standard in test construction. Hence, 
a second claim to be made about ucr’s language profi ciency test 
is that this test is properly aligned with the cefr language profi -
ciency descriptors.

3. Description of the test

3.1.  Validity argument

For the purpose of the validity argument in this paper, we provide 
the following descriptions and analyses as suggested in the Stan-
dards for Educational and Psychological Testing (section 4.1).

3.1.1. Purpose of the test

The purpose of the pdl-mep test is to assess Costa Rican high 
school students regarding their understanding and production of 
non-technical English related to both regional and global contexts 
pertaining to formal and informal socio-interpersonal, transac-


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 128 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

tional, and academic domains while using the cefr descriptors as 
reference. This test is merely diagnostic. All senior high school 
students in Costa Rica who take this test will be able to know their 
profi ciency level in terms of reading and listening comprehension 
skills, according to the cefr. This will not be considered a lan-
guage certifi cation test; hence, it should not be used for college 
admissions, visa applications, or job applications.

3.1.2. Score interpretation

As indicated by mep’s authorities, the purpose of this new test is 
to determine the language profi ciency of students as a means to 
diagnose the effi cacy of the language programs recently adopted 
in Costa Rica, as well as the language development stage of stu-
dents. Hence, testees should interpret the results as a refl ection of 
progress in their foreign language education, which will evidence 
the areas where they perform strongly and those that need improve-
ment. Although students are not required to obtain a particular 
score to graduate from high school, they should take this test as a 
requirement for graduation.

Based on the scores obtained nationwide, the mep might make 
the necessary adjustments to better achieve its goal: getting stu-
dents to perform at the B1 level by the end of their high school 
years. To illustrate such adjustments, if the test results show a clear 
lack of command of B2-level tasks, mep teachers will be able to 
use this information and address the defi ciencies identifi ed by re-
inforcing the unsatisfactory tasks in class. Internal stakeholders, 
as outlined by Hooge, Burns, and Wilkoszewski (2012: 13) might 
use the test results to determine, for example, where to recruit new 
bilingual personnel or whether to invest in additional language pro-
grams for underprivileged populations.


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 129 ]

3.2.  The constructs of the test

3.2.1. Reading comprehension

Reading comprehension profi ciency is defi ned as demonstrating 
an understanding of non-technical texts in English related to both 
regional and global contexts that pertain to formal and informal 
socio-interpersonal, transactional, and academic domains, taking 
cefr’s descriptors as reference. The contents to be included are 
determined following the mep guidelines. Furthermore, the skills 
assessed range from recognizing “familiar words accompanied by 
pictures, such as a fast-food restaurant menu illustrated with pho-
tos or a picture book using familiar vocabulary” to understanding 
“in detail lengthy, complex texts, whether or not they relate to 
[examinees’] own area of speciality” (North, Piccardo & Goodier, 
2018: 60). Finally, some of the strategies to be demonstrated by 
testees are included in the cefr descriptors, such as skimming, 
scanning, understanding a writer’s tone and humor, and identify-
ing attitudes and implied opinions (cefr, 2018).

3.2.2. Listening comprehension

Listening comprehension profi ciency is defi ned as demonstrating 
an understanding of non-technical English aural texts related to 
both regional and global contexts that pertain to formal and in-
formal socio-interpersonal, transactional, and academic domains, 
using the cerf descriptors as reference. The contents to be in-
cluded are determined following the mep guidelines. Some of the 
skills to be assessed range from recognizing “numbers, prices, 
dates, and days of the week, provided they are delivered slowly 
and clearly in a defi ned, familiar, everyday context” to following 
“extended speech even when it is not clearly structured and when 
relationships are only implied and not signaled explicitly” (cefr, 
2018: 55). Last, some of the strategies to be demonstrated by test-
ees are encompassed in the cefr descriptors, such as understanding 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 130 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

the main ideas and specifi c details, making inferences, and discern-
ing speakers’ attitudes (cefr, 2018).

3.3.  Claims

3.3.1. Claim 1 (MEP)

The ucr language test gives the mep valid and reliable informa-
tion about the English performance of students regarding nation-
wide language standards and cefr profi ciency bands, including 
communicative activities, strategies, and language competences. 
Based on this information, the mep can report students’ language 
performance by classroom, school, district, and region. With this 
in mind, the Ministry will be able to design strategies to focus on 
those areas most in need of support regarding language profi ciency.

3.3.2. Claim 2 (teachers)

The ucr language test gives the mep valid and reliable informa-
tion about the English performance of students regarding nation-
wide standards and cefr profi ciency bands, including communi-
cative activities, strategies, and language competences. Based on 
this information, the Ministry of Education can adjust classroom 
activities — formative and summative — to meet the standards 
established by the mep.

3.3.3. Claim 3 (parents and students)

The ucr language test gives parents and students valid and reliable 
information about the English performance of students regarding 
nationwide language standards and cefr profi ciency bands, includ-
ing communicative activities, strategies and language competences. 
Based on this information, these stakeholders can determine stu-
dents’ progress across the entire education system.


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 131 ]

3.4.  Rationale

Given the scenario described above, the Ministry of Education and 
the University of Costa Rica agreed to build, administer, and de-
liver the results of an English language profi ciency test that would 
represent a customized and convenient option for both institutions. 
Accordingly, the mep would have an instrument that meets their 
specifi c needs and the ucr would have another opportunity to give 
back to society all the knowledge it has acquired through research 
and its socially-oriented education programs over the years. More-
over, as part of a well-documented and reliable process, “language 
testers shall endeavor to communicate the information they pro-
duce to all relevant stakeholders in as meaningful a way as possi-
ble” (International Language Testing Association, 2018: 2). This 
transparency is particularly important in documenting such a pi-
oneering initiative whereby a Ministry of Education in a Latin 
American country enforces a national policy of bilingualism in 
conjunction with a higher education public institution through a 
large-scale language test.

This article aims to gather evidence to build a pdl-mep content 
validity argument through an analysis of the theoretical founda-
tions supporting the construction and administration of a custom-
ized standardized language test and its localized context, including 
a description of the test and the input of the internal stakeholders. 
Additionally, this article provides suggestions and recommenda-
tions for future testing experiences of this type while setting the 
grounds for future researchers who intend to follow this academic 
approach.

The following literature review has been included as a refer-
ence to support both the claims established in the validity argu-
ments proposed above and the claims to defi ne the construct of the 
ucr Language Profi ciency Test (content domain).


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 132 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

3.5.  Localized test context

3.5.1. MEP diagnostic test for high school students

To meet the localization principles mentioned above, the national 
English language profi ciency test will assess the reading and listen-
ing comprehension skills of high school students — as per mep’s 
request — based on the themes, domains, and scenarios set by the 
Ministry of Education in its Programas de Estudio de Inglés (En-
glish Language Programs), which have been aligned with the cefr
guidelines (Ministerio de Educación Pública, 2016). The topics 
addressed in this document include, but are not limited to, confl ict 
resolution, democracy and democratic principles, economic devel-
opment, environmental sustainability, blurring of national borders, 
and human rights defense and protection (Ministerio de Educación 
Pública, 2016: 13). Three axes encompass all these topics: a glob-
al citizenship with local belonging, education for sustainable de-
velopment, and new digital citizenship (Ministerio de Educación 
Pública, 2016: 13, 55). The contexts or domains where the target 
language is to be used and selected for this test include socio-inter-
personal, transactional, and academic domains (Ministerio de Edu-
cación Pública, 2016: 38). Among the multiple scenarios provided 
by the mep for all secondary education levels, the test may include, 
for example, “Enjoying Life” (7th grade), “Going Shopping” (8th 
grade), “Lights, Camera, Action” (9th grade), “Stories Come in 
All Shapes and Sizes” (10th grade), and “The Earth – Our Gift and 
Our Responsibility” (11th grade). Therefore, a third claim is that 
ucr language profi ciency test items address the topics and themes 
identifi ed as important by the Ministry of Education, including, 
but not limited to, confl ict resolution, democracy and democratic 
principles, economic development and environmental sustainabil-
ity, blurring of national borders, and defense and protection of hu-
man rights. A fourth claim is that the test items address the three 
axes identifi ed by the Ministry of Education: a global citizenship 
with local belonging, education for sustainable development, and 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 133 ]

new digital citizenship. Finally, a fi fth claim is that the test items 
refl ect three domains in which students must demonstrate English 
language profi ciency: socio-interpersonal, transactional, and aca-
demic domains.

The contextual needs addressed above will be further opera-
tionalized by simple and clear wording items and instructions that 
not only meet the expected language profi ciency levels of test-
ees but also comply with the design requirements of trained lan-
guage specialists. Brown and Abeywickrama (2019: 74) warned 
against wordiness, redundancy, and unnecessarily complex lexi-
cal items that might confuse the testee. Consequently, to ensure 
that test takers can understand what is expected of them, the lan-
guage complexity in the instrument should correspond to the 
one described in the respective cefr band. The “Text Inspector” 
tool (Cambridge University Press, 2015) will be used to guarantee 
this match. Finally, as mandated by the Standards for Educational 
and Psychological Testing, items will be designed and proofread 
by trained specialists, some of whom are native English speakers 
and whose teaching expertise is also valuable (American Educa-
tional Research Association [aera] et al., 2014: 75).

The format of the items, tasks, and questions will follow the 
guidelines in the document Programas de Estudio de Inglés (Minis-
terio de Educación Pública, 2016). For example, the mep (2016: 
44) suggests the following items to assess reading comprehension 
profi ciency in the classroom: “reading aloud, multiple choice, and 
picture-cued items. Selective reading performances are gap fi ll-
ing, matching tasks, and editing”. The test will prioritize those 
tasks that lend themselves to be used in large-scale, standardized, 
computer-based scenarios. The mep also provides a list of possi-
ble tasks to be used to assess the listening skill, such as summariz-
ing, note taking, and identifying specifi c information (Ministerio 
de Educación Pública, 2016: 42). These tasks should all mimic 
real-life scenarios similar to those that students encounter in the 
classroom over their learning process.


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 134 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

3.5.2. Stakeholders’ expectations

To further contextualize the test and accurately place it within the 
given ecosystem, stakeholders’ views should also be carefully con-
sidered and analyzed. This analysis includes the information pro-
vided by two of the most relevant decision-makers: the national 
coordinator of the Alliance for Bilingualism at the mep and the di-
rector of the School of Modern Languages at the ucr. The former 
acknowledged that the transition from a traditional reading com-
prehension test to a skills-based one took the mep approximately 
11 years, led by the Dirección de Gestión y Evaluación de la Cali-
dad [Quality Management and Evaluation Department]. The fi rst 
administration of the test aims to diagnose the English profi ciency 
level of high school students; comparing these results against those 
to be obtained in the future will demonstrate the impact of the re-
cently introduced Programas de Estudio de Inglés. Said impact 
could then be further analyzed using predictive validity evidence 
studies. In the words of the national coordinator of the Alliance for 
Bilingualism at the mep (personal communication, November 5, 
2020), this fi rst test administration will also determine whether or 
not the mep has the adequate physical and digital infrastructure to 
administer the test to more than 60 000 students. The coordinator 
also stated that this test will help other interested organizations to 
evaluate their adaptation capacity to the potential scenarios that 
may arise when monitoring their tests at the mep. For example, 
some students might need to take the test at facilities other than 
their school due to poor or no Internet connectivity. Consider-
ing the multiple circumstances that regions or institutions may face 
(e.g., insuffi cient number of working computers, lack of person-
nel to supervise the test administration, or the variety of school 
types), several administration schedules should be arranged, for 
example, three or four different agreed-upon times according to 
the particular requirements of each institution. Additionally, this 
stakeholder emphasized some of the requirements to be met by any 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 135 ]

candidate testing organization when working with the mep: avail-
ability of immediate technical support, verifi able language testing 
experience, standardized administration protocols, and provision 
and modifi cation of physical resources according to the particular 
needs of testees.

The director of the School of Modern Languages at the ucr
— the second stakeholder interviewed in this study — highlighted 
several points (personal communication, December 15, 2020). The 
School of Modern Languages presents itself as a valuable part of 
the process because of its signifi cant technical and human capacity 
and previous experience in standardized language assessment. The 
director underlined the fact that the School has the necessary ca-
pacity, in terms of technology and human resources, to administer 
such a high-stakes test successfully. However, he also recognized 
that further support and investment from ucr authorities would be 
advisable to improve computer laboratories, security protocols, and 
data collection and analysis instruments. He also acknowledged 
the added value that collaborating with other University depart-
ments could provide in the future. In terms of staff, the institution 
has properly trained personnel for the successful administration of 
the test in any of the formats requested by the mep, although ad-
ditional support for the continuous training of these professionals 
would be advisable.

In terms of the School capacity, the director affi rmed that the 
University can administer this test three times a year at large-scale, 
specifi cally examining 5000 mep students per day, with results de-
livered within three weeks. Moreover, the technological know-how 
and experience in testing facilitate any specifi c adaptations and 
modifi cations that the mep may require. Finally, the director high-
lighted the importance of familiarizing the target population with 
the test by facilitating a customized digital mock test to bridge the 
gap between their knowledge of computerized testing and class-
room testing.

The director expressed his confi dence in the reliability of the 
listening and reading online tests based on the evidence gathered 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 136 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

but also considers that using printed instruments could be feasible, 
and equally reliable, depending on the needs of the target popula-
tion. Both types of formats can be equally reliable regarding data 
collection, provided they fulfi ll the safety and procedural protocols 
designed by the ucr.

Production skills may be evaluated in the future, although fur-
ther training and studies are necessary to ensure the reliability and 
validity of the results assumptions based on student performance. 
As a novel feature, he elaborated on the future possibility of using 
artifi cial intelligence as a tool in the development and scoring of 
ucr language tests.

This stakeholder believes the test should accurately measure 
the language level of testees by evaluating their understanding of 
everyday and academic English, which is the core of the construct 
approved by the mep for this test. This goal will be achieved by 
using an instrument that will include 40 to 60 items per skill and 
which would take approximately 60 to 70 minutes to complete per 
macro skill. The items may include multiple-choice, sequencing, 
matching, drag-and-drop, and short answers; the specifi c items se-
lected will depend upon how these perform in pilot testing. Finally, 
the director added that the text reading and listening sources will 
be authentic and ecologically sensitive material that will match the 
cefr levels intended to assess. The operationalization of this test 
and its construct have been approved by the mep.

4. Next steps

Given the large-scale nature of this assessment and its implications, 
and as a pioneering enterprise, institutions should work together 
in organizing the test logistics. The expertise of the involved par-
ties (in this case, mep and ucr) is key in successfully attaining the 
goals set by localizing the test. This would require the University 
of Costa Rica to conduct the following:


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 137 ]

1) Arrange meetings with mep representatives and deci-
sion-makers to agree on the operationalization of the test 
features

2) Carry out analyses of needs:
a. Survey teachers of English in Costa Rican high schools 

regarding, among other topics, their technological lite-
racy, type of instruction, expectations from the test, and 
attitude towards standardized testing

b. Gather information on the views and experience of stu-
dents with standardized and computer-based testing, fa-
miliarity with online testing and item types, and prefe-
rred topics for English language profi ciency assessment, 
among others

c. Interview regional and national English advisors to co-
llect information regarding the availability of human 
resources and infrastructure required for reliable test 
monitoring

d. Ensure that all students are treated fairly throughout the 
assessment process, having an unobstructed opportuni-
ty to demonstrate their level of English language pro-
fi ciency

e. Conduct additional analyses of mep’s English curricula 
to determine additional topics and domains that could 
be tested

3) Organize and hold massive training programs for all stake-
holders in this ecosystem

4) Design the test around the language pro� ciency concept 
and its two main pillars according to the cefr: communica-
tive language activities and strategies and communicative 
language competences

5) Create a draft test blueprint to share with stakeholders to 
obtain their feedback before starting the item development 
phase


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 138 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

6) Share the draft blueprint with key stakeholders and have 
them complete a survey with questions about it to gath-
er their opinion about its adequacy

7) Pilot-test items in the real population and conduct statisti-
cal analyses to assess their usefulness and reliability before 
the offi cial test administration. This step would ensure the-
se items are fair for various subgroups (e.g., male /   female, 
urban / rural / suburban, different racial / ethnic groups, low 
ses / high ses) by conducting differential item functioning 
(dif) analyses.

As Brown and Abeywickrama (2019), Fulcher (2010), and Coombe 
(2018) argued, building a localized validity argument for a national 
standardized test from scratch requires multiple steps and studies 
that would involve massive amounts of fi eldwork to meet the par-
ticular needs and characteristics of the context and population as-
sessed. By localizing the English profi ciency test to meet the spe-
cifi c needs of Costa Rican high school students, the latter should 
consider the test fair after realizing it was not developed lightly but 
resulted from careful consideration and design, as Fulcher (2010: 
4) recommended. In turn, this may contribute to neutralizing the 
generalized negative perceptions of standardized testing. Since this 
is a customized test, it will need to address our country’s needs, 
lacks, and wants in foreign language standardized testing by bas-
ing its tests on mep’s unit contents, theoretical constructs, and item 
familiarity.

The locality-sensitive assessment will be produced in paral-
lel with the new national policy of bilingualism (Ministerio de 
Educación Pública, 2016) where, in agreement with Badger and 
Yan (2012) as well as Brown and Abeywickrama (2019), students 
should learn to use the language. This rationale underlies ucr’s 
choice of the skill-based assessment provided by the cefr, which 
emphasizes and evaluates testees’ competences. The customized 
nature of the test would not only reduce the anxiety and fear of 
those involved (managers and students), but would also help us 


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 139 ]

obtain more precise evidence of the testees’ performance in lan-
guage receptive skills. This is indeed the current communicative 
concept of language assessment and advocated by Canale and 
Swain (1980), Brown and Abeywickrama (2019), and Jamieson 
et al. (2008).

The results from this test will be diagnostic (see Brown and 
Abeywickrama, 2019: 10), which would, in turn, provide authori-
ties with a clearer perspective of the system’s strengths and weak-
nesses in so far as such interpretation is aligned with the theoretical 
construct of the test (Carr, 2015; Fulcher, 2010).

Since validation is a never-ending process (Chapelle, 2008; 
Brown & Abeywickrama, 2019), this pioneering nationwide stan-
dardized testing exercise is an ongoing project that has just started 
with this fi rst step in language standardized testing in Costa Rica 
and Latin America.

5. Recommendations

The following recommendations are suggestions for those resear-
chers who are developing localized and standardized language 
tests.

1) Researchers should review international guidelines on de-
veloping standardized tests. Some of these guidelines have 
been issued by institutions such as ilte, alte, and apa. 
Guides such as the Standards for Educational and Psy-
chological Testing are user-friendly starting points for re-
searchers in the fi eld.

2) Localizing a standardized language test requires more than 
designing an assessment instrument for a specifi c popula-
tion. As outlined above, this continuous process should be 
done from the beginning with hand in hand with stakehol-
ders, especially students. Given the short- and long-term 
impact of these tests and since multiple actors will be invol-


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 140 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

ved in the process, researchers are advised to consider the 
opinion of all stakeholders before making any decisions.

3) Researchers should take the input of some stakeholders with 
caution. For example, some may over- or underrepresent 
their particular context needs, lacks, or preferences. Con-
sequently, it is imperative to corroborate the information 
with real-time observations and multiple sources to confi rm 
the test requirements.

4) Investigators are advised to seek the assistance of langua-
ge-testing specialists during the development of their own 
standardized tests. These specialists should help investiga-
tors to solve any issues since the former may have already 
dealt with these previously. There is nothing wrong with 
asking for help when it comes to such high-stakes tests.

5) Institutions aiming to develop standardized language tests 
may consider the possibility of certifying their language 
professionals in the different areas they intend to test. For 
instance, the actfl offers international certifi cation for pro-
fessionals who want to become offi cial certifi ed testers of 
English (for oral and written production). Having certifi ed 
testers as part of the team constructing the test would provi-
de valuable support to the process of developing, pilot-tes-
ting, and assessing the performance of items designed to 
measure those skills.

6) If an institution is planning to develop a standardized lan-
guage test, it should consider the available human resour-
ces. Since this is a continuous process, it is convenient to 
have team members in charge of the different tasks rela-
ted to the test, so as not to burden them with excessive 
workloads. For example, one group of language specialists 
could be dedicated to item writing; another, to item analy-
sis; and yet another, to collecting evidence for the multiple 
claims. Assigning all of these tasks to the same team may 
induce a “burnout” feeling in the team members.


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 141 ]

7. References

American Educational Research Association (AERA), American Psychological Association (APA), National 
Council on Measurement in Education (NCME), & Joint Committee on Standards for Educational 
and Psychological Testing (US) (2014).  Standards for educational and psycho-
logical testing. Washington: American Psychological Association.

Association of Language Testers in Europe (2020).  ALTE principles of good practice. 
https://pt.alte.org/resources/Documents/ALTE%20Principles%20
of%20Good%20Practice%20Online%20version%20Proof%204.pdf

Azofeifa, Mauricio (2019, February 11).  Más de 5.300 estudian inglés gracias a 
Alianza para el Bilingüismo (ABi). Ministerio de Educación Públi-
ca. Gobierno de Costa Rica. https://www.mep.go.cr/noticias/mas-
5300-estudian-ingles-gracias-alianza-bilingueismo-abi

Bachman, Lyle (1990).  Fundamental considerations in language testing. New 
York: Oxford University Press.

Badger, Richard, & Yan, Xiaobiao (2012).  To what extent is communicative language 
teaching a feature of ielts classes in China? In Jenny Osborne & 
idp: ielts Australia (Eds.), IELTS research reports 2012 (Vol. 13, pp. 
1–44). Australia, United Kingdom: idp: ielts Australia Pty Limit-
ed, British Council.

Brown, H. Douglas, & Abeywickrama, Priyanvada (2019).  Language assessment: Prin-
ciples and classroom practices (3rd. ed.). Hoboken: Pearson Edu-
cation.

Cambridge University Press (2015).  Text Inspector. https://languageresearch.cam
bridge.org/wordlists/text-inspector

Carr, Nathan T. (2015).  Designing and analyzing language tests. Oxford: Ox-
ford University Press.

Chapelle, Carol A. (2012).  Validity argument for language assessment: The 
framework is simple… Language Testing, 29(1), 19–27.

Chapelle, Carol A. (2008).  The toefl validity argument. In Carol A. Chapelle, 
Mary K. Enright & Joan M. Jamieson (Eds.), Building a validity ar-
gument for the Test of English as a Foreign Language (pp. 319–352). 
New York: Routledge.


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

[ 142 ] Walter Araya Garita, José Fabián Elizondo González & Ana Carolina González Ramírez

Council of Europe (2002).  Common European framework of reference for lan-
guages: Learning, teaching, assessment. Cambridge: Press Syndi-
cate of the University of Cambridge.

Coombe, Christine (2018).  An A to Z of second language assessment: How lan-
guage teachers understand assessment concepts. London: British 
Council.

Cordero, Monserrat (2019, November 29).  mep: colegios públicos tienen nivel bási-
co en dominio del inglés. Semanario Universidad. https://semanar
iouniversidad.com/ultima-hora/mep-colegios-publicos-tienen-niv
el-basico-en-dominio-del-ingles/

Fulcher, Glenn (2010).  Practical language testing. London: Hodder Education.
Hooge, Edith; Burns, Tracy, & Wilkoszewski, Harald (2012).  Looking beyond the numbers: 

Stakeholders and multiple school accountability. OECD Education 
Working Papers, No. 85. Paris: oecd. http://dx.doi.org/10.1787/5k
91dl7ct6q6-en

International Language Testing Association (2018).  ILTA code of ethics. https://www.
iltaonline.com/page/CodeofEthics

Jamieson, Joan M.; Eignor, Daniel; Grabe, William, & Kunnan, Antony John (2008).  Frameworks 
for a new toefl. In Carol A. Chapelle, Mary K. Enright & Joan M. 
Jamieson (Eds.), Building a validity argument for the Test of English 
as a Foreign Language (pp. 55–95). New York: Routledge.

Messick, Samuel (1995).  Validity of psychological assessment: Validation of 
inferences from persons’ responses and performances as scientifi c 
inquiry into score meaning. American Psychologist, 50(9), 741–749. 
https://doi.org/10.1037/0003-066X.50.9.741

Messick, Samuel (1996).  Validity and washback in language testing. Language 
Testing, 13(3), 241–256. doi:10.1177/026553229601300302

Ministerio de Educación Pública (2016).  Programas de estudio de inglés: tercer ciclo 
y educación diversi� cada. Costa Rica: Imprenta Nacional.

North, Brian; Piccardo, Enrica, & Goodier, Tim (2018).  Common European Framework 
of Reference for Languages: Learning, teaching, assessment. Com-
panion volume with new descriptors. Strasbourg: Council of Europe.

O’Sullivan, Barry (2016).  Adapting tests to the local context. British Council 
new directions in language assessment: JASELE journal special edi-


Estudios de Lingüística Aplicada, año 40, número 75, julio de 2022, pp. 119–143

doi: 10.22201/enallt.01852647p.2022.75.1013

Getting stakeholders acquainted with the rationale behind the construct of the English language profi ciency test [ 143 ]

tion (pp.145–158). Tokyo: Japan Society of English Language Ed-
ucation, British Council.

Van Houten, Jacque, & Shelton, Kathleen (2018, January).  Leading with culture. The 
Language Educator. https://www.actfl .org/sites/default/fi les/tle/
TLE_JanFeb18_Article.pdf

Savignon, Sandra J. (1985).  Evaluation of communicative competence: The 
actfl provisional profi ciency guidelines. The Modern Language 
Journal, 69(2), 129–134. doi:10.1111/j.1540-4781.1985.tb01928.x