Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions

Purpose : This study aims to verify the quality of the teacher competency inventory as a pedagog - ical competencies measure of teachers in higher education institutions. Design methodology : The development and testing of the teacher competency inventory covered three phases: (1) item generation, (2) dimension identification and data reduction (factor analysis), and (3) scale evaluation. The current study tests its construct validity, using an independent sam - ples t-test, variance analysis, and Anova in two different environments. Findings : Results indicate that the inventory measures unitary constructs and provide its predictive capacity. The developed instrument measures levels of future-oriented pedagogical competen - cies with comparable reliability and validity. Practical implications : Teacher competency inventory may be applied to teachers’ recruitment, selection, development, and performance evaluation with the goal to identify current and required levels of competencies. The inventory allows one to compare teacher competencies with the expected competency profile. Possible differences may indicate deficits or overruns in relation to the expected profile. Originality : Teacher competency inventory is a meaningful tool to improve human resource prac - tices – including recruitment, selection, and teacher performance monitoring – and to indicate the current and required level of competencies. The identified competency gap can help plan teacher development to increase job performance.


Introduction
Higher education institutions' (HEIs) policymakers need to continuously analyze the challenges universities face and seek solutions that increase organizational performance.They need to recognize problems in the quality of teaching and learning linked to less effective teachers that become reasons for low-performing educational institutions.Properly implemented and executed human resource practices are vital factors in the success (Cohen, 2015) of HEIs.
One of the most recent challenges is to recruit and retain quality teachers (Adnot et al., 2017).Thus, HR practices need to focus on resourcing and continuous development to stimulate teacher abilities (Keller-Schneider et al., 2020), motivations, and opportunities to grow, while assisting them in reaching higher performance (Pak, 2019).Therefore, we require appropriate measurement and evaluation methods to accurately identify low-performing teachers and replace them with more effective ones (Adnot et al., 2017).Measurements should focus on individual ability in terms of competencies (Ulrich & Dulebohn, 2015).It becomes a key issue to focus attention on making decisions based on sufficient and measurable data (Firestone & Donaldson, 2019).
The role of competency orientation for teaching in educational institutions has recently increased.The high level of teacher competencies informs the quality of teaching regarded as the determinant of any educational institution's quality (Yeşilçınar & Çakır, 2020).The concept of competency is a major focus of each human resource (HR) practice, not just recruitment and selection but also performance evaluation, development, and others.These practices measure critical characteristics of the employee or candidate, which then allows forecasting their workplace performance.Such an approach requires the application of reliable assessment methods to identify competencies meaningful for the institution (Baartman, 2013).
To predict teacher job performance, most higher education institutions (HEIs) still use commercial solutions that lack evidence-based support.They recruit more teachers instead of finding the person who best fits the needs of students and institutions.It may result not only in higher recruitment process costs or increased teacher turnover but also in low-performing teachers (Adnot et al., 2017).Consequently, poor HR practices in teacher assessment and evaluation can result in a decrease in the quality of education.The evidence-based approach promotes accountability and should be required to measure teacher quality (Darling-Hammond, 2014) and performance.Therefore, the critical examination of assessment tools and their use should be the focus of HEIs policymakers (Firestone & Donaldson, 2019) and HR administrators.
Q1. Do the four TCI dimensions establish a valid assessment tool to measure a teacher's pedagogical competencies?Q2.Does the TCI fits different groups (i.e.gender, education) thus explaining its internal quality?
There is a shift in evaluating the quality of assessments from external toward internal evaluation, along with responsibility placed on organizations (McNamara & O'Hara, 2008;Vanhoof et al., 2010).Therefore, this article aims to investigate the competency inventory of abilities and its application to HR practice as internal organizational practice.
In the first part, I will critically review the literature so as to then define the concept of competence according to various approaches and discuss the importance of competence tests, including self-assessment.Next, I will describe the research methodology, i.e. description of the research group, construction of items, and dimensions.In the next section, I will present in detail the method of validating and testing the quality of the TCI, also by formulating the research hypotheses, and then I will describe the

Literature Review
Currently, competencies are the most important unit of educational effectiveness (Ridei et al., 2021).The competency-based approach in HEIs is becoming a priority; it aims to develop all aspects of teachers' competencies (Ridei et al., 2021).According to Armstrong (2009) competency-based human resources management (HRM) involves the use of the concept of competency and its analysis to improve HR practices, including employee recruitment, selection, development, performance evaluation, and remuneration.
Initially, competency has been defined as a formal right to deal with specific matters and make decisions in a set range on behalf of an organization.However, the ability to act effectively and efficiently was generally associated with qualifications.In the 1980s, competencies began to be understood more broadly as the employee's scope of rights, duties, and responsibilities to the assigned job.Boyatzis (1982) describes it as the potential existing in a person, which leads to behavior satisfying work requirements in the parameters of the organization's environment, which in turn provides desired results.Boyatzis establishes there is a range of factors that differentiates a successful performance from the less successful one.These factors include personal qualities, motivations, experience, and behavioral characteristics (Armstrong, 2009).Most authors struggle with the need to determine the competency components (Yeşilçınar & Çakır, 2020).Knowledge and skills are the essential components of competency.Relatively often, competency components include attitudes and behaviors (Muñiz--Rodríguez et al., 2017).Therefore, competency is the ability to use and apply knowledge in professional situations, as a set of capabilities that enable effective job performance, the achievement of goals, and compliance with standards of action.Moreover, the study by Keller-Schneider et al. (2020) revealed that teachers take competency as a resource and can differentiate their sense of competency across different domains.Hager et al. (1994) recommend an integrated approach, in which competency is not observed directly, but it is inferred from performance.This approach places the assessment of competency in the same situation as other evaluation methods in the frame of available procedures with the goal to maximize its validity and reliability (Hager et al., 1994).This requirement is crucial in any attempt to put employee competencies into practice.The competency test is a tool that allows measuring the level of all components of competencies, namely knowledge, skills, and behavior.
Competency tests belong to the group of standardized methods of measurement and competency assessment.They are reliable tools that provide objective and reliable information about the level of competency.Competency tests are psychometric tools created to test specific competencies.As there are no universal competencies, there are also no comprehensive tests that measure competencies.The key indicator is test reliability and validity, which describes the accuracy of the competency test, along with the parameter that determines what exactly is the subject of measurement.The assessment method validity refers to how well it measures what it is supposed to measure.
A specific form of evaluation is self-evaluation, in which the employee gathers information and evidence to prove the level of competencies.Existing research shows that self-evaluation stimulates reflection and leads to concrete points for improvement (McNamara & O'Hara 2008;Baartman et al., 2013).Nevertheless, the quality of self-evaluation differs significantly (Schildkamp, 2007) concerning factors related to evaluators' attitudes toward self-evaluation, the provision of support, and the view of self-evaluation as a policy action that requires systematic data collection.Moreover, McNamara & O'Hara (2008) revealed that teachers collect evidence of their competencies, which are far from reality.Moreover, teachers find it difficult to conduct a systematic self-evaluation.To overcome these problems, TCI can be used as self-evaluation, but it can also be performed by supervisors who collect and provide appropriate evidence to demonstrate assessment quality (Baartman et al., 2013).

Research Methodology
This study was conducted to verify the proposed hypotheses (section 3.2).The research was conducted among HEI teachers, students, and graduates in Poland and India.Both countries are quickly developing; their educational systems are continuously changed by national regulations; however, the educational context differs.These elements motivated me to study the two different contexts.The sample consisted of 219 respondents from India and 47 from Poland.The difference in sample size was due to the number of higher education institutions in both countries.The education market in India is much larger than in Poland, and so, the sample size was representative of the number of institutions in both countries.In both cases, the scales were shown to be reliable, so the sample size was sufficient for validation.
This study followed the widely authenticated framework outlined by Hinkin (1998) and Churchill (1979) to develop a comprehensive psychometric survey instrument.
Accordingly, the empirical development and validation of the TCI involved the following steps: (1) item generation, (2) dimension identification and data reduction (factor analysis), and finally (3) scale evaluation.The measurement of validity and reliability was done following the guidelines provided by Hair et al. (2010).

Item Generation and Dimensions Identification
First, the study developed the initial pool of items for the questionnaire.Then, the data were analyzed to generate a list of items for each dimension (Turker, 2009).I specified the content of each dimension, identified items specific to each dimension, and then empirically determined how much these items were relevant to each studied domain.Based on the domain conceptualization, I generated 71 items that described teacher competencies and related teacher behavior based on the extensive literature review (Ramsden et al., 2007;Xu, 2011;Dani & Mhunpiew, 2019) and in-depth interviews with five experts from higher education institutions.The in-depth interviews were grounded on questions regarding the basic research question of this study: "What are the future-oriented teacher competencies?"Following the guidelines that "the new items should be scaled using 5-point Likert scales" (Hinkin, 1998, p. 110), all items were anchored at five points (1=strongly disagree to 5=strongly agree), and participants expressed their level of agreement with each item.
The survey with 71 items was conducted among teachers, students, and graduates of higher education institutions in India and Poland.
The factor analysis was performed to investigate the underlying constructs of the 71 items in the TCI.As a result, five questions were removed from the survey because they did not make a significant contribution to any of the identified factors.Finally, there was high Cronbach's alpha reliability for the identified dimension, which implied that the inventory enjoyed proper internal consistency.The resulting inventory was reliable and valid with 66 items in the four dimensions regarding future-oriented teaching competencies in HEIs (Appendix 1; Ludwikowska, 2019).A test of the inventory indicated that there was a high Cronbach alpha reliability for both groups from India (from 0.715 to 0.970) and Poland (from 0.749 to 0.979; Ludwikowska, 2019).
The following dimensions of the teacher competency inventory (TCI) emerged in the study: stimulating students for their achievement (SA), using different teaching approaches to improve student learning (TA), preparing students to be globally competent (GC), creating a supportive learning environment (LE).

CEMJ 91
Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions The results show that the inventory enjoyed adequate internal consistency for four dimensions as general teaching competencies.For selected dimensions, I performed normalization aimed at adjusting the values measured on different dimensions to a standard scale.For each dimension, a sten score has been calculated (Table 1).Normalization ensured that all data look and read the same way across all records.The results of each scale are in the range of 1-5 (average items were counted).The TCI with the final 66 items and four dimensions explained 47.279% of the total variance and indicated a strong factor model (Ludwikowska, 2019).
The development of a valid and reliable instrument is a complex and systematic process.These results gave the foundation for investigating the internal quality of the test in terms of its validity.Therefore, the second phase of the study sought to evaluate the degree to which the TCI dimensions constitute an operational definition of the construct used to determine teachers' pedagogical competencies.This study tested the construct validity of the test by using the independent sample t-test, variance analysis Anova in two different environments (India, Poland), and its application to HEI HR practices.The analysis was conducted to verify the proposed hypotheses.

Validity and Test Quality Evaluation
Further empirical investigation of TCI's psychometric properties is essential in developing the inventory and its constructs, bearing in mind that establishing the psychometric properties of behavioral measures is a complex and lengthy process (Cronbach & Meehl, 1955;Anastasi & Urbina, 1997).Hence, I performed validation to establish the test quality.To verify the research questions and the validity of TCI, the following hypothesis was formulated: H1: There are differences among groups of respondents from India in perceiving the importance of TCI dimensions.
The following sub-hypotheses were specified as follows: H1a: There are differences among male and female respondents from India in perceiving the importance of TCI dimensions.
H1b: There are differences between respondents from India who studied/taught abroad and those who did not in perceiving the importance of TCI dimensions.H1c: There are differences among the Indian respondents who had lectures at their home university with a teacher from abroad and those who did not in perceiving the importance of TCI dimensions.H1d: There are differences between respondents from India with different current occupations (student, graduate, academic teacher, other) in perceiving the importance of TCI dimensions.
H1e: There are differences among the Indian respondents with different educational backgrounds (engineering, humanities, social sciences, art, management, other) in perceiving the importance of the dimensions of TCI.
H1f: There are differences between respondents from India with different educational levels (undergraduate, postgraduate, Ph.D.) in perceiving the importance of TCI dimensions.
Then I formulated the second hypothesis: H2: There are differences among respondents of different nationalities (India, Poland,) in perceiving the importance of TCI dimensions.
Validity refers to "the methodological and/or conceptual soundness of research  1985, p. 9).
The determination of a measure's validity focuses on the extent to which its content captures the construct it intends to measure, as well as the extent to which empirical evidence supports its theoretical structure: "validity is the most important consideration in test evaluation" (American Psychological Association, American Educational Research Association, & National Council on Measurement in Education, 1985, p. 9).
The current conceptualizations of validity emphasize the need to understand what is being assessed, how it can be identified, and what else may be contributing to the scores to verify the content and validity of a measure.Thus, validity must be a primary concern throughout the instrument development process.
There are four types of validation: predictive validity, concurrent validity, content validity, and construct validity (Cronbach & Meehl, 1955).The current study aims to investigate the construct validity of TCI to measure its quality.
The idea of construct validity was introduced by the American Psychological Association with the goal to validate theoretical attributes and qualities (Colliver et al., 2012).Construct validity is central to establishing test quality (Embretson & Gorin, 2001).Construct validation is defined as a process by which researchers provide evidence to establish appropriate inferences from observed scores to conceptualization for a particular construct (Forer & Zumbo, 2011).Construct validation is applied whenever a test seeks to measure some attributes or quality which are not "operationally defined" (Lee, 1955).In terms of TCI, the attribute measure refers to four identified dimensions of teacher competencies.A "construct" is defined as a person's postulated attribute.
We expect that a person will at any time possess (or not) a qualitative attribute, namely competency.Construct validity attempts to validate the theory behind the instrument.The TCI constructs were theoretically defined and described based on the literature review (Ludwikowska, 2019).
The essence of the theoretical validity assessment process is the reference to the theory of the measured characteristic.Cronbach and Meehl (1955) distinguish five ways of estimating theoretical validity.This study focused on the analysis of intergroup differen ces (t-test, variation analysis Anova).If the test is an accurate measure of a particular construct, the results obtained by the two groups should vary significantly (Westen & Rosenthal, 2005).
The t-test is applied as a statistical method to assess whether the mean value of the data from an independent sample -which follows a normal distribution -is consistent with or significantly departs from the mean value of a null hypothesis.Furthermore, the t-test evaluates whether the difference between the means of two independent samples following a normal distribution is statistically significant (Rivas-Ruiz et al., 2013).However, there was a limited number of studies analyzing the validity of the test using the t-test.An Anova test is a way to find out if the survey results are significant.
Therefore, to evaluate the validity of the TCI, this study scrutinized the quality of the test in terms of its construct validity by using the independent sample t-test, variance analysis, and Anova in two different environments with context variables (Cronbach & Meehl, 1955).

Results and Discussion: Differences Between Groups
To examine the proposed hypotheses and answer research questions, differences between groups were compared with the t-test and the Anova one-and two-factor difference tests.For significant interactions, Duncan's multiple range test was applied.Descriptive statistics were calculated, including mean and standard deviation, degrees of freedom (df), and the level of statistical significance (p).Descriptive statistics in the t-tests between the following groups were included: men and women for the groups from India; respondents answering "Yes" and "No" to the question "Have you ever studied/ taught abroad?" for the group from India; respondents answering "Yes" and "No" to the question "Have you ever attended classes by a foreign teacher at your university?" for the group from India; respondents of different nationalities.
Descriptive statistics between the following groups were calculated in the single-factor Anova difference tests: respondents of different current positions for the group from India; respondents studying different faculties for the group from India; CEMJ 95 Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions respondents with different levels of education for the study group from India; respondents of varying gender and nationalities.
Duncan's multiple range tests were calculated for significant interactions between the following groups: respondents of different genders and nationalities in the TA dimension; respondents of different genders and nationalities in the GC dimension.
To verify hypotheses H1 and H2, I analyzed whether there are statistically significant differences between the two groups of respondents from India.For this purpose, the t-test statistics for independent samples were calculated.
First, descriptive statistics between men and women for the groups from India were analyzed (Table 2).The results obtained in the SA dimension were statistically significant (t (217) = -2.062;p = 0.040) and show that women have higher scores (4.23 ± 0.43) than men (4.19 ± 0.57; Table 2).Next, the t-test was applied to verify whether respondents answering "Yes" Vol. 30, No. 3/2022 Kamila Ludwikowska or "No" to the question "Have you ever studied/taught abroad?"differ statistically significantly in terms of average values of measured variables (Table 3).Statistically significant results (t (217) = -2.450;p = 0.015) were observed in the SA dimension, in which respondents from India who answered "No" to the question "Have you ever studied/taught abroad?" had higher results (4.32 ± 0.48) from respondents who answered "Yes" (4.14 ± 0.58; Table 3).The analysis with the t-test revealed that respondents who answered "Yes" or "No" to the question "Have you ever attended classes by a foreign teacher at your university?" differed in a statistically significant way in terms of average values of measured dimensions (Table 4).
The results obtained in the SA dimension showed a statistically significant difference (t (217) = -2.894;p = 0.004): respondents from India who answered "No" to the question "Have you ever attended classes by a foreign teacher at your university?" have higher scores (4.38 ± 0.50) than respondents who answered "Yes" (4.17 ± 0.53; Table 4).
Statistically significantly (t (217) = -2.347;p = 0.020) in the LE dimension respondents from India, who answered "No" to the question "Have you ever attended classes by The further study aimed to verify differences between the four dimensions for respondents of different nationalities (India, Poland;  One-factor ANOVA was calculated to analyze the differences between more than two studied groups.Duncan's multiple range tests were calculated for significant interactions. First, the differences between respondents of different current positions for the group from India were calculated (Table 6).CEMJ 99 Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions

Kamila Ludwikowska
The current position of the respondents revealed that the four groups did not differ statistically significantly (p> 0.05) in terms of the average values measured in the four dimensions.
Then, the differences between respondents of different faculty/branch for the group from India were calculated (Table7).
The "Faculty/Branch you study/graduate/teach" section revealed that the four studied groups did not differ significantly (p> 0.05) in terms of average values measured in four dimensions (Table 7).
Third, the differences between respondents of different educational levels for the group from India were calculated (Table 8).The educational level of the respondents revealed that the four studied groups did not differ significantly (p> 0.05) in terms of the average values measured in the four dimensions (Table 8).
The two-factor ANOVA test was performed to analyze the differences between respondents of different sex and nationalities in terms of measured variables (Table 8).
CEMJ 101 Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions Table 9. Descriptive statistics with the two-factor Anova difference test between respondents of different gender and nationality The interaction between respondents' nationality and sex was significant for the TA dimension (F (1; 262) = 5.337; p = 0.021) and for the GC (F (1; 262) = 7,028, p <0.01; Table 9).For these variables, post hoc test scores were calculated for specific differences (Table 10).An important conclusion is that respondents of different sex from India in the TA dimension have higher scores than respondents from Poland.An interesting result is also that there are no differences (p = 0.348) between women (M = 4.30) and men (M = 4.18) in the group from India in relation to the dimension mentioned above.In the Polish group, men had significantly (p <0.01) lower scores (M = 3.24) than women (M = 3.77).Figure 1 presents the mean values with a 95% confidence interval for the mean value broken down by group and sex.

India
The descriptive statistics of Duncan's multiple range test were calculated for respondents of different sex and nationalities in the GC dimension (Table 11).An important conclusion was also that respondents of a different sex from India in the GC dimension have higher scores than respondents from Poland.An interesting result was the lack of differences (p = 0.606) between women (M = 4.30) and men (M = 4.23) in the dimension mentioned above in the Indian group.In the Polish group, men had significantly (p <0.01) lower scores (M = 3.25) than women (M = 3.81; Table 11).
Figure 2 presents the mean values with a 95% confidence interval for the mean value divided by respondents' nationality and sex.
The results obtained in the study confirmed the high internal quality of TCI.The research analysis that applied the t-test and the Anova one-and two-factor difference tests confirmed H1: There are differences between groups of respondents from India in perceiving the importance of the dimensions of TCI.The sub-hypotheses H1a, H1b, and H1c were also evaluated:

Kamila Ludwikowska
H1a: the women in the SA dimension had higher scores (4.23 ± 0.43) than the men (4.19 ± 0.57; Table 4); H1b: statistically significant, respondents from India in the SA dimension who have chosen the answer "No" to the question "Have you ever studied/taught abroad?"(4.32 ± 0.48) had higher results than respondents who answered "Yes" (4.14 ± 0.58; Table 3); H1c: only the SA and LE dimensions were significant (Table 4).
The respondents from Poland had lower scores (3.67 ± 0.86) than the respondents from India (4.25 ± 0.53) in all dimensions (Table 5), which confirmed H2.Furthermore, the results of the two-factor Anova difference test between respondents of different sex and nationalities strengthened H1a and H2.Statistically significantly, the interaction between nationality and gender of the respondents appeared for the TA dimension (Table 9).These differences were supported by the results of Duncan's CEMJ 105 Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions multiple range tests for specific differences (Table 10).In two dimensions, TA and GC, respondents of different sex from India have higher scores than respondents from Poland.Moreover, there were differences between women and men in the group from India in both cases.However, in the Polish group, men had significantly lower scores than women (Table 10, Table 11).
However, the sub-hypotheses H1d, H1e, and H1f were not confirmed.The current position -the faculty/branch you study/graduate/teach -and the educational level of respondents revealed that the four groups do not differ statistically significantly (p > 0.05) in terms of average values measured in four dimensions (Table 6, Table 7, Table 10).This result gives the basis to accept the HO about the lack of differences between the analyzed groups of respondents.

Discussion
There is a growing number of assessments, testing, and evaluation methods.The ultimate goal of any measurement instrument is to produce enough information to allow the user to make appropriate judgments.The results of the study allowed me to answer the research questions: Q1: Research analyses confirmed that there was a high Cronbach's alpha reliability for both groups from India and Poland for the developed inventory.Furthermore, the comparison test in most cases confirmed the formulated hypotheses.The internal quality of the test has been established through the validity of the construct.The analysis confirmed the quality of the TCI in terms of its validity as a measure of the pedagogical competencies of academic teachers and its application to HR practices.Q2: In most analyzed cases, the model fits different groups related to sex, nationality, respondents' experience in studying/teaching abroad, and respondents' experience in interacting with a foreign teacher at home university.Thus, the results confirmed the quality of the TCI in terms of its validity as a measure of pedagogical competencies of academic teachers and its application to HR practices.
Competency tests are standardized methods of competency assessment (Armstrong & Taylor, 2014).These are the tools that provide objective and reliable information about the level of competency a person has and enable planning further development.
The objectivity of the results has high psychometric parameters that include accuracy and reliability.Moreover, a comparison is possible between the respondents considering themselves and the competency profile associated with their positions.
Therefore, the teacher competency inventory (TCI) was revealed to be a meaningful tool to enhance HR practices in HEIs and to indicate the current and desired level of competencies.The validated dimensions of the competency model can be used to control the quality of teaching considered as a determinant of the quality of any educational institution (Yeşilçınar & Çakır, 2020).

Conclusions
The TCI is a four-dimensional scale that explains future-oriented teacher competencies.The findings of the study contribute to the field of assessing teacher competencies.
The TCI allows comparing teacher competencies with the expected competency profile.Possible differences may indicate deficits or overruns in relation to the expected profile.The 66-item TCI is an accessible and easily administered measure of teacher competencies.
The identified four competencies and behavioral descriptions provide a framework of competencies that can be used to build the desired competency profile of the teacher in HEIs in India and Poland.
Comparison of the desired and existing teacher competency profile allows identifying the competence gap.The identified competency gap may be the basis for increasing job performance through different HR practices.Competencies are indicators of the quality of education, which results at the microlevel in a higher quality of teaching and learning in the classroom and -at the macrolevel -of education as a whole.
Competence profiles are widely used in HRM, including improving quality and efficiency of work, recruitment, selection, and adaptation of employees, increasing the flexibility of employment, transparency, and clarity of employee roles, training, development of professional skills, planning a career development path, rewarding employees, and creating equal opportunities and possibilities.
However, this empirical study was burdened with certain limitations.First, the hypotheses were tested based on one research sample, limited to educational organizations operating in Poland and India.Nevertheless, it seems to be a solid underlining of the need for further analysis of the validity of tests used for HR practice in HEIs.Second,

Figure 1 .
Figure 1.The mean values with a 95% confidence interval for the average value of the TA dimension by gender and group

Figure 2 .
Figure 2. The mean values with a 95% confidence interval for the average value of the GC dimension by gender and group

Table 1 .
Initial normalisation of the scales measured Source: own work.
Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions refers to validity as "the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores.Test validation is the process of accumulating evidence to support such inferences" (American Psychological Association, American Educational Research Association, & National Council on Measurement in Education, " (Graziano & Raulin, 2000, p. 436).The Standards for Educational and Psychological Testing CEMJ 93

Table 2 .
Descriptive statistics and the t-test between male and female for the groups from India Source: own work.

Table 3 .
Descriptive statistics and the t-test between respondents answering "Yes" or "No" to the question "Did you ever study/teach abroad?" for the group from India Source: own work.

Table 4 .
Descriptive statistics and the t-test between respondents answering "Yes" or "No" to the question "Did you ever attend classes by a foreign teacher at your university?" for the group from India Source: own work.

Table 5 .
Descriptive statistics and the t-test between respondents of different nationalities Source: own work.

Table 6 .
Descriptive statistics and the one-factor Anova difference test between respondents of different Current Position for the group from India Source: own work.

Table 7 .
Descriptive statistics and the one-factor Anova difference test between respondents of different Faculty/ Branch for the group from India

Table 8 .
Descriptive statistics with the one-factor Anova difference test between respondents of different Educational level for the group from India Source: own work.

Table 10 .
Descriptive statistics of post-hoc Duncan difference test between respondents of different gender and nationalities in the TA dimension Source: own work.

Table 11 .
Descriptive statistics of post-hoc Duncan difference test between respondents of different gender and nationalities in the GC dimension Competency-Based Tests as a Tool for Teacher Evaluation in Higher Education Institutions CEMJ 103