T.D. 2/96  
Decision rendered on February 15, 1996  
THE CANADIAN HUMAN RIGHTS ACT  
(R.S.C., 1985, c. H-6 (as amended))  
HUMAN RIGHTS TRIBUNAL  
BETWEEN:  
PUBLIC SERVICE ALLIANCE OF CANADA  
Complainant  
- and -  
CANADIAN HUMAN RIGHTS COMMISSION  
Commission  
- and -  
TREASURY BOARD  
Respondent  
DECISION OF THE TRIBUNAL  
Tribunal:  
Donna Gillis, Chairperson  
Norman Fetterly, Member  
Joanne Cowan-McGuigan, Member  
Appearances:  
Andrew Raven  
Counsel for the Public Service Alliance of Canada  
Rosemary Morgan and René Duval  
Counsel for the Canadian Human Rights Commission  
Duff Friesen, Lubomyr Chabursky and Deborah Smith  
Counsel for Treasury Board  
Location of Hearing: Ottawa, Ontario  
TABLE OF CONTENTS  
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1  
II. ISSUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13  
III. LEGISLATION . . . . . . . . . . . . . . . . . . . . . . . . . . 14  
IV. BURDEN OF PROOF . . . . . . . . . . . . . . . . . . . . . . . . 39  
V. STANDARD OF PROOF . . . . . . . . . . . . . . . . . . . . . . . 47  
VI. FACTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51  
A. THE WILLIS PLAN . . . . . . . . . . . . . . . . . . . . . . 51  
B. THE WILLIS PROCESS . . . . . . . . . . . . . . . . . . . . 55  
(i). Data-Gathering . . . . . . . . . . . . . . . . . . . . 57  
(ii). Willis Questionnaire . . . . . . . . . . . . . . . . 59  
(iii). Coordinators . . . . . . . . . . . . . . . . . . . . 62  
(iv). Screeners and/or Reviewers . . . . . . . . . . . . . 66  
C. THE EVALUATION PROCESS . . . . . . . . . . . . . . . . . . 74  
(i). Master Evaluation Committee . . . . . . . . . . . . . 74  
(ii). Multiple Evaluation Committees . . . . . . . . . . . . 78  
(iii). Process for Evaluation of Questionnaires . . . . . . 82  
(iv). Training of the Multiple Evaluation Committees . . . 85  
(v). Master Evaluation Committee's Evaluations . . . . . . 90  
(vi). Multiple Evaluation Committees' Evaluations . . . . . 96  
(vii). Re-Training of Multiple Evaluation Committees . . . . 111  
(viii). Sore-Thumbing . . . . . . . . . . . . . . . . . . . . 111  
D. RELIABILITY TESTING . . . . . . . . . . . . . . . . . . . . 112  
(i). Inter-Rater Reliability Testing . . . . . . . . . . . . 112  
(ii). IRR Testing in the Multiple Evaluation Committees . . . 124  
(iii). Inter-Committee Reliability Testing . . . . . . . . 125  
(iv). ICR Testing in the Multiple Evaluation Committees . . . 126  
(v). Wisner 222 Re-Evaluations . . . . . . . . . . . . . . . 134  
E. THE COMMISSION . . . . . . . . . . . . . . . . . . . . . . 150  
(i) Commission Investigation . . . . . . . . . . . . . . . . 152  
(ii). Sunter's Analysis . . . . . . . . . . . . . . . . . . . 185  
F. ROLE OF CONSULTANTS IN RE-EVALUATIONS . . . . . . . . . . . 199  
G. WHETHER THE RESULTS SHOULD BE ADJUSTED - THE EXPERTS . . . 210  
VII. DECISION AND ANALYSIS . . . . . . . . . . . . . . . . . . . . . 213  
VIII.CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . 244  
APPENDIX A - COMMITTEE MANDATES  
I. INTRODUCTION  
1. The Canadian Human Rights Commission (the "Commission") is  
established under the Canadian Human Rights Act, R.S., 1985, c. H-6 as  
amended (the "Act"), and is a party in this complaint, representing the  
public interest.  
2. The Commission presented six witnesses qualified to testify as  
experts. The first witness to appear was Dr. Nan Weiner, an expert in pay  
equity and compensation. The second expert to testify was Norman D.  
Willis, an expert in pay equity and job evaluation. They were followed by  
two expert statisticians, Dr. Richard Shillington, an expert in data  
analysis and Alan Sunter, an expert in statistics. Also called were two  
employees of the Commission, Paul Durber and James Sadler. Durber is an  
expert in pay equity, job evaluation and other general areas of job  
evaluation and Sadler is an expert in pay equity and job evaluation.  
3. The Respondent, Treasury Board (the "Employer"), is the employer  
of employees who work in Federal Public Service of Canada listed in  
Schedule 1, Part 1 of the Public Service Staff Relations Act, 1966-67, c.  
72, s.1, p. 35, Schedule 1 (the "PSSRA"). In addition to Willis, the  
Employer only called one expert to testify, Fred Owen. Owen was a former  
Willis consultant and an expert in pay equity and job evaluation.  
4. The Complainant, the Public Service Alliance of Canada (the  
"Alliance"), is an "employee organization" within the meaning of the PSSRA.  
The Alliance has been certified by the PSSRA to act as bargaining agent for  
a number of bargaining units in the Federal Public Service. The Alliance  
is the third largest union in Canada representing approximately 170,000  
employees, 70 per cent of whom work outside of the National Capital Region.  
The Alliance is composed of 18 components which are, with the exception of  
one or two components, male-dominated. The largest bargaining unit  
represented by the Alliance is the Clerical and Regulatory Group (the "CR  
Group") which consists of approximately 50,000 employees. This bargaining  
unit is 80 per cent female and includes employees performing an extremely  
wide range of functions.  
5. The Alliance called four experts to testify during the course of  
this hearing. The first was Dr. Pat Armstrong, accepted by the Tribunal as  
an expert in job evaluation and pay equity. The Alliance also called Dr.  
Eugene Swimmer, an expert in labour economics and statistics. The Tribunal  
accepted one Alliance employee, Margaret Jaekl, as an expert in pay equity  
and job evaluation. Another individual, Margaret I. Krachun, who at the  
time of the hearing was employed by the Alliance, was accepted as a  
layperson with some experience in evaluation gained while a member of one  
of the evaluation committees.  
6. The case originally before the Tribunal arose from complaints  
filed by both the Alliance and the Professional Institute of the Public  
Service of Canada (the "Institute") alleging violation of s. 11 of the Act.  
The Institute called one expert witness, Dan Butler, a negotiator with the  
Institute. He was accepted by the Tribunal as an expert expressing the  
2
opinion of the Institute on several issues before the Tribunal, primarily  
on wage adjustment methodology.  
7. The human rights complaints before the Tribunal now pertain only  
to the complaints of the Alliance. The Institute's complaints are no  
longer before us. Those complaints were resolved by a negotiated  
settlement between the Employer and the Institute. A Consent Order was  
issued by the Tribunal dated May 31, 1995, giving effect to their  
settlement.  
8. In the case of the Alliance, two complaints remain for our  
determination. The first complaint, dated December 19, 1984, alleges  
discriminatory practice contrary to ss. 7, 10 and 11 of the Act with  
respect to employees in the female-dominated CR Group. It is only the s.  
11 portion of the 1984 CR Group complaint which has been referred to the  
Tribunal for ruling. The complaint presented on behalf of the employees in  
the CR Group affects the rights of approximately 50,000 workers who belong  
to this group.  
9. The second complaint, dated February 16, 1990, alleges the  
results obtained through the process of the Joint Union-Management  
Initiative on Equal Pay for Work of Equal Value has demonstrated the  
existence of wage rates which are in contravention of s. 11 of the Act with  
respect to employees in the female-dominated occupational groups: Clerical  
and Regulatory; Secretarial, Stenographic and Typing; Data-Processing;  
Educational Support; Hospital Services; and Library Science. This  
complaint of the Alliance was filed with the Commission shortly after the  
breakdown of the Joint Union-Management Initiative (which will be detailed  
later). That complaint relies upon the job evaluation data generated by a  
study resulting from this initiative claiming, in support of its position,  
that employees in the identified complainant groups continue to suffer wage  
rate discrimination contrary to s. 11 of the Act, notwithstanding  
unilateral payments announced by the Employer in January of 1990.  
10. From the outset, the Alliance's preferred position was to attempt  
to resolve equal pay issues through negotiations with the Employer at the  
bargaining table. It was only when these measures failed to lead to  
corrective action that the complaint mechanism of the Act was invoked.  
11. The human rights complaints of the Alliance are not the first s.  
11 complaints the Alliance has presented under the Act. The earlier  
complaints include the complaint of the Library Science Group (the "LS  
Group") and the Hospital Services Group (the "HS Group") on behalf of  
employees in the female-dominated sub-groups in the General Services Group  
(the "GS Group").  
12. In each of these cases, monetary compensation in the form of wage  
adjustments were paid to affected employees. The LS Group complaint was  
resolved with the understanding that final corrective action would await  
the outcome of the study. In the matter of the HS Group complaint, which  
was the subject of a Tribunal Order of July 15, 1987, another earlier  
tribunal, it was expressly understood by the parties that s. 11 complaint  
3
would likewise await final wage gap computations after the conclusion of  
the study.  
13. Each Federal Public Service employee occupies a position which is  
classified in accordance with the Employer's classification system. The  
Employer's classification system is comprised of 69 occupational groups,  
each with its own classification standard ("job evaluation system").  
14. In the classification system, positions are classified as  
belonging to occupational groups, sub-groups (where applicable) and levels.  
Occupational groups are designated by two-letter abbreviations; sub-groups  
by three-letter abbreviations. A position is the smallest organizational  
unit and represents a unique set of tasks and duties performed by an  
individual. The Employer has the same number of positions as it has  
employees. On the other hand, a job in the Federal Public Service is a  
grouping of positions which have the same key duties and responsibilities.  
15. The occupational groups are assembled into six occupational  
categories as follows: (i) the Scientific and Professional Category; (ii)  
the Administrative and Foreign Service Category; (iii) the Technical  
Category; (iv) the Administrative Support Category; (v) the Operational  
Category; and (vi) the Executive Category.  
16. In March of 1985, the government initiated pro-active measures to  
implement the principles of equal pay for work of equal value in the  
Federal Public Service. It invited unions and management to participate as  
partners in a senior level Joint Union-Management Initiative (the "JUMI").  
The JUMI was directed by a committee (the "JUMI Committee"). The JUMI  
Committee was asked to prepare a detailed implementation plan in the area  
of equal pay for work of equal value. The unions, not only the Alliance,  
but other unions as well, accepted the government's invitation. The  
Alliance, at the time of accepting this invitation, had established a  
consistent policy of supporting the principle of equal pay for work of  
equal value. At the time of the voluntary initiative, there were three  
outstanding complaints before the Commission under s.11 of the Act.  
17. The action plan agreed to by the JUMI Committee was to conduct a  
study (the "JUMI Study") pursuant to s. 11 of the Act to determine the  
degree of sex discrimination in pay and to devise methods for system wide  
correction in order to eliminate sexually based wage disparities (Exhibit  
HR-11A, Tab 9, Annex B). The Commission was invited to be a participant of  
the JUMI Study to fulfil the role of an observer at committee meetings and  
to provide interpretation and guidance when required by the JUMI Committee.  
(Exhibit HR-11A, Tab 7). The Commission held all s. 11 complaints, which  
had been filed before the JUMI Study commenced, in abeyance. The  
Commission agreed that any new complaints received during the JUMI Study,  
which might be affected by the study, were to be held in abeyance as well.  
18. The JUMI Committee had equal representation from the Employer and  
eight different unions. The JUMI Committee's first task was to define the  
parameters of the JUMI Study. Pivotal to its operation was the requirement  
for joint agreement between management and union representatives on the  
process to be used during the JUMI Study (the "JUMI Process"). Neither the  
4
unions nor management was to act independently or make decisions in the  
course of the JUMI Study without joint approval. The JUMI Committee hired  
Willis & Associates, a consulting firm based in Seattle, Washington, to  
assist in the Study. Willis & Associates was founded and directed by  
Norman Willis.  
19. Early on in the JUMI Study, the JUMI Committee made it abundantly  
clear to Willis that he had no decision-making authority in the conduct the  
JUMI Study. Willis' role was to attend the meetings and to give advice at  
the request of the JUMI Committee.  
20. The JUMI Committee established sub-committees at various stages  
which were called upon by the JUMI Committee to provide advice, to perform  
certain tasks, and make recommendations to the JUMI Committee with respect  
to particular issues. Agreement by members of the JUMI Committee was  
required in order to form a sub-committee. Each sub-committee thus formed  
had equal representation from union and management sides.  
21. In the fall of 1987, the JUMI Committee established the Equal Pay  
Study Secretariat (the "EPSS") to conduct the administrative work  
associated with the JUMI Study. The EPSS was managed by a Treasury Board  
representative, Pierre Collard. The objective of the EPSS was to provide  
administrative support to the multiple evaluation committees in the JUMI  
Study and it was responsible for the coordination of all support  
activities.  
22. In addition to hiring Willis & Associates, the JUMI Committee  
eventually agreed on other important matters. The JUMI Committee agreed to  
evaluate positions from male- and female-dominated occupational groups  
using a common evaluation plan. A comparison of wages paid to male- and  
female-dominated occupational groups performing work of equal value could  
then be made. The JUMI Committee agreed the study would be "position  
specific" using a representative sample of positions. A position-specific  
study means every different job selected for evaluation is evaluated  
separately as opposed to "predominant use" studies in which positions are  
selected for evaluations that best represent a classification or grouping  
of jobs. The JUMI Committee agreed only positions from male- and female-  
dominated occupational groups, as defined in s. 13 of the Equal Wages  
Guidelines (the "Guidelines"), were to be included in the representative  
sample.  
23. As of March, 1985, based on s. 13 of the Guidelines (which  
prescribes the criteria defining sex predominance), the parties agreed  
there were 9 female-dominated occupational groups, 53 male-dominated  
occupational groups and 8 gender-neutral occupational groups. For clarity,  
s. 13 of the Guidelines is reproduced as follows:  
13. For the purpose of section 12, an occupational group is  
composed predominantly of one sex where the number of members of  
that sex constituted, for the year immediately preceding the day  
on which the complaint is filed, at least  
5
(a) 70 per cent of the occupational group, if the group has less  
than 100 members;  
(b) 60 per cent of the occupational group, if the group has from  
100 to 500 members; and  
(c) 55 per cent of the occupational group, if the group has more  
than 500 members.  
24. The nine female-dominated occupational groups represented by the  
Alliance and the Institute with their abbreviations are listed below:  
¨ Clerical and Regulatory (CR);  
¨ Data Processing (DA);  
¨ Education Support (EU);  
¨ Home Economics (HE);  
¨ Hospital Services (HS);  
¨ Library Science (LS);  
¨ Nursing (NU);  
¨ Occupational and Physical Therapy (OP); and  
¨ Secretarial, Stenographic, Typing (ST).  
25. Positions from gender-neutral occupational groups or the  
Executive Category were excluded from the study. The proposed JUMI Study,  
although service-wide in nature, was not intended to cover all employees  
providing services for the Government of Canada. The JUMI Study did not  
include employees of Crown Corporations nor did it include employees of  
separate employers. For purposes of the legislation, separate employers  
are identified in Part II of the PSSRA as follows:  
¨ Atomic Energy Control Board  
¨ Canadian Advisory Council on the Status of Women  
¨ Canadian Security Intelligence Service  
¨ Communications Security Establishment, Department of  
National Defence  
¨ Economic Council of Canada  
¨ Medical Research Council  
¨ National Film Board  
¨ National Research Council of Canada  
¨ Natural Sciences and Engineering Research Council  
¨ Northern Canada Power Commission  
¨ Northern Pipeline Agency  
¨ Office of the Auditor General of Canada  
¨ Public Service Staff Relations Board  
¨ Science Counsel of Canada  
¨ Social Sciences and Humanities Research Council  
¨ Staff of the Non-Public Funds, Canadian Forces  
26. The sample eventually drawn was representative of positions by  
groups and levels for female-dominated occupational groups and by group for  
male-dominated occupational groups. Approximately 2,800 positions from  
female-dominated occupational groups and 1,500 positions from male-  
dominated occupational groups were ultimately included in the sample. The  
sample size and composition met with the approval of Statistics Canada.  
6
27. The JUMI Committee agreed to use the Willis Job Evaluation Plan,  
with some amendments, as the appropriate job evaluation instrument for  
evaluating the representative sample of positions. The JUMI Committee also  
agreed to use the Willis Questionnaire, with amendments, to gather  
information on the positions to be evaluated. A communications strategy  
was recommended and agreed upon by a JUMI sub-committee to encourage  
selected incumbents to participate in the JUMI Study and to provide  
information on their positions. Position information was then collected  
from September, 1987 until January 1989.  
28. The JUMI Committee acting on Willis' advice established, as a  
first step in the process of evaluation, a Master Evaluation Committee (the  
"MEC"). The MEC was asked to evaluate 503 position questionnaires which  
were to serve as benchmarks and as a frame of reference for all subsequent  
evaluations by other evaluation committees. The MEC began its important  
task in September, 1987 and finished it in July, 1988. In the final  
analysis, the MEC completed 501 benchmark evaluations.  
29. After the MEC completed their evaluations, the remaining  
evaluations were done by 14 evaluation committees, (the "multiple  
evaluation committees"). The first five multiple evaluation committees  
began evaluating in September, 1988. By April, 1989, they had evaluated  
approximately 1,283 positions. In April, 1989, the multiple evaluation  
committees were expanded from five to nine. The nine committees included  
some members of the first five multiple evaluation committees as well as  
new members. The expanded committees evaluated approximately 1,400  
positions between April, 1989 and September, 1989.  
30. In May of 1989, the JUMI Committee decided, in view of the slow  
progress at which the questionnaires were being evaluated, that the sample  
size should be reduced by approximately 880 positions. The JUMI Committee  
then agreed to reduce the original sample from 4,300 positions to  
approximately 3,280 positions. The Office of the Chief Statistician for  
Statistics Canada was advised of the nature and reasons for the reduction  
in the sample size and approved the reduction. In the end, the MEC and the  
14 multiple evaluation committees evaluated 3,185 positions from the  
reduced sample of positions.  
31. The Commission's representatives functioned as observers  
throughout the evaluations of the MEC and the multiple evaluation  
committees. They were present during the meetings of the JUMI Committee  
and meetings of the multiple evaluation committee chairpersons.  
32. Overall, the JUMI process had a number of shortcomings, largely  
due to the manner in which it operated. According to Willis, the JUMI  
Committee was "ill-formed". Rather than working as a team, the JUMI  
Committee functioned in a negotiating mode with the unions on one side and  
the Employer on the other. As described by Willis, each side spoke with  
one voice. Because the Employer represented a singular position, this  
required the unions to caucus in order to respond in one voice. Rather  
than a joint union-management committee working together as a team, the  
proceedings were akin to union-management bargaining.  
7
33. As a result, many decisions took a great deal of effort and time  
and were not easily or amicably achieved. For example, after the first  
JUMI Committee meeting which was held on September 16, 1985, it took until  
September 22, 1986, one year later, for the parties to reach an agreement  
on the Terms of Reference and Action Plan for the JUMI Study.  
34. The length of time needed to carry out the JUMI Study prompted  
the Chief Commissioner of the Commission, on different occasions, to urge  
the President of the Treasury Board to resolve the outstanding issues  
occupying the JUMI Committee.  
35. Another problem in the JUMI process was the inability of the  
management and union sides to reach closure on some major aspects of the  
JUMI Study. For example, when the MEC had completed its benchmark  
evaluations, Treasury Board withheld whole-hearted support of those  
evaluations. Although Treasury Board agreed to proceed with the rest of  
the evaluations, it continued to harbour doubts and indicated its intention  
to study the reliability of the MEC benchmarks independently.  
36. Problems also arose during the course of the multiple evaluation  
committees' evaluations. Willis recommended disbanding one of the original  
five multiple evaluation committees. The JUMI Committee rejected this  
recommendation and could not agree on a resolution. In addition, there  
were some multiple evaluation committee challenges to the MEC benchmark  
evaluations. The JUMI Committee established a smaller version of the MEC  
(the "Mini-MEC") to review and discuss these challenges. The Mini-MEC  
could not reach a consensus so in the end the matter was never fully  
resolved.  
37. The JUMI Study was intended to encompass four phases. These  
phases were to be as follows:  
Phase I  
Agreement on the common evaluation plan to be used to determine  
the relative value of jobs and on the evaluation of benchmark  
positions.  
Phase II  
Agreement on the statistical methodology for sampling actual  
positions.  
Phase III  
Sampling and evaluation of actual positions, using the agreed to  
evaluation plan with benchmarks.  
8
Phase IV  
Determination of the degree of wage disparity and recommendations  
on corrective measures. These may include recommendations to  
resolve discriminatory aspects of the classification system which  
contribute to wage inequity as defined in Section 11 of the  
Canadian Human Rights Act.  
(Exhibit HR-11A, Tab 9)  
38. During the life of the JUMI Study tension between the management  
and union sides persisted and intensified. There was disagreement between  
the union and management sides relating to the release of evaluation  
scores. The JUMI Committee agreed the data would be released after two-  
thirds of the evaluations were completed. According to Willis, following  
the release of the MEC evaluation scores on July 13, 1988, relationships in  
the JUMI Committee began to deteriorate. It then became apparent to Willis  
that the climate of the JUMI Committee had changed. When the MEC results  
were made available to the parties, the Employer's classification system  
became an issue for the Employer. Willis was troubled and mystified by  
correspondence he received from the management co-chair on August 18, 1988,  
which indicated the parties were not ad idem on the purpose of the JUMI  
Study.  
39. During the last few months of the JUMI Study, an issue arose  
between the union and management sides relating to a report released by  
Willis & Associates concerning re-evaluations by a Willis consultant of 222  
multiple committee evaluations. This issue was never resolved by the JUMI  
Committee and eventually, led to the final breakdown of the JUMI Study.  
40. The parties had contemplated eventual agreement upon a joint  
recommendation to the President of Treasury Board for implementation of pay  
equity. Phase 4 of the JUMI Study was never achieved. After approximately  
four years, in December, 1989, the union side withdrew from the JUMI Study  
on a temporary basis. In January, 1990, the largest participant union in  
the JUMI Study, the Alliance, permanently withdrew from the JUMI Study.  
41. Early in 1990, the Government of Canada made a decision to  
unilaterally implement immediate measures to achieve equal pay for work of  
equal value for female-dominated occupational groups in the Federal Public  
Service. The measures adopted by the government were based on the  
evaluation results of the JUMI Study with corrective adjustments for gender  
bias arising from the controversial report by Willis & Associates on the  
222 re-evaluations. Those measures were referred to as the public service  
"equal pay adjustments" or the "equalization payments". The equalization  
payments were applied to three female-dominated occupational groups, the  
CR, NU, and ST Groups.  
42. Neither the Commission nor the Alliance, or any of the other  
participant unions, were consulted by the Employer prior to making these  
voluntary adjustments. The parties were first informed of the Employer's  
decision when the President of the Treasury Board made an announcement on  
January 26, 1990. The adjustments involved payments of approximately $317  
9
million for wages retroactive to April 1, 1985 and payments of $76 million  
annually in continuing adjustments. The lump sum payments by the  
government were made retroactive to March 31, 1985, the month in which the  
Treasury Board President first announced the establishment of the Joint  
Union-Management Committee to study how gender based wage discrimination  
would be eliminated in the Federal Public Service.  
43. After the breakdown of the JUMI Study, the Commission and the  
Alliance made it clear to the Employer the data generated by the JUMI Study  
would be presented as evidence to a Human Rights Tribunal.  
44. The formal investigation of the s. 11 complaints lodged with the  
Commission commenced following the announcement of the equalization  
payments. Included in its investigation, was an examination by the  
Commission of the equalization payments. This exercise was done to ensure  
full adherence to the Act and Equal Wages Guidelines. Following a formal  
six month investigation, the Commission decided to refer the s. 11  
complaints to a Tribunal. That decision was made on October 16, 1990.  
45. During the course of this hearing, when the Commission attempted  
to introduce the JUMI data into evidence, it was met with the objection by  
the Employer that the data was inadmissible on the grounds it had been  
created in an effort to resolve or avoid litigation and should therefore be  
treated as privileged. A voir dire was conducted by the Tribunal on this  
issue and following its completion, the Tribunal dismissed the Employer's  
objection in a ruling rendered August 21, 1992 (see Voir Dire Ruling for  
further details).  
46. The Employer alleges the job evaluation data generated in the  
course of the study is not sufficiently reliable for the adjudication of  
the complaints referred to the Tribunal. The Employer is not satisfied  
with the reliability of the evaluation results. The Employer's  
equalization payments indicate the extent to which the Employer is willing  
to rely upon the evaluation results. The Commission and the Alliance are  
seeking to use the evaluation data for a determination of wage disparity  
and pay adjustments under s. 11 of the Act.  
II. ISSUE  
47. As a result of a pro-active initiative by the Employer, the  
Complainant, together with 13 other public sector unions, and the  
Respondent entered into a pay equity study called the Joint  
Union/Management Initiative.  
48. The JUMI Study began in 1985 and lasted until January, 1990, when  
the JUMI Study was aborted firstly by the Complainant and then by the  
Respondent. The Complainant and the Respondent produced, over that period  
of time, job evaluation results.  
49. Prior to the commencement of the JUMI Study, the Complainant had  
filed with the Commission a s. 11 wage discrimination complaint against the  
Respondent. After the breakdown of the JUMI Study, the Complainant filed a  
second and new complaint against the Respondent.  
10  
50. The Commission and the Complainant intend to use the job  
evaluation results from the JUMI Study as evidence of the value of work  
performed by male and female employees whose jobs are the subject of these  
complaints. The Commission and the Complainant further intend to use the  
job evaluation results as proof of a wage gap alleged by these complaints  
as contrary to s. 11 of the Act.  
51. The Respondent submits the job evaluation results are unreliable  
for purposes of adjudication. More specifically, the Respondent alleges  
the job evaluation results are biased, in as much as, the male-dominated  
questionnaires and the female-dominated questionnaires used to produce the  
results were treated differently by the individuals who performed the  
evaluations.  
52. Therefore, the issue is whether or not the job evaluation results  
of the JUMI Study are reliable for purposes of the s.11 complaints referred  
to this Tribunal for deliberation.  
III. LEGISLATION  
53. The complaints before us allege wage discrimination on the basis  
of sex contrary to s.11 of the Act. Section 11 states:  
11(1) It is a discriminatory practice for an employer to establish  
or maintain differences in wages between male and female employees  
employed in the same establishment who are performing work of  
equal value.  
(2) In assessing the value of work performed by employees  
employed in the same establishment, the criterion to be applied is  
the composite of the skill, effort and responsibility required in  
the performance of the work and the conditions under which the  
work is performed.  
...  
(5) For greater certainty, sex does not constitute a reasonable  
factor justifying a difference in wages.  
54. The equal pay for work of equal value provisions of s. 11 of the  
Act was the subject of a Supreme Court of Canada decision in the case of  
Syndicat des employés de production du Québec et de l'Acadie v. Canada  
(Canadian Human Rights Commission), [1989] 2 R.C.S. 879 (S.C.C.). That  
decision dealt with the issue of whether the Canadian Human Rights  
Commission's decision to dismiss a complaint pursuant to s. 36(3)(b) of the  
Act is "required by law" to be made on a quasi-judicial basis and  
accordingly, reviewable by the Federal Court of Appeal under s. 28 of the  
Federal Court Act. The majority of the Court held that the Commission's  
decision was not reviewable by the Federal Court of Appeal under s. 28 of  
the Federal Court Act and thus, the Commission's decision was not one  
required to be made on a judicial or quasi-judicial basis.  
11  
55. Although the interpretation of s. 11 of the Act was not integral  
to the majority decision, Sopinka J. in delivering for the majority said at  
p. 903:  
The intention of s.11 is to prohibit discrimination by an employer  
between "male and female employees" who perform work of equal  
value and not to guarantee to individual employees equal pay for  
work of equal value irrespective of sex.  
56. In our view, as expressed by Sopinka J. the wording of s. 11  
prohibits any practice by an employer to differentiate on the basis of  
"sex" when determining the wages or compensation to be paid between its  
male and female employees who perform work of equal value. For greater  
certainty, s. 11(5) makes it clear that "sex" does not constitute a  
reasonable factor justifying a difference in wages. Other sections of the  
Act also refer to prohibitions on the basis of sex. Section 3(1) of the  
Act includes "sex" as one of the prohibited grounds of discrimination.  
Section 7 of the Act declares that it is a discriminatory practice to  
refuse employment or differentiate adversely during the course of  
employment on a prohibitive ground, i.e., sex. Section 10 of the Act  
declares that it is a discriminatory practice to establish or pursue a  
policy or practice or to enter into an agreement affecting recruitment,  
referral, hiring, promotion, training, apprenticeship, transfer or any  
other matter relating to employment or prospective employment that deprives  
or tends to deprive an individual or class of individuals of any employment  
opportunity on a prohibitive ground of discrimination, i.e., sex.  
57. The discriminatory practice alleged in the complaints before the  
Tribunal is that the Employer maintains a difference in wages between male  
and female employees employed in the same establishment who are performing  
work of equal value, contrary to s. 11. There are certain exceptions to  
the statutory prohibition against wage discrimination as stated by s. 11(4)  
of the Act. That section reads:  
11(4) Notwithstanding subsection (1), it is not a discriminatory  
practice to pay to male and female employees different wages if  
the difference is based on a factor prescribed by guidelines,  
issued by the Canadian Human Rights Commission pursuant to  
subsection 27(2), to be a reasonable factor that justifies the  
difference.  
58. The brief legislative history of s. 11 finds that the Government  
of Canada declared in 1976 that it would introduce a human rights bill.  
The major effect of the bill would be to prohibit discrimination on the  
grounds of race, colour, national or ethnic origin, religion, age, sex,  
marital status or physical handicap. In particular, the bill would  
establish the principle of equal compensation for work of equal value  
performed by persons of either sex. (Exhibit PIPSC-82).  
59. The "Background Notes" to the Canadian Human Rights Bill, issued  
by the then Minister of Justice, indicate that the bill would consider, in  
relation to a prohibited ground, discriminatory practices such as the  
differentiation in wages based on sex between workers performing work of  
equal value. The notes state, at p. 4:  
12  
This provision is designed primarily to cope with female `work  
ghettoes'; it would enable workers performing one sort of job,  
such as secretarial work, to have their compensation related not  
only to that of other secretaries, but also to other jobs of equal  
value in the firm.  
(Exhibit PIPSC-82, p. 3)  
60. In 1977, the Government of Canada enacted the Act. The intent of  
s. 11 of the Act is to ensure that men and women who perform work of equal  
value receive equal compensation. Section 11 came into force on March 1,  
1978. Section 27(2) of the Act authorizes the Canadian Human Rights  
Commission to pass guidelines "interpreting the provisions of the Act."  
Since the proclamation of the Act in 1978, the Guidelines were twice  
promulgated by the Commission. The first set of Guidelines passed pursuant  
to the Act were prepared to assist in the interpretation of s. 11 of the  
Act and were issued on September 18, 1978. These were revoked by the  
Guidelines dated November 18, 1986, and gazetted in December, 1986.  
61. The 1986 Guidelines describe the manner in which s. 11 of the Act  
is to be applied and the factors that are considered reasonable to justify  
a difference in wages between males and females performing work of equal  
value in the same establishment. The 1986 Guidelines prescribed ten  
factors justifying a pay differential between male and female employees  
performing work of equal value. None of these exceptions play a role in  
these complaints.  
62. The dissenting opinion in Syndicat, supra, is helpful because it  
does address some of the prerequisite elements necessary to build a case  
under s. 11 of the Act. The dissent was delivered by L'Heureux-Dubé J. in  
which her Ladyship refers to earlier decisions of the Supreme Court of  
Canada, namely, Robichaud v. Canada (Treasury Board), [1987] 2 S.C.R. 84  
and Canadian National Railway Co. v. Canada (Canadian Human Rights  
Commission), [1987] 1 S.C.R. 1114 (sub nom: Action Travail des Femmes)  
which reviewed complaints based on ss. 7 and 10 of the Act respectively.  
Both decisions make clear statements that intent is not a precondition to a  
finding of adverse discrimination under the Act. L'Heureux-Dubé J. notes  
the scope of protection under s. 11 differs from ss. 7 and 10 and says at  
p. 925:  
As intent is not a prerequisite element of adverse discrimination,  
a complainant may build his or her case under ss. 7 and 10 by  
presenting evidence of the type adduced by the complainant in the  
present case. Statistical evidence of professional segregation is  
a most precious tool in uncovering adverse discrimination.  
Section 11, however, differs from ss. 7 and 10. Its scope of  
protection is delineated by the concept of "equal value". That  
provision does not prevent the employer from remunerating  
differently jobs which are not "equal" in value. Wage  
discrimination, in the context of that specific provision, is  
premised on the equal worth of the work performed by men and women  
in the same establishment. Accordingly, to be successful, a claim  
13  
brought under s. 11 must establish the equality of the work for  
which a discriminatory wage differential is alleged.  
63. L'Heureux Dubé J. is of the opinion that a complainant may build  
a case under ss. 7 and 10 without presenting or including as part of its  
case the element of intent. In her Ladyship's words, statistical evidence  
is "a most precious tool in uncovering adverse discrimination."  
64. L'Heureux-Dubé J. asserts that although the principle of "equal  
pay for work of equal value" is expressed in a straight forward manner, its  
application under s. 11 of the Act raises considerable difficulties. She  
maintains the concept is simple only in appearance. One element of  
difficulty is the concept of equality which, in her view, should not  
receive a technical or restrictive interpretation. In referring to the  
concept of equality, L'Heureux-Dubé J. says at pp. 926-27:  
The prohibition against wage discrimination is part of a broader  
legislative scheme designed to eradicate all discriminatory  
practices and to promote equality in employment. In this larger  
context s.11 addresses the problem of the undervaluing of work  
performed by women. As this objective transcends the obvious  
prohibition against paying lower wages for strictly identical  
work, the notion of equality in s. 11 should not receive a  
technical or restrictive interpretation.  
65. Another such difficulty, according to L'Heureux-Dubé J., persists  
in the concept of value. At p. 928, she states:  
The notions of `skill', `effort', `responsibility' and  
`conditions' which one finds in the Act and the companion Equal  
Wage Guidelines are terms of art. They refer to the areas  
traditionally measured by industrial job evaluation plans.  
66. Section 11(2) defines, in general terms, the manner in which the  
value of the work is to be assessed and establishes four criteria, namely,  
skill, effort, responsibility and working conditions. The criteria are  
defined in greater detail in s. 3 of the Guidelines, the companion to s.  
11.  
67. Madame Justice L'Heureux-Dubé observes that it is more than  
coincidence that the same four words used in this legislation were also  
used in the American counterpart and that these words are an indication  
that job evaluation plans can be used to determine whether jobs are of  
equal value under s. 11. However, she is of the opinion that the use of a  
job evaluation plan is not necessarily the only approach to the  
implementation of the provisions of s. 11. It is the Commission's view, as  
expressed in evidence by Durber, that in a s. 11 complaint, equality of  
work can be established through the use of a job evaluation plan but, may  
also be established through other less formal methodologies.  
68. The Tribunal heard expert evidence that the purpose of a job  
evaluation plan is, in the context of a s. 11 complaint, to determine the  
relative worth of jobs within an organization. It involves a systematic  
14  
process which first defines and establishes factors which relate to the  
four criteria identified in s. 11(2) of the Act. The factors are weighted  
against each other for their relative importance. Each job is assessed  
against each factor to develop a hierarchy of jobs. Various steps or  
stages are involved before a hierarchy is developed which include gathering  
job information, defining the jobs considered for evaluation, evaluating  
each job and assigning scores for each compensable factor.  
69. L'Heureux-Dubé J. commented on the use of job evaluation plans  
and the number of steps involved at p. 931:  
All steps of such a job evaluation plan involve a measure of  
subjectivity. Social beliefs which have traditionally led to the  
undervaluing of women's work may bring a certain measure of bias  
in the design and application of these methods. To illustrate,  
job content information which is supplied by the employees can  
contain certain characteristics which, as a result of underlying  
values, may be overlooked in the assessment. There may be  
confusion between truly compensable characteristics and  
stereotyped notions of what are perceived to be inherent  
attributes of being a woman.  
70. These comments were echoed by pay equity experts who testified at  
this hearing. While job evaluation procedures can be controlled, to a  
certain extent, it is still an inherently subjective process. The value  
assigned to each job is an expression of opinion given by individuals and  
is a judgment call by the evaluators. According to Willis, the pay equity  
expert and consultant to the JUMI Committee, such a procedure may  
incorporate both random and systematic errors of judgment.  
71. Willis testified that random errors are to be expected in an  
undertaking as large as the JUMI and can result from a lack of sufficient  
job information, assumptions about particular job aspects, inconsistent  
application of the Willis Plan (the job evaluation plan), or simply from a  
differing interpretation of the job information. Willis indicated that  
while random differences are expected and tend to cancel each other out,  
patterned differences are not expected and do not cancel each other out.  
These patterned differences, or systematic errors of judgment, according to  
Willis, are evidence of bias on the part of evaluators and should be  
avoided.  
72. Weiner, one of several pay equity experts who testified before  
the Tribunal, referred to the wage discrimination identified in s. 11 of  
the Act as one type of systemic discrimination. She describes the  
unintentional aspect of systemic discrimination in Volume 6, at p. 875, as  
follows:  
Systemic discrimination is unintentional, impersonal, built into  
ongoing systems, often referred to as "neutral" systems, because  
they were never designed to discriminate.  
Also, in Volume 6, at p. 877:  
15  
Systemic discrimination operates in systems. It goes on and on  
and on in the policy books and no one designed them to  
discriminate so it become [sic] much more difficult to identify  
that discrimination.  
73. According to Weiner, this discrimination emanates from the  
practices and processes of an employer relating to compensation rather than  
from individual actions.  
74. Of significance to the interpretation of systemic discrimination  
is the Supreme Court of Canada decision in CN, supra. In that decision,  
the Court upheld an order of a Canadian Human Rights Tribunal which imposed  
upon the Canadian National Railway a special employment program for  
employment equity. In upholding the remedial order, Dickson C.J., as he  
then was, in referring to the proper interpretative attitude toward human  
rights codes and acts said at p. 1134:  
Human rights legislation is intended to give rise, amongst other  
things, to individual rights of vital importance, rights capable  
of enforcement, in the final analysis, in a court of law. I  
recognize that in the construction of such legislation the words  
of the Act must be given their plain meaning, but it is equally  
important that the rights enunciated be given their full  
recognition and effect. We should not search for ways and means  
to minimize those rights and to enfeeble their proper impact.  
75. Dickson, C.J. elaborated on the purpose and objective of human  
rights legislation and on the Court's general attitude towards the  
interpretation of such legislation which is to give an interpretation that  
will advance the legislation's broad purposes. He referred to the Supreme  
Court's decision in Ontario Human Rights Commission v. Simpsons-Sears Ltd.,  
[1985] 2 S.C.R. 536, which recognized that human rights legislation is  
directed not only at intentional discrimination but unintentional  
discrimination as well, and prohibits discrimination in situations of  
"adverse affect discrimination".  
76. The Supreme Court of Canada in CN, supra, recognized systemic  
discrimination in the context of employment equity as distinct from equal  
pay for work of equal value referred to by Weiner in her discussion  
relating to s. 11 of the Act. The Supreme Court recognized that s. 15(1)  
and by extension s. 41(2)(a) of the 1976-77 Canadian Human Rights Act as  
amended in 1985 were designed to resolve the problem of systemic  
discrimination. Dickson C.J. described systemic discrimination, at p.  
1139, as follows:  
In other words, systemic discrimination in an employment context  
is discrimination that results from the simple operation of  
established procedures of recruitment, hiring and promotion, none  
of which is necessarily designed to promote discrimination. The  
discrimination is then reinforced by the very exclusion of the  
disadvantaged group because the exclusion fosters the belief, both  
within and outside the group, that the exclusion is the result of  
"natural" forces, for example, that women "just can't do the  
16  
job"...To combat systemic discrimination, it is essential to  
create a climate in which both negative practices and negative  
attitudes can be challenged and discouraged. The Tribunal sought  
to accomplish this objective through its "Special Temporary  
Measures" Order.  
77. In his decision, Dickson C.J. emphasized that the Order of the  
Tribunal, under review there, was made to implement an employment equity  
program which was not simply compensatory but also prospective in its  
provisions so as to confer benefits designed to improve "employment  
opportunities for the affected group in the future." Further, Dickson C.J.  
reasoned that such a program was designed to break the continuing cycle of  
systemic discrimination in the employment of women. Dickson C.J. was of  
the opinion that the goal of the legislation, specifically with reference  
to s. 41(2)(a), was an attempt to eliminate the insidious barriers which  
would block future job applicants, that is to say women, from the unfair  
employment practices that their forebears had experienced as a group. It  
was not, on the other hand, concerned so much with compensating past  
victims of discrimination or providing employment opportunities previously  
denied to specific individuals.  
78. Dickson C.J. found the goal was not to compensate past victims or  
even to provide new opportunities for specific individuals who had been  
unfairly refused jobs or promotions in the past, rather it was an attempt  
to ensure that in the future applicant workers from the affected groups  
would not face the same insidious barriers that blocked their forebears.  
79. In that case, the Chief Justice agreed with McGuigan J., the  
dissenting member of the Federal Court of Appeal who found that s. 41(2)(a)  
of the Act (now s. 53(2)(a)) is designed to enable human rights tribunals  
to prevent future discriminatory employment practices against identifiable  
protected groups. The Chief Justice also reasoned in an employment equity  
program there simply cannot be a radical dissociation of "remedy" and  
"prevention". Further, he held "prevention" is a broad term and it is  
often necessary to refer to historical patterns of discrimination in order  
to design appropriate strategies for the future.  
80. We find that s. 11 does not specifically recognize the phenomenon  
we referred to as "systemic discrimination" and is not a well-designed  
vehicle for breaking the cycle of discrimination. The comments of Dickson,  
C.J. in the CN case, supra, need to be taken in context. In that case, an  
Order made by a Tribunal pursuant to s. 41(2)(a), now s. 53(2)(a),  
requiring the Canadian National Railway to adopt a special employment  
equity program in relation to the affected female group who were seeking  
blue collar jobs, was under appeal. It arose from a complaint-based on  
discriminatory employment practices and was decided in 1986.  
81. The description of systemic discrimination by Dickson, C.J. in  
the CN case, supra, is, in our view, the kind of unintentional  
discrimination which s. 11 was designed to eliminate.  
82. According to expert opinion, systemic discrimination has no focus  
or origin, only that it develops over time. It is an attitudinal  
17  
phenomenon which undervalues female work and thus differentiates against an  
individual or group based on gender or sex. Research has documented the  
group of people most commonly affected by this type of discrimination are  
females, and their wages and salaries, relative to male wages and salaries,  
are lower. This kind of discrimination is rooted in attitudes, beliefs and  
mind sets about work traditionally performed by males and work  
traditionally performed by females.  
83. Counsel for the Commission submitted s. 11 is part of a statutory  
regime which prohibits systemic discrimination on the basis of sex and the  
payment of different wages between groups of predominantly male and  
predominantly female employees performing work of equal value. Commission  
Counsel further submitted s. 11 is designed to remedy the historical  
undervaluing of female work and to address gender discrimination in pay.  
Counsel submits that proof of gender discrimination in pay is found if  
there is a wage gap between male- and female-dominated occupational groups  
performing work of equal value. (Volume 218, p. 28424).  
84. It is important at this point to understand the meaning of "wage  
gap" within the context of s.11 of the Act. The Tribunal had the benefit  
of expert evidence from Armstrong, with expertise in job evaluation and pay  
equity, who described the overall wage gap between prevalent rates of pay  
earned by females as compared to males. Armstrong testified in order to  
comprehend the wage gap one must understand the underlying factors which  
may have contributed to it. She stated there may well be some legitimate  
and unchangeable factors responsible to some extent for the existence of  
the wage gap.  
85. A wage gap is not something clearly delineated. The Tribunal  
recognizes that salary differentials between male and female jobs can be a  
function of job requirements making some jobs intrinsically more valuable  
to the employer than other jobs. Such differentials are in contrast to  
differentials which are based entirely on gender differences and it is the  
latter resulting wage gap which the Tribunal believes s. 11 is intended to  
eliminate.  
86. Section 11 incorporates the concept of equal pay for work of  
equal value in its wording. Weiner testified there are two questions which  
arise when one invokes this concept in the context of evaluation of jobs  
employing the same criteria, firstly, the identification of what is meant  
by "equal value" and, secondly, to define what is meant by "equal pay".  
Weiner equates the concept of "equal pay for work of equal value" with the  
concept of "pay equity".  
87. The evidence before the Tribunal is that pay equity legislation  
addresses a trend that assumes systemic discrimination against female-  
dominated jobs. Some provinces have enacted pay equity legislation to  
remedy pay discrimination by identifying and redressing the wage gap  
through the implementation of pay equity plans. This latter legislation is  
"pro-active" because, Weiner says, its motive and intent is to provide a  
framework for redressing wage discrimination, rather than laying blame upon  
employers or unions for historical wage discrimination. The difference  
between pro-active legislation and s. 11 is that s. 11 is "complaint based  
18  
legislation", whereby a complainant alleges discrimination against some  
identified comparator group. Since s. 11(1) talks about discrimination  
between male- and female-dominated jobs either way, Weiner says presumably  
under s. 11 one could have a male job alleging discrimination.  
88. While the principle of equal pay for work of equal value  
underpins the provisions of s.11 and is frequently expressed as "pay  
equity", there is in current usage of that phrase a pro-active connotation.  
There is, in fact, a significant difference between the principle  
enshrined in s. 11 which is complaint based and the pro-active approach to  
the problem of wage disparity which the experts in the field today accept  
and refer to as "pay equity". The comments of Weiner in her testimony  
before the Tribunal are instructive and illustrative of the problem which  
is encountered when applying the principles of the Act and in particular s.  
11 to remedying systemic discrimination in the work force. She stated in  
Volume 16, at p. 2124:  
I agree with you that the Human Rights Commission law, including  
Section 11, is written with a complaint-based mind set. I think  
that was a mistake, but we didn't know that in 1977 when it was  
written. And really, while that makes a great deal of sense for  
many of the kinds of issues the Human Rights Act has to deal with,  
it doesn't fit as well with the systemic discrimination of  
something as complicated as the wage setting process.  
So I think you are right, there is, to my mind, an anomaly of law  
makers, including a methodology that fit our 1970s thinking of how  
discrimination operated with some forward thinking about another  
problem, but not recognizing that the equal value pay equity  
problem was a systemic problem and didn't fit as well with a  
complaint-based mentality. [emphasis added]  
89. Weiner, who is co-author of Pay Equity: Issues, Options and  
Experiences with Morley Gunderson, summarizes at the end of Chapter 8, at  
pp. 127-28, their conclusion regarding the federal legislation as follows:  
That pay equity is an idea whose time has come is demonstrated by  
the initiation of pay equity in eight Canadian jurisdictions since  
1985. In the previous ten years, only two jurisdictions had  
passed pay equity legislation. Unlike most of the subsequent  
legislation, these two early pieces of legislation, in the federal  
government and in Quebec, were complaint-based. The inability of  
such legislation to address a systemic problem like pay equity is  
evidenced by the employer-initiated enforcement mechanism in most  
of the recent legislation.  
90. Some change has been instituted through the political movement in  
the United States to enact comparable worth plans which, in turn, has  
created a framework within which previously invisible or unacknowledged  
skills associated historically with female and minority work were made  
visible and worthy of compensation. The parallel pay equity movement in  
Canada saw the enactment of provincial legislation designed to redress  
systemic wage discrimination and compensation for work performed by  
19  
employees of female-dominated jobs. Of relevance is the preamble to the  
Pay Equity Act (Ontario), 1987, which states that affirmative action is  
required to redress systemic wage discrimination. However, the legislative  
history of s. 11 does not document the same political motivation contained  
in that legislation or other provincial legislations found in Manitoba,  
Ontario, Prince Edward Island, Nova Scotia and New Brunswick.  
91. Provincial legislation is aimed at correcting systemic  
discrimination and provides a time frame and a procedure for achieving pay  
equity. The approach in the provincial legislation is future oriented and  
while recognizing past injustices, the remedies are focused on achieving  
equity in employment as well as in pay. On the other hand, s. 11 of the  
Act is complaint based and is silent on the means for achieving equal pay  
for work of equal value. While the Guidelines passed pursuant to the Act  
expand on the four essential elements of s. 11(2), i.e., skill, effort,  
responsibility and working conditions and define how value is to be  
assessed, who is an employee, what a group is and so on, it does not  
establish a programme or describe an appropriate methodology for achieving  
the goal of eliminating "systemic discrimination". It is a phenomenon  
which is not expressly referred to either in the Act or in the Guidelines.  
92. Referring again to Weiner commenting on s. 11, she states in  
Volume 16 at p. 2125, "the legislation does not recognize that equal pay  
for work of equal value was a systemic problem that didn't fit as well with  
a complaint based mentality." Therein lies the difficulty with s. 11 which  
is not entirely compatible with the evolution and application of the  
principles of pay equity (or comparable worth) during the past two decades.  
Nevertheless, it is necessary in view of the general nature and intent of  
the legislation which is to combat systemic discrimination to adopt the  
reasoning of Chief Justice Dickson, at p. 1139 of CN, supra, where he  
states:  
...it is essential to create a climate in which both negative  
practices and negative attitudes can be challenged and  
discouraged.  
93. The wage setting process in the Federal Public Service is a  
highly complex process spanning many decades, each contributing to new  
trends and developments, most notably, the introduction of collective  
bargaining in the 1960s. The advent of collective bargaining brought about  
contract negotiations which in turn affect the determination of wage rates.  
For the most part, job classification in the Federal Public Service has  
been determined by a job evaluation process; however, no single process has  
ever been stipulated and the result is a classification structure of  
multiple occupational groups with no common job evaluation plan. Rates of  
pay have been arrived at through this process with the aid of labour market  
surveys, largely provided by the Pay Research Bureau until 1992. It is  
apparent the classification system has been undergoing reform since 1990.  
94. Evidence was led that the Government of Canada is committed to  
simplifying the job classification system in the Public Service through an  
initiative entitled PS2000. Part of this Initiative is a commitment to  
compensate employees equitably, in a manner that is free of gender bias and  
20  
maintains equal pay for work of equal value. A new classification system  
is being introduced to meet these commitments.  
95. Documentary evidence reveals a Task Force has been examining and  
developing this initiative and has thus far produced a draft pamphlet in  
November, 1992 as a reference guide for public sector employees to prepare  
what is referred to as "gender-neutral" work descriptions  
96. The expert evidence reveals that compensation systems that rely  
on market surveys can result in wage disparities for jobs deemed to be of  
equal value. Research has shown that the market reflects an historical  
pattern of lower wages to employees in positions staffed predominantly by  
females. For the most part, market rates are established through the use  
of traditional job evaluation systems which self-perpetuate the problem of  
undervaluation of female work as these traditional job evaluation systems  
were not designed to capture skills associated with female work.  
97. The pay equity experts explain that gender bias is reflected in  
existing compensation systems and pay practices. Historically these  
systems and practices undervalue female work. Since the purpose of s. 11  
is to remove gender discrimination from pay, based on the intrinsic value  
of a job, any job evaluation system used to assess job value must be  
designed to eliminate factors that contribute to gender bias and include  
factors that will capture skills associated with female work which have, in  
the past, been overlooked.  
98. We do find that s. 11 is a remedial section dealing with salary  
inequities which arise between jobs that are deemed by some process of  
evaluation to be of equal value. The salary inequity, or resulting wage  
gap, is the salary differential between rates of pay for male and female  
employees who are performing work of equal value and not the overall wage  
gap referred to by Armstrong. She was referring generally to differences  
in pay between males and females which can result from factors in addition  
to gender inequities. Armstrong's response to the following question in  
Volume 179, at p. 22879, lines 2 - 7, is informative:  
If pay equity were to be achieved in all occupations, in all jobs,  
would the wage gap disappear?  
THE WITNESS. The overall wage gap probably wouldn't disappear  
completely, no. There still might be a difference.  
99. We must be assured the complaints seek to redress a wage gap  
based on wage differentials that are gender based and not resulting from  
other factors. It seems apparent that the existence of a wage gap per se  
is not proof of discrimination. To hold otherwise would negate the entire  
evaluation process which has, as its purpose, the comparison of jobs  
according to a plan or system for rating work according to the criteria  
prescribed in s. 11(1) of the Act.  
100. We also find s. 11 is designed to eliminate economic inequality  
created by gender based wage discrimination. The discrimination is  
unintentional as the decision of Dickson C.J. in the CN case, supra, makes  
21  
clear. It is nevertheless a subtle form of discrimination built into  
employment practices as they have existed over the years since females have  
become contributors to the work force. We recognize from the expert  
testimony of Weiner, Armstrong and Willis that systemic discrimination  
operates in systems and becomes incorporated into the wage setting  
practices of organization and that classification of jobs may be the by-  
product of systemic discrimination. Since systemic discrimination is part  
of a system never designed to discriminate, Weiner says that it cannot be  
corrected instantly nor can pay equity be achieved quickly.  
101. As remedial legislation, s. 11 addresses pay disparities in  
employers' compensation practices. Willis testified to the effect s. 11 is  
not true pay equity legislation, but instead concerns itself with examining  
pay disparities and is probably a first step in the direction of true pay  
equity insofar as it requires the wages of females be moved up to the same  
level as males. Willis testified in Volume 29, at p. 3760, line 22 to p.  
3761, line 9:  
The concept of pay equity has to do with compensation without  
gender bias; that is, compensation based on the intrinsic value of  
a job rather than the market value of a job.  
I recognize there is a school of thought that says: Don't bother  
with job evaluation, just give us the money.  
But in order to logically arrive at an intrinsic value of a job,  
you exercise a job evaluation plan; that is, job evaluation  
provides a way of taking any job apart and examining the amount of  
skill, effort, responsibility, and conditions of work that are  
required.  
102. If employers use job evaluation systems that are gender biased in  
favour of male work, the result will be seen in differential wages paid to  
male and female jobs that ought to be considered of equal value. Job  
evaluation systems that traditionally favour male jobs do not value the  
skills and job content of jobs that are designated female work.  
Traditional job evaluation is most often designed to value characteristics  
of male work. On the other hand, pay equity job evaluation has as its goal  
the use of systems that remove gender bias in the valuing of work.  
103. At this point, it is useful to recall some of the circumstances  
leading up to the issue before us. Under the JUMI, the parties engaged in  
a proactive study with the intention of developing parameters in which to  
implement the principal of equal pay for work of equal value as  
incorporated in s. 11 of the Act. The parties employed a pay equity  
expert, Willis, to assist with this study and used the Willis Job  
Evaluation Plan to assess selected jobs from male- and female-dominated  
occupational groups in the Federal Public Service. The JUMI never  
completed its task. The data generated in that study is now in evidence  
before the Tribunal. It was used by the Commission in its investigation of  
the complaints and is presented as proof of a breach of s. 11 of the Act.  
The Commission and the Alliance have called upon the Tribunal to accept the  
evaluation scores as evidence of the value of work. These results, they  
22  
submit, can be used to establish the equality of work and are proof of a  
wage gap. It is alleged by the Employer that the results are not reliable.  
We are called upon to determine whether these results are reliable.  
104. We have previously referred to the steps involved in the use of  
the job evaluation plan as discussed by L'Heureux-Dubé J. in the Syndicat  
decision, supra. The Tribunal heard lengthy evidence on the Willis Process  
which incorporates the typical steps involved in job evaluation. The  
reliability of the results, which is the issue before us, focuses on one  
particular important step in job evaluation, namely, its application by the  
evaluators who had the responsibility of analyzing the job information and  
assigning points or scores to each of the job factors in the Willis  
evaluation plan.  
105. Willis and Weiner agree for job evaluation to be effective in  
eliminating bias it must be approached in a systematic fashion. At the  
same time, one needs to understand that job evaluation is an inherently  
subjective process. The question as to what constitutes "bias" is a  
complex one, and is central to the arguments submitted to us. Counsel for  
the Commission refers to the Equal Wages Guidelines passed pursuant to the  
provisions of the Act and in particular to s. 9(a) which reads:  
9. Where an employer relies on a system in assessing the value  
of work performed by employees employed in the same establishment,  
that system shall be used in the investigation of any complaint  
alleging a difference in wages, if that system  
(a) operates without any sexual bias;  
106. Commission Counsel, supported by the Alliance, posits the  
question of the reliability of the results to be addressed by the Tribunal  
as follows:  
Is there a pattern, a systematic variance of different treatment  
of male and female questionnaires (in the evaluation process)  
that was caused by or is attributed to gender bias or gender  
related bias. [emphasis added]  
107. Respondent Counsel advocates a broad reading of the term "sexual  
bias" as used in s. 9(a) of the Guidelines. Respondent Counsel proposes  
that any bias that is a different treatment of male and female  
questionnaires is a sexual bias and bases this submission on an  
interpretation of Willis' testimony, who described bias in Volume 208, at  
p. 26937, lines 11 - 16, as follows:  
A. "Bias" simply means that if there is a pattern of different  
treatment for male-dominated jobs versus female-dominated jobs,  
whether it's conscious or unconscious, that difference in  
treatment would represent an amount of bias.  
108. According to Willis, it is possible to have a bias that is  
related to gender but is not a direct gender bias, and he refers to this  
bias as a gender preference. His example of a gender preference would be  
23  
where an evaluator may have a preference for individuals who wear blue  
shirts or have blue collars. Willis testified, for example, trade jobs are  
known as "blue collar" jobs. If a preference for blue collars causes an  
evaluator to evaluate trades jobs more favourably, Willis says that this  
may not be a gender bias but maybe it is a blue collar bias. According to  
Willis, this would bring the same result as a direct gender bias.  
109. The Employer submits that the meaning of gender bias can include  
attitudes toward one sex or the other that are conscious or unconscious.  
In the Employer's view, a bias can also relate to some characteristic which  
is not gender per se, but is itself related to gender, which they describe  
as a "gender-related bias". Respondent Counsel submits that s. 11 is  
designed to redress both kinds of biases. Respondent Counsel also submits  
that if a "blue collar" preference results in a different treatment of male  
and female questionnaires this is a "sexual bias" as contemplated by s.  
9(a) of the Guidelines. Respondent Counsel urges that the question to be  
addressed by the Tribunal is not the one posed by the Commission but rather  
is as follows:  
Is there a pattern of different treatment of male and female  
questionnaires?  
110. It is to be noted that the Commission and the Alliance do not  
express any difficulty in assigning a wide meaning to the term "sexual  
bias" and we refer to the remarks of Commission Counsel in Volume 230, at  
p. 30583, lines 14 - 25:  
These are very different things. All those other things such as -  
- I mean, they referred to something called "dirty work"'. I  
don't know whether you would have a preference for people who do  
hard work outdoors. If that were gender related and had an effect  
on the way people perceive the work and rated the jobs, and in the  
end had a consequential gender effect on the jobs, then no one can  
dispute that that would be a gender bias contrary to section 9 of  
the Equal Wage Guidelines and therefore contrary to section 11 of  
the statute.  
111. In formulating the question for the Tribunal to address,  
Respondent Counsel argues that their formulation does not require a  
causative factor for the different treatment of male and female  
questionnaires. The disagreement between the parties lies not in assigning  
a broad meaning to the words "sexual bias" but instead arises as to whether  
s. 11 requires the existence of a cause when different treatment of male  
and female questionnaires is found or whether, on the other hand, it is  
simply a matter of differential treatment of male and female jobs without  
the necessity of assigning cause. In support of the Employer's submission,  
they rely on a meaning of bias which in their view does not require a  
causal link or relationship under s. 11 of the Act.  
112. There is a disagreement between the parties about the analysis  
and investigative findings of the Commission on the job evaluation process  
and the statistical evidence. The dispute centres on the submissions of  
the Commission and the Alliance that some differences in treatment of male  
24  
and female questionnaires between committees and consultants are not based  
on gender or gender-related bias but are due to a "value bias". The  
Commission and the Alliance rely on the statistical expert, Sunter, whose  
analyses, they submit, demonstrates the effect of a value bias which  
accounts for some, if not all, of the differences in treatment between the  
committee and the consultant. Sunter testified that the effect of the  
value bias has an appearance of gender bias and the difference in treatment  
between the committees and the consultants is as likely to be a consequence  
of value bias as it is gender bias.  
113. In dismissing the need to know the cause of differential  
treatment of male and female questionnaires, Respondent Counsel relies on  
the testimony of Willis. Willis repeatedly stated during this hearing that  
after the evaluation process is finished there is no need to explore the  
reasons for the differences between the evaluations of the committees and  
the consultants. That testimony can be summarized in Willis' letter to  
Respondent Counsel dated May 19, 1994 which expands on job evaluation  
disparities. This reads as follows:  
Evaluation disparities represent a lack of consistency in the  
application of the evaluation system. Therefore, disparities are  
a cause for concern, and require attention to determine if they  
result in a pattern of different treatment for different kinds of  
jobs.  
The question as to why disparities have occurred is important  
during the course of the committees' work. An understanding of  
the reasons can be helpful in the continued training of the  
members. However, after the evaluation phase of the study has  
been completed, the reasons for any disparities are no longer of  
any real importance. What is important is the existence of any  
pattern of bias that is developed among the evaluations.  
[emphasis added]  
(Exhibit R-164)  
114. In the course of our hearing, in addition to his definition of  
bias as a different treatment between male and female jobs, Willis offered  
an opinion on the meaning of gender bias in a pay equity study in Volume  
80. He states in Volume 80, at p. 9737, lines 13 - 18:  
In the context of the pay equity study, gender bias has to do with  
the extent to which jobs that are traditionally held by one sex or  
the other are paid more favourably than jobs that are  
traditionally held by the opposite sex.  
115. Willis refers to gender bias as both different treatment and  
different pay. To better understand Willis' definitions of bias, it is  
helpful to refer to the theory of disparate treatment considered in the  
decision American Federation of State, County and Municipal Employees, ASL-  
CIO et al. v. State of Washington et al., Nos. 84-3569, 84-3590, 770 s. 2d  
1401 (1985) United States Courts of Appeal, 9th Circuit.  
25  
116. The plaintiffs in the American case alleged sex discrimination in  
compensation against the State of Washington pursuant to s. 703(a) of  
Title VII of the Civil Rights Act of 1964, 42 U.S.C. The United States  
District Court for the Western District of Washington had found in favour  
of the class of state employees of which at least 70 per cent were female,  
and the state had appealed to the Court of Appeal, 9th Circuit. A relevant  
fact in the District Court decision, was that Willis had conducted a study  
in 1974 to examine and identify salary differences pertaining to job  
classes predominantly filled by males compared to job classes predominantly  
filled by females, based on job worth. The 1974 Willis Report submitted  
into evidence concluded based on the job content of the 121 classifications  
evaluated, the tendency was for female classes to be paid less than male  
classes for comparable job worth, and that overall the disparity was  
approximately 20 per cent. Willis' study had deemed the male and female  
positions to be of comparable worth. Comparable worth as defined by the  
State, for the District Court, means the provision of similar salaries for  
positions that require or impose similar responsibilities, judgments, and  
knowledge.  
117. In the first instance, the district court had found a violation  
of Title VII premised upon the American disparate impact and the disparate  
treatment theories of discrimination. As explained in the District Court's  
judgment, Title VII prohibits two types of employment discrimination: (i)  
intentional unfavourable treatment of employees based upon impermissible  
criteria; and (ii) practices with a discriminatory impact: facially neutral  
practices that have a discriminatory impact and are not justified by  
business necessity.  
118. The District Court decision was appealed to Kennedy, Circuit  
Judge for the United States Court of Appeals, who considered the  
allegations of disparate treatment, and held that the unions had failed to  
prove a prima facie case of sex discrimination by the preponderance of the  
evidence. In citing reasons, Kennedy J. offers the following with regard  
to the Willis study at p. 1408:  
We also reject ASFCME's contention that, having commissioned the  
Willis study, the State of Washington was committed to implement a  
new system of compensation based on comparable worth as defined by  
the study. Whether comparable worth is a feasible approach to  
employee compensation is a matter of debate...Assuming, however,  
that like other job evaluation studies it may be useful as a  
diagnostic tool, we reject a rule that would penalize rather than  
commend employers for their effort and innovation in undertaking  
such a study.  
119. As noted in the decision of Kennedy J., under the disparate  
treatment theory, an employer's intent or motive in adopting a challenged  
policy is an essential element of liability for violation of Title VII. To  
establish liability, a plaintiff must show the employer chose a particular  
policy because of its affect on members of a protected class and it is  
insufficient for a plaintiff to allege under this theory that the employer  
was merely aware of the adverse consequences the policy would have on a  
protected class.  
26  
120. The United States Court of Appeals had to find a proof of intent  
required in a disparate treatment case unlike s. 11 which addresses  
systemic discrimination, a form of unintentional discrimination. Willis'  
definitions of bias should be viewed within the context of that American  
jurisprudence which we note deals with a different statute than the Act and  
a different requirement of intention.  
121. Referring to s. 9(a) of the Guidelines, supra, we note that it  
provides, inter alia, that an employer may use a system for assessing work  
if that system operates without any sexual bias. By way of contrast and  
focusing on the issue which we must resolve, it is in the application of  
the system that we are concerned. In this regard, it is helpful to refer  
to the comments of Weiner, and to the following statement she made:  
Even though I mentioned gender bias in job evaluation and gender  
bias in the evaluation systems, I think gender bias in the  
application of the system is key. If you give people bias free  
job information and a bias free evaluation system, people can  
still introduce gender bias when they apply it.  
122. There is agreement between the parties that the evaluation  
system, i.e., the Willis Job Evaluation Plan, is bias free by any  
reasonable standard. According to Willis, if there is a pattern of  
differential treatment between male and female questionnaires, this is  
evidence of systematic biases occurring in the application of job  
evaluation. For purposes of determining whether bias is present in the  
results, Willis was not prepared to give an opinion based solely on his  
observations of the committee process, but instead said he would rely on  
statistical analysis of the data. According to Willis, there are two ways  
to determine if bias is present in the application of the plan: (i)  
observation by the consultants who participated in the process; and (ii)  
statistical analysis.  
123. It was Willis' opinion that as a factual matter, he and his  
consultants who were present during the job evaluations, were able to  
ferret out and identify direct gender bias. They observed how evaluators  
responded to questions as to why they evaluated jobs a particular way. The  
consultants would not permit a rater to defend an evaluation based on  
opinions or conclusions.  
124. Willis said that to the extent that indirect biases occur, they  
are more difficult to detect. Usually the only way to detect if indirect  
bias is operating is to do a statistical analysis of the results to  
determine if a pattern in the ratings exist. Since the evaluators are  
usually unconscious of these biases, they are not aware of making gender  
based judgments. Evaluators will apply points unevenly across male and  
female jobs, and male and female jobs will consistently receive low or high  
points. Generally speaking, a statistical analysis will reveal this type  
of pattern if indirect biases have entered the process.  
125. Both the question posed by the Employer and the question of the  
Commission, in our view, restrict the Tribunal from fully assessing the  
question of reliability. The issue of reliability is not purely  
27  
statistical and the questions as suggested restrict our assessment of the  
evidence to statistical measures.  
126. It is important to bear in mind that the results were generated  
through a process of job evaluation overseen by a JUMI Committee with the  
advice and consultation from a pay equity expert. Willis testified that he  
recommended to the JUMI Committee certain safeguards in the process to  
ensure consistent and reliable results. These safeguards included reliable  
job information, balanced evaluation committees, selection and rating of  
benchmarks, sore-thumbing exercises, training of participants, quality  
checks in the form of testing of evaluators and committees and consultant  
participation.  
127. Willis took a consistent position throughout the hearing that in  
order to analyze the results, he required a statistician. He viewed the  
role of the statistician to examine the data and to ascertain the extent of  
the problem if one was found.  
128. Willis said he would not support the reliability of the results  
based on the process alone. When asked to consider the process without the  
results, Willis said in Volume 78, at p. 9570, lines 12 -22:  
A. ...if all of my recommendations had been taken, if I had  
felt that the processes that were followed were all sound, then it  
is quite likely that I would have been able to support the results  
of the study without doing any testing.  
This didn't happen. I have not yet supported the results of the  
study. But in the final analysis that testing is going to tell  
somebody, me or somebody else, whether or not the study was sound.  
129. Expert Weiner has stated that the idea behind a job evaluation  
process is to be systematic, that the process should involve a series of  
steps. The goal in pay equity job evaluation should be, in her opinion, to  
apply the process fairly to all the jobs. Notwithstanding Willis' opinion  
which focuses primarily on results rather than process, the Tribunal must  
be able to assess the checks and balances in the process, and must be able  
to do this not only from a statistical perspective but also to analyze  
reliability by assessing how the Willis job evaluation plan was applied by  
the job evaluation committees.  
130. We are entitled to look at the Act as a whole, including the  
regulations and Guidelines passed pursuant thereto, in order to assist us  
in interpreting the meaning of s. 11(1), see Driedger, The Construction of  
Statutes, Chapter 11, 3rd Edition by Ruth Sullivan. It is our opinion the  
legislation, given the state of development or evolution of the concept of  
equal pay for work of equal value at the time, its complaint based  
orientation and considering the gender driven language of the relevant  
sections, that causation is implicit in its provisions.  
131. The wage gap to be redressed by s. 11 must be caused by gender  
based discrimination. Section 9(a) of the Guidelines is subordinate to the  
enabling legislation, the Act, and is authorized by s. 27(2) of that Act.  
28  
There is a presumption in favour of the validity of regulations in the  
light of their enabling statute. In the Interpretation of Legislation in  
Canada, 2nd Edition, Pierre Andre-Coté at p. 310 the learned author  
comments as follows:  
Finally it must be pointed out that the regulations are not only  
deemed to remain intra vires, but also to be formally coherent  
with the enabling statute.  
132. Moreover, s. 16 of the Interpretation Act (Canada) provides:  
Where an enactment confers powers to make regulations, expressions  
used in the regulations have the same respective meanings as in  
the enactment conferring the power.  
133. For purposes of s. 11 of the Act we do not find it necessary to  
make a distinction between gender bias or gender preference. We are in  
agreement with the parties that the phrase "sexual bias", as contained in  
s. 9(a) of the Guidelines, should provide for any bias in the context of  
job evaluation which has the end result of favouritism toward one gender.  
Moreover, we agree with Willis and the Employer that it is not necessary to  
determine why a particular evaluator is motivated to exhibit bias.  
However, we find it necessary to examine the differences between committees  
and consultants from both a statistical perspective and a process  
perspective to determine if a bias exists.  
134. In our opinion, causation is implicit in the provision of the  
legislation and the Guidelines. Different treatment of male and female  
jobs must be proven to be gender-based. This is consistent with the  
opinions expressed by Willis as he does not merely talk about a different  
treatment but a different treatment that is "an influence towards one  
gender or another" (Volume 38, p. 4794) and a bias "favouring" a gender  
(Volume 38, p. 4792). It is the gender aspect of the treatment that  
concerns Willis and which concerns the Tribunal.  
135. Accordingly, the Tribunal is interested in the gender aspect and  
based on our interpretation of the Act, the question to be addressed is:  
Is there a different treatment of male and female questionnaires  
in the evaluation process that was caused by or attributed to  
gender bias or gender-related bias?  
136. We will now address the question of whether there is gender-based  
bias present in the treatment of male and female questionnaires. Our  
enquiry will encompass the evidence of the process that generated these  
results and the statistical evidence presented at this hearing.  
IV. BURDEN OF PROOF  
137. In the case before the Tribunal the issues, because of the length  
and complexity of the evidence, have been argued and addressed by the  
parties in stages. The first stage relates to the reliability of the  
results generated by the JUMI study. The affirmative alleges that the  
29  
results are reliable and free of gender bias by any reasonable standard.  
The negative alleges that the results are unreliable and coloured by gender  
bias to such a degree that they do not allow for an adjudicated resolution  
by the Tribunal.  
138. The phrase "burden of proof" describes the duty which lies on one  
or the other of the parties either to establish a case or to establish the  
facts on a particular issue. See M.N. Howard, ed. Phipson on Evidence,  
14th ed., (London: Sweet & Maxwell, 1990) para. 4-01.  
139. In Miller v. Minister of Pensions, [1947] 2 All E.R. 372, (K.B.),  
Lord Denning, at p. 374, defines the degree of probability required to  
discharge the burden of proof in a civil case in these terms:  
That degree is well settled. It must carry a reasonable degree of  
probability but not so high as is required in a criminal case. If  
the evidence is such that the tribunal can say: "We think it more  
probable than not" the burden is discharged, but if the  
probabilities are equal it is not.  
140. Counsel for the Commission in her opening remarks conceded that  
the burden of establishing a prima facie case rests with the Alliance and  
the Commission. (Volume 218, p. 28337).  
141. This concession by Counsel for the Commission simply recognizes  
the evidentiary rule frequently enunciated by the Courts and contained in  
the text books on this subject.  
142. In the view of Sopinka J. et al, The Law of Evidence in Canada,  
(Toronto: Butterworths, 1992), a prima facie case does not compel a  
specific determination unless there is a specific rule of law which demands  
such a conclusion. After examining and analyzing several decisions of the  
Supreme Court of Canada in which the Justices differ, the authors state  
that a prima facie case simply permits an adverse finding against the  
Employer in the absence of evidence to the contrary. The authors quote  
with approval a passage found in R. v. Girvin (1911), 45 S.C.R., 167,  
(S.C.C.) at p. 169 as follows:  
I have always understood the rule to be that the Crown, in a  
criminal case, is not required to do more than produce evidence  
which, if unanswered, and believed, is sufficient to raise a prima  
facie case upon which jury might be justified in finding a  
verdict." [emphasis added]  
143. This passage was recently adopted in R. v. Mezzo, [1986] 1 S.C.R.  
802 (S.C.C.) and the learned authors conclude at p. 73:  
The terms "prima facie evidence", "prima facie proof", and "prima  
facie case" are meaningless unless the writer explains the sense  
in which the terms are used. For clarity and conciseness it is  
preferable...to explain the evidentiary effect consequent upon the  
proof of certain facts rather than to indiscriminately use these  
mixed Latin English idioms.  
30  
144. Because there appears to be some question as to the meaning of  
the phrase "burden of proof" as it applies in these circumstances we refer  
again to Phipson on Evidence, supra. According to the learned author, it  
has three meanings as follows:  
(i) The persuasive burden, the burden of proof as a matter of  
law, i.e., the burden of establishing a case by a  
preponderance of evidence;  
(ii) The evidential burden, the burden of adducing evidence; and  
(iii)The burden of establishing the admissibility of  
evidence.  
145. The persuasive burden, sometimes referred to as the "legal  
burden", in a civil case rests on the party who substantially asserts the  
affirmative of the issue and is fixed at the beginning of the trial or  
hearing by the state of the pleadings, i.e., the complaints made pursuant  
to the legislation, and it is settled as a question of law that the burden  
remains unchanged throughout the hearing exactly where the complaints place  
it, and only rarely shifts except under special circumstances.  
146. The legal burden of proof normally arises after the evidence has  
been completed and the question is whether the trier of fact has been  
persuaded with respect to the issue or case to the civil or criminal  
standard of proof. The legal burden, however, ordinarily arises after a  
party has first satisfied an evidential burden in relation to that fact or  
issue. See The Law of Evidence in Canada, supra, at p. 58.  
147. Stated another way, the legal burden does not play a part in the  
decision making process if the trier can come to a determinate conclusion  
on the evidence. If, however, the evidence leaves the trier in a state of  
uncertainty, the legal burden is applied to determine the outcome on a  
balance of probabilities. See also The Law of Evidence in Canada, supra,  
at p. 60 quoting a passage in a decision of the Privy Council in Robins v.  
National Trust Company, [1927] 2 D.L.R. 97, which reads in part as follows:  
But onus as a determining factor of the whole case can only arise  
if the tribunal finds the evidence pro and con so evenly balanced  
that it can come to no sure conclusion. Then the onus will  
determine the matter.  
148. This passage can be compared to the comments of McIntyre J. in  
Ontario Human Rights Commission v. Simpsons-Sears [1985] 2 S.C.R. 536 at  
558:  
But as a practical expedient it has been found necessary, in order  
to insure a clear result in any judicial proceeding, to have  
available as a `tie breaker' the concept of the onus of proof.  
149. The evidential burden, on the other hand, may shift constantly  
throughout the hearing, accordingly as one scale of evidence or other  
31  
preponderates. The burden of proof in this sense rests upon the party who  
would fail if no evidence were produced at all, or no more evidence, as the  
case may be, were given on either side. In civil cases the evidential  
burden may be satisfied by any species of evidence sufficient to raise a  
prima facie case. It is for the Tribunal to decide as a matter of law  
whether there is sufficient evidence to satisfy the evidential burden, that  
is to say, to establish a prima facie case. See Phipson on Evidence,  
supra, at para. 4-10(b).  
150. The burden of proof in any particular case depends on the  
circumstances in which the claim arises. In general, according to Phipson,  
the rule which applies is "he who invokes the aid of the law should be the  
first to prove his case." This rule is founded on considerations of good  
sense and as well, in the nature of things, a negative is more difficult to  
establish than an affirmative. See Robins v. National Trust Co., supra,;  
Constantine Line v. Imperial Smelting Corp., [1942] A.C. 154, 174 per Lord  
Maugham.  
151. Commission Counsel, in her oral presentation, in Volume 218, at  
p. 28349, line 25 to p. 28350, line 11, asserts as follows:  
If a process is created which is considered by the experts to be  
the best process for identifying gender bias...then there is no  
reason to look further beyond that process. If that's the case,  
then there is prima facie evidence of a reliable process, which is  
[sic] the absence of evidence to the contrary would permit a  
finding of reliability. [emphasis added]  
152. This rather broad statement of Counsel is supported by reference  
to Farnquist v. Blackett-Galway Insurance Ltd. (1969), 72 W.W.R. 161 (Alta.  
C.A.) (Allen J.A.) at pp. 172-73 and by OPSEU v. Ontario (Ministry of  
Community and Social Services) (1986), 15 O.A.C. 78 (Div. Ct.) at p. 79  
which deal with proof on a balance of probabilities.  
153. It is not clear what Counsel intends by use of the phrase "If a  
process is created". For purposes of clarification, if Counsel means the  
procedures and structures put in place by Willis and the JUMI Committee for  
the evaluation of jobs, the evidence establishes that in general the  
structures are compatible with the requirements of the Act, the Guidelines  
and with the principles of pay equity as understood by the experts.  
154. Assuming the process encompasses, not only, the procedures and  
structures such as the evaluation plan, the training of the evaluators, the  
questionnaires, the collection of information and in addition but more  
importantly, according to Weiner, the application of the evaluation system  
in a gender free manner, then one might accept Commission Counsel's  
statement as correct. But Counsel follows her statement with this comment  
in Volume 218, at p. 28357, line 22 to p. 28358, line 7:  
The purpose of all this is to relate to the shifting burden that  
rests with the Defendant, once the Complainants have demonstrated  
that there is a prima facie case and the burden has shifted to the  
Defendant, it is incumbent upon them to prove on the balance of  
32  
probabilities...that gender bias or whatever their allegation is  
going to be is in fact the cause of the event and, therefore, is  
in fact the cause of the unreliability of the results. [emphasis  
added]  
155. The shifting burden of proof referred to by Counsel in the  
passage quoted above does not relieve the party who asserts the  
affirmative, in this case the Commission and the Alliance, from satisfying  
the evidential burden that the results of the study can and ought to be  
relied upon for purposes of adjudication. If, on the whole of the  
evidence, including both anecdotal and statistical testimony, the Tribunal  
can come to a determinate conclusion it will not be necessary, in our  
opinion, to invoke the legal burden, in order to reach a decision.  
156. Counsel for the Commission in his opening remarks admitted that  
"the process" in the JUMI exercise was flawed but not so flawed as to  
vitiate the results. The anecdotal testimony of the participants in the  
study, the Willis consultants and some of the evaluators, raise questions  
about the impartiality of some evaluators as well as the functioning of  
certain committees. Incidents occurred which were disturbing and caused  
the consultants to experience discomfort about the process and its  
application. Additionally, there were differences between the committees  
and the consultants in re-evaluation exercises conducted by the consultants  
at different stages of the study. Analyses of the results, in turn, led to  
a critique of the data by qualified statistical experts. This short  
description of some of the problems which arose during the course of the  
study and which might have had some effect on the scores of the evaluators  
is not exhaustive.  
157. The problems relating to the reliability of the results whether  
arising from the evaluation process or from the statistical analyses on re-  
evaluations will be addressed in a subsequent portion of this decision.  
158. The Employer submits if the Willis Process worked well then the  
Complainant and the Commission have made out a prima facie case on  
reliability and there is no need therefore to look at the results for  
further evidence of reliability. If on the other hand, the process did not  
work well, then the onus, according to the Employer, remains with the  
Alliance and the Commission to demonstrate through "other evidence" that  
the results are reliable and sound.  
159. What is meant by "other evidence" is described by Respondent  
Counsel as consisting of statistical analyses performed by the statistical  
experts to demonstrate there is a systematic pattern in the disparities.  
But Counsel argues that by "attacking the credibility and the usefulness of  
those disparities" the Commission and more particularly the Alliance are  
left with no basis for comparison between the committee evaluations and the  
re-evaluations performed by the consultants, whose credibility and  
impartiality were under attack by the Alliance.  
160. According to Respondent Counsel, in the eventuality that the  
process did not work well and if the Commission and the Alliance are to be  
33  
precluded from relying on the statistical analyses, they are left with  
nothing and have therefore failed to establish a prima facie case.  
161. With respect, the Tribunal, is unable to accept the proposition  
advanced by Respondent Counsel. In our view, it is not simply a matter of  
choosing to accept or reject one or both of the alternatives presented to  
us.  
162. Within the JUMI Study itself and under the direction of Willis,  
the approach adopted by the JUMI Committee was to use statistical tests as  
a means of validating the process. The whole Willis Process is a complex  
scheme that does not only include an exercise of job evaluation but is  
inclusive of many steps and stages, one of which is the validation of the  
results by testing for inter-rater and inter-committee reliability. Some  
of the testing uses statistical analysis. These tests are integral to the  
process Willis utilizes in large pay equity studies, and are most evident  
in the JUMI Study. Other significant tests undertaken were a re-evaluation  
of 222 positions by the Willis consultant, Jay Wisner, who performed  
statistical tests on the re-evaluation results. The only statistical  
testing which did not occur during the process itself consisted of an  
additional re-evaluation of 300 positions conducted by the Commission in  
its investigation after the completion of the study.  
163. The statistical analysis by the Commission combined the re-  
evaluations which occurred during the JUMI Study with the re-evaluations  
that were done subsequently. The act of combining these re-evaluations  
does not, in our view, create any artificial framework in respect of the  
evidence as it relates to the process or the evidence as it relates to  
statistical analyses. Neither process nor statistical measures operated in  
complete isolation from each other, but were interlocked in the sense that  
an understanding of one required an understanding the other.  
164. Accordingly, we are entitled to look at the whole of the evidence  
and to weigh it in the light of all the circumstances. We will be  
examining in great detail the testimony of the participants in the study,  
the expert evidence of the consultants, the expert testimony of the  
statisticians and others who had some involvement in the study. Our  
decision, will therefore encompass all the evidence presented to us during  
the course of this hearing.  
165. Those elements required to satisfy the evidential burden in the  
present proceedings consist, in our opinion, of the following which are  
based on the provisions of s. 11 of the Act, the companion Guidelines and  
the state of the pleadings:  
(i) The complainant groups are female-dominated within the  
meaning of the Equal Wages Guidelines;  
(ii) The comparator groups are male-dominated within the meaning  
of the Equal Wages Guidelines;  
(iii)The value of work assessed is reliable; and  
34  
(iv) A comparison of the wages paid for work of equal value  
produces a wage gap.  
166. As mentioned previously at this stage the Tribunal will address  
the third element which is have the sampled positions in the JUMI Study  
been properly evaluated so as to produce reliable results. It should be  
noted moreover, the parties, including the Employer, have agreed the Willis  
Plan is, in fact, an appropriate gender free evaluation plan for the JUMI  
Study which captures the criteria required to be measured by s. 11(2) of  
the Act.  
167. In addressing the third element relating to the reliability of  
the evaluations, Counsel for the Commission enumerated several  
considerations which needed to be taken into account, namely: the plan  
allows for comparison between occupations; the process was designed to  
obtain reasonably reliable job information; there were additional  
procedures in place so as to ensure comprehensive job information; the Plan  
was, in fact, applied with reasonable consistency by the multiple  
committees; there was consistency in the job information; there was  
consistency in the results; and the salary data was reasonably reliable.  
168. These considerations are, it seems to us, appropriate and helpful  
in evaluating the evidence and will be applied when the Tribunal assesses  
the evidence, both anecdotal and statistical in the following sections of  
this decision.  
V. STANDARD OF PROOF  
169. The "standard of proof" determines the degree of probability that  
must be established by the evidence to entitle the party having the burden  
of proof to succeed in proving either his/her case or an issue in the case.  
170. There are two levels of probability depending on whether the  
matter to be tried is of a criminal nature, in which case, proof beyond a  
reasonable doubt is required, or is a civil matter in which case the  
claimant is required to establish his/her case, or an issue therein, on a  
balance of probabilities, which is to say a greater likelihood that the  
conclusion advanced by the claimant "is substantially the most probable of  
the possible views of the facts." See Duff J. in Clark v. Treking, [1921]  
61 Can. S.C.R. 608, at p. 616.  
171. The standard applied in Haldimand-Norfolk (29 May 1991), 0001-8  
P.E.H.T. by the Tribunal when interpreting Section 5(1) of the Pay Equity  
Act (Ontario) is contained in paragraph 24 of that decision which reads as  
follows:  
24. Having carefully considered the evidence and submissions in  
this case, we find that the parties have an obligation to ensure  
the collection of job content information meets the requirements  
of the Act to accurately identify skill, effort, responsibility  
and working conditions normally required in the work of both the  
female job classes in the establishment and the male job classes  
to be compared. Not only is this a necessary condition of a  
35  
gender neutral comparison system but we also find that section 5  
of the Act requires a standard of correctness, that is, the  
skills, effort, responsibility and working conditions must be  
accurately and completely recorded and valued. [emphasis added]  
172. Section 5(1) of the Ontario Act reads as follows:  
5(1). For the purposes of this Act the criterion to be applied in  
determining the value of work shall be a composite of the skill,  
effort and responsibility normally required in the performance of  
the work and the conditions under which it is normally performed.  
173. Section 5(1) itself does not impose any particular standard which  
must be met by the parties in order to fulfil the criteria.  
174. Accordingly, the decision of the Tribunal in the Haldimand-  
Norfolk case, insofar as it deals with the standard to be met in the  
collection of job information is the Tribunal's interpretation of Section  
5(1) of the Pay Equity Act (Ontario). It should be pointed out that the  
issues in that case were whether the employer had adopted a gender biased  
comparison system and whether it had failed to negotiate in good faith with  
its employees. The question of the reliability of the results which  
concerns us here was not directly addressed. That issue relates to the  
process. The process requires a standard by which to assess the collection  
of job information and a standard by which to assess the procedures for  
evaluating that information.  
175. The issue before us relates to such matters as the format of the  
questionnaire, the procedures for gathering information about jobs, the  
follow-up procedures and safeguards, the composition and functioning of the  
committees, the application of the job evaluation plan and the vetting of  
committee results by statistical analysis.  
176. The Commission and the Alliance have advocated a standard of  
reasonableness to be applied in assessing job information and job  
evaluation. Also in assessing "damages", by which it is assumed is meant  
the measure of consequential relief afforded to the complainants by the  
provision of the Canadian Human Rights Act, a standard of reasonableness is  
to be applied.  
177. Respondent Counsel in his oral submissions when dealing with the  
onus of proof makes the following statement in Volume 226, at p. 29761,  
lines 16 - 24:  
Another point on onus of proof is this. The employer's position  
is that the standard for assessing reliability, the standard for  
assessing the process to decide whether we have reliability, is  
one of reasonableness. Did the process work well? It is not a  
question of whether the process worked perfectly or whether the  
job information was perfect. The employer has never contended for  
perfection.  
36  
178. In commenting on the Haldimand-Norfolk case, Counsel also makes  
the following observations in Volume 226, at p. 29761, line 25 to p. 29762,  
line 13:  
I might just contrast that with the Haldimand-Norfolk case, in  
which...the Tribunal said, "And we want a standard of  
correctness." Correctness sounds very like that they were looking  
for perfection. I don't know...In any event, the employer's  
position on this is that we are looking for did the process work  
well. That doesn't mean perfectly.  
179. So in the result the parties have themselves advocated a standard  
of reasonableness. Respondent Counsel's position is that when applying a  
standard of reasonableness to the results in this study the Tribunal must  
find that the results fall short of providing a reliable basis on which to  
render a favourable decision.  
180. What standard ought the Tribunal to follow in assessing the  
reliability of the results? The concept of reasonableness should be viewed  
in the context of what pay equity or comparable worth hopes to achieve and  
how it expects to achieve its goal. There are, as well, practical  
considerations as to its effects in the work place on the parties involved.  
181. Throughout his testimony, Willis, an acknowledged expert in his  
field, stressed that achievement of pay equity or equal pay for work of  
equal value as between male dominated jobs and female dominated jobs is not  
a scientific, mathematical or statistical endeavour. Rather it is an "art"  
based on a combination of analytical skills, comprehension, intuition and  
ultimately a subjective evaluation of the job within the framework of the  
plan while at the same time adhering to the discipline which the plan  
imposes.  
182. In an article by Judy Fudge of Osgoode Law School entitled the  
"Legal Standard for Gender Neutrality under the Pay Equity Act (Ontario):  
Achieving the Impossible?", the learned author in referring to a legal  
standard against which to judge the gender-neutrality of job comparison  
systems states:  
...to date there does not exist a conclusive method to demonstrate  
either gender bias or gender-neutrality in any particular job  
comparison system. For this reason, the Pay Equity Hearings  
Tribunal should adopt a reasonableness standard with respect to  
the issue of gender-neutrality.  
183. She then outlines minimum criteria for developing a gender-  
neutral job evaluation system. It is not necessary to examine those  
criteria for our purposes since all of the parties to this enquiry have  
agreed that the Willis Plan initially adopted at the outset of the study  
satisfies the minimum criteria and is therefore gender-neutral. What is  
useful for our purposes is to note the observations of the author (at p.  
20) where she acknowledges that:  
37  
...there does not exist a conclusive method to demonstrate either  
gender bias or gender-neutrality in any particular job comparison  
system. For this reason, the Pay Equity Hearings Tribunal should  
adopt a reasonableness standard with respect to the issue of  
gender neutrality.  
184. In commenting on the actual job evaluation process, the author  
states:  
No matter how scrupulous the design of the job comparison system  
in avoiding gender bias, bias can creep into the actual process of  
assigning job value points to jobs. In other words, the job  
evaluation system may be fair, but the application can be biased.  
185. Fudge then goes on to describe the use of job evaluation  
committees which if properly constituted following clearly defined  
procedures would minimize the possibility of bias.  
186. Also we refer to the earlier comments of Armstrong that the  
overall wage gap probably wouldn't disappear completely if pay equity were  
achieved in all jobs.  
187. What is apparent from these comments and from the nature of the  
subject is that equal pay for work of equal value is a goal to be striven  
for which cannot be measured precisely and which ought not to be subjected  
to any absolute standard of correctness. Moreover, gender-neutrality in an  
absolute sense is probably unattainable in an imperfect world and one  
should therefore be satisfied with reasonably accurate results based on  
what is, according to one's good sense, a fair and equitable resolution of  
any discriminatory differentiation between wages paid to males and wages  
paid to females for doing work of equal value.  
VI. FACTS  
A. THE WILLIS PLAN  
188. The framework within which work was evaluated during the JUMI  
Study was through the use of a job evaluation plan.  
189. One of the early tasks of the JUMI Committee was to select a job  
evaluation plan. The JUMI Committee created a sub-committee to examine  
various job evaluation plans and make recommendations to the JUMI Committee  
at large. Several plans were examined by the Sub-Committee on a Common  
Evaluation Plan. In the end, this sub-committee recommended the Willis  
Plan, designed by Willis, with some minor modifications to better meet the  
criteria of the Act. Following consultations with representatives of the  
JUMI Committee, Willis agreed to make changes to the plan including changes  
to the working conditions chart.  
190. The Commission also examined the Willis Plan and expressed its  
concern about the treatment of "effort" in respect to working conditions.  
There was also concern about the manner in which the plan dealt with  
"accountability". Willis agreed to change aspects of the Willis Plan such  
38  
that both physical and mental effort would be assessed in working  
conditions. He also agreed to changes in the treatment of accountability.  
191. All participants, including the Commission, appeared satisfied  
with the changes. Paul Durber, Director of Pay Equity, Canadian Human  
Rights Commission and a pay equity expert, provided expert evidence as to  
how the requirements of s. 11 of the Act and the Guidelines were captured  
by the Willis Plan. Durber stated an essential element of the job  
evaluation plan, used for purposes of pay equity, is that the tool be  
gender bias free. In Durber's opinion, there is nothing on the face of the  
Willis Plan which appears gender biased and there is nothing in the plan  
that would make it difficult to measure work traditionally performed by men  
as compared to that of women.  
192. The Willis Plan is complex in design. Willis developed this plan  
in 1974 after working with the consulting firm, Hay & Associates for three  
years. It uses a matrix format which permits evaluation of the four  
factors of skill, effort, responsibility and working conditions to be  
broken down into subfactors. The matrix design allows for one, two or  
three sub-factors to be assessed on a single guide chart with a total of  
four guide charts. A guide chart presents the criteria used in the Willis  
Plan. In some cases, one factor is imbedded in another. For example,  
interpersonal skills are measured within the levels of managerial skills,  
thus, how one scores managerial skills affects the number of points given  
for each of the levels of interpersonal skills.  
193. The Willis Plan is a point factor system, which simply means  
points are assigned to each factor. The point values are added together to  
arrive at a total point score for each job. The Willis Plan is designed  
geometrically. Willis has chosen a 15 per cent difference between any two  
levels of the plan. He finds this percentage of difference is a  
discernible difference in the semantic definitions of the different levels  
in the charts. He stated, if the differences were too small, evaluators  
would be unable to make a choice.  
194. The assigning of relative worth to each job is established by the  
number of points that are available for each factor in the Willis Plan, and  
there is almost an infinite number of points that are available. The  
relative number of points that are available for each factor contributes to  
the conclusion of relative worth of different positions. (Volume 77,  
p. 9377).  
195. Dr. Nan Weiner, President of NJ Weiner Consulting, Inc., a  
consultant specializing in pay and employment equity, was deemed an expert  
by the Tribunal in pay equity and compensation. She was asked to express  
an opinion on the Willis Plan, which she referred to as a system.  
Although, she had not worked with the Willis Plan, she stated there was  
nothing to indicate that it would in any way undervalue female jobs. She  
indicated a weakness in the Willis Plan could be that, given the breadth  
and diversity of the Federal Public Service, four levels of interpersonal  
communication are simply not adequate to differentiate across all the jobs  
which were evaluated. In her opinion, the Willis Plan attributes more  
points to knowledge and skill, and accountability or responsibility than to  
39  
effort and working conditions, and in some respects, favours white collar  
work over blue collar work. According to Weiner, it is important for the  
evaluators who use the system to ensure, through their discussions, the  
blue collar jobs are measured fairly.  
196. In her view, it is not the distinguishing element of the work but  
how the system is adapted by the user that is important. In this respect,  
she says in Volume 11, at p. 1564, lines 20 - 23:  
THE WITNESS: What is important is that you ask the people who  
actually use the system what they did to make sure the system was  
being used fairly for blue collar jobs.  
197. Willis testified the current modified Willis Plan does not have  
the points on the charts. Thus, the evaluators never know what the points  
are. They evaluate using pluses and minuses. A computer program  
determines the points. According to Willis, this frees the evaluator from  
knowing the point relationships between different jobs.  
198. Two elements emerge from the design of a pay equity job  
evaluation plan. The first is a pay equity plan must be capable of  
capturing and appropriately valuing female and male work. The second is  
the allocation of "weight" assigned to the various factors of the plan.  
For example, in the weighting of factors, the Willis Plan attributes many  
more points to knowledge and skills than it does to working conditions and  
physical effort.  
199. Willis described his weighting scheme for the Plan. He testified  
the weights were validated using market rates of pay. Criticism about the  
use of the market as a measure of validation was expressed by Weiner who  
stated market influences on the wages for female-dominated jobs are  
inconsistent with pay equity. Simply expressed, when job evaluation  
systems undervalue work traditionally performed by women, this becomes  
compounded in the market place. Accordingly, in her opinion, the market  
reflects the undervaluation of women's work.  
200. The last formal validation of the weights in the Willis Plan was  
done in 1985. Early in the study, Willis agreed, at the request of the  
JUMI Committee, to do a validation study of the weights, because of a  
concern expressed by the Commission. Willis did not believe this re-  
validation necessary as he had been using the system continuously and had  
no empirical evidence to demonstrate the factor weights were inappropriate.  
Management representatives ultimately decided not to perform the validation  
study because of cost considerations.  
201. During the lengthy course of these proceedings, the Employer  
challenged the weighting of the Willis Plan and its validity as a tool for  
evaluating jobs in a gender bias free manner. On that basis, the Tribunal  
heard a considerable volume of evidence. It was not until written  
submissions from Respondent Counsel were available that the Tribunal and  
the other parties were advised the Employer agreed the Willis Plan was an  
appropriate and acceptable evaluation plan for the purposes of the study.  
Moreover, it was then agreed by the parties that the Willis Plan met the  
40  
requirements of s. 11 of the Act and is an appropriate instrument within  
the meaning of s. 11 for these complaints. Reference is made to the  
Employer's written submissions at para. 41, p. 11, which read:  
41. Nevertheless, for purposes of this litigation, the Employer  
accepts that the Willis Plan was an appropriate plan to use in  
evaluating jobs in the Federal Public Service. Therefore, the  
Tribunal need not decide whether weighting the Willis plan is  
valid.  
202. Also, in oral argument, Respondent Counsel stated in Volume 218,  
at p. 28453, lines 4 - 11 as follows:  
Then, having covered all of those points, we say this: For  
purposes of this litigation the employer accepts that the Willis  
Plan was an appropriate plan to use in evaluating jobs in the  
federal public service. So that, in my submission, was intended  
to be a complete indication that no issue is raised in respect of  
the Willis Plan.  
203. The Commission has an obligation to assure the Tribunal the  
Willis Plan meets the requirements of s. 11 of the Act and s. 9 of the  
Guidelines. In this regard, the Tribunal was assured by Durber with regard  
to the Commission's view which essentially confirms that the Willis Plan  
meets the requirements of the Act.  
204. During oral argument, all parties agreed on the suitability of  
the Willis Plan as an appropriate tool for dealing with the complaints  
before us. Therefore, the Tribunal is persuaded and does find as a matter  
of fact that the Willis Plan is an appropriate tool within the requirements  
of the Act and Guidelines for the job evaluations which form the basis for  
this adjudication.  
205. The Willis Plan provides a tool to be used in assessing the  
relative value of work. But in and of itself it does not give a  
methodology to determine what is the wage gap between female positions and  
male positions. The determination of any wage gap is a function of  
comparing evaluations between male and female jobs. The system itself does  
not do that without a further step.  
B. THE WILLIS PROCESS  
206. The Willis Process was developed by Willis over the approximately  
24 years he spent as an independent consultant in the area of pay equity  
job evaluation. The Tribunal heard considerable testimony relating to the  
implementation of the Willis Process in the JUMI Study. In particular, the  
evidence covered the period of job evaluation which commenced in the fall  
of 1987 and concluded in the fall of 1989. In assessing the issue of  
reliability, we find it appropriate to review each aspect of the process to  
determine whether it achieved or fell short of achieving the aim of  
avoiding gender bias.  
41  
207. The Willis Process is a process for examining, assessing and  
evaluating jobs. Participants in this exercise were given the task of  
measuring the content of each of the positions examined, and asked to  
assign a value reflective of the total work of each position or job.  
208. Although Willis testified the job evaluation plan must be a sound  
instrument, he also insisted the process within which the evaluation plan  
is used is more important than the plan itself. According to Willis,  
everything done in terms of the process was aimed primarily at avoiding  
evaluations which would suggest traditional relationships or stereotyping.  
It was designed to avoid anything that might be identified as gender bias.  
Willis maintained throughout the study, that vigilance during the  
evaluation stage was of paramount importance, and he continually reinforced  
the need for objective, fair and equitable evaluations of all positions.  
209. By way of historical background, the process eventually agreed  
upon by the JUMI Committee was not Willis' preferred choice. Willis  
initially recommended his proposal for the consideration of the JUMI  
Committee which outlined processes and procedures to be employed in the  
study. Due to financial considerations under the control of the management  
side, his proposal was rejected. Willis then prepared a modified proposal.  
The JUMI Committee accepted the modified version after a number of "make or  
buy" decisions were made relating to certain aspects of the Willis Process.  
210. The modifications in the data-gathering phase included:  
(i) instead of having his consultants conduct the briefing of  
employees selected to complete questionnaires, he agreed federal  
employees could be trained to do this task;  
(ii) instead of having his consultants review or screen the  
completed questionnaires, he agreed to train a team of  
federal employees to perform this task under the direction  
of a consultant who would be available to oversee this  
process;  
(iii)instead of using consultants to conduct face to face  
interviews with select incumbents, he agreed to train a team  
of federal employees to conduct any face to face interviews  
deemed necessary; and  
(iv) in the later stage of the study, Willis reduced the amount of  
time and involvement that he and his consultants would have in  
the data-gathering phase of the study.  
211. Willis believed these modifications would result in a study which  
would be of sufficient quality to meet the requirements of the Act based on  
a number of safeguards he instituted to ensure that complete and accurate  
job information was obtained.  
212. We find it helpful to separately identify each step in the Willis  
Process, accompanied by the evidence relevant to each step, thereby  
42  
assisting us in determining the issue of reliability and the effectiveness  
of the safeguards.  
(i). Data-Gathering  
213. Willis testified data-gathering is a critical and most important  
step in a study of this type. He characterized four possible sources of  
data-gathering.  
214. One source of information is the job description. In reviewing a  
sample of job descriptions from the Federal Public Service, Willis  
determined they were out of date and not sufficient for this purpose.  
215. A second source is the closed-ended questionnaire, which is  
described as similar to a multiple choice question. Closed-ended  
questionnaires require very extensive, in depth and detailed knowledge of  
the job in order to structure them properly. A great deal of familiarity  
with the work is required so as to construct the kinds of alternatives  
provided in a closed-ended questionnaire. This is easier to do in a  
smaller establishment where there is less variety in the kinds of jobs.  
216. The advantage of a closed-ended questionnaire is that it leaves  
less for the comprehension of the incumbent in terms of their awareness of  
the range and content of the job. There is instead more reliance on the  
knowledge and comprehension of the person who structures the questionnaire.  
The disadvantage of a closed-ended questionnaire is that if the person who  
structured the questionnaire is not aware of the whole range of the work  
involved or if they are not fully aware of the kinds of considerations that  
must go into pay equity questionnaires, then the questionnaires may have  
fundamental bias built into them. (Volume 180, p. 22971). A closed-ended  
questionnaire is relatively easy for employees to complete but it is,  
according to Willis, fundamentally unsound because it permits employees to  
make value judgments about their work rather than providing factual  
information.  
217. A third source is an open-ended questionnaire which is more  
difficult to complete than a closed-ended questionnaire. Willis prefers an  
open-ended questionnaire to a closed-ended questionnaire. Armstrong  
explained the advantage of open-ended questionnaires is that they are most  
useful when there is a whole range of quite different jobs to collect  
information for evaluation. Open-ended questionnaires are also more useful  
with a literate workforce, which is the case with most of the employees in  
the public service. (Volume 180, p. 22971).  
218. Willis testified he constructed his questionnaire to obtain  
complete, definitive, accurate and up to date job information. (Volume 68,  
p. 8542). The data gathered in this study was obtained through an open-  
ended questionnaire (the "Willis Questionnaire").  
219. The fourth and last source of data-gathering involves a task  
force of professional job analysts who would interview each employee and  
then prepare the document. Willis has used this approach in a few  
43  
instances but stated that it would be "impractical" in the context of the  
Federal Public Service. (Volume 29, p. 3696).  
(ii).Willis Questionnaire  
220. Willis discussed the advantages and disadvantages of open-ended  
and closed-ended questionnaires in the context of a large study such as the  
JUMI Study. He preferred an open-ended questionnaire as opposed to a  
closed-ended questionnaire because his aim was to prevent incumbents from  
making value judgments of their own work. This will occur when a closed-  
ended questionnaire is employed. Willis states in Volume 65, at p. 8084,  
lines 12 - 14:  
I think the important thing is that the evaluator must make that  
value judgment, not letting the employee make it.  
221. This questionnaire was used in many previous Willis & Associates  
pay equity studies in both Canada and the U.S. The JUMI Committee  
established a sub-committee to finalize the questionnaire's format and  
content. The amended Willis Questionnaire was agreed to by the JUMI  
Committee. A guidebook was appended to each questionnaire as a source of  
assistance to an employee completing a questionnaire. The guidebook was  
also amended by the JUMI Committee to reflect the Federal Public Service  
environment.  
222. In summarizing his participation in the design of the  
questionnaire, Willis says in Volume 60, at p. 7429, line 18 to p. 7430,  
line 6:  
A. The questionnaire has been, I would say, developed over a  
number of years. It is probably the most worked up questionnaire  
that is in existence going back all the way to 1974. We have  
tried to modify it and change it over the years to make it easier  
for people to complete, but at the same time it is a totally open-  
ended questionnaire, which I think is necessary.  
The final design, of course, was a modification of the  
questionnaire by a sub-committee from the Joint Union/Management  
Committee. I think perhaps we have as good a questionnaire as you  
could expect to have for any study of this type.  
223. Willis participated in the suggested changes in the  
questionnaires and the guidebook and approved all of the changes which were  
made. He testified he was satisfied with the questionnaire and the  
guidebook in the form in which they were used in the JUMI Study. (Volume  
62, p. 7654).  
224. A portion of the questionnaire provides space for the incumbent's  
supervisor to make comments. The Questionnaire Sub-Committee had discussed  
and made changes to this portion of the questionnaire. It was Willis' view  
these changes were minor and he was satisfied with the questions in their  
final form. The questions for the supervisor reads as follows:  
44  
Carefully review the completed questionnaire, but do not alter or  
eliminate any portion of the original response. Please answer the  
questions listed below. We also invite you to consult with your  
manager on this subject.  
1. What do you consider the most important duties of this  
position and why? (Refer to Question III.)  
2. Comment on the accuracy and completeness of the responses by  
the employee.  
3. Please sign on page 34.  
IMPORTANT: Significant differences of opinion noted by the  
immediate supervisor should be reviewed with the employee.  
(Exhibit HR-34)  
225. Willis stated this kind of check on position information is  
intended to address two concerns: firstly, the tendency of some employees  
to overstate their jobs to some degree when there is no supervisor to  
review the information; and secondly and more importantly, often the  
supervisor will have additional information which the employee forgets that  
might be helpful in evaluating the position.  
226. One of the problems identified by Willis was obtaining good  
information from "sophisticated professional level positions." (Volume 68,  
p. 8544). He stated the higher the level of knowledge and sophistication  
of the job, the more it requires the understanding and interpretation of  
principles and theories, hence greater difficulty is encountered by the  
individual incumbents in describing their work. In a higher level job it  
is more difficult for the employee to document and describe their work in a  
way an evaluator can understand. On the other hand, a very simple cleaning  
job which follows specific procedures can be documented with relative ease.  
227. Another problem encountered in gathering information is ensuring  
adequate time be given to employees to complete the questionnaire. Each  
incumbent must be given sufficient time to complete the questionnaire,  
which is contingent upon the ability of the incumbents to express their  
jobs in writing. Not only is time an important element in this exercise,  
but also the effort and care expended by each incumbent.  
228. In Willis' expert opinion, the questionnaire was a good tool for  
obtaining factual up to date job information. In assessing the ability of  
the Willis Questionnaire to collect sufficient information for the  
evaluation committees, we note the remarks of Willis in Volume 62, at  
p. 7686, lines 16 - 22:  
Q. In terms of any of the times that the consultants were  
sitting in, are you satisfied that by the time those  
questionnaires came to be evaluated that there was sufficient  
45  
information for those jobs to have been properly evaluated in  
accordance with the Willis Plan?  
A. Yes.  
229. The JUMI Committee understood, from the onset of the study, the  
need to communicate to selected employees the importance of the study as  
well as the importance of providing thorough job information. As a result,  
the JUMI Committee established a Communications Sub-Committee to develop a  
communication strategy emphasizing the necessity of complete and accurate  
information and a prompt return of the questionnaires distributed for this  
purpose.  
230. The communication strategy included such items as: (i) a pay  
cheque stuffer explaining the purpose of the JUMI Study and containing an  
assurance to employees that classification levels would not be affected;  
(ii) letters to employees who were asked to complete a questionnaire; (iii)  
preparation of a video for employees designated as screeners/reviewers to  
be used in training of incumbents who would be filling out questionnaires;  
and (iv) training materials for coordinators.  
231. Employees were given assurances from the JUMI Committee their  
participation would not have a negative impact on their careers. They were  
also assured any information provided would not be used for any other  
purpose than the JUMI Study. The incumbents from male-dominated  
occupational groups were instructed that if a wage gap were found, their  
wages would remain unaffected.  
232. To counter possible problems in using an open-ended  
questionnaire, Willis implements checks and balances, or safeguards, to  
ensure that the evaluators had complete, definitive, accurate and current  
information. These safeguards will now be described.  
(iii). Coordinators  
233. Willis had originally proposed his consultants train incumbents  
in the completion of the Willis Questionnaire but accepted the JUMI  
Committee's decision to use coordinators as trainers which he considered a  
valid "make or buy" decision.  
234. The function of the coordinators included training incumbents on  
how to complete the questionnaires, conducting briefing sessions to explain  
the nature and intent of the study, responding to employee questions,  
distributing and explaining the Willis Questionnaire, assisting employees  
in completing the questionnaires when required, and coordinating the data-  
gathering process.  
235. Coordinators were designated as either national or regional  
depending upon their purpose and locale. The selection process and  
criteria applied in selection, used by the Alliance in appointing  
coordinators was described in detail by an Alliance witness, Elizabeth  
Millar, Head, Classification & Equal Pay, Collective Bargaining Branch.  
Similarly, the selection process of the Institute was described by Kathryn  
46  
Brookfield, Section Head of Research. On the other hand, no evidence was  
presented by the Employer as to the manner and criteria for selecting its  
employees for this role.  
236. Coordinator training sessions were conducted in the months of  
September and October, 1987. Materials, in the form of printed  
information, slides and videos were given to coordinators to assist in the  
training of incumbents. The coordinator training program lasted about a  
day and a half. Some additional exercises in coordinator training included  
practice sessions on eliciting the support of individuals who might be  
reluctant to complete the questionnaire, dealing with language difficulties  
and making arrangements for interpreters where necessary. All training on  
the Willis Plan was conducted by a Willis Consultant.  
237. With regard to the adequacy of coordinator training, Willis  
provided the following opinion in Volume 62, at p. 7657, lines 15 - 21:  
Q. In terms of the training, you participated in the training  
of the coordinators, or your consultants did?  
A. Yes.  
Q. And you were satisfied with the training that was  
given to the coordinators?  
A. Yes.  
238. Following training, each coordinator was then assigned a number  
of incumbents to train. The date for performing this task was to be  
decided by the individual coordinator although Willis wanted the training  
of incumbents undertaken as soon as possible. He also emphasized to the  
coordinators the importance of having incumbents complete the  
questionnaires as soon as possible after their incumbent training was  
given. Willis preferred the questionnaires be completed within a two week  
period subsequent to the employee training. Willis estimated it would take  
incumbents four to eight hours to properly complete the questionnaire.  
239. Following the training of the coordinators which was completed in  
October of 1987, approximately two-thirds of the questionnaires were  
received by February of 1988 and up to three-quarters were received by  
March of 1988. The Administrative Sub-Committee, established by the JUMI  
Committee, spent considerable time assessing the number of questionnaires  
which had been received and ways and means of obtaining all the remaining  
questionnaires. The final rate of return for the questionnaires was 95 per  
cent. A few questionnaires continued to come in over the summer and fall  
of 1988.  
240. The Tribunal heard from Brookfield who testified that many of the  
coordinators from the Institute commenced their training task very soon  
after receiving coordinator training. She explained that the employee  
training went on for a considerable period of time because the coordinators  
had a large number of employees to train and the employees were not all at  
a single work site. These factors required staggered training sessions for  
47  
coordinators to meet with different employee groups. Brookfield also  
indicated some of the incumbents could not be released from their work at  
the same time, and this factor also lengthened the period of time required  
for the training.  
241. Brookfield also expressed the Institute's view as to the calibre  
of training provided at the end of coordinator training. She said in  
Volume 168, at p. 21007, lines 5 - 16:  
A. They said to me quite frankly the more they did it, they  
felt the better they got and that they had received input from  
previous training sessions about questions that employees would  
have and they would respond to them at that point. But then,  
after that, they might think of more information or another way  
they might have addressed that concern and they would incorporate  
it in their next training session, perhaps up front, or be able to  
raise, if there weren't questions, possibilities and things they  
had gleaned from other training sessions.  
242. The Alliance had many more coordinators than the Institute,  
numbering approximately 100. Margaret Jaekl, Classification and Equal Pay  
Officer, Collective Bargaining Branch, of the Alliance, testified as to the  
effectiveness of coordinator training from feedback she received from  
Alliance coordinators. Jaekl states in Volume 200, at p. 25831, line 25 to  
p. 25833, line 8:  
Q. Did you receive feedback from the co-ordinators as to how  
they felt their role was being received, first of all, by  
management and, second of all, by those that they were training in  
the filling in of the plan?  
A. Yes. We had meetings from time to time with all of what we  
called our national co-ordinators. Each component had a national  
co-ordinator and then they had many regional co-ordinators, too.  
...  
A. The feedback we got generally was that they felt they were  
working well with their management counterpart. People were  
understanding their presentations. People were generally  
completing their questionnaires and returning them. Some people  
had questions and, in general, they felt comfortable that they  
were able to answer those questions.  
243. The JUMI Committee sought cooperation from management in granting  
time to employees for training. The uncontradicted evidence is there was  
good cooperation from the Employer in providing the selected employees with  
sufficient time during normal working hours to attend the training session  
and to complete the questionnaire. Incumbents were given time off with pay  
to complete the questionnaire which could involve up to eight hours, where  
necessary.  
48  
244. In Willis' opinion, the shorter the time lapse, between the  
training of coordinators and their training of the incumbents, the more  
effective the training would be. Willis' experience was he was able to  
track the quality of the questionnaires in terms of how soon the incumbents  
completed the questionnaire after receiving their training from the  
coordinators. According to Willis, quality goes downhill over time. In  
this particular case, Willis was not able to pinpoint when the quality  
began to decline. He found a variety of quality levels in the completed  
questionnaires. He remarked the earlier questionnaires possessed a higher  
quality. The Department of National Defence questionnaires were completed  
right on schedule. Willis testified these employees completed the  
questionnaires as they were supposed to be done and were "excellent  
questionnaires". Willis and his consultants noticed a "dropping off" in  
quality the longer it took for the questionnaires to be returned.  
245. There is little evidence concerning specific dates of  
coordinator-incumbent training sessions. Some portion of the delay can be  
attributed to the time supervisors took to read, comment on and sign  
employee questionnaires. Some supervisors waited until all of their  
employees had completed their questionnaires and signed them off en bloc.  
Willis admitted there was no way of knowing whether an employee had, in  
fact, filled the questionnaire out within the goal of 10 to 14 days after  
receiving their training or at a later time. It is noted there were 1,258  
incumbent substitutions in total involving 837 questionnaires.  
246. The evidence revealed the information from female employees came  
in sooner and was of better quality than the information received from male  
employees. Also, questionnaires from incumbents of high level technical  
and professional positions were returned later and contained weaker  
information than questionnaires from the incumbents of clerical and  
vocational positions.  
(iv).Screeners and/or Reviewers  
247. As the completed questionnaires were returned, one of the Willis  
Consultants, Jan Drury, was asked by Willis to select the best  
questionnaires for evaluation by the MEC. Drury expressed concerns to  
Willis about the overall quality of the questionnaires. As a result,  
Willis then instituted a back up procedure to obtain additional  
information. This involved a task force of employees, appointed by the  
JUMI Committee, referred to as screeners and/or reviewers. Their primary  
function was to screen incoming questionnaires for any gaps in information  
and/or inconsistencies.  
248. According to Willis, the screening of questionnaires is an  
absolute necessity in the Willis Process. It was Willis' original  
recommendation for the study that the consultants perform the screening and  
reviewing function. Normally, Willis would use his consultants to screen  
the completed questionnaires. The JUMI Committee decided to train federal  
government employees to perform this task. This triggered another "make or  
buy" decision by the JUMI Committee. The screeners/reviewers functioned  
throughout the duration of the study.  
49  
249. The screeners and reviewers were trained by Drury. They received  
more extensive training than the coordinators because the screeners and  
reviewers had to be familiar with the Willis Plan in order to assess  
whether the questionnaires were properly completed.  
250. Accordingly, the management side and the union side each  
appointed individuals to act as screeners/reviewers. Approximately 55  
individuals functioned in this capacity. Their responsibilities included  
undertaking certain technical tasks for each questionnaire, such as  
removing all gender and classification references. After identifying  
questionnaires requiring additional information or clarification, the  
screener/reviewer was then required to draft questions to ask incumbents in  
order to complete the necessary information. They also obtained further  
factual information respecting technical terminology found in the  
questionnaires and presented this information in terms better understood by  
an evaluation committee.  
251. Drury oversaw the work of the reviewers until March of 1988.  
Drury examined the review questions and notes drafted by the  
screeners/reviewers for each review completed on the questionnaires  
evaluated by the MEC. Subsequently, Diane Saxberg, on the union side, and  
Doug Edwards, on the management side were appointed Chief Reviewers as of  
March 7, 1988. The Chief reviewers were responsible for reviewing the  
draft questions of the screeners/reviewers.  
252. The screeners/reviewers interviewed the incumbents to obtain the  
required information. A high percentage of these follow up interviews were  
done by telephone and only a limited number, less than a dozen, were done  
in person. In some instances, obtaining this information required several  
telephone calls, some of which were extremely lengthy. The responses were  
then written up and appended to the questionnaires before being conveyed to  
an evaluation committee. The written responses were referred to as  
"reviewer notes".  
253. Willis wanted the screeners/reviewers to identify areas in the  
questionnaires where something may have been overlooked or left out, or  
where there might have been contradictions between what the incumbent wrote  
and the comments of their supervisor. They were also instructed to be  
alert to expressions of opinion or conclusion not supported by fact.  
254. The screeners/reviewers found only a "handful" of cases in which  
there was disagreement between the supervisor and incumbent. Saxberg  
testified in these situations she would talk to both individuals and, in  
most cases, reported the disagreement was more of a semantic nature than a  
substantive disagreement about job duties.  
255. Willis explained, based on his past experience, about 50 per cent  
of the necessary interviews can be conducted by telephone but the other 50  
per cent require a personal meeting, in order to obtain more substantive  
information, especially when dealing with higher level technical and  
professional jobs.  
256. Willis stated the number of times that a questionnaire has to be  
supplemented, whether it is 80 per cent or 30 per cent of the cases, does  
50  
not really impact on the quality of the questionnaire. According to  
Willis, it is the extra information which is obtained and put before the  
committee that counts.  
257. The evidence indicates there were some reviewers who previously  
had functioned as evaluators on an evaluation committee and who had been  
identified as "outliers" in terms of their evaluations. Willis defined an  
outlier as an individual, on an evaluation committee, who exhibits a  
divergence from the rest of the committee as a whole and gives higher  
scores to certain kinds of jobs or lower scores to certain kinds of jobs  
compared to the other members of the evaluation committee. (Volume 29, p.  
3793).  
258. Willis indicated one way of checking for validity, in the  
situation where a screener/reviewer is also an outlier, is to examine the  
questions they draft and the answers they give and determine whether the  
answer responds to the question. Willis testified he saw no indication  
these individuals were not recording the answers to the questions raised.  
259. The Tribunal heard evidence from three individuals who performed  
as screeners/reviewers. With regard to the effectiveness of telephone  
interviews for obtaining information, Christine Netherton states in Volume  
173, at p. 21919, lines 5 - 20:  
A. ...sometimes it only took half as long to get the  
information, but very often you would have to explain what the  
study was doing and they would say "Oh, I filled that out" and so  
and so. So there would be a lot of chat to get easy with them.  
And you tried not to rush people.  
I think the information did come back on the whole. And you would  
get this response from other reviewers as well.  
But there would be the person that did not like talking. I am  
talking of the impression I am left with. I am not saying that it  
was 100 per cent perfect. But the main impression is that in the  
majority of cases you did get good information via the telephone.  
260. Another reviewer/screener who testified, Mary Crich, said she did  
not often see examples of conflict in the summary of duties and  
responsibilities between the incumbents and the supervisors. Crich gleaned  
from the telephone interviews that employees enjoyed the opportunity to  
speak with someone about their job.  
261. Both Willis and Durber were asked about the competency and  
ability of the screeners/reviewers. A number of individuals who functioned  
as screeners/reviewers were familiar to Durber because of his lengthy  
experience in the Federal Public Service. Durber described them as  
"professional job evaluators as well as analysts". In his opinion, they  
would tend to be more competent to perform the tasks assigned to them as  
reviewers than others without similar backgrounds. (Volume 164, at  
pp. 20505-07).  
51  
262. With respect to the adequacy of their work, Willis said the  
following about screeners/reviewers in Volume 65, at p. 8136, lines 2 - 7:  
Q. But were you aware that after training reviewers had any  
difficulty understanding their job?  
A. I don't believe any of them had any difficulties. At least  
none were expressed to me.  
263. Following the screening/reviewing process, the questionnaires  
with the reviewers' notes would be turned over to an evaluation committee.  
If an evaluator on the committee required further information, questions  
would be drafted by the evaluation committee and would be passed back to  
the screener/reviewer to solicit the necessary information from the  
incumbent. The information obtained by the screener/reviewer would then be  
provided in writing and returned to the appropriate evaluation committee.  
264. Under the direction of Durber, the Commission examined  
questionnaires with a view to assessing their quality. During the hearing,  
the Commission introduced a report, An Examination of the Quality of  
Questionnaire Information used by the Federal Equal Pay Study (Exhibit HR-  
245). The report which examined the quality of questionnaire information  
was prepared by the Pay Equity Directorate of the Commission at the request  
of Durber, the investigator into these complaints. An experienced  
researcher, who possesses a Master's Degree in Canadian Studies from  
Carleton University, was commissioned to review a cross section of the  
evaluations. This included 63 benchmark evaluations and 588 non-benchmark  
questionnaires, a total of 651 questionnaires. Her task was to ascertain  
the apparent completeness and accuracy of all material in the  
questionnaires files collected as part of the JUMI Study. The researcher  
was closely supervised by Durber. As part of this work, Durber personally  
reviewed 36 files which were flagged by the researcher and found each to be  
in satisfactory condition.  
265. The researcher reported the legibility of the descriptions in the  
questionnaires was good in all cases and that the open nature of the  
questionnaire appeared to provide scope for answers for both male- and  
female-dominated occupational groups. Many incumbents enlarged on their  
duties by adding pages to this portion of the questionnaire.  
266. The Commission's report also recorded supervisor signatures were  
affixed to over 99 per cent of the questionnaires and in over 96 per cent  
of them the supervisors provided comments. Contradictory information from  
supervisors appeared in approximately 9 per cent of the questionnaires. In  
the questionnaires where supervisors provided conflicting information, 95  
per cent were resolved by subsequent interviews conducted by the  
screeners/reviewers.  
267. Durber expressed his own expectations about the quality of the  
questionnaire information when he said in Volume 158, at p. 19761, line 23  
to p. 19762, line 3:  
52  
I can only say that from my experience in the public service, what  
I did see was much superior to what I have seen in job  
descriptions and in job files, presentations even in grievance  
situations, just to try to put my own expectations into some sort  
of context.  
268. During cross-examination by Respondent Counsel, Willis was asked  
whether the safeguards implemented by the JUMI Committee to address  
problems in the data-gathering stage achieved what he wanted. Willis said  
in Volume 78, at p. 9543, line 3 to p. 9546, line 1:  
Q. Those safeguards -- and they were all described in your  
original proposals -- related to both information-gathering and  
evaluation. Right?  
A. Yes.  
Q. I am going to suggest to you that almost or wholly without  
exception the safeguards that were implemented -- and there were  
lots -- were not effective to achieve what you wanted them to do.  
A. I think it is fair to say that there were degrees of  
effectiveness that I experienced.  
Q. And the degree of effectiveness, I am going to suggest to  
you, is disappointing at best.  
A. Yes.  
Q. Part of the result of that is that when we come to the  
information that was made available to the five and nine  
committees, after all the shoring-up it was weaker than it should  
have been. Do you agree?  
A. I am not sure what you mean by "weaker than it should have  
been".  
Q. Weaker than is desirable for a good evaluation.  
A. I did feel -- and I expressed this to the Joint  
Union/Management Committee -- that the quality of the information  
was not as high as I would have liked. However, I felt that  
overall it was satisfactory for our purposes.  
Q. I understand, but you have also told us that it was weaker  
than what you normally get in other studies.  
A. Yes.  
Q. Even because of some of the weaknesses in the safeguards we  
have to raise something of a question mark or a flag, if you will,  
over some of the information that was actually obtained, some that  
is actually there, because of some of our discussion that it  
53  
wasn't written by a skilled job evaluator and some of the entries  
were made by outliers; all that discussion that we had. Do you  
agree with me?  
A. Are you suggesting that some of the information may have  
been inaccurate?  
Q. I am not saying that it is inaccurate. We don't know  
whether it is accurate or not. Our level of confidence in the  
information is below what we would like because the information  
is, to some extent, written by people who aren't skilled in doing  
this kind of writing, it was screened by people who aren't skilled  
in screening, interviews were conducted by people who aren't  
professional job analysts. That is what I am saying.  
A. I think I did express to the Joint Union/Management  
Committee, or at least to the Mini-JUMI, that we would expect a  
wider amount of disparity because of the information being  
somewhat weak.  
Q. But what I am suggesting to you, in addition, is that -- you  
say "weak". I am asking you whether you agree with me that even  
with what we have, we have a somewhat reduced level of confidence  
in its accuracy.  
A. I don't know if I can say that. Certainly what we would  
have liked would have been questionnaires that were more complete  
and that focused more on factual information. These are the  
things that always lead the evaluators to making certain  
assumptions, resulting in a wider range of possible disparities.  
269. In spite of the fact the quality of information was weaker than  
what was available to him in other studies in which he had been involved,  
Willis consistently maintained throughout the course of this hearing that  
the quality of the information was good enough for the purposes of this  
study.  
270. We note, in the course of further cross-examination by Respondent  
Counsel, Willis again gave his opinion on the quality of the information.  
This response is found in Volume 69, at p. 8612, line 22 to p. 8615, line  
15, where he says:  
Q. So what you are saying is that after all the shoring up, the  
information was still wanting to some significant degree.  
A. I would say that the information at best was satisfactory,  
but not superior.  
Q. Would there have been a range -- you make me think of our  
performance appraisals. We can get "satisfactory", "fully  
satisfactory", and "superior". Is that the table you are using?  
54  
A. Let me put it this way: I had some concerns about the  
quality. Telephone interviews and interviews by interviewers who  
were not professionally trained can never completely substitute  
for a well-completed questionnaire in the first place. While  
overall I would say the quality was sufficient for our purposes,  
particularly with the large numbers of evaluations -- again, if  
this had been Addiction Services with only 19 or 20 positions, I  
would have been very concerned because I knew that we had to  
tolerate a greater disparity than I would have hoped for as a  
consultant. Again, as long as the disparity is random and it will  
cancel itself out in the end, I felt that I could live with the  
result.  
What happens when you do have a questionnaire -- two things happen  
when you have a questionnaire that is somewhat weak: (1) it slows  
down the process, as we found out; and (2) we have to anticipate  
that there will be a wider tolerance for disparity.  
Q. All right. I want to stop you there because you said  
something again that I want to challenge you on.  
You are saying that the disparity cancels itself out. That is if  
you are looking for gender bias.  
A. If it is random, by definition it will cancel itself out.  
If there is a pattern that results, then it isn't random.  
Q. I am going to suggest to you that what it does -- if it is  
random, it cancels out gender bias.  
A. It will cancel out any bias.  
Q. Any bias, all right. But what it doesn't cancel out is  
unreliability. If you have extensive disparity, what you have is  
a lower level of reliability. I thought we agreed on that  
yesterday.  
A. I think a statistician would say that if you were dealing  
with a relatively small number, that would be very true. It is  
less true as the number of evaluations grows, and the disparity  
continues to be random -- that is, the pluses and the minuses tend  
to cancel each other out -- you can still achieve satisfactory  
reliability with a large number of evaluations.  
271. We also note, during cross-examination by Respondent Counsel,  
Willis reiterates his previous testimony in Volume 78, at p. 9566, lines 19  
- 22:  
I think I have already said that I feel that with all of the work  
we did on the data gathering that the data is good enough for the  
purposes of this study.  
272. And again at p. 9567, line 23 to p. 9568, line 14:  
55  
Now I am saying to you, Mr. Willis, what do we have to take away  
before you would say, "I will not defend this study"?  
A. Number one, I did make the statement several times that the  
quality of information was good enough. I would have blown a  
whistle if I felt that the quality was so low that we couldn't  
depend on it.  
Second, I did indicate that I felt strongly that I could not  
validate the results of the study if we couldn't do an assessment,  
an internal review of existing evaluations. That has been done.  
But what it would take? It is possible that I might look at that  
final analysis and say that I agree that we cannot use the  
results, but I don't know that.  
C. THE EVALUATION PROCESS  
273. In a large study such as the JUMI, involving a significant number  
of positions, Willis utilizes multiple evaluation committees. One  
committee is his preferred approach, but with myriad jobs, it is necessary  
to rely on more than one committee in order to evaluate efficiently and  
properly. Overall, there were 16 evaluation committees established to  
evaluate questionnaires.  
(i). Master Evaluation Committee  
274. The challenge for Willis is to design a process which enables the  
various committees to be consistent with one another over a relatively long  
period of time. As a guide and procedural safeguard, Willis creates a  
steering committee or a master evaluation committee. Willis stated it is  
necessary and essential in a pay equity exercise to make comparisons among  
dissimilar jobs. The master evaluation committee has the primary  
responsibility for establishing the relationships among different jobs and  
setting the frame of reference for the multiple evaluation committees.  
This exercise is what may be described as the master evaluation committee  
"discipline".  
275. The MEC evaluations are referred to as benchmark evaluations.  
The MEC evaluated a total of 501 evaluations. Benchmark evaluations are  
critical in a process where multiple committees are used.  
276. The MEC was composed of 10 members, one half management  
representatives, and the other half union representatives. One management  
representative and one union representative were designated as co-chairs.  
Willis did not select the MEC members, this was left to the parties  
discretion. Willis recommended its members have a government wide  
perspective of work performed, analytical/conceptual skills, dedication to  
completing a tough assignment and an ability to submerge feelings of union  
or management affiliation in order to achieve a balanced approach to  
evaluations. The parties attempted to structure the MEC to reflect that  
balance.  
56  
277. Willis testified the MEC had a good balance of males and females  
with a good variety of backgrounds. The MEC also had an even number of  
union and management representatives.  
278. According to Willis, the key to successful job evaluation is  
consistency in the interpretation of the evaluation factors as between the  
multiple evaluation committees. He uses three methods to test for  
consistency, all of which were employed in the JUMI Study. The first  
method is used in situations where a consultant is facilitating an  
evaluation committee or acting as an advisor. Here the consultant  
independently evaluates the same job using the same information the  
committee members are absorbing, while at the same time looking for  
committee patterns which may differ from the independent consultant  
evaluation. The second method consists in comparing individual evaluators  
to the committee as a whole. Finally, the third method consists of  
comparing committees to one another. Testing for reliability between  
evaluators, inter-rater reliability, and testing for reliability between  
committees, inter-committee reliability, will be described and examined in  
greater detail in a following section.  
279. Benchmark evaluations provide a broad frame of reference for  
evaluation committees and are utilized to achieve consistency and function  
as a kind of quality control in the evaluation process. More specifically,  
the term "discipline" refers to the liberalness or conservativeness with  
which the MEC interprets the evaluation semantics.  
280. The consultants must ensure the discipline is consistent, among  
the different evaluation committees. The discipline adopted by the MEC  
places a heavy responsibility on the multiple evaluation committees to  
evaluate the jobs and ensure they track well and are consistent with the  
jobs the MEC evaluated. That is, if the MEC evaluates a certain factor in  
a certain way, it must be adhered to by the other evaluation committees.  
281. Willis testified if the multiple evaluation committees were  
permitted to create their own discipline, the end result would be that the  
evaluations would be inconsistent. The evaluations might be consistent  
within themselves, that is, the multiple evaluation committees might treat  
all jobs fairly and equitably, but the degree of liberalness with which  
they interpret the semantics might differ. If the master evaluation  
committee evaluates a factor in a certain way, that same approach must be  
adhered to by the multiple evaluation committees, otherwise over or under  
evaluation of questionnaires arises. In Volume 60, at p. 7396, line 18 to  
p. 7397, line 4, Willis stated:  
Every evaluation committee adopts what I have referred to as a  
discipline, which is a conservativeness or liberalness in  
treatment of the evaluation factors. Once that discipline is  
established, if an evaluation comes in higher or the job is  
evaluated more liberally than the discipline would suggest by  
other evaluations, I would call that an over-evaluation. If the  
evaluation was more conservative than I would have expected  
compared with the overall consistency of the committee, then I  
would call that an under-evaluation.  
57  
282. Willis felt it critically important the MEC provide a sound  
evaluation basis for the other committees to use as a frame of reference.  
As to the relative quality of the questionnaires used by the MEC as  
compared to those used by the other committees, Willis stated that the  
quality of questionnaires used by the MEC was higher than used by the other  
committees.  
283. Willis requested Drury select benchmarks for the MEC, based on a  
broad representation of the depth and breadth of the organization. The  
JUMI Committee formally approved the criteria for selection at its July 10,  
1987 meeting. These criteria specify that benchmark positions would be  
representative of all occupational groups, different organizational levels,  
high population jobs, standard jobs, and mix of male- and female-dominated  
occupational groups in the total study population sample. As well, care  
would be taken to ensure that there was a sampling of specialized positions  
and that consecutive levels within a job series would be minimized.  
284. He also gave Drury another criterion for selecting benchmarks  
which was to pick questionnaires of the highest quality. Quality in this  
context, according to Willis, was completeness, definitiveness and factual  
content. Willis felt it was very important the MEC have the highest  
quality questionnaires.  
285. The MEC did not enjoy the luxury of receiving all the  
questionnaires beforehand and then selecting those to be used as  
benchmarks. Willis was instructed by the JUMI Committee to begin the MEC's  
work as soon as the first 50 questionnaires were returned. In fact, some  
questionnaires were still being returned when the MEC had finished its  
work. While Willis was satisfied overall that the MEC provided a good  
frame of reference, he could not say each of the criteria approved by the  
JUMI Committee for selecting the MEC's questionnaires was satisfied in  
selecting the benchmarks.  
286. At the beginning of the MEC's work, Willis functioned as the  
chair of the committee. After a period of time, Willis relinquished the  
role of chair to the MEC co-chairs who rotated on a weekly basis. The role  
of the chair was to facilitate the meeting, maintain a neutral posture so  
as not to influence the group, write the evaluations on the blackboard and  
lead the group through the consensus process. Willis spent some time with  
the co-chairs, coaching them as to what he was doing, and why he was doing  
certain things. It was about three weeks before they assumed this task.  
From that point on Willis sat in the back of the room as an observer and  
was called upon from time to time for interpretation. He also functioned  
as a facilitator during the "sore-thumb" or "interim review sessions"  
(another part of the process which will be explained later). He proceeded  
on that basis all the way through. Whenever it was time for a review  
session, he would take over from the group.  
287. After the MEC had completed its work, Willis suggested, for  
efficiency purposes, that a portion of the MEC benchmarks be designated  
"primary benchmarks". As the additional job evaluation committees began  
their work, they required access to the MEC benchmarks. Rather than having  
a complete set of all benchmark evaluations available to each evaluator,  
58  
primary benchmarks were identified and provided to each individual  
evaluator. However, each evaluation committee was provided with one  
complete set of benchmarks.  
288. The selection of primary benchmarks was based mainly on expected  
frequency of use and on other factors such as different organizational  
levels, different occupational groups, and the inclusion of different  
factors which were most representative of the jobs evaluated. At Willis'  
request, each of the MEC members produced a list of benchmarks, which was  
refined by Willis and in the end approximately 100 primary benchmarks were  
identified.  
(ii).Multiple Evaluation Committees  
289. Each of the remaining multiple evaluation committees had seven  
members equally divided between union and management. One member  
functioned as either a management or union chair of the committee. Again,  
Willis left the selection of these members to the parties. The Tribunal  
heard evidence from the Alliance and the Institute that care was taken to  
select individuals who were articulate, analytical, able to defend the  
evaluations and willing to work as a team. In terms of balance between  
the sexes, the Alliance attempted, without success, to recruit equal  
numbers of males and females. Their female evaluators were, however, often  
members of male-dominated occupational groups.  
290. Willis believes a mix of genders on a committee is important  
primarily because of perception. As he said, if a committee is all female,  
it could be viewed as a female-oriented study or might be perceived the  
other way if the committee was all male. Willis' experience is if a  
committee has "good" people on it, their gender is not important. Willis  
considers the background of the members more important than the sex of the  
individual doing the evaluation.  
291. Willis had recommended that no Federal Public Service  
classification specialists be on evaluation committees. This, however, did  
happen. Seven evaluators nominated by the Employer had extensive knowledge  
of the classification system in the federal government. They served on  
four evaluation committees and on the MEC. Willis' concern about  
individuals with classification background is they tend to bring what he  
refers to as "baggage" to the evaluations. Willis believes someone who is  
totally inexperienced will likely be more objective than someone with years  
of experience in classification.  
292. In this context, Willis described "baggage" as pre-existing  
knowledge and understanding of the relativities within an organization.  
For example, baggage refers to assumptions about work and are probably  
unconscious. He views baggage as biases based on incomplete information  
from which hidden agendas could arise because of those beliefs.  
293. Everyone carries "baggage" of one sort or another, according to  
Willis. It can, nevertheless, be minimized with an open mind and an  
objective, fair attitude when applied equally to all jobs so as not to  
improperly influence an evaluation.  
59  
294. Each of the five and nine evaluation committees consisted of  
seven members at all times; however, in many instances, substitutions did  
occur. The Tribunal heard direct evidence from 17 evaluators.  
295. There was testimony from one of the 17 evaluators, Christine  
Netherton, a member of the first version of Committee #1 (it functioned  
after the MEC was finished) concerning the element of baggage. One member  
of her committee had a classification background. Netherton testified this  
particular individual had difficulty appreciating other points of view  
because of her background in classification. When this kind of problem  
emerged, the committee would attempt to discuss it with the member.  
Failing a resolution of the problem they would obtain the assistance of a  
consultant.  
296. This problem was also identified by Willis with evaluators on  
Committee #3. The first formation of Committee #3 had numerous problems.  
Some of these can be attributed to the fact that certain of the management  
evaluators had former classification backgrounds. On the staff side there  
were evaluators committed to raising the scores of female-dominated  
occupational groups higher than was warranted. Willis described the way  
this committee functioned as "almost a stand off". Further details of the  
problems in Committee #3 are canvassed in Willis' evidence in Volume 57, at  
p. 7090, line 12 to p. 7093, line 20:  
A. Number 3 had some individuals on it who, on the staff side,  
were people who seemed to be committed to having the jobs of  
people in female occupational groups up as high as they could and  
two of the three on the management side were former classification  
people and they seemed to be devoted to keeping them as much in  
line as they could. It was almost a standoff.  
The Chair of Committee #3 was a union representative and, while we  
counsel the chairs very carefully to take a neutral position --  
that is, the chair, for their own credibility and not to have an  
undue influence, should be very careful how they led or how they  
facilitated the groups -- this particular chair almost became a  
fourth union evaluator. Not that she actually evaluated, but she  
entered into discussions in a way that leaned toward the union  
side rather than taking a neutral posture.  
Of course, the chair has the opportunity of consensing and moving  
on. She would never move on until her side seemed to be well  
represented. This was one case where I felt that it was  
imperative that the chairperson be removed and I so recommended to  
the Joint Union/Management Committee.  
Q. And what happened with regard to your recommendation?  
A. Nothing. The management side supported my recommendation,  
but the staff side refused to go along with it.  
Q. So, what was the result of this standoff? How do you feel  
it affected the evaluation process within Committee #3?  
60  
A. Interestingly enough, they tend to be a good match. One of  
the management side was a former classification manager and he was  
very forceful. It turned into a standoff in most cases.  
The problem was that they were evaluating slow and slower. While  
I would expect eight or nine evaluations a day, I was not able to  
get that sort of productivity from any of the committees. But  
this particular committee was evaluating two or three jobs a day  
and they were themselves becoming extremely frustrated. So, I  
felt that the exercise was detrimental not only to productivity  
but to the health and well-being of the members themselves.  
Q. You have mentioned two consequences of this standoff, being  
the health of the committee members and also the slow productivity  
rate. What is your opinion with regard to the actual evaluations  
performed by that committee?  
A. I can't say that we found any pattern of bias that grew from  
that committee. I am sure that we would have gotten a pattern if  
it hadn't been three on one side and three on the other. I think  
the evaluations, at least as far as we could determine, were okay.  
Q. What eventually happened to Committee #3?  
A. At the time that we expanded from five to nine committee  
[sic], we were able to remove the chair from the leadership role  
and place here as a voting member of one of the other committees,  
one of the nine committees. Surprisingly enough, her attitude  
seemed to improve dramatically at that point.  
Q. What do you mean her attitude improved?  
A. In the opinion of the consultant sitting with this committee  
and in the opinion of the chair of the committee, she was handling  
herself more conscientiously than she had.  
Q. So, with regard to the evaluations performed within the  
subsequent committee she worked on, do you have an opinion to give  
on that?  
A. I didn't see a pattern of problem developing either with the  
individual or with the committee itself.  
297. The best pay equity evaluation results are obtained from having  
truly heterogenous committees. The ideal profile of a job evaluation  
committee in this kind of exercise is to have individuals with different  
backgrounds, different experiences, with approximate numbers of males and  
females, individuals representing different unions and employees with  
different functions representing different departments and organizational  
levels. Willis' goal is to obtain individuals who can be called upon to  
evaluate conscientiously and fairly, not an easy task.  
61  
298. According to Willis, bias can occur in the use of job evaluation  
systems, not necessarily from the evaluation plan but on the part of an  
evaluator. This is the reason he thinks that the process itself is more  
important than the job evaluation instrument. A heterogenous committee  
cannot guarantee bias will not creep into a job evaluation process;  
however, with this kind of committee, there is a better chance of getting  
an objective result. As Willis states in Volume 29, at p. 3788, lines 18 -  
22:  
...we need people who are conscientious, who can be analytical,  
and who could be depended on to do their best to do what is right,  
rather than to protect their own particular field or area.  
299. The Tribunal heard evidence on the backgrounds, ages, positions  
held, skills, strengths and weaknesses of the members of the multiple  
evaluation committees. With some exceptions, the evidence generally  
supports Willis' criteria of a balanced committee. On the other hand, the  
evidence clearly indicates there were some evaluators who carried baggage,  
who had agendas, and could not be depended upon to evaluate jobs  
objectively. To the extent these evaluators may have affected the  
reliability of the results, we will review other procedural safeguards used  
by Willis in the evaluation process to determine how well the process  
worked.  
(iii). Process for Evaluation of Questionnaires  
300. Willis described the difference between what is commonly known as  
traditional job evaluation and pay equity job evaluation. Willis stated  
traditional job evaluation has been used since the early 1940s to evaluate  
primarily management jobs. Its purpose is to achieve some basis for  
applying pay differences among different levels of managers. On the other  
hand, Willis states pay equity requires comparisons of dissimilar jobs at  
all levels within an organization and in the market.  
301. The methodology tends to vary considerably between traditional  
job evaluation and pay equity job evaluation, although they both utilize  
evaluation committees. Routinely, in traditional job evaluation,  
committees are made up of managers, information is collected using job  
descriptions and interviews are conducted by consultants. The consultant's  
role becomes less intrusive once a committee is trained to evaluate. As  
Willis says, "the learning curve goes up rather dramatically."  
302. In pay equity job evaluation, Willis prefers committees that are:  
(i) "balanced", comprising equal representation of male and female; (ii)  
with cross-sections of different organizational levels; and (iii)  
representative of diverse backgrounds.  
303. Willis further testified pay equity job evaluation committees  
have to be trained in how to look at a questionnaire and to analyze the  
importance of a job. During this process, they must submerse their  
personal feelings about how jobs tend to fit together. Willis stated  
different problems are encountered in pay equity job evaluation than in  
traditional job evaluation. Primarily, problems arise in pay equity  
because of "people's feelings about job relationships."  
62  
304. Willis finds evaluators are comfortable in the context of  
traditional job evaluation because of their general understanding of jobs,  
as for example, a group of managers evaluating jobs in their own  
organization. Thus, "peoples feelings about job relationships" become less  
important in that context than in a pay equity job evaluation process where  
the consultants have to actually get the evaluators to look at things  
differently than they ordinarily would.  
305. The Willis Process requires evaluators to evaluate independently.  
The Willis Process prescribes a particular procedure which must be followed  
during evaluations. The procedure may be described as follows: each  
member of the committee reads the questionnaire on their own; then the  
evaluators discuss the information and raise questions about job content  
which Willis equates to the final step in the data-gathering process;  
during the discussion stage, Willis permits committee members to share any  
special factual knowledge about the nature of the work performed and the  
context in which it is performed; should any evaluator or committee require  
additional information, questions would be drafted at this time and sent to  
a reviewer; when the committee members have a common understanding of the  
facts, each evaluator is required to independently and confidentially, rate  
each factor pertaining to that position; subsequently, the consultant or  
chair collects all of the evaluation slips which contain each individual  
evaluators rating and transfers them to a blackboard, thus giving the  
committee a visual basis for making comparisons.  
306. There follows a discussion period in which the evaluators talk  
about their evaluation differences; if an individual has a slightly  
different rating for any given factor they are called upon to justify, in  
factual terms, their rating; Willis expects the other members of the  
evaluation committee to listen to the reasons of minority evaluators and he  
refers to this part of the process as the "consensus process"; he then  
permits individual evaluators to adjust the factors at this point, but only  
if they can demonstrate factual reasons for this adjustment; the consensus  
score is recorded and a rationale is prepared which explains essentially  
the reasons for the particular evaluation of each position, using criteria  
defined in the evaluation plan, and exemplified by the benchmarks.  
307. Although Willis advised the JUMI Committee it was only necessary  
for the MEC to prepare rationales, the JUMI Committee decided that the  
multiple evaluation committees should prepare written rationales as well.  
Problems arose in relation to the rationales as some were poorly written  
and difficult to decipher. Consequently, there were delays in their  
transcription. In the past, Willis has not used rationales because he does  
not consider them critical to the evaluation process although, he did  
testify they can be helpful. Willis counselled multiple evaluation  
committee members to use the rationales as a guide but, in every case, he  
wanted the evaluators to return to the MEC questionnaire and read it,  
rather than relying on a rationale. It was Willis' opinion it was  
impossible to capture all the things that an evaluator would need to know  
in order to evaluate a position in a one or two page rationale.  
308. Either before reaching consensus or after reaching consensus,  
depending upon the preference of the individual committee, the committee  
63  
looked at the MEC benchmarks and selected either similar or dissimilar jobs  
to ensure their ratings were consistent with the MEC benchmarks. If their  
evaluation scores were inconsistent with the MEC benchmark then the  
committee had to adjust its evaluation to accord with the benchmark  
evaluations.  
309. Willis stated it becomes fairly obvious, particularly to the  
consultants, when an individual demonstrates a gender preference during  
this process because it is difficult for an individual to provide factual  
information to support a preference based on feelings. Willis does not  
require unanimity for consensus, but requires a two-thirds agreement by  
members of the evaluation committee. Any evaluator in the one-third  
minority has an opportunity to persuade the group their rating is the  
correct one. As Willis stated, "...in the final consensus, we have to have  
at least two thirds of the people who feel the evaluation is right."  
(Volume 38, p. 4737).  
310. The evaluation committees tended to follow the evaluation process  
designed by Willis, that is, independent reading of the questionnaires,  
discussion among committee members to better understand the facts,  
individual rating of each subfactor, posting of individual ratings on a  
blackboard, arriving at a consensus and selecting appropriate MEC  
benchmarks.  
311. Willis testified a good committee possessing good job information  
can usually evaluate 8 to 10 questionnaires per day. On that basis, the  
JUMI Committee initially established 5 multiple evaluation committees, with  
the expectation that each committee would be able to evaluate approximately  
750 positions; however, productivity was much lower than originally  
anticipated for the MEC and the multiple evaluation committees.  
Consequently, in order to deal with the time delay and to solve other  
problems, Willis recommended and the JUMI Committee agreed on March 3, 1989  
to reform and expand the five multiple evaluation committees to nine.  
312. Many of the problems observed by Willis and his consultants  
occurred with the initial five evaluation committees. The circumstances  
surrounding these concerns are now detailed.  
(iv).Training of the Multiple Evaluation Committees  
313. The evaluators needed training in the use of the Willis Plan.  
Willis personally trained the MEC in October, 1987. Willis testified he  
was satisfied with the training of the MEC. (Volume 62, p. 7698).  
314. Training of the first five evaluation committees was undertaken  
by Willis and his consultants. Willis met with all five evaluation  
committees for the first day, and thereafter he divided the members into  
evaluation committees and assigned a consultant to each committee. When  
the five evaluation committees expanded into nine, all new committee  
members received individual training or, if it was a new fully constituted  
committee, then training was undertaken with the whole committee.  
64  
315. One of Willis' goals in training a committee is to ensure comfort  
with the Willis Plan. His training usually consists of explanations of the  
Willis Process, and on the job training with his consultants until  
evaluators become comfortable using the Willis Plan. Willis' approach is  
mostly "learned by doing". Training usually spans a two week period, and  
towards the end of the first day or maybe into the second day, Willis  
distributes a questionnaire and has the group go through an evaluation  
exercise. Willis instructs evaluators not to make assumptions about the  
work, and to look for facts when completing the questionnaire.  
316. Willis trains his own consultants in the Willis Process. Part of  
their training is directed at attitudinal problems relating to  
stereotypical work. This part of the training, with both consultants and  
committees, is informal. The perspective he conveys is to ignore whether a  
job is male-dominated or female-dominated. He discusses attitudes with his  
consultants and trains them to deal with attitudes in terms of examining  
pieces and components of a job, breaking a job down into a number of parts  
and examining the pieces without regard to the sex of the incumbent. The  
same method is then imparted by his consultants in training evaluation  
committees.  
317. Willis was asked to comment on a publication of the Ontario Pay  
Equity Commission (Exhibit PSAC-71), a commission established to assist in  
the implementation and administration of the Pay Equity Act (Ontario). The  
publication contained information on training job evaluation committees.  
Willis agreed in principle with the Ontario Pay Equity Commission's list of  
elements of appropriate training for an evaluation committee which include:  
information on the history of job evaluation; how salaries and wages were  
set in the past; pay equity and wage determination processes; how gender  
bias may enter into evaluation systems; trends in women's participation in  
the labour force; the rationale for pay equity; and specific mechanics of  
the system used by the organization in question.  
318. However, Willis prefers his approach which, over the past 20  
years, has been more pragmatic than the detailed criteria listed by the  
Ontario Pay Equity Commission. He says the following in Volume 209, at p.  
27088, line 23 to p. 27089, line 16:  
A. We have found through experience that the best way of  
dealing with differences in different kinds of jobs -- and,  
incidentally, there is no such thing as an all women's job or an  
all men's job any more. They are all some mix of men and women  
and there are all kinds of jobs.  
There are some features in men's jobs and women's jobs that are  
somewhat hidden and there is such a variety of kinds of jobs,  
particularly in the public sector, that our experience has been  
that we can best deal with it if we don't try to focus on men's  
work versus women's work at all, but, rather, focus on breaking  
the job down into factors and examining those factors without  
regard to whether it is a woman's job or a man's job, making sure  
that all of the hidden elements, whatever they are, are brought  
out.  
65  
319. Willis' own training in gender sensitization was not achieved  
through any formal program but rather "came from the school of hard  
knocks." (Volume 209, p. 27168).  
320. On further cross-examination by Complainant Counsel, Willis  
agreed consciousness raising or gender sensitization "is not a bad thing".  
In response to expert evidence from Weiner and Armstrong on this topic,  
advocating the kind of training recommended by the Ontario Pay Equity  
Commission, Willis, relying on his experience, said it may be helpful to  
have sensitivity training of this type, but that it is not absolutely  
necessary. More particularly, he confirms this in his response in Volume  
209, at p. 27096, lines 9 - 17:  
Q. So you are satisfied, then, that you can do a pay equity  
study and do fair evaluations of jobs without the kind of training  
that is suggested by Dr. Armstrong and Dr. Weiner?  
A. I would say that we have ample experience in evaluating male  
and female jobs in cases where there has been no sensitivity  
training per se, but that the consultant's guidance is sufficient.  
321. The mechanism or safeguard Willis uses to ensure sound, reliable  
results, in the absence of more formal gender sensitivity training, is one  
of consultant participation. With the exception of three weeks, Willis  
personally observed the work of the MEC. During his three week absence, he  
was replaced by his consultant, Drury.  
322. There were five consultants working on this study, including  
Willis. When the first five evaluation committees began their evaluations,  
each committee had a designated consultant for training and consultation.  
(Volume 60, p. 7433).  
323. Willis testified the role of the consultant is to evaluate  
privately while the evaluation committee is doing its evaluations. Willis  
said initially the MEC would have a short period of time to discuss the  
particular job selected for evaluation. As was his usual practice, Willis  
would have his own list of questions which needed to be answered with  
regard to a particular questionnaire. If this information was not brought  
out by the MEC members then Willis would raise these questions himself.  
This function was performed by his consultants with the later multiple  
evaluation committees.  
324. Willis summarizes the role of the consultant as generally to  
serve as a group facilitator, to be a trainer, to answer the committee's  
questions about evaluation techniques, and at the same time, to observe the  
functioning of the committee and to maintain a finger on the pulse. When  
the consultants do their own evaluation of the job, they do not communicate  
to the committee the result of their evaluation. The purpose of these  
evaluations is to enable the consultant to track the committee evaluations.  
In Willis' opinion, the consultants have two advantages over the  
committees: (i) they are professional evaluators; and (ii) they do not  
carry any baggage.  
66  
325. In Willis' opinion, a disadvantage of the consultant's role is  
that as "outsiders" they do not know the environment of the organization as  
well as the evaluation committee members and thus, do not know how  
differences are perceived within an organization. Willis also points out  
there is always the danger a consultant may be influenced by their  
knowledge of a job in another organization which may be similar but not  
exactly the same as a job within the study.  
326. The consultant is not only examining the factual basis each  
evaluator is using to justify their own evaluation, but is also examining  
what the committee is doing and more importantly ascertaining the  
committee's rationale for what they are doing for each of the subfactors in  
the Willis Plan. The consultants exercise what Willis describes as an  
"empirical judgment" during this process.  
327. Willis testified the MEC was a good and effective committee.  
Based on his own observation and the information received from Drury, he  
was satisfied with the degree of consistency the MEC had in terms of its  
own discipline. His overall assessment of the quality of their efforts was  
they were evaluating based on facts.  
328. From his personal observation, he identified two individuals who  
seemed to be outliers. The MEC did not tend to be influenced by these  
individuals. Willis' conclusions on how the other members of the MEC  
received and reacted to the outliers' comments was based on the remaining  
evaluators' reasons for evaluations and the overall consensus of the group,  
which were not affected by the outliers.  
329. With respect to the training of the multiple evaluation  
committees, Willis found at the conclusion of the first five days training,  
the majority of the members were barely comfortable with the system, but  
became more comfortable with the plan after two weeks of training.  
Individual evaluators who testified at this hearing also experienced an  
increase in comfort with the Plan as their work progressed.  
330. Willis recognized the need for constant vigilance in maintaining  
and understanding the plan so that evaluators would not revert to previous  
evaluation judgments. As a result, Willis met regularly with the initial  
five evaluation committees to review problems and suggest solutions. In  
addition, the Willis firm prepared technical advisories, written  
explanations by the consultants, which answered questions posed by the  
committees concerning the interpretation on the technical aspects of the  
Willis Plan.  
(v). Master Evaluation Committee's Evaluations  
331. Willis provided the Tribunal with his conclusions regarding the  
work of the MEC during the JUMI Study. He repeatedly maintained the MEC's  
evaluations were unbiased, he was comfortable with the MEC's work, the  
information the MEC was using was based on facts, and, in his opinion the  
MEC had done an excellent job.  
67  
332. On several occasions, throughout the JUMI Study, Willis was asked  
to assess the quality of the MEC's ratings. In response to Respondent  
Counsel, Willis commented in Volume 75, at p. 9202, lines 14 - 23, as  
follows:  
I probably examined those 503 evaluations and the differences to  
death. It was over a six month period that I was continually  
challenged about them. Every time I reviewed them by myself and  
with other consultants, we came up to the consistent opinion that,  
while there was some differences, the Master Evaluation  
Committee's benchmarks were satisfactory, recognizing that this is  
not an exact science; it is an art.  
333. The approach adopted by Willis to validate evaluation results was  
to personally, or have one of his consultants, re-evaluate selected  
questionnaires. The first testing of the MEC evaluations was conducted by  
Willis consultant, Drury, in the spring of 1988. Willis testified the  
purpose of Drury's review was not to validate the results of the work of  
the MEC, but to ensure the MEC evaluators knew how to use the Willis Plan  
and that they understood and were interpreting it properly.  
334. Willis was interested in whether the MEC evaluators were  
consistent among themselves. The MEC evaluators themselves wished to have  
a review as a double check while they were still learning the evaluation  
process. It was not made known to the Tribunal exactly how many MEC  
questionnaires Drury reviewed. Her review was done in the spring of 1988,  
and the MEC had been evaluating since the fall of 1987.  
335. For purposes of her review, Drury used only those questionnaires  
done when she was absent from the MEC discussions. In the end, she  
identified a total of 12 positions which she evaluated slightly different  
from the MEC's. Drury's differences arose only with female-dominated  
positions and the female evaluators on the MEC took exception to this fact  
and wrote a letter in protest. This letter was addressed to Willis and the  
Commission. Willis believes this controversy arose not so much because  
Drury was critical of the MEC's evaluations but rather because there was an  
appearance of singling out female jobs.  
336. Willis reviewed the 12 evaluations Drury identified. He  
considered Drury's assessments "sound" and communicated his findings to the  
Commission. Of the total, in three of the twelve questionnaires there was  
less than a 2.5 per cent difference in points between Drury and the MEC.  
Drury then met with the committee once more. The MEC made some changes in  
view of Drury's re-evaluations, but seven out of the twelve were left  
unchanged. Of these seven, Drury deemed the MEC had undervalued two and  
overvalued five. Willis was not concerned with the small number of  
differences per se, he was more concerned with the fact that of the seven,  
five were in one direction and two in another. This fact lead him to  
consider the possibility of a pattern of bias.  
337. Willis again met with Drury and with the MEC. Of the five jobs  
Drury deemed over-evaluated, four were nursing positions. After discussing  
the content of these jobs with the MEC, Willis concluded the MEC's  
68  
evaluations were satisfactory and he supported their original evaluations.  
In discussing this situation later with Drury, she admitted to Willis her  
past experience with nursing positions had been in the State of Connecticut  
and this had coloured her evaluations. Willis reasoned that nurses in  
Connecticut do not possess the same breadth of knowledge required from  
nurses in the Canadian system. Thus, in Willis' opinion, Drury was  
probably "a little off".  
338. In his final analysis of the re-evaluations, Willis did not  
believe the MEC had over-reacted to the Drury re-evaluations in any  
systematic way. He expressed this opinion in a letter to the management  
side co-chair of the JUMI Committee, Lise Ouimet. The letter was written  
on December 5, 1988 and reads in part:  
In the spring of 1988, we responded to a request from MEC to  
review and comment on the evaluations they had completed as of  
that time. Jan Drury reviewed the Committee's efforts and made  
recommendations regarding twelve evaluations. Of these, four were  
for total point adjustments of between 10.0 percent and 10.9  
percent, four were between 11.0 percent and 15 percent and one was  
slightly greater than 15 percent.  
The group reviewed her evaluations and explanations, both written  
and verbal, and changed their evaluations of two of the nine  
positions (including the one that showed a difference of slightly  
of 15 percent), leaving seven that are different from Jan Drury's  
evaluations by between 10 percent and 15 percent. Of these seven,  
Ms. Drury's evaluations were higher on two positions and lower on  
five positions. Five of these seven are nursing related  
positions.  
Comments by MEC members indicated that they believed there is a  
slight difference in the roles of Government of Canada nursing  
positions having specialty assignments than Ms. Drury's experience  
with nurses in the U.S. would suggest. For example, MEC gave more  
weight to #83 Staff Nurse-Sexual Offenders Unit's role in  
counselling of offenders than Ms. Drury did.  
I am not inclined to totally discount the MEC's judgment on this  
issue without more information and do not feel that these slight  
differences warrant concern. On the other hand, if you disagree,  
I suggest that these nursing positions be submitted to MEC for  
review including obtaining additional information regarding the  
significance of specialty assignments, and re-evaluated.  
(Exhibit R-35)  
339. Another procedural safeguard designed by Willis to address the  
issue of disagreements arising between the multiple evaluation committees'  
and the MEC was to permit and even encourage the multiple evaluation  
committees to submit their differences with explanations to the JUMI  
Committee. Willis had proposed in his plan that the MEC be reconvened to  
address these differences, and to either explain their evaluations in a  
more comprehensive manner or to adjust their evaluations to conform with  
69  
the results of the multiple evaluation committees. He believes this is a  
vitally important exercise.  
340. Willis explained once a committee has evaluated a number of jobs,  
they develop a sense of confidence in their own ability to evaluate and  
inevitably there will be minor differences in how a job is perceived.  
Willis said one approach would be to tell evaluation committees they would  
have to adopt the discipline of the MEC regardless of any disagreement.  
But Willis desired more open communication. If the multiple evaluation  
committees were not comfortable with an evaluation by the MEC, Willis felt  
they had an obligation and a right to note these differences, and that the  
MEC should review any challenges brought forward by the evaluation  
committees.  
341. Pursuant to the above, a total of 48 challenges to the MEC  
evaluations were brought forward by the multiple evaluation committees.  
There was disagreement within the JUMI Committee as to whether or not the  
MEC should be reconvened to review these evaluations. At one stage, the  
consultants were asked to independently review approximately 33 adjustments  
suggested by the evaluation committees. Willis testified in two-thirds of  
the cases, the differences were so nominal, that they were hardly worth  
considering, and those cases were discussed individually with the  
evaluation committees. Willis believed there were about 14 remaining  
questionnaire evaluations requiring review by the MEC.  
342. Willis did not wish to say what the change ought to be because he  
did not want to "second guess" the MEC. There were 14 he identified as  
"problematic questionnaires", suggesting a possible problem or at least  
enough doubt that they ought to be revisited.  
343. Willis felt strongly the MEC should be reconvened to put to rest  
the differences in interpretation between the MEC and the multiple  
evaluation committees. It was ultimately decided by the JUMI Committee  
that the MEC would not be reconvened, instead a smaller version of the MEC  
(the "Mini-MEC") was created. The Mini-MEC was composed of a small number  
of former MEC members. They were three in total, Willis, Joanne Labine, a  
union representative, and Michel Cloutier, a management representative.  
These latter members were previously identified as "outliers" on the MEC.  
Both outliers, Labine and Cloutier, had been identified by Willis in his  
direct observation of the MEC and that conclusion had been confirmed by  
statistical analysis conducted by an independent statistician. It is one  
of the incomprehensible decisions of the JUMI Committee.  
344. Not surprisingly, Willis questioned the JUMI Committee's decision  
to select those two outliers and suggested choosing two other individuals.  
According to Willis, "they stone-walled me" and he lacked the authority to  
overrule the JUMI Committee's decision. He felt the two outliers were ill  
prepared to represent the JUMI Committee because of gender bias in their  
original evaluations.  
345. The Mini-MEC considered the consultant's recommended changes to  
the challenged benchmarks. The union representative agreed with the  
70  
consultant and the management representative rejected all of Willis'  
recommendations.  
346. The Mini-MEC then suggested three options to the JUMI Committee  
which were made without Willis' consultation. These options included:  
OPTION 1  
It is proposed that MINI-MEC review the consultants'  
recommended changes to MEC bench marks (33).  
-Should the two MINI-MEC members agree, the challenged bench  
marks and rationales will be amended.  
-Should MINI-MEC after consultation with N.D. Willis or  
Jane [sic] Drury can not arrive at a decision, than  
MINI-MEC and the consultant will determine whether it  
is in the best interest of the study to remove the  
bench mark(s) in question.  
OPTION 2  
It is proposed that challenged MEC bench marks not be  
amended. In cases where MINI-MEC agrees that the  
rating is an inconsistent one, then the bench mark(s)  
in question(s) [sic] would be removed.  
This proposal is based in part on our opinion that it is  
to[o] late to attempt to change bench marks and have the  
committees adjust their rating patterns.  
It is to be noted that the two above options would necessitate re-  
sore thumbing in situations where a change to or the removal of a  
MEC bench mark is effected.  
OPTION 3  
It is proposed that should a situation arise where a  
committee is unable to reach a consensus on a rating, that  
the questionnaire be referred to MINI-MEC for resolution.  
It is also recommended that further challenges to MEC bench  
marks not be accepted.  
347. The JUMI Committee selected Option 2. As might be expected the  
Mini-MEC could not agree which benchmark rating was inconsistent and, as a  
result, none of the benchmarks was removed. Although disappointed, Willis  
did not feel the integrity of the process was invalidated. At this stage  
he believed the evaluations were "intact" and reasonable.  
348. Wisner performed the re-evaluations of the challenged benchmarks  
and suggested an average 4.2 per cent overall increase for the 14  
benchmarks under review. The purpose of the consultant's review was to  
71  
determine whether or not the differences were representative of a pattern  
of bias. The analysis by Willis did not demonstrate a pattern of bias, and  
Willis felt he could live with the JUMI Committee's decision not to recall  
the MEC. Willis did not believe the overall differences identified by this  
analysis between the consultants and the committee had a material adverse  
affect on the study.  
349. Willis testified Wisner's analysis illustrates the percentage  
difference between Wisner and the MEC's approach. He stated the  
consultants tried throughout the study to refrain from imposing their  
evaluations on the MEC. Willis was asked when he would impose the  
consultant's evaluations on the committees. He responded in Volume 57, at  
p. 7053, line 14 to p. 7055, line 10:  
THE CHAIRPERSON: Maybe you can tell us: Where do you draw the  
line between when you strongly make a recommendation or you  
strongly suggest or you advise? Where do you draw the line in  
terms of saying to the Committee, "This should be done", or do you  
ever do that?  
THE WITNESS: Yes. First of all, the consultant who is  
meeting or sitting with the committee will be privy to the  
questionnaire information and the discussion about the job. If we  
find at that time that people are not talking about the facts and  
are apparently not using the facts fairly and equitably, we would  
raise the question with the Evaluation Committee itself as it is  
proceeding.  
On the other hand, after the fact, looking at a series of  
evaluations, we might disagree slightly with the committee, but  
our concern would be whether or not that difference might be a  
difference of honest interpretation by the committee, it might be  
a difference between what the consultant knows about that kind of  
a job -- and we can be a little bit misled ourselves as  
consultants -- as opposed to the committee, which may have a  
better handle or feel for the content of jobs in their  
organization.  
If we identified a pattern that seemed to be resulting, then we  
would take very strong steps. Recognizing that these are value  
judgments, we have to have some tolerance. Just because we come  
out with an average of five or six per cent more overall than the  
committee, that doesn't necessarily mean that they are five or six  
per cent wrong. But our concern would be more: Is there a  
pattern here or is there a random difference? If it is a random  
difference, then we are not at all concerned, unless there is a  
possibility that they are misunderstanding how to use the  
instrument itself.  
THE CHAIRPERSON: Why were you strongly advising that MEC  
reconvene?  
72  
THE WITNESS: Just because of the psychology of the  
committees, they felt very strongly about this. Even though there  
may be a very slight difference in a job, they feel uncomfortable  
if they haven't at least had a hearing.  
350. According to Willis, stemming from the JUMI Committee's decision  
not to reconvene the MEC, a considerable amount of frustration was  
experienced by the multiple evaluation committee evaluators. The  
consultants were obliged to tell the evaluators the JUMI Committee had made  
a policy decision and that there would be no changes in the benchmarks  
resulting from the committee challenges. Willis suggested that the  
committees try to work around them. He believes many evaluation committees  
tended to take alternate MEC evaluations for comparison with their own  
evaluations and tended to ignore the MEC evaluations which had been  
challenged.  
351. Willis indicated the level of frustration was highest when the  
evaluation committees expected and were waiting for the MEC to reconvene.  
When they were informed this was not going to happen they became more  
resigned.  
(vi).Multiple Evaluation Committees' Evaluations  
352. Some of the first five evaluation committees tended to negotiate  
rather than cooperate in trying to achieve a consensus. Willis found the  
evaluation committees tended to balance each other fairly well, but the  
obvious result was lower productivity. In the initial formation of the  
five evaluation committees, Committee #3 was more contentious and less  
productive than the others. Willis trained Committee #3 and led them for  
the first three weeks of their evaluations. He met with the chair weekly  
to try to work with her, as he considered her part of the problem. He sat  
with this committee a great deal of the time, working with them and  
monitoring their evaluations. He was better acquainted with this committee  
and their problems than with any other committee. Willis observed  
Committee #3 had individuals on the staff side who seemed to be committed  
to rating jobs from female-dominated occupational groups as high as they  
could and two or three on the management side, some of whom had former  
classification backgrounds, who were devoted to keeping the jobs as much in  
line as they could. He described this as a "stand-off".  
353. Willis testified the chair of Committee #3, a union  
representative, sometimes assumed the role of evaluator and entered into  
the discussions in a way he believed was inappropriate for the chair. The  
proper role of a chair is to assume a neutral posture and to facilitate  
committee discussions. Willis subsequently recommended to the JUMI  
Committee that the chair of Committee #3 be removed and he eventually  
recommended the entire committee be disbanded. However, the JUMI Committee  
rejected his recommendations and nothing happened to improve Committee #3's  
situation until the reformation and expansion into the nine committees.  
354. Willis described a good functioning evaluation committee as a  
team working together with each member trying to evaluate fairly and  
equitably. His discomfort with Committee #3 was not the fact of the actual  
73  
evaluation ratings but rather the manner in which they evaluated. This  
committee would debate until finally they would agree due to exhaustion.  
355. Willis did not believe the "stand-off" he described between  
management and union evaluators on Committee #3 negatively affected the  
evaluation process in that committee. Neither he nor his consultants could  
detect any pattern of bias in Committee #3 evaluations. However, as a  
consequence of the "standoff", some committee members experienced health  
problems and the productivity rate suffered noticeably.  
356. Willis also had a problem with the initial formation of Committee  
#4. He testified Committee #4 was an excellent committee from its  
inception to about March of 1989. However, in the latter stages, due to  
substitutions and the reforming of this committee, problems developed. In  
April of 1989, Willis requested Committee #4 undergo a final sore-thumbing  
exercise. During this exercise the chair of this committee came to him,  
almost in tears. Willis testified she said, "I can't handle this any more.  
It has all broken down, they are all getting emotional, they are yelling at  
each other. We have a job to do and I quit." In the JUMI Committee  
minutes of October 31, 1989 (Exhibit R-44), Willis remarked, with regard to  
the consultant report on Committee #4, the "major problem with Committee #4  
was its lack of objectivity, creating the disastrous consequence of two  
camps, separate agendas, and arbitrary and opposing viewpoints."  
357. At this point the committee had evaluated 52 jobs. Willis then  
requested that the remaining committee members state in writing their  
individual concerns about the evaluations and suggest any changes which  
they thought were necessary. He then disbanded the committee.  
Subsequently, a Willis consultant, Robert Barbeau, reviewed the specified  
concerns, made recommendations and was asked to take appropriate action.  
The committee members made suggestions on a total of 25 jobs and there was  
only one in which the consultant differed significantly from the committee  
members. Willis described this as one instance where there was consultant  
influence on the evaluations, albeit a small amount.  
358. Willis did not observe any problems occurring with Committee #1  
or #2 during the initial formation of the five evaluation committees.  
359. Willis' observations of Committee #5 were that the evaluators  
tended to be extreme, on one side or the other, but not as extreme as  
Committee #3 and their productivity "tended to move along". Willis  
identified one female union representative demonstrating a female  
preference and two male management representatives demonstrating a male  
preference. Willis further testified a female union representative also  
demonstrated a male preference. Willis found two of these evaluators, a  
female union representative and one of the male management representatives,  
tended to cancel each other out. Willis observed that the other members of  
the committee were not influenced by these two individuals and tended to  
discount their positions.  
360. Willis further found the evaluations generally produced by  
Committee #5 to be "pretty good". He identified two members of the  
committee as outliers but later recommended they become chairs of the  
74  
expanded nine committees notwithstanding, because he considered them to be  
good evaluators.  
361. The Tribunal heard evidence from three evaluators who were  
members of Committee #5. Each confirmed this committee's thoroughness in  
discussing jobs and diligence in completing their task. Their evidence  
further corroborates Willis' view that the outliers did not influence the  
consensus of the committee.  
362. There was evidence provided by two evaluators who were members of  
the first version of Committee #5, to the effect that the questionnaires  
discussed in this committee were difficult. One of these evaluators, Mary  
Crich explained the committee's long discussions resulted from very  
difficult male jobs.  
363. Pauline Latour, another evaluator on Committee #5, states in  
Volume 171, at p. 21604, lines 20 - 25:  
A. We had a difficult -- the questionnaires in Committee 5, I  
have a sense that they were more difficult to evaluate. There  
were many that we seemed to have unanswered questions. So, we  
definitely returned more questionnaires in Committee 5.  
364. Latour further elaborates on this point in Volume 171, p. 21605,  
line 12 to p. 21606, line 9:  
Q. You mentioned just a short time ago that there were some  
jobs that you recall as being more difficult to evaluate than  
others. Could you describe which of these -- give us perhaps some  
examples of the types of jobs that as a committee you found more  
challenging than others.  
A. This perhaps is going to be a bit of a convoluted answer,  
but, for example, the jobs that we were comfortable with were jobs  
that we had rated many similar positions. For example, we  
evaluated many secretarial jobs which were evaluated at quite a  
range, from typists to senior executives. We had a good  
understanding of the nature of the work.  
There were some positions where we evaluated basically one or two  
jobs that were related and we never had a sense of how that job  
actually fit in the section that that person worked in. So,  
because they were so unrelated, there were quite a few positions  
that were unrelated, we really had a difficult time just grasping  
the level of complexity of that position.  
365. The Tribunal heard direct evidence from 15 witnesses, who were  
evaluators on Committees #1, #2, #3, #4, #5, #8 and #9, about their  
experiences and perceptions while serving on their respective committees.  
Evidence about Committees #6 and #7 was provided by Willis and another  
Willis consultant, Owen. Neither one expressed any serious concern about  
what they observed on either of these committees.  
75  
366. In terms of the direct evidence from these 15 witnesses, the  
Tribunal was impressed with their individual level of commitment to the  
study. Although job evaluation is a systematic process that is mentally  
challenging, the fact remains these individuals endeavoured to achieve a  
consensus evaluation for each position, eight hours a day, five days a  
week, over long periods of time. Willis observed variations in the  
productivity of the committees. The productivity record based on a total  
of 3,185 questionnaires is as follows: (i) Committee #1 - 466; (ii)  
Committee #2 - 431; (iii) Committee #3 before expansion to 9 committees -  
165; Committee #4 - first version - 200 evaluations; second version - 52  
evaluations and after expansion of 9 committees - 160 evaluations;  
Committee #5 - 430 evaluations. After the expansion to 9 committees,  
Committee #6 - 197 evaluations, Committee #7 - 149 evaluations, this was  
francophone committee; Committee #8 - 150 evaluations, this also was a  
francophone committee and Committee #9 - 145 evaluations.  
367. Given his experience in previous studies, Willis expects a  
certain amount of conflict within an evaluation committee because of the  
different backgrounds and perspectives of the various evaluators. However,  
Willis testified the degree and nature of the conflict he observed in this  
study within the evaluation committees made him feel uncomfortable.  
368. Some of the problems which arose in the multiple evaluation  
committees had been anticipated by the JUMI Committee. The Testing Sub-  
Committee of the Willis Evaluation Plan, in its report of July 20, 1987  
(Exhibit HR-11A, Tab 19), made recommendations in response to problems  
experienced by this sub-committee during a two week trial period. Some of  
the problems included personality conflicts, weariness owing to constant  
concentration and stress in being seconded from their regular jobs for long  
periods of time. As a consequence of this experience, the sub-committee  
recommended the rotation of evaluation committee members between evaluation  
committees, working a shorter day or week and the utilization of alternate  
members to replace designated committee members for periods of time. These  
recommendations were never acted upon by the JUMI Committee and the  
Tribunal was not provided with reasons for the rejection of these  
recommendations.  
369. The evaluators testified they experienced tension as committee  
members, stress in reaching a consensus, personality conflicts,  
inflexibility on the part of some individual evaluators, difficulties with  
some chairpersons and screaming by some evaluators. In some instances  
evaluators walked out of evaluation meetings because of frustration.  
Compounding these problems was the frequent rate of substitutions of  
members for some committees. This resulted in a change of dynamics  
requiring adjustments by both new and older members.  
370. Coupled with these problems, was a rigid working environment  
orchestrated and controlled by the Chief of the EPSS, who was apparently  
more flexible with management evaluators than union evaluators. The Chief,  
Pierre Collard, closely monitored the arrival and departure time of the  
evaluators, the lunch breaks and the coffee breaks. He insisted doors  
remain closed at all times during deliberations (causing ventilation  
problems), limited access to telephones, and kept all supplies in locked  
76  
compartments (thus creating time delays for obtaining supplies). These  
very stringent constraints intensified the frustrations already experienced  
by committee members. Moreover, some evaluators were "from out of  
province" and found it difficult to wait for long periods to be reimbursed  
for their travel expenses. This issue, in particular, was not resolved in  
a timely fashion.  
371. Many evaluators who testified at this hearing, expressed a  
willingness for and the necessity of adhering to the MEC benchmarks as well  
as to the requirement that evaluations were to be based upon facts  
contained in the questionnaires, and not on any other extraneous  
considerations.  
372. The only criticism the Tribunal heard concerning the committees'  
willingness to follow the MEC discipline was that for a short period of  
time, early in the evaluation process, Committee #1 tended to follow its  
own discipline rather than that of the MEC. This problem was corrected as  
soon as it had been identified by the Willis consultants.  
373. In the early part of the year, 1989, Willis began to express to  
the JUMI Committee, his concerns, not directly related to the actual  
evaluations themselves, but concerns regarding "circumstantial things"  
which had transpired. He referred to these incidents as "smoke" because  
they were largely rumours and included incidents which occurred both inside  
and outside the committee rooms. He became increasingly uncomfortable with  
how the evaluation committees were working and with what he described as  
confrontations between union and management sides. Although he could not  
identify anything specific which would suggest gender bias was developing,  
based on his own observations and those of his consultants, he knew "some  
things were happening" and some improper attitudes were developing causing  
him a great deal of concern.  
374. On several occasions, while Willis sat with a committee, it  
became clear to him a position taken by a particular evaluator was very  
biased. Usually, the individual evaluator refused to change the score,  
even though lacking the facts upon which to base a rating. The frequency  
of these occasions began to disturb Willis. It was occurring, he observed,  
on both union and management sides, and arose more frequently from the  
earlier formation of the five evaluation committees and their members.  
375. Willis said that he was made aware of union members attempting to  
recruit other evaluators to their "bloc". He had not seen this phenomenon  
in any other evaluation study in which he has been involved. Willis did  
not observe directly any incidents regarding this recruitment. He was  
informed, however, by Owen, of an incident in which an evaluator approached  
another evaluator about the evaluations. Owen testified about the  
circumstances surrounding this incident which occurred in February, 1989.  
Owen testified he overheard a conversation between two female evaluators  
who entered a room in which he was working. He overheard one female  
evaluator say to the other "we don't think you're doing enough for women's  
jobs." According to Owen, the other evaluator became agitated, her voice  
increased in loudness and he heard her reply "I didn't come here to build  
77  
up some kinds of jobs. I came here to do an honest job of evaluating the  
work."  
376. Owen further testified he observed a sort of "faction-based"  
behaviour in the committees. There were some union evaluators who seemed  
to be treating certain jobs in a similar way as union evaluators in other  
committees. He identified them as Alliance members. What troubled Owen  
was in his prior experiences, which involved training and facilitating more  
than 50 evaluation committees, he had not observed any kind of similar  
behaviour. He also noticed unusual scoring, long discussions advocating a  
particular choice, and the selection of benchmarks inappropriate to the  
particular evaluation at hand. In another incident, during the initial  
formation of the five evaluation committees, Owen was asked to chair  
Committee #3, because the regular chair was participating as an evaluator  
elsewhere. When the chair returned to the room, a very contentious  
argument concerning an evaluation was taking place. The chair asked Owen  
to rule on how to proceed and asked for points of order similar to Roberts  
Rules of Order. Owen was completely unfamiliar with Roberts Rules of Order  
and was thus unable to give an appropriate response. The chair's reaction  
was to order and instruct the Alliance evaluators on this committee to walk  
out, which they did, slamming the door as they left. Owen viewed this  
unhappy incident as an attempt by one side to control that particular  
committee.  
377. Like Willis, Owen felt frustration at not having any level of  
opportunity to intervene or take action.  
378. Another incident noted by Owen occurred during the fall of 1988.  
Most of the Alliance members did not attend their committees on a  
particular day as they had designated it the day for a "sick out" to  
demonstrate their support for pay equity issues at the collective  
bargaining table. Apparently, collective bargaining was under way and  
there was some discussion among union members as to whether the proposals  
on pay equity would be withdrawn from the bargaining process. Two Alliance  
members who did not attend the sick out, told Owen that they were concerned  
about reprisals from their union for not having participated in the sick  
out.  
379. Among the committees, Willis felt the conflict was too much "us  
versus them". Willis confirmed he had never seen so many participants with  
a classification background in a pay equity study and this was "an  
important aspect in the conflict in this case."  
380. Willis testified if the Federal Public Service was his  
organization and he had control over the evaluation process and decision  
making authority, he would have made some changes and continued with the  
study. His preference would have been to remove the personnel creating  
problems and engage more consultants to work closely with each committee.  
381. Willis' expert opinion is that gender bias can operate very  
subtly in a pay equity study, and he felt in order to defend the results,  
he had to reassure himself there were no problems with the evaluations.  
Willis was not sure the actual problems which existed resulted in biased  
results. He stated in Volume 69, at p. 8654, lines 8 - 14:  
78  
I have mentioned that there was an interesting contradiction. I  
had some very strong concerns about attitudes, things we observed.  
However, when we attempted to look at what committees' results  
were and when we tried to look at comparison of similar jobs, we  
were not able to detect a clear pattern of a problem.  
382. During the evaluation of the first five evaluation committees,  
Willis testified that based on his observations, he identified ten  
evaluators who he believed were exhibiting gender preferences. According  
to Willis, the majority were exhibiting a female preference. His approach  
in dealing with this problem was to counsel these individuals. At this  
stage, Willis could not determine whether the identified evaluators were  
influencing the group evaluations. He was concerned he had no evidence  
other than his personal observations for support. He alerted his  
consultants who were already aware of the particular individuals. He and  
his consultants continued to track these individuals and to look at the  
results of the evaluations overall.  
383. Another approach of the consultants was to break the evaluations  
down by occupational groups and determine if these individuals were  
influencing the group and to then attempt to identify on an overall basis  
if there appeared to be any problems with bias. This tracking did not seem  
to indicate any significant bias.  
384. When counselling individual evaluators, Willis would sit with  
them in a private room and discuss their evaluations and what changes he  
expected from them. Willis testified he did not see any difference or  
change in the evaluations of the individuals after they received  
counselling. Willis was informed during counselling of some management  
evaluators that they were evaluating to offset evaluations on the part of  
the union evaluators. For the most part, Willis did not receive denials  
from any of the evaluators whom he counselled as to their behaviour.  
385. Throughout the study, Willis also conducted committee  
counselling. He observed an evaluation committee as they were evaluating.  
In his interventions, he attempted to direct the evaluators to the facts,  
to look at the questionnaire and discuss the actual position rather than to  
make assumptions or stereotype. As to the effect of committee counselling  
Willis said the following in Volume 57 at p. 7087, line 9 to p. 7088, line  
5:  
Q. With regard to the last type of counselling you just gave  
for the evaluation committees as a group. I had already asked you  
as to your opinion of the efficacy for the individuals. Now I  
want to know what your opinion is with regard to how well the  
counselling of the evaluation committee groups worked?  
A. That is a little hard to say. These committees were  
somewhat unusual compared to most committees I work with, in that  
I was not observing actual evaluation bias or any pattern that I  
could identify. On the other hand, I did not have committees that  
were all working together to accomplish a fair, equitable,  
conscientious result.  
79  
What I had in many committees were the staff on one side and the  
management on the other side and they were at loggerheads. This  
was a pattern that was not universal, but we found it on several  
committees. The extent to which our counselling affected them, in  
some cases, was negligible. [emphasis added]  
386. During later testimony, Willis was asked to explain what he meant  
in the above excerpt by the words "I did not have committees that were all  
working together to accomplish a fair, equitable, conscientious result."  
Willis explained his reference is primarily to the word "conscientious".  
To him this word suggests an employee is working hard and meeting their own  
personal standards. In this context, Willis testified every individual he  
observed on every committee was evaluating conscientiously. On the other  
hand, the consultants attempted to instill a standard by which every job  
would be treated fairly, objectively and impartially. In that context,  
Willis said he observed evidence, which was not pervasive among all  
evaluators and committees, that this standard was not being consistently  
applied.  
387. The testimony from the participating evaluators who were asked  
about how they personally approached evaluations was that they were honest,  
dedicated and conscientious. They observed the same commitment from most  
of their committee members.  
388. Specific questions were posed to the evaluators who testified  
about Willis' concerns, referred to as "smoke". The questions posed  
concerned rumours some committees were "block voting", meaning union  
evaluators would vote together to obtain the same score for subfactors and  
all of the management evaluators would vote together to obtain their same  
score and about other methods of communication including the use of sign  
language and hand signals to indicate how specific evaluators were scoring  
so as to influence decisions.  
389. None of the evaluators who testified observed this kind of  
behaviour or any other kind of organized communication designed to over-  
evaluate female jobs and under-evaluate male jobs. Apparently, hand  
signals had been discussed in a social setting, which one witness believed  
resulted from frustrations expressed about the difficult process of job  
evaluation. This action was given and received in a joking manner.  
390. The Tribunal heard direct evidence of three separate incidents of  
inappropriate behaviour. In the first incident, both evaluators were female  
representatives of the Alliance who were involved in the conversation  
overheard by Owen, referred to earlier. An evaluator on Committee #4  
testified she was approached by another evaluator on her committee  
concerning the subject of whether or not she was evaluating female-  
dominated jobs fairly. This witness had the impression that this  
individual wanted her to increase her ratings. The witness testified she  
responded by saying she was there to evaluate fairly and to the best of her  
ability in comparison to all of the jobs. As far as the witness was  
concerned, this was the end of the incident.  
80  
391. The second incident also occurred between two female Alliance  
evaluators. The witness testified she was approached by another evaluator  
who wanted to meet her outside of the committee room to discuss how to  
evaluate jobs. The gist of the meeting was the second evaluator wanted the  
first evaluator to favour female-dominated jobs in a higher bracket in the  
same way as she did. The first evaluator felt this was not an objective  
approach and told the second evaluator that her ratings would continue to  
be objective.  
392. With regard to the first and second incidents respectively, both  
witnesses testified the incident did not have any impact on their manner of  
evaluating. The evidence is clear the individual connected to the first  
incident and who made the request was noted by her committee for her biased  
ratings which the committee had endeavoured unsuccessfully to change.  
Since she refused to change, she was basically ignored by the rest of her  
committee.  
393. The third incident involved a female, Institute evaluator. This  
evaluator testified there was a social gathering in her hotel room  
involving about 10 or 15 evaluators. A conversation occurred later in the  
evening between this evaluator and four other evaluators from the Alliance.  
The Institute evaluator testified she had been advocating an objective  
point of view for doing evaluations and two of the Alliance members became  
very aggressive toward her. Their response was the study was an  
opportunity for women to have something done for them, and nothing was  
going to get done unless women's jobs were evaluated higher and the study  
was their last chance. The Institute evaluator testified "things then got  
a little too personal." Another Alliance witness who testified described  
this incident as a verbal attack on the Institute evaluator.  
394. With regard to this third incident, the Institute evaluator  
assumed the individuals who confronted her in her hotel room were in a  
position of authority vis a vis the Alliance and could call meetings and  
influence other Alliance evaluators. At the time of giving her testimony,  
she admitted she no longer had any basis for this belief and no longer felt  
there existed a common understanding among Alliance evaluators to act  
dishonestly.  
395. Willis recalls he had discussions about problems in the  
evaluation committees with the Mini-JUMI, a sub-committee of the JUMI  
Committee. This sub-committee was formed to handle procedural problems of  
the evaluation committees. Willis testified he discussed with two of its  
members, Gaston Poiré and Elizabeth Millar some of the evaluators he felt  
were creating problems. Willis suggested certain individuals be eliminated  
from the evaluation committees. He testified he did not get the active  
support he expected. As a result, the JUMI Committee reassigned problem  
individuals when the committees expanded from five to nine. According to  
Willis, after the committees expanded, some committees worked well and some  
still had problems but not to the same extent as the initial five  
evaluation committees. He stated "Nothing was worse than the original  
Committee #3." In his estimation, it was at the bottom of the barrel and  
after that it "got better." (Volume 69, p. 8653).  
81  
396. Willis regarded what was happening in the evaluation committees  
as unacceptable. He concluded he needed to conduct further analysis, a  
more in depth analysis of the results, if he was going to be able to  
support the outcome of the study. Although he had not identified gender  
bias in the evaluations by January and February of 1989, he said in Volume  
58, at p. 7229, line 13 to p. 7230, line 3:  
A. I think the one thing that characterized the whole study,  
the equal pay for work of equal value charge, was to evaluate a  
broad range of positions on a gender neutral basis. I think  
everything we did in terms of the process that we set up and the  
evaluation system that was used, the way we tried to work with the  
groups, was all aimed primarily at avoiding any evaluations that  
would suggest traditional relationships, or in any way any bias  
that could be identified as gender bias.  
I feel that at all stages in this study it was paramount that we  
continue vigilance and continually reinforce the need for  
objective, fair, equitable evaluations of any and all kinds of  
positions.  
397. A letter dated May 4, 1989 to the JUMI Committee co-chairs from  
Scott Gruber, a Willis consultant, contained a recommendation that a  
special analysis of evaluation committee results be undertaken. The letter  
reads in part:  
This letter describes our proposal for a special analysis of  
evaluation committee results, which we believe is timely and  
appropriate. The question to be addressed is:  
Have the evaluations of the five evaluation committees  
(#1 through #5) been consistent with the evaluations  
generated by the MEC?  
...  
The methodology for this analysis will be as follows:  
1. A sample would be selected randomly from the evaluation  
result of each committee. The sample size will be 10% of  
the positions evaluated, with a minimum of 25 per committee.  
This latter provision allows for a reasonable examination of  
the efforts of low productivity committees. Using these  
guidelines, the total sample will be approximately 140  
positions.  
2. A Willis consultant, familiar with the MEC evaluations, will  
examine each of the 140 questionnaires and make comparisons  
with appropriate or corresponding MEC questionnaires.  
3. Based on this examination the consultant will then assess  
the soundness of the final, post-sorethumb consensus  
evaluations from the five committees together with their  
82  
selected MEC benchmark questionnaires. Problems and trends  
will be identified, by committee and for the entire group.  
4. Gender domination information will be obtained for positions  
in the sample at this stage. Additional analysis will  
identify whether any committee, or committees, exhibited  
tendencies regarding male or female dominated groups in  
their final results. Other variables besides gender could  
also be included in the analysis at this stage.  
5. A report will be prepared and presented to you, describing  
the process of the research, the analysis, and the findings.  
...  
We view this as a quality assurance study, to examine the  
evaluation results of five committees, comprised of people with a  
diversity of education, experience, and occupation, that could not  
mirror the characteristics of the composition of the MEC...A major  
question to be explored is whether the committees have used the  
MEC benchmark evaluations consistently and properly in the  
comparison process.  
...If the results show that the five committees have performed  
their respective tasks consistently with the MEC, many concerns  
regarding the study will be resolved. On the other hand, if  
problems are identified corrective actions can be taken and the  
continuing efforts of the nine committees will benefit from the  
knowledge gained.  
(Exhibit HR-11B, Tab 32)  
398. A snapshot assessment of the validity of the evaluations was  
requested to be conducted on the 2,000 positions evaluated to date. Willis  
suggested one of his consultants examine 10 per cent of the completed  
evaluations and compare the committees' evaluations to the consultant's  
evaluations. In this way, he would at least satisfy himself there was no  
evidence of a problem or would expose the possibility that a problem might  
exist. His intention at the time was to start with a small study, which  
might expose evidence of discrimination. If a problem was revealed, he  
anticipated conducting a second study, a more in depth analysis, which  
would expose the extent of any problem indicated by the first study. He  
did not indicate to the JUMI Committee directly that he anticipated a two-  
tiered approach.  
399. The proposal of a small study was accepted by the JUMI Committee  
and this analysis commenced in the spring of 1989. The analysis is  
entitled the Special Analysis of Evaluation Committees' Results (the  
"Wisner 222") and was prepared by the Willis consultant, Jay Wisner  
(Exhibit PSAC-4). Wisner examined and re-evaluated 222 of the committee  
evaluations from both the five and nine committees. When the sample of the  
222 positions was made the multiple evaluation committees were still  
evaluating questionnaires and the nine committees had been operating for  
about three months.  
83  
(vii). Re-Training of Multiple Evaluation Committees  
400. This step in the Willis Process involves retraining an evaluation  
committee or an individual evaluator. If the consultant noticed a problem,  
the objective of the retraining session was to bring the committee or  
individual back to the MEC discipline. Retraining could be as informal as  
that which took place during the life of the MEC, when Willis assisted the  
committee in interpretation of the plan, or it could have involved more  
formal sessions which did occur during the work of the five and nine  
committees. After the initial training for the five evaluation committees  
during the week of September 19 - 23, 1988 (Exhibit HR-11B, Tab 27), the  
next formal retraining occurred in March-April, 1989, following the  
expansion of the multiple evaluation committees. Between these sessions,  
less informal training was provided by the consultants as required.  
(viii). Sore-Thumbing  
401. Another procedural safeguard in the Willis Process is a review  
process referred to as sore-thumbing which is synonymous with the term  
interim review. According to Willis the first interim review usually  
occurs after 25 to 30 jobs have been evaluated. These jobs are then listed  
in descending order of points and comparisons are made between the jobs,  
factor by factor. The idea is to look for sore-thumbs, that is to say,  
those evaluations which may not have the same consistency as the other  
evaluations. A final evaluation sore-thumbing session occurs after all the  
jobs have been evaluated. This technique is designed to ensure consistency  
within a committee and reveals whether a committee has varied from its  
discipline. No evaluator was permitted to be involved in a sore-thumb  
exercise if they had not been present during the original evaluation.  
402. The MEC had five sore-thumb sessions which resulted in nominal  
changes. Overall, Willis was satisfied with the results of the MEC sore-  
thumbing. Each of the other evaluation committees also had four or five  
sore-thumb sessions. The evaluation committees sore-thumb exercises had a  
different emphasis than the MEC simply because the concern was more with  
whether the committees were adhering to the MEC discipline. This sore-  
thumbing took the form of reviewing their own evaluations and comparing  
them with the MEC discipline so as to ensure consistency.  
403. If the evaluation committees were not consistent with the MEC  
discipline on a factor by factor basis, the result would be a lack of  
consistency in overall evaluations across the board. The degree of  
liberalness or conservativeness is not always the same from one factor to  
another. The important rule is all jobs must be treated the same way; that  
is, if the committees are going to be liberal in interpersonal skills, then  
they must be liberal with all jobs and if they are conservative in  
knowledge and skills, then they should be consistently conservative for  
this factor. Willis did not express a direct opinion on the effectiveness  
of the multiple evaluation committee sore-thumbing exercises.  
D. RELIABILITY TESTING  
84  
404. As part of the Willis Process, Willis generally recommends  
reliability testing of the evaluations.  
(i). Inter-Rater Reliability Testing  
405. The first type of reliability testing is inter-rater reliability  
(IRR) testing which specifically identifies evaluators who may be  
developing patterns in their ratings inconsistent with the other members of  
their committee. Willis introduced the concept of IRR testing during the  
planning phase of the JUMI Study.  
406. Willis explained IRR testing is advisable for two reasons. For  
personal reasons Willis finds, when counselling evaluators who demonstrate  
bias in their evaluation scores, it is helpful to have documentation of a  
statistical nature to support his observations and opinions. If it were  
otherwise, it would be the consultant's word against the evaluator's.  
Willis finds it helpful to use the IRR testing with the individual and to  
ask the evaluator to look at the pattern in their evaluations. This makes  
it easier to discuss the problem with the evaluator and convince the  
evaluator to change. He testified in certain instances, evaluators would  
refuse to heed the suggestion their evaluations were biased unless  
confronted with statistical documentation.  
407. The second reason Willis introduced the IRR testing is, in a very  
large and important study like the JUMI, he felt the results would be  
subjected to public scrutiny and in that sense, might be criticized for  
failing to use this procedure.  
408. Willis made it clear IRR testing is not necessary in order for  
himself or his consultants to observe and identify outliers. Willis  
testified an experienced consultant will always recognize an outlier but  
this testing provides some written statistical evidence.  
409. Willis' recommendation for IRR testing was not accepted by the  
JUMI Committee in the initial planning stages. He later reintroduced this  
concept when the MEC started its work. There was some debate within the  
JUMI Committee about whether or not the testing should actually be  
undertaken. At the January 13, 1988 meeting of the JUMI Committee (Exhibit  
R-9), the management side agreed in principle there was a need to conduct  
IRR testing in addition to inter-committee reliability testing but  
questioned the current Willis proposal.  
410. The JUMI Committee formed a sub-committee called the Inter-Rater  
Reliability and Methodology Sub-Committee (the "IRR Sub-Committee") which  
was delegated to explore this issue. Its mandate was:  
(a) to determine and make recommendations about the methodology  
and research necessary to test evaluation committee rater  
reliability  
(b) to assess and make recommendations about research  
methodology as it applies to the JUMI Study as a whole  
(Exhibit HR-11A, Tab 26)  
85  
411. Willis testified he was not certain exactly why there was  
resistance from the JUMI Committee to IRR testing but, ultimately, it was  
decided by that committee to engage the consulting firm of Tristat  
Resources to perform the testing. For his part, Willis accepted and agreed  
to this arrangement. The testing was conducted by Dr. Richard Shillington,  
a statistician who testified as an expert at this hearing.  
412. Willis was disappointed the actual IRR testing did not commence  
before the MEC had completed its work. As a result, he was unable to use  
the results in his counselling of the MEC evaluators whom he identified as  
outliers. Willis stated, "but other than that, I was satisfied with the  
testing itself."  
413. Originally, Willis had proposed to undertake the IRR testing at  
least three or four times during the course of the MEC's work, thus  
providing statistical information which he could use as a basis for  
discussion with evaluators who exhibited gender bias. During the MEC's  
work Willis identified two evaluators as outliers and the IRR testing  
confirmed his observation. Willis met with them but since no testing had  
taken place, he had no documentation to support his counselling.  
414. Willis did not have authority to remove those he identified as  
outliers. At the time, Willis felt their biases were subtle, ineffective  
and not harmful to the MEC's work. In both cases, Willis' counselling had  
little or no effect. Willis testified the two outliers tended to cancel  
each other out. One was systematically favouring male jobs and the other  
female jobs. The IRR testing confirmed the identity of the two outliers.  
415. Although, the IRR testing did not assist Willis in his effort to  
counsel the outliers, still, in his opinion, the testing could be used as  
"after the fact evidence of the consistency of the evaluation process."  
Shillington's report on the IRR testing of the MEC evaluations was released  
in July 31, 1988. The report is referred to as the "Tristat Report".  
416. Shillington first became involved in the JUMI Study in the spring  
of 1988. He was approached by a Treasury Board member of the IRR Sub-  
Committee and was asked if he would be interested in the work of the sub-  
committee. Although Shillington was retained by the Employer, he viewed  
the IRR Sub-Committee as his client. In the context of the IRR testing,  
Shillington conducted statistical tests to analyze and interpret inter-  
rater reliability. Its purpose was to determine whether evaluators  
functioned consistently and whether evaluators treated questionnaires for  
male- and female-dominated occupational groups in a consistent fashion.  
417. Shillington understood his role as assisting the IRR Sub-  
Committee to develop a methodology which could be used with the data to  
address their questions and to assist them in making some decisions. The  
IRR Sub-Committee was primarily interested in identifying evaluators who  
seemed to have a gender preference or a gender bias in their questionnaires  
but there were other aspects as well. One of these was a determination of  
whether these evaluators were influential evaluators within their  
committee. (Influential evaluators, in this context means evaluators who  
86  
seemed to be able to shift the consensus score of the committee towards  
their own initial rating.)  
418. Shillington used a combination of statistical tests called t-  
tests, chi square and z-scores, (which are similar to t-tests), to make  
comparisons of the differences between individual evaluator scores and  
committee averages to determine whether there was a pattern between male  
and female questionnaires.  
419. Shillington identified two MEC evaluators demonstrating a  
systematic gender preference in their ratings. One was a male management  
representative who allocated male-dominated positions a higher rating than  
the committee and the other was a female union representative who allocated  
female-dominated positions a higher rating than the committee. These  
evaluators were the same individuals identified by Willis and who  
ultimately became members of the Mini-MEC. The IRR test results did not  
indicate there was a dramatic difference between their scores and the  
committee scores on every single questionnaire but a rather subtle, smaller  
pattern appeared fairly frequently.  
420. The IRR Sub-Committee had also requested Shillington identify  
"influential" evaluators. The sub-committee put the question: "Were there  
particular raters who seemed to be able to do this more often than other  
raters?" To answer this, Shillington looked at questionnaires where the  
consensus score was not near the middle of the ratings in order to  
determine how often particular evaluators were in the situation where they  
had apparently moved the consensus score towards their score. Using this  
methodology, some evaluators were identified as influential.  
421. Shillington was then asked to identify the extent to which  
evaluators who had shown a gender bias were influential. These test  
results indicated that the evaluators who demonstrated a significant level  
of influence over the committee were not the same two evaluators who had  
been identified as having a gender bias and that the most influential  
evaluators displayed no gender preference.  
422. The third and last aspect of the IRR Testing was the  
identification of questionnaires for re-review. This exercise arose from  
the identification of influential evaluators. The IRR Sub-Committee used  
the test results to identify questionnaires where the consensus score  
seemed to be either large or small compared to the initial ratings.  
Approximately 103 questionnaires were referred by the IRR Sub-Committee for  
re-review and characterized as "unusual". Of these questionnaires  
referred, one factor only, i.e., working conditions, was responsible for 43  
of them being identified as "unusual".  
423. Shillington testified regarding the limitations of the IRR  
testing methodology contained in his report of July 31, 1988. Using the  
methodology of comparing evaluator initial scores to committee average  
scores, an assumption had to be made, according to Shillington, that the  
committees are less biased than individual evaluators. In this context,  
the overall average of a committee is then considered more reliable.  
87  
424. Another limitation expressed by Shillington is found in the  
Tristat Report:  
Further, the fact that a rater systematically favoured occupations  
dominated by one gender over another does not imply a gender  
preference. Since the sexes were not equally distributed in the  
population, it may simply have been a result of a bias for or  
against some other factor which was common in occupations  
dominated by one gender. For example, a bias in favour of  
advanced education would have caused a rater to be identified as  
having a preference in favour of males having been more common in  
senior positions. Similarly, individual rater preferences  
associated with technical skills, or physical labour would have  
lead to the appearance of a gender bias.  
(Exhibit HR-39, p. 5)  
425. As to the limitations expressed in the above excerpt found on  
page 5 of the Tristat Report, Shillington testified in Volume 86, at p.  
10653, line 10 to p. 10656, line 11:  
THE WITNESS: The mathematical statistics are not a lot of  
help in that. That is basically an interpretation question.  
Thank you for drawing my attention to that limitation. When I was  
trying to summarize this report, I didn't mention it and it was an  
important limitation.  
The mathematical statistics can be helpful in identifying that an  
individual was treating questionnaires from male-dominated groups  
differently than questionnaires from female-dominated groups. But  
it can't do a lot to help you understand why.  
The limitation that is expressed in the section you pointed out,  
that it might be an indirect relationship to education, or blue-  
collar/white-collar preference, or things like that, is certainly  
a valid consideration. Someone who had a strong preference who  
thought that advanced education was undervalued or though that  
work outside was undervalued or overvalued could possibly appear  
to have a gender preference or a gender bias -- I will use those  
words interchangeably for a moment -- and you would have no way of  
knowing whether or not it was directly related to gender or an  
indirect relationship to something that is correlated with gender.  
If that idea that I discussed of having hypothetical  
questionnaires inserted into the process, questionnaires that were  
basically rigged to appear to have a gender difference even though  
they were identical in all other respects, if you had done that,  
then you could have actually addressed some of this concern.  
As part of that you could have said: If we had someone who had a  
high education preference and that high education preference might  
get reflected in gender bias, can we design three or four  
questionnaires which are all similar in terms of requiring advance  
88  
qualifications, but are different in terms of a male/female  
composition? Then you could look at those questionnaires for  
these individuals and try to identify whether or not when you  
compared two jobs that had an advance qualification requirement  
but were slightly different in gender, whether or not those  
persons treated those jobs differently or not. Then you could try  
to distinguish between whether or not it was truly a gender  
preference that was operating or whether or not it was a high  
education preference.  
The gender preference that is identified by the mathematics could  
be an indirect relationship to some other preference. Basically  
it is an interpretation question. The mathematics can't really  
help you, except, I guess, in judgment. The stronger the  
relationship is, the more striking the difference between the  
treatment of the male and female questionnaires, the more most  
people would, in judgment, conclude that it really was a gender  
preference operating and not something correlated with that.  
You talk about a gender preference as opposed to a gender bias.  
We use the terms "gender preference" and "gender bias" fairly  
interchangeably in the work because of this concern that someone  
might get labelled as having a bias when in fact it is potentially  
related to a preference for education or blue collar, a secondary  
relationship. We caution someone that we should call it a  
preference.  
For the two raters that were identified there, the relationship in  
the data was so strong that I would have a hard time believing  
that it wasn't gender that was driving the distinction between the  
way they treated the questionnaires.  
426. The IRR Sub-Committee produced its own report concerning the IRR  
testing performed by Shillington. Their report was released on July 15,  
1988, about two weeks before the Tristat Report was officially released.  
The Sub-Committee's report of July 15 differs from the Tristat Report of  
July 31, 1988 by referring to 103 "problematic" questionnaires "requiring"  
re-evaluation. On the other hand, the Tristat Report identified these  
questionnaires as "unusual" and suggested that they "should be reviewed"  
not re-evaluated. Shillington testified the IRR Sub-Committee's report  
used stronger language than he used and, in his opinion, the identified  
questionnaires should be "looked at, nothing more."  
427. Shillington attended the JUMI Committee meeting of July 15, 1988,  
when the IRR Sub-Committee report was tabled. The tabled report written by  
a management representative indicated 103 of approximately 500 benchmarks  
had been "influenced" requiring further examination and possibly re-  
evaluation and sore-thumbing.  
428. Shillington testified the use of the word "influenced" in that  
context did not reflect what was agreed upon in the IRR Sub-Committee. He  
indicated his July 31, 1988 report was his best recollection of the  
opinions formed by the IRR Sub-Committee. Shillington testified he was of  
89  
the view the identification of the 103 questionnaires as having been  
influenced was not supported by the research he had done.  
429. Willis also testified about this aspect of the Tristat Report  
(Exhibit HR-39) and the IRR Sub-Committee Report (Exhibit HR-11B, Tab 26B).  
Willis did not agree with the section of the IRR Sub-Committee report which  
dealt with "influential raters" and he stated the sub-committee appeared to  
overlook the fact that Willis considers it necessary and desirable that  
evaluation members be permitted to make adjustments in their evaluations at  
consensus time based on factual information. The fact there is a shift  
from the majority evaluators towards a minority evaluator in a number of  
cases is not by itself evidence of a problem. Willis testified in Volume  
38, at p. 4803, lines 7 to 20:  
There were, if I recall, a couple of raters on the evaluation  
committee who did have an influence not because they were biased  
but because they were bright analytical people that others  
respected. Usually when they had a statement about an evaluation  
or were asked to provide information relative to the facts of that  
rating, they generally had very sound reasons, and these reasons  
were respected. So there were occasions where other members of  
the Master Evaluation Committee did respond to them.  
I don't consider that a limitation. I think that was one of the  
steps that was built into the process.  
430. Eventually, Willis did follow through with a re-evaluation of the  
103 questionnaires identified by the sub-committee. Willis and his  
consultants had been asked by the IRR Sub-Committee to question the  
assumption the 103 questionnaires presented a problem. One of Willis'  
consultants, Jay Wisner, did the re-evaluations and prepared a report for  
the JUMI Committee. His analysis is contained in a report to the JUMI  
Committee entitled Analysis and Conclusions Concerning the Master  
Evaluation Committee's Work and dated July 1988 (Exhibit R-22). Willis  
testified he reviewed each of Wisner's evaluations and made some minor  
changes in the report. Willis testified it was "our" conclusion that a  
systematic review of further evaluations was not warranted, nor was a  
reconvening of the MEC necessary. Willis felt the evaluations were  
appropriate, and he was very comfortable with the overall results given the  
reasonable random disparity among the evaluations. Moreover, he felt at  
this point the JUMI Study should proceed as planned.  
431. Willis' conclusions are contained in a report dated July, 1988 to  
the JUMI Committee. The conclusions concerning the re-examination of the  
IRR analysis is reproduced as follows:  
After careful and intensive consideration of the questions raised  
by the IRR committee's analysis of the MEC's evaluations, we find  
that the principal recommendation of that report, that the MEC  
should be convened to re-examine a large number of evaluations, is  
not supported.  
90  
We have re-examined the evaluations which the IRR analysis  
indicated were "unusual." We did not find evidence that any  
raters exercised "undue influence" over the group consensus  
evaluation. In our opinion, the great majority of the evaluations  
listed by the IRR committee are the product of accurate and  
consistent application of the evaluation plan by the MEC, and  
should not be changed.  
For those few positions where we recommend re-evaluation, we found  
no pattern of influence by a minority resulting in evaluations  
with which we disagree; in some cases, we recommend movement  
further from the middle of the initial individual evaluations. We  
believe that the eventual re-examination by the MEC of the ten  
evaluations where we suggest some revision need not delay the  
convening of the five sub-committees. We recommend that these  
reviews be combined with reviews of benchmarks sought by one of  
the five sub-committees.  
We have no significant concerns regarding the MEC's understanding  
and application of the evaluation plan. The MEC's pattern of  
application of the evaluation plan to positions (their  
"discipline") differs in some respects from the pattern which the  
consultants would use. However, given the manner in which the MEC  
membership was determined, their discipline constitutes a more  
accurate reflection of the values of positions as commonly  
understood within the Government of Canada than the consultant  
could determine from an outside point of view. This kind of  
adaptation of the plan to the climate and conditions of an  
organization by an evaluation committee is expected and proper.  
We would be concerned if there were evidence of inconsistent  
application of the evaluation factors within or across position  
families. We did not encounter any evidence of such  
inconsistency. We believe that the framework of benchmark  
evaluations and the selection of principal benchmarks by the MEC  
provides a sound basis for the evaluation of the remaining  
positions by the five sub-committees. We have found no  
significant cause for concern and support the progression of the  
study as scheduled.  
(Exhibit R-22, p. 8)  
432. Of the 103 MEC questionnaires re-evaluated by Wisner and reviewed  
by Willis, ten were evaluated differently by the consultants and of these  
ten, only three were significantly different. Of the three, one was more  
significant than the others. It was Willis' judgment if the MEC was going  
to be reconvened, it would only be to review that one questionnaire which  
was more significant than the others.  
433. By September of 1988, the management side of the JUMI Committee  
were still not satisfied with the manner in which the Willis Plan had been  
applied by the MEC and continued to express concerns about the MEC's work.  
At the September 15, 1988, JUMI meeting, the management side indicated a  
further analysis should be carried out on problematic benchmarks referred  
91  
to in Willis' report of July, 1988 on MEC evaluations. The management side  
identified 100 benchmarks with problems and forwarded 46 of these 100 to  
Willis with a list of questions, observations and anomalies. In response  
to management's request, Willis & Associates conducted an independent  
review of these questionnaires and attempted to do a "fresh" evaluation,  
without regard to the MEC's prior evaluations but consistent with the  
general evaluation discipline established by the MEC.  
434. A report of this work was submitted to the JUMI Committee in  
September, 1988 (Exhibit R-28). This analysis was done by the Willis  
consultant, Wisner. Willis & Associates agreed with management on a number  
of their challenges but, in the end, did not identify the existence of a  
gender pattern. As to the discipline adopted by Wisner in his independent  
evaluation of the 46 questionnaires, Willis said in Volume 56, at p. 6936,  
lines 3 to 11:  
I think he was familiar enough with the Master Evaluation  
Committee's evaluations at this point. We had had discussions as  
to where they were conservative, where they were a little bit  
liberal, so that he was able to track, but fairly independently.  
I would say, though, that while it is not critical, it would  
appear that he was just a hair more liberal on the average than  
the Master Evaluation Committee.  
435. In the final analysis, Willis and Wisner identified one  
evaluation out of the 46 which they considered was misunderstood by the  
MEC. Willis' report provided explicit answers to each of the questions  
raised by the management side. In their judgment, the additional analysis  
supported their conclusion the MEC had done a fully satisfactory job in  
applying the evaluation system to a broad range of positions. The report  
states:  
We believe that a sound basis has been provided for evaluation of  
the remaining 3900 positions and that, at this stage, there is no  
logical reason to expect less than a high quality, defensible  
result from the study.  
(Exhibit R-28, p. 4 of the addendum)  
436. In this report, Willis also provides general observations as to  
how differences can occur between the MEC and the consultants. The report  
states they can occur in three different ways and anyone of these three  
ways could be caused by systematic bias on the part of evaluators. He  
identifies the three different ways as follows:  
3. Differences in evaluations of the same positions between the  
MEC and the consultants could occur in three different ways:  
Misreading of the questionnaire. This could result if  
parts of the questionnaire were overlooked or not  
given appropriate consideration.  
Different interpretations of the facts given. The  
consultants may draw interpretations from a more  
92  
extensive experience in evaluating other jobs having  
similar functional responsibilities. On the other  
hand, evaluation committee members may have a better  
understanding than the consultants of the culture  
within the governmental organization resulting in  
slightly different job perspectives.  
Misuse or misunderstanding of the evaluation system.  
This is expected only during the learning stages of  
the evaluation effort.  
Any one of these three ways could be caused by systematic  
bias on the part of evaluators.  
(Exhibit R-28, pp. 1-2 of the addendum)  
437. By late November, 1988, the management side of the JUMI Committee  
were still dissatisfied with the Wisner/Willis analysis of the MEC  
evaluations. Ouimet forwarded a four page letter to Willis detailing her  
concerns (Exhibit HR-19). Willis responded to her letter on December 5,  
1988, in a six page letter in which he attempted to deal with those  
concerns. Part of that letter is reproduced as he attempts to persuade  
Ouimet that some variance between evaluators will occur and the reasons for  
this variance. He says:  
Evaluation Tolerance  
As I indicated in the Addendum to the Responses to the Management  
Side of the Joint Union/Management Initiative on Equal Pay for  
Work of Equal Value, it is expected that some variance in  
interpretation of position information provided to evaluators will  
occur. A tolerance of plus or minus 10 percent in random  
evaluation variance is acceptable between two teams evaluating the  
same positions, given complete and accurate factual information.  
As a practical matter, analysis and assessment of evaluation  
reliability requires making judgments considering a number of  
variable factors, such as:  
-Completeness, factual content and definitiveness of the  
information used. Lower quality of information normally  
results in wider random bias.  
-The nature of the job. Is it unusual or complex, or one the  
evaluators should be reasonably capable of understanding  
(e.g. research scientist or cleaner)? To evaluate a  
position properly, the evaluators must be able to understand  
its content.  
-How far removed is it in organizational level from the  
experience or knowledge of the evaluators? This is similar  
to the previous factor in that evaluators may have trouble  
conceptualizing a job that is several organizational levels  
above their own experience.  
93  
-Do evaluation variances depict a pattern? Does there appear  
to be a systematic bias, or is it a random variance?  
Systematic bias is much more significant than variance that  
is simply difference in interpretation or understanding of  
the job's requirements.  
-If the comparison evaluations are by a consultant, could the  
deviation result from difference in understanding of the  
culture or value systems within the organization, resulting  
in a slightly different job perspective?  
In essence a value judgment must be made as to the extent of  
allowable variance in scores and whether or not a problem exists.  
An assessment of this nature does not lend itself to "precise and  
quantitative terms".  
Of the fourteen MEC evaluations assessed as differing by more than  
10 percent compared to consultant evaluations it was my considered  
judgment that one, MEC  
#428 Head Display Preparation Section, was not properly understood  
by MEC and should be submitted to that committee for questions to  
be asked, and re-evaluated.  
(Exhibit R-35, pp. 3-4)  
438. In the final analysis, the management side of the MEC did not  
whole-heartedly support the MEC benchmark evaluations. Although they were  
prepared to continue the study, their intention was to conduct further  
reviews of the benchmark evaluations, this further review was not addressed  
by the Employer in the presentation of their case.  
439. The Tribunal heard limited testimony from evaluators on the  
subject of MEC challenges. Pauline Latour, one the evaluators who  
testified before us on the issue of committee challenges to benchmarks,  
viewed the MEC challenges as a small issue. Only one particular benchmark  
caused her committee, (Committee #5), difficulty. It was this committee's  
view that the position was rated higher by the MEC than what it ought to  
have been. (Volume 171, pp. 21641-43).  
(ii).IRR Testing in the Multiple Evaluation Committees  
440. Shillington also conducted IRR testing on the remaining five and  
nine evaluation committees using the same methodology he used to identify  
outliers on the MEC. Willis was provided with two written reports on the  
IRR testing of the five and nine evaluation committees. The first  
disclosure made to Willis occurred in May of 1988 and was primarily based  
on the original five evaluation committees. The second disclosure occurred  
in July of 1989 and was based on the expanded nine evaluation committees.  
441. The IRR Sub-Committee reported to the JUMI Committee, at its  
meeting of August 25, 1989, that an analysis of individual ratings to the  
end of July, 1989, revealed 11 outliers, six female evaluators from the  
staff side, three female evaluators from the management side and two male  
evaluators from the management side. Seven of these outliers expressed an  
94  
apparent preference for male positions and four expressed an apparent  
preference for female positions.  
442. The sub-committee further reported seven of the outliers had been  
previously identified in the earlier disclosure. However, Willis, in his  
testimony, was able to recall eight outliers who had been previously  
identified. The identification of the previously identified outliers was  
reported in the second disclosure by the IRR Sub-Committee in order to  
confirm Willis' opinion as to the ineffectiveness of his  
intervention/counselling following the first disclosure.  
443. The JUMI Committee decided the names of the outliers would only  
be revealed to Willis and the Chief of the EPSS. It was Willis'  
understanding that the JUMI Committee's decision to deal with the question  
of outliers in this confidential manner, was done to protect the  
individuals concerned. The JUMI Committee had made an earlier decision  
they were not going to remove any evaluators from the committees and it  
would not be productive to release their names at this point.  
444. Shillington prepared exhibits identifying what he referred to as  
"an underlying attitudinal dimension of these outliers". He was unable to  
explain why these differences occurred or what they were. Exhibits HR-117  
and HR-133 indicate the male and female preferences crossed  
union/management lines and female/male lines. With respect to the cross  
over of male/female lines, some female evaluators displayed a male  
preference; however, no male evaluators displayed a female preference.  
(iii). Inter-Committee Reliability Testing  
445. Willis testified inter-committee reliability (ICR) testing is  
designed to determine whether evaluations from a series of committees are  
related. As explained by Willis, this testing looks at consistency between  
committees and identifies where committees need to be retrained. Willis  
testified ICR testing is not designed to identify any form of bias. In the  
JUMI Study, it was intended instead, as a means for assessing whether or  
not the evaluation committees were adapting successfully to the discipline  
of the MEC.  
(iv). ICR Testing in the Multiple Evaluation Committees  
446. The process generally involved taking a series of questionnaires  
and submitting the questionnaire to each committee. Each of the evaluation  
committees performed an evaluation on the same questionnaire and the  
consultant then attempted to identify the extent to which different  
committees rated the same job similarly or rated the job differently.  
According to Willis the first ICR testing began early in 1989 and included  
26 tests altogether. The ICR testing continued until July of 1989.  
447. The JUMI Committee established an ICR testing sub-committee (the  
"ICR Sub-Committee") to establish policy and oversee procedures for the  
testing. The ICR Sub-Committee consisted of three management  
representatives, two staff representatives, Willis, one of his consultants  
and two Commission representatives. The purpose of the IRR Sub-Committee  
95  
as stated in the IRR Sub-Committee report of March 3, 1989 is listed as  
follows:  
-examine the results of the tests administered to the  
evaluation committees in relation to the baseline provided  
by the consultants,  
-examine the baseline score provided by the consultants,  
-determine the significant differences in the consensus ratings of the  
committees in relation to the benchmarks and the baseline,  
-formulate if needed, recommendations for training, re-  
training by the consultant and/or other courses of action  
for JUMI considerations, and  
-identify procedural/process problems and potential for  
improvement including the revisions to the formulation of  
rationales.  
448. The IRR Sub-Committee requested the Commission conduct the actual  
testing. The Commission determined the timing of the tests, distributed  
the questionnaires and explained the process to the committees. The JUMI  
Committee asked Willis to evaluate the test questionnaires and provide a  
"baseline score" for each of the test jobs.  
449. The baseline score was the independent evaluation of the test  
questionnaires by the consultants. In each case, Willis had two  
consultants review the questionnaire and arrive at their own independent  
evaluation, which was then compared with the test evaluations done by the  
five or nine committees. The purpose of the comparison between the  
baseline score and the committee score was to identify any deviation  
between, first, the individual committees and, second, the consensus of the  
multiple evaluation committees compared to the consultants' evaluations,  
thus identifying areas where the multiple evaluation committees needed to  
be retrained because of difficulty in interpreting the evaluation factors.  
450. Willis used rationales in the ICR testing to analyze differences  
between the baseline scores and the committee scores. His use of the  
committee rationales in the ICR testing was for a different purpose than  
the use of rationales generally in the evaluation committees. In the ICR  
testing Willis explained why rationales were to be used in this exercise,  
as distinct from his reasons for not wanting them to be used in evaluations  
by the committees, in which case he wanted the members to focus on the  
questionnaire itself.  
451. The consultant baseline score was compared with each committee's  
consensus score and also to the overall consensus of the five, and later  
nine, evaluation committees.  
452. Willis had minor input into the procedure that was adopted by the  
ICR Sub-Committee and he was opposed to their approach. In other studies,  
Willis always provided a list of questionnaires to his clients, and then  
96  
introduced the questionnaire into the committee's portfolio of  
questionnaires in such a way the committees did not know which  
questionnaires were part of the test. In the case of the JUMI Study, time  
was set aside and the questionnaires were distributed to the evaluators who  
then became aware of the testing. The Commission randomly selected the  
questionnaires and approached the Willis consultants about an hour before  
the test to give them an option as to which questionnaires should be used  
for testing. The consultants were not given the opportunity of selecting  
questionnaires that were more complete. As a result, Willis testified  
there was frustration on the part of the evaluators, as well as varying  
levels of conscientiousness in completing the tests.  
453. The procedure for ICR testing as conducted by the Commission was  
very strict at first. It was announced there was going to be a test. The  
Commission was on hand to oversee the test and had an observer in each  
room. The questionnaires were distributed and the evaluators were informed  
they could not leave the room during the actual period of the test.  
454. In other studies, if the committees needed more information on  
the test questionnaires, Willis arranged to have individuals waiting at  
telephones to answer any questions. No time was permitted for this in the  
actual ICR testing. Consequently, each committee was allowed to make its  
own assumptions and fill in any gaps in the information. Committees were  
required to write down their assumptions but the problem was that each  
committee made different assumptions. Willis testified because committees  
were making different assumptions, variance occurred in the evaluations.  
For these reasons Willis was not comfortable with the results of the ICR  
testing.  
455. Willis found the committees did not take the ICR tests as  
seriously as they did their actual evaluations. He observed a considerable  
amount of resentment on the part of the evaluators and this increased over  
time. Moreover, the committees had to stop their regular evaluations to go  
through the testing exercise. The committees were being pressured by  
Willis to keep moving but, at times, were subjected to two tests a week.  
As Willis stated in Volume 58, at p. 7166, lines 13 - 16:  
They resented it every step of the way and some of them quite  
frequently took the testing with somewhat less than a serious  
approach.  
456. On February 6, 1989, Willis produced a report on the first nine  
tests conducted between November 7, 1988 and January 5, 1989. This report  
examined the variation among the original five evaluation committees and  
essentially concluded they had learned the Willis system and were  
evaluating positions in line with the MEC discipline when "they feel  
comfortable with that discipline."  
457. The ICR Sub-Committee report of March 3, 1989 was based on an  
analysis of the first 11 tests conducted. The report noted: (i) that the  
consultants needed to go through a revision of the initial training program  
with the committees and to address problems that were identified; (ii)  
there was some concern with respect to evidence of cross-family job  
comparisons and the job evaluation process ought to be amended to provide  
97  
for these comparisons; and (iii) the rationales needed more attention and a  
revised job evaluation process ought to be developed.  
458. Willis was asked to describe the amount of variance between the  
consultant scores and the consensus of the five committees on the first 11  
tests. He responded in Volume 58, at p. 7227, lines 1 to 9, as follows:  
A. Considering the various handicaps and expressions of  
frustration and concern that we heard, I think that they did very  
well. I felt very positive, particularly after discussing with  
each committee what their differences were, why they had selected  
the assumptions they had. While I did agree that the additional  
re-training was desirable, I felt very positive about how well  
they were doing.  
459. The ICR Sub-Committee attached to their report a description of  
an Improved Evaluation Process which it recommended for adoption by the  
committees. The revised process provided for comparisons of benchmarks  
outside of the job family. It asked evaluators to first reference  
benchmarks in relation to their factor ratings before independent ratings  
were passed to the committee chair for posting.  
460. Willis testified the committees tried the improved job evaluation  
process and found it was not really practical and actually required more  
time than the original process. The evaluation committees resisted the  
change so it was finally dropped.  
461. Willis also produced a report on the ICR testing. He described  
the sub-committee's report as similar to his own report for the most part,  
but did not completely concur with all of the sub-committee's findings.  
462. The essential purpose of the ICR testing was to identify whether  
or not committees understood and were applying the Willis Plan in  
accordance with the MEC discipline. This testing also gave the consultants  
an opportunity to examine whether or not the committee's reasons for their  
ratings were suggestive of gender bias.  
463. Willis did a careful review, factor by factor, of what each  
committee did, why they did it, and how the consensus was reached for each  
factor and for the total. When one committee's score differed from the  
other committees' scores, Willis explored the reasons for the difference.  
If those reasons suggested, in any way, they were influenced by a  
particular gender or by a particular kind of job, it was information that  
would be available to the consultant for follow up action.  
464. Within this testing framework, Willis was asked whether he found  
any evidence of gender preference in the work of the committees during the  
first series of ICR tests. His response is given in Volume 58, at p. 7227,  
line 10 to p. 7228, line 19:  
Q. At this point -- and it looks like we are in the late winter  
or the early spring of 1989, when only half of the tests had been  
performed -- did you have any evidence from these tests, or  
98  
otherwise, that there might be a problem of gender preference  
exhibited by the evaluators?  
A. Unrelated to the ICR testing, in the early part of the year  
I began to express some concerns, not related directly to the  
actual evaluations themselves, but I had some concerns regarding  
some circumstantial things that had been happening. I became  
increasingly uncomfortable with how the committees were working  
with the confrontations between the staff and the management side,  
and some of the circumstantial things that I had observed  
happening.  
In stressing with my consultants working with the groups, and  
doing our own analysis of how committees were actually evaluating,  
and how occupational groups were coming out among and between the  
committees, I could identify nothing specific that would suggest  
there was a gender bias that was developing.  
Nevertheless, I had strong mixed emotions because I knew some  
things that were happening, some attitudes that were apparent that  
were giving me a great deal of concern.  
So at this point in the study I had some problems with my own  
level of comfort. I discussed these problems individually and  
with the members of the mini JUMI and collectively with them as a  
group. I felt that I was going to have to take some sort of an  
analysis, a more in depth analysis of results if I was going to be  
able to support the outcome of the study.  
465. Willis did not think it possible to identify gender bias simply  
by looking at the results of the ICR testing. If there is gender bias,  
Willis finds evaluators usually tend to talk about their conclusions or  
opinions rather than about the facts of the questionnaire. He indicated to  
his consultants to watch very carefully for this sort of behaviour but he  
does not think a consultant can decide whether or not there is bias just by  
looking at a score or on a job by job basis. He testified on an individual  
job evaluation, a consultant has to look at the reasons why the committees  
selected what they did, what was stated in the rationale and then quiz the  
evaluators personally as to what were the reasons for the differences. In  
Willis' opinion these tests did not provide any conclusive evidence of  
gender bias and the information obtained from these tests should be  
discounted because the committees did not take the testing seriously.  
466. In late May or early June of 1989, Willis recommended to the ICR  
Sub-Committee the testing be discontinued because it was becoming very  
clear to him the evaluation committees were becoming more and more  
frustrated with this procedure. He also concluded the tests were beyond  
any point of usefulness. Willis understood it was at the insistence of the  
Treasury Board representative on the sub-committee, that the testing  
continue. The sub-committee did not accept his recommendation and  
continued with the testing into July of 1989. It was Willis' opinion the  
reaction he observed by the evaluation committees to these tests might have  
an affect on the reliability of the results. (Volume 59, p. 7291).  
99  
467. Although the testing continued, Willis did not perform any  
additional formal analysis on the results. He reviewed the remaining tests  
submitted to him by the sub-committee and continued to meet individually  
with committees.  
468. A draft final report of the 26 ICR tests was prepared by a  
management representative, Michel Papineau, on the ICR Sub-Committee. This  
report is dated October 26, 1989. Willis had no input into this draft.  
The conclusion reads as follows:  
The ICR test results tend to support the gender preferences found  
in the IRR report and in the Consultant's study of a sample of 220  
questionnaires already evaluated by the committees. The  
differences are such that there is little doubt as to whether or  
not these are due to systematic or random biases. The proportion  
of these discrepancies are significant enough to exceed the degree  
of tolerance expressed by the consultant. Thus, it is strongly  
recommended that further investigations be conducted prior to  
reaching any conclusion based on the evaluation results.  
(Exhibit HR-90, p. 4)  
469. Papineau concluded there was evidence of gender bias in the  
evaluations but it was Willis' judgment, the analysis of the ICR testing  
should be discounted for two reasons. He stated the 26 evaluations were  
too small a number from which to draw any firm conclusions, and secondly,  
the committees were not taking these tests as seriously as the actual  
evaluations and were rushing through them as quickly as they could without  
much discussion. It is Willis' opinion the tests were "not valid for any  
particular use after about the first 10 or 12 tests." (Volume 59, p.  
7297).  
470. As to the assertion in the report that further studies should be  
undertaken, Willis testified he had already decided on the basis of the  
Wisner 222 a further study needed to be undertaken and this draft ICR  
report did not add to his conviction.  
471. According to Willis, the Wisner 222 was not related to the ICR  
testing at all. He testified he would have asked for the Wisner 222  
whether or not the JUMI Committee had agreed to conduct the ICR testing.  
Willis saw it as a totally separate issue.  
472. Elizabeth Millar, a union member of the ICR Sub-Committee,  
employed by the Alliance as Head of Classification and Equal Pay Section,  
testified she was under the impression the ICR testing was being taken very  
seriously by the committees. She testified one of the problems of the ICR  
Sub-Committee was in getting timely feedback to the committees. Millar  
said she did not think the ICR Sub-Committee functioned in an effective  
manner after May, 1989. She stated the management representatives on the  
ICR Sub-Committee appeared to adopt a different agenda from the rest of the  
committee. These representatives wanted an increase rather than a decrease  
in the schedule of testing to the end of the evaluation process.  
100  
473. By memorandum dated November 10, 1989, Millar responded to the  
draft report prepared by Papineau. Essentially, she found the draft  
unacceptable to the Alliance as it did not reflect the discussions and  
deliberations which took place within the ICR Sub-Committee. The analysis  
contained in the report did not reflect the committee's findings and the  
conclusion contained in the report had never been discussed by the ICR Sub-  
Committee. In his testimony, Willis agreed with Millar that the concluding  
statement contained in the draft report was perhaps overstated. It implied  
the comparison left little doubt as to the existence of gender bias.  
(Volume 59, p. 7304).  
474. In Papineau's memorandum, which is attached to the minutes, he  
indicates his intention to table the report at the next JUMI Committee  
meeting which was held on October 31, 1989 (Exhibit R-44). However, the  
final report was not tabled at this time, since the report had only been  
distributed one week prior.  
475. Millar testified in the spring of 1989, she observed a change in  
attitude by the management representative on the ICR Sub-Committee toward  
the consultants. She said the Employer's attitude before May of 1989 was  
more accepting of and in tune with the consultant's view so that the sub-  
committee was able to reach agreement in problem areas. It was agreed the  
evaluation committees had trouble understanding the Willis Plan and needed  
further help in training. She testified after May, 1989, the Employer  
representatives became very critical of the consultants and the ICR Sub-  
Committee meetings became extremely difficult. She recalled one particular  
meeting in which Scott Gruber, a Willis consultant, reported on one of the  
tests that had been done. Gruber had met with all committees to discuss  
the results and found overall the work was going well. The Treasury Board  
representatives took issue with Gruber's report. According to Millar, one  
Employer representative commented to the effect the committee ought to have  
expected something better from the consultants.  
476. Millar referred to another incident in Volume 185, at p. 23775,  
line 17 to p. 23776, line 11 as follows:  
At one meeting in which Mr. Owen was the consultant, two Treasury  
Board representatives turned up with reports that we hadn't known  
were in the preparation which had calculated the difference  
between each committee score and the base line score and had used  
these calculations to indicate whether or not a problem had  
existed.  
Mr. Owen, who I have described as unfailingly polite and a kind  
individual, as well as very competent, became extremely agitated.  
He threw his pencil across the desk and accused both the Treasury  
Board representatives of neither understanding job evaluation or  
the Willis Plan. Mr. Willis reported to me later on that he had  
worked with Fred Owen a long time and he had never seen him so  
angry. Needless to say, these reports, the uncommissioned  
reports, were never accepted by the subcommittee and were never  
tendered further.  
101  
Mr. Owen was not questioned about this incident.  
477. Willis testified the ICR testing fell short of his expectations.  
He said for future ICR testing, he would arrange to do it covertly so the  
evaluators would not know they were being tested. He did comment  
concerning the ICR testing results as follows in Volume 59 at p. 7352,  
lines 2 - 7:  
But the bottom line is, apparently, in spite of lack of management  
support, in spite of some variances in the quality of information  
and in spite of some attitudinal problems, the result was within  
satisfactory limits.  
478. Willis was also asked whether there was an indication in the  
first 11 ICR tests that the committees valued higher along gender lines.  
He testified it was his assessment there did not appear to be a gender  
preference. Any differences in interpretation between the consultants and  
the committees on the "problem-solving" factor in the Willis Plan was due  
more to a lack of clear understanding of how to use the evaluation system  
than anything else.  
(v). Wisner 222 Re-Evaluations  
479. Willis testified it was clear to him there were agendas both on  
the staff side and on the management side affecting the way evaluators  
worked together. He observed attitude problems on the part of some of the  
evaluators. As the study proceeded, Willis became concerned that he could  
not defend the results without doing further analysis. Willis' discomfort  
did not result from what he was able to observe, in terms of actual gender  
bias in the evaluations. It centred primarily around what he viewed as an  
attitude problem on the part of the evaluators. Because this was a large  
and important study, he wanted to be sure there was no subtle bias creeping  
into the process.  
480. Willis made a recommendation to the JUMI Committee to conduct a  
"snapshot" assessment on the validity of the evaluations, with the  
intention that if his preliminary analysis revealed the possibility of  
problems, he would subsequently do a more in depth analysis. When Willis  
made his proposal to the JUMI Committee in the spring of 1989, he did not  
advise that he anticipated adopting a two-tiered approach if a problem was  
encountered in the first small study undertaken. His recommendation to the  
JUMI Committee was made about the time the first 11 ICR tests had been  
completed. At this point, the committees had evaluated approximately 2000  
questionnaires, and Willis wanted to examine 10 per cent of these completed  
evaluations using one of his consultants to independently evaluate a  
sample.  
481. Willis testified the only way he knows of determining whether  
gender bias is present in an evaluator's evaluation is to look for patterns  
of preference for one gender or the other. In his opinion, the only  
possible way of identifying gender bias would have been to have an  
impartial third party, such as one of his consultants, re-evaluate selected  
questionnaires, then to compare the results between the committees and the  
102  
consultants. Willis usually solicits the assistance of a statistician to  
perform this comparison. Willis refers to differences between the  
committees and the consultants as disparities.  
482. During the course of the hearing, there were questions about  
whether consultants should or should not be considered the baseline for  
comparison. Willis pointed out the JUMI Committee had agreed to use the  
consultants as the baseline in the ICR studies, and that agreement was  
expressed in writing by the JUMI Committee.  
483. Willis believes his consultants who were involved in the JUMI  
Study were unbiased and testified to this in Volume 208, at p. 26934, lines  
10 to 16, as follows:  
We understand the system. I think it would not be appropriate to  
say that all consultants are necessarily unbiased. However, our  
experience, our background, our intent, our own philosophy, has  
always been not to favour one side or the other, but to walk in  
the middle road, if you will.  
484. Willis testified the disparities form the basis for identifying  
whether or not there is a gender based pattern. In this context, he said  
"bias" simply means if there is a pattern of different treatment for male-  
dominated jobs versus female-dominated jobs, then the different treatment  
would result in some degree of gender bias.  
485. The positions included in the Wisner 222 were selected randomly  
from a list of all the evaluations provided by the EPSS. The sample taken  
included at least 10 per cent of the total number of positions evaluated by  
the nine evaluation committees at the time of the Wisner 222. The sample  
included the full range of evaluation levels and the variety of types of  
work seen by the nine committees.  
486. Wisner did not testify at this hearing. His study was explained  
by Willis who described the method used by Wisner in his analysis. First,  
Wisner read the position questionnaire and any reviewer notes. He then  
determined whether a similar position was included among benchmark  
positions evaluated by the MEC. When there was a similar position, Wisner  
reviewed the benchmark questionnaire to confirm his impression and adopted  
the MEC benchmark evaluation as the consultant evaluation. When there was  
no similar set of duties among the MEC benchmarks, Wisner proceeded to do  
an independent evaluation of the position, supported by reference to  
appropriate benchmarks. Many of the positions included in the sample were  
found to require this step. After determining an evaluation, Wisner  
reviewed the committee evaluation for that position. He paid particular  
attention to the committee's use of benchmarks and the facts they used to  
support their evaluation. Wisner then adjusted his evaluation as  
appropriate in view of the committee's rationale and benchmark references.  
487. When Wisner found differences between his final evaluation and  
the committee's evaluation, he wrote a brief rationale in support of his  
position.  
103  
488. Wisner then proceeded to do a special analysis on the results.  
This analysis was initiated in order to assess the quality of the position  
evaluations by the nine evaluation committees. As stated in his report of  
July, 1989, the considerations he included in determining the "quality" of  
the evaluations were:  
1. Proper use of the Willis evaluation system in accordance with the  
Guide to Position Measurement and the training and technical  
advisories issued by the consultant.  
2. Consistency of the evaluations by the nine committees with  
the benchmark evaluations and evaluation discipline  
established by the Master Evaluation Committee.  
3. Absence of any systematic bias in the evaluations by the nine  
committees.  
(Exhibit PSAC-4, p. 1)  
489. Wisner's analysis also included statistical testing. According  
to Willis, Wisner is a statistician. His findings on the first  
consideration regarding the proper use of the Willis evaluation system, was  
that there was no evidence found, with two possible exceptions, of any  
consistent misinterpretation or misapplication of the evaluation factors  
and dimensions. As to the two exceptions, he noted because the number of  
positions sampled were so small that it was impossible to draw any firm  
conclusion about these.  
490. Regarding the overall consistency of evaluations by the nine  
committees with the MEC evaluation discipline, he found that the committee  
and the consultant had an exact match in 70 of the 222 positions, and that  
an additional 34 positions showed differences of +/- 2.5 per cent, so that  
almost 47 per cent of the positions in the sample had approximately the  
same overall evaluation. He concluded these differences indicated fair  
consistency of evaluation between the nine committees and the MEC  
benchmarks. Since Wisner found more than half of the positions differed by  
more than 2.5 per cent, he recommended that further analysis of the  
differences was warranted.  
491. As to the third consideration, in analyzing the differences  
between the consultant evaluations and the committee evaluations, Wisner  
found for the female-dominated positions, 35 were under-evaluated compared  
to the consultant, 40 were over-evaluated and 43 had no  
difference; and for the male-dominated positions, there were 55 under-  
evaluated, 22 over-evaluated and 27 with no difference. His report states  
at p. 5:  
This indicates that female dominated positions were over evaluated  
somewhat more often than the total sample, and male dominated  
positions were under evaluated somewhat more often than the total  
sample.  
104  
492. And his conclusions at p. 8 reads:  
The findings of the analysis described above suggest that the  
consistency of the evaluations by the nine committees with the MEC  
benchmarks is less than would be desirable, and that there may be  
some gender-related bias in the evaluation results. It is the  
consultant's opinion that these findings indicate that a wider  
review of the evaluations by the nine committees would be proper.  
Such a review would serve to confirm or refute the apparent  
problems found in the sample of positions examined in this study.  
[emphasis added]  
493. Wisner, however, advises caution in dealing with his report. The  
statistical analysis between gender dominance and evaluation differences  
between the committee and himself are based on a comparatively small number  
of positions and his findings "between the two variables does not mean that  
there have been deliberate or unconscious sex bias in the evaluations." He  
goes on to say there are a number of other possible explanations for the  
differences. He refers, for example, to the tendency in the positions in  
the male-dominated classifications to have more complex duties and  
responsibilities than the majority of positions in the female-dominated  
classifications. He suggests the observed pattern of evaluation  
differences could occur if the committees tended to under evaluate more  
complex positions in relation to the MEC discipline as viewed by the  
consultant.  
494. Willis' covering letter of July 17, 1989, addressed to the co-  
chairs of the JUMI Committee, which accompanied Wisner's report, states in  
the third paragraph:  
Our findings indicate the existence of some systematic divergence  
from MEC evaluations. Statistically, however, the size of the  
sample reviewed, 222 evaluations, was insufficient to permit  
specific conclusions as to the degree of the problem.  
(Exhibit PSAC-4)  
495. Willis was asked in Volume 58 to clarify exactly what it was he  
was trying to state in this letter. He responded in Volume 58, at p. 7249,  
lines 1 - 5, as follows:  
A. The results of our analysis appeared to suggest that there  
is some pattern of deviation from the Master Evaluation  
Committee's evaluations. It could be interpreted as a gender  
bias.  
496. At the completion of the Wisner 222 there were about 1,000  
evaluations remaining. Since the nine evaluation committees had just  
started their work, Willis felt it was critical that a more extensive  
analysis be done as soon as possible to correct a potential problem. He  
recommended to the JUMI Committee an additional analysis be undertaken  
without delay.  
105  
497. In his testimony, Willis referred to the following table  
contained on p. 4 of Wisner's Report to explain why he wanted a further  
study and his concern about possible gender bias:  
Table 1  
Per Cent Differences  
Group <-15% -14.99 -9.99 -4.99 -2.49 0 0.01 2.50 5.00 10.00 >15  
to to to to  
to to to to to  
-10.0 -5.00 -2.50 -0.01 2.49 4.99 9.99 14.99  
Female  
Male  
6
8
7
5
9 43 9 10 4  
9 8  
4 4  
8
15 13  
9 10 27 6 4 4  
Total 14  
23 20 14 19 70 15 14 8 13 12  
498. Willis testified the above table breaks down the total group of  
the 222 evaluations. In the first line which reads "Female", highlighted  
under 0, the 43 indicates Wisner and the MEC agreed on 43 evaluations. To  
the right of 43 is the number of MEC evaluations above Wisner and these  
total 40, and to the left of 43 is the number of MEC evaluations below  
Wisner and these total 35. On the "Male" side, highlighted under 0, the  
number in the chart indicates Wisner and the MEC agreed on 27 evaluations.  
The right hand columns indicate that the MEC rated 22 evaluations higher  
than Wisner, and the left hand columns indicate that the MEC rated 55  
evaluations lower than Wisner. This suggested to Willis the beginning of a  
pattern because there are approximately twice as many (55) male-dominated  
evaluations rated lower than the number of evaluations which agreed with  
the consultant (27) and the number (22) which were over-evaluated compared  
to the high number of male-dominated jobs within the total male-dominated  
occupational groups which were evaluated lower by the committees than the  
consultant's evaluation.  
499. This aspect of the Wisner 222 concerned Willis. Another concern  
with the report was that it showed one female-dominated occupational group  
(ST) in which the numbers indicated a comparatively large degree of over-  
evaluation. This to Willis was some evidence, however slight, of gender  
bias.  
500. Willis stated the Wisner 222 was very limited. It was not  
intended for a basis on which to make a determinative judgment as to  
whether or not true gender bias existed and to what extent. He testified  
it contained enough evidence to justify a further look before he could feel  
comfortable in defending the results.  
501. Following the release of the Wisner 222, the unions sent a letter  
to Durber, expressing their concerns. This letter is dated September 27,  
1989. The letter, which is written by Christine Manseau, the union co-  
chair, indicates the unions did not agree the Wisner 222 Report supported  
106  
the contention there was gender bias in the evaluations. Paragraph 2 of  
the letter reads as follows:  
Our analysis shows that, on average, there is remarkably little  
difference between the evaluation scores of the consultant and the  
committees. Of the 118 female positions in the sample, the  
average consultant score is 182 and the average committee score is  
181. Of the 104 male positions, the average consultant score is  
273 and the average committee score is 263, a difference of 3.7%.  
We do not believe these differences are significant and we note  
that they are well within the + or - 5% accuracy level for average  
scores that the parties agreed to in dealing with the issue of  
sample reduction and overall sample size for the JUMI study.  
(Exhibit PSAC-5)  
502. According to Kathryn Brookfield of the Institute, after having  
received the Wisner 222, the unions expressed concerns as to how the data  
in the report matched with the conclusions. Brookfield testified the union  
looked at the distribution of the evaluations from female-dominated  
occupations and did not see evidence of an imbalance in the evaluations and  
yet the report came to that conclusion. Brookfield testified the unions  
wanted to sort out in the Wisner 222 why the data and the conclusions did  
not agree. Until that question was resolved, the unions did not have  
sufficient confidence to ask Willis to go ahead and repeat the exercise.  
Brookfield further stated the unions wanted to meet with the Treasury Board  
representatives, go through the report, discuss the differences and see if  
they could come to some understanding about them.  
503. There was considerable debate within the JUMI Committee as to  
whether Willis should undertake further re-evaluations. Willis met  
privately with members of the Mini-JUMI as well as with the full JUMI  
Committee to request a more in depth analysis. He never wavered from his  
position that a further analysis was needed, although, the extreme  
positions taken by some evaluators seemed to settle down during the course  
of the summer of 1989 as the committees began to work with new, fresh, and  
in some cases, reorganized committees. He said in Volume 58, at p. 7285,  
lines 4 to 8:  
A. Call it a gut feel, I just felt that the importance, the  
size of the study was such that I wanted a better feeling of  
confidence that I could, in fact, defend the results.  
504. At that time, Walt Saveland, an employee of the Commission, did a  
"technical examination" of the Wisner 222 analysis. Saveland was a staff  
person with the Commission in Policy and Research. Durber had asked him to  
assist in interpreting the Wisner 222 and to pinpoint the problem of bias.  
The Saveland Report, Exhibit PSAC-6 entitled "Technical Observations and  
Suggestions on Willis & Associates "Special Analysis of Working committee  
Results" provided a list of male jobs which the Commission ought to give  
priority attention because the committees differed from the consultant by  
10 per cent or more. In the end, the list contained 25 jobs,  
107  
notwithstanding 27 had been identified. Wisner and the committees agreed  
with an additional 2 jobs which had somehow been missed.  
505. The Saveland Report appears to pinpoint the source of apparent  
gender bias to the male-dominated questionnaires. The balance of the  
Saveland report, from page 6 onward, uses a number of statistical  
measurements which, according to the statistical expert, Sunter, are  
"absolute nonsense". (Volume 105, pp. 12696-97).  
506. Paragraph 2 of this report states as follows:  
The most important evidence of apparent gender bias is found among  
male-dominated jobs. A pivotal role seems to be played by 27 jobs  
in which Committee evaluations where [sic] between 5 and 15% lower  
than Consultant evaluations. (Evidence of apparent gender bias  
was also found among the clerical portion of female-dominated  
jobs.)  
(Exhibit PSAC-6)  
507. Saveland's report states "it is this kind of asymmetry in the  
male-dominated line which indicates apparent gender bias." Saveland  
explored the effects of asymmetry by expanding the standard for relative  
agreement from a +/-2.5 per cent to a standard of +/-5 per cent. If the  
expanded standard is imposed for the category of relative agreement with  
respect to the female-dominated line, it results in a perfectly symmetrical  
distribution with a sizable majority of jobs, showing 76 relative  
agreements. For the male-dominated jobs, 56 are now counted in relative  
agreement but apparent under-evaluations out number over-evaluations by  
exactly 3:1 or 36 to 12.  
508. The report contains, in the end, technical suggestions. One  
suggestion was a re-examination of the specific jobs in dispute, to be done  
by some existing or newly formed review committee, whose members are  
experienced in job evaluation. The report states this review committee  
should consider all jobs in a "suspect" category and "this means all  
existing and additional male-dominated jobs (and possibly all clerical  
jobs)." The report notes an examination of only selected jobs playing a  
pivotal role in gender bias runs the risk of losing objectivity. The  
report makes suggestions about what approach ought to be used when a review  
committee accepts or rejects a specific committee evaluation. The report  
also suggests, while the review committee is doing its work, the consultant  
could be re-evaluating the same jobs. Wisner would be the preferred  
consultant for job re-evaluation because according to the report, he offers  
the best assurance of continuity. The report states at p. 24:  
If others do the work for Willis and Associates, then quality-  
control procedures should be put in place to make sure that new  
Consultants would have done the previous work in exactly the same  
way.  
(Exhibit PSAC-6, p. 24)  
509. At the October 31, 1989 meeting of the JUMI Committee, Saveland  
was in attendance. He presented his analysis of the Wisner 222. (His  
108  
report was released subsequent to this meeting and bears the date November  
10, 1989.) Brookfield testified that Saveland, in his presentation, had  
concurred with the unions' position which was there was no evidence in the  
report of systematic over-evaluation of female positions. Saveland also  
told the committee, most of the differences between Wisner and the  
evaluation committees were found with 27 male positions.  
510. Durber also attended the October 31, 1989 JUMI Committee meeting.  
The minutes (Exhibit R-44) state at p. 9 that Durber requested the JUMI  
Committee to indicate how it would deal with the apparent gender bias  
referred to in the Wisner 222. Durber offered the Commission's assistance  
to the JUMI Committee. At that time, the management side of the JUMI  
Committee was willing to do further reviews of the Willis results. The  
staff side position, communicated by Manseau, the union co-chair, was that  
prior to this meeting, the staff side were not in a position to proceed  
further with the Willis study. Manseau promised to reply to the management  
side by November 10, 1989, about whether the staff side would proceed and  
who would represent the staff side in the joint process.  
511. Following the JUMI Committee meeting of October 31, 1989, Durber  
sent a letter dated November 10, 1989, to Manseau. In his letter, Durber  
notes the Commission's concern is with apparent gender bias and the  
Commission had drawn no further conclusion at that time, but expected the  
parties to resolve the question of bias in a way that would satisfy the  
requirements of the Act. He referred to the fact that Saveland, in his  
written report makes reference to reviewing the 27 male jobs, and offers a  
caution that the separate exercise should be done with care to ensure  
objectivity.  
512. In an attempt to understand the Wisner 222 Report, the unions  
approached their members who had been on the MEC to obtain information  
which might assist in explaining the differences between evaluations done  
by Wisner and those done by the committees. Brookfield testified she  
received information from the CATCA union. A member of CATCA, Rick Smith,  
was provided with the 27 male questionnaires and assigned by the union side  
to analyze these questionnaires. The information he provided was reported  
and filed as Exhibit PIPSC-129. The author of the report did not testify  
at this hearing. His conclusions are contained on page 2 and 3 of the  
report which reads as follows:  
In summary, after careful review of the committee results and  
consultant results I find that the consultant has been  
consistently higher in ratings for several reasons. Some are  
outlined above and others are individually pointed out in his  
rationales. The % differences which I have indicated between  
Committee and Consultant range from insignificant (in my opinion),  
5.4%, to 17.4% which is just at the edge of an acceptable error  
tolerance. I can find no evidence of bias nor can I say that I  
could discount the possibility. The committees and the consultant  
have provided complete, sound ratings with logical rationales to  
support them. They are slightly different in all cases but this  
is to be expected. My own analysis of the positions was often  
109  
slightly different than both or leaning toward the committee or  
the consultant rating.  
The process is not an exacting science and the Willis plan does  
not provide for a wrong or right evaluation of a job. A consensus  
is the best one can expect and I have no reason not to accept the  
ratings of the committees as they stand.  
513. According to Brookfield, the unions were anxious to meet with the  
Treasury Board representatives with all the information the unions had  
gathered, including Smith's report, supra, to determine if the differences  
between the consultants and the committees could be explained.  
514. Ouimet wrote to Manseau, by letter dated November 27, 1989,  
indicating the management side required a response to its request that  
Willis & Associates be instructed to do further work. The letter stated  
management required a response by December 1, 1989 or they would "proceed  
unilaterally" (Exhibit HR-17, Document 22).  
515. The next meeting of the JUMI Committee was scheduled for December  
13, 1989. Brookfield testified there was no opportunity for the union side  
to discuss with management side the report received from Rick Smith of  
CATCA. It appears from the letter of November 27, 1989, from Ouimet that  
the management side had embarked upon a review of the 27 questionnaires  
identified by Saveland of the Commission. The second paragraph of the  
letter reads:  
As requested, we are prepared to exchange comments on the 27  
questionnaires identified in Mr. Willis' analysis on December 8;  
the modalities of a sub-committee will be discussed at the  
December 13 meeting. Its work however, is independent of the  
research required by Willis and Associates; this work must proceed  
immediately and would be concurrent with that of the committee if  
it is established. Even if the committee finds an explanation for  
the 27 questionnaires in question, we still require more  
evaluations to make bias estimates for the various employment  
groups in the study. At this late date, delays are a luxury we  
can ill afford. We require a response from you concerning Willis  
and Associates further work by December 1, or we will proceed  
unilaterally.  
(Exhibit HR-17, #22)  
516. The union side concluded from its reading of this letter, even if  
a joint process to find explanations for the 27 male questionnaires was  
undertaken, Treasury Board intended to proceed with Willis' recommendation  
for a further study with or without the consent of the unions. This became  
a reality when the union co-chair received Ouimet's letter of December 11,  
1995 which reads in part:  
We remain firm in the belief that the uncertainty surrounding  
these evaluations mandates further study. We accept the  
recommendation by Willis and Associates to undertake further  
analysis (supported, it would seem, by the CHRC). We have agreed  
110  
with your proposal to examine the 27 evaluations cited by the CHRC  
as relevant to `apparent bias', but you have not responded to our  
proposal to proceed with further evaluations at the same time. To  
quote Mr. Durber `...we are anxious that the matter of gender bias  
be dealt with quickly'. Your responses to our letters leave us no  
choice but to conclude that you do not want to resolve this issue  
in the near future. We have decided therefore, to comply with the  
recommendations expressed by both the Consultant and the CHRC and  
to proceed as of December 11 at which time the process by which  
Willis and Associates may undertake further analysis will  
commence. We will keep you informed of the progress of the study.  
You may have our assurance as well, that the same methodology  
unanimously agreed to by JUMI in the first phase will be carefully  
followed. [emphasis added]  
(Exhibit HR-17, #7)  
517. Willis testified the decision of the Employer to proceed  
unilaterally and authorize him to do additional re-evaluations was  
announced to the staff side without consulting him in advance. When the  
December 13, 1989 JUMI Committee meeting convened, a statement was read by  
Manseau. At the request of Manseau, the statement was appended to the  
minutes after which the unions withdrew and no further business was  
conducted. The statement made by Manseau is reproduced in full.  
STATEMENT BY CHRISTINE MANSEAU  
CO-CHAIR OF JUMI  
ON BEHALF OF THE PUBLIC SERVICE UNIONS  
For some time the unions represented at JUMI have not felt equal  
partners in this joint undertaking. We had wanted to discuss  
jointly the conclusions of the CHRC on the findings of Willis and  
Associates in an informal setting so that perhaps JUMI could  
arrive at a joint agreement on how to deal with their  
recommendations. We had suggested the establishment of a sub-  
committee to review jointly our conclusions on the consultant's  
evaluations reported in the Willis Special Analysis prior to  
proceeding with further analysis - we were denied that. We had  
asked further analysis not proceed unilaterally for we felt it  
would endanger the joint character of the Study and undermine  
JUMI's credibility - we were denied that.  
In view of Ms. Ouimet's letter of December 11 announcing that  
Treasury Board has decided to proceed unilaterally with further  
analysis by Willis & Associates, we feel this Study is no longer  
joint. We therefore are not willing to participate in any  
discussions on any outstanding issue at this time.  
111  
We request this statement be recorded verbatim in the minutes and  
that the correspondence exchanged since the last meeting of JUMI  
be attached to the minutes.  
(Exhibit HR-11B, Tab 34)  
518. From the August 25, 1989 JUMI Committee meeting when Willis first  
recommended a further study to the December 13, 1989 JUMI Committee meeting  
when the union side temporarily withdrew from the study there was  
considerable tension between the parties. This tension manifested itself  
even earlier during the work of the IRR and ICR Sub-Committees but it was  
after the release of the Wisner 222 that the relationship between the  
management and union sides began to rapidly deteriorate.  
519. From August 25, 1989 onward, the union side wanted to move  
forward with the JUMI Study to conclude the evaluation phase, to determine  
the methodology for compensation and wage comparisons and if a wage  
disparity was identified to continue with bilateral and multilateral  
meetings as required. On the other hand, from the August meeting, the  
management side felt strongly that an additional study was required and the  
matter of apparent gender bias could not be dismissed without this study.  
520. As the parties became more entrenched in their positions  
throughout the fall of 1989 the tension escalated. Between November 7,  
1989 and December 11, 1989, there were no less than 21 letters introduced  
into evidence written between the JUMI co-chairs with as many as three  
letters written by one side on the same day. As Brookfield said in Volume  
169, at p. 21296, line 24 to p. 21297, line 9:  
Q. Had you ever had that kind of flurry of paper before in the  
years that you had been involved in dealing with each other?  
A. No. I think HR-17, over, I think we are talking, a six-week  
period, every issue imaginable about several -- four or five,  
issues are going on with correspondence, some it [sic]  
simultaneous, and I think that speaks rather directly to the fact  
that people were having a lot of difficulty communicating with  
each other, that there was this flurry of correspondence.  
521. Since the unions refused to go along with a further analysis,  
Ouimet advised him the Employer intended to commission Willis & Associates  
to do the work on behalf of the Treasury Board. On December 19, 1989,  
Willis wrote to Ouimet declining to conduct a further analysis  
"unilaterally" on behalf of the Treasury Board. Willis testified he  
understood from the very beginning he was answerable only to the JUMI  
Committee. Willis felt this was inappropriate. Willis had hoped the JUMI  
Committee would reconvene. He was asked by a Treasury Board  
representative, Gaston Poiré, under what circumstances he would conduct the  
analysis. Willis responded that he would conduct a study of a larger  
sample if the Commission requested it, since "the Human Rights Commission  
was an objective third party and it was their bill." (Volume 59, p. 7311).  
112  
522. In Willis' letter to Ouimet of December 18, 1989, he mentions for  
the first time what the information from a second study should provide.  
The relevant portion of the letter reads:  
It is my belief that an expansion of this analysis is necessary to  
determine the extent of any actual bias that may exist in the  
evaluations. This information should afford a basis for any  
adjustment in evaluation results that may be required to assure a  
fair and objective study. [emphasis added]  
(Exhibit HR-92)  
523. On January 23, 1990, the Alliance announced its permanent  
withdrawal from the initiative and three days later, on January 26, 1990,  
the President of the Treasury Board announced the implementation of equal  
pay for work of equal value adjustments, with the assurance the  
government's action did not prejudice any conclusions and findings of the  
Commission relating to the resolution of the issues still to be  
investigated by the Commission.  
524. Brookfield testified she noticed a change in the attitude of the  
Employer toward the end of the study. She made reference to the fact the  
discussions between the unions and management was initially about apparent  
gender bias. Following the Wisner 222 report however, the Treasury Board  
no longer discussed apparent gender bias and had changed their approach by  
suggesting they would adjust for actual gender bias.  
525. The unions were very concerned about this change in the Treasury  
Board's approach after the Wisner 222. Brookfield testified there was  
correspondence about adjusting scores and referred to a letter written  
January 26, 1990, after the break down of the study (Exhibit HR-41), from  
the President of Treasury Board to Max Yalden, Chief Commissioner of the  
Commission, explaining the equalization payments were calculated on the  
basis of adjustments for gender bias made by Treasury Board.  
526. In the letter, the President of the Treasury Board, Robert de  
Cotret, wrote to Yalden with details of the government's decision to  
implement service wide measures based on the evaluation results of the  
Joint Initiative. The letter does not refer to the extent of apparent  
gender bias identified in the Wisner 222, but instead alludes to "the  
extent of gender bias." An excerpt from de Cotret's letter reads as  
follows:  
It is my strong belief that an unprecedented study of this  
magnitude must be fair, statistically sound, and credible, given  
its significant ramifications. This further analysis was needed  
to determine the extent of gender bias and adjust the Initiative's  
evaluation results accordingly. I appreciate, therefore, the  
Commission's agreement to conduct this analysis to determine the  
extent of gender bias. [emphasis added]  
(Exhibit HR-41)  
113  
527. The above excerpt seems to confirm the union's belief of the  
changing emphasis by the management side from a concern for apparent gender  
bias raised in the Wisner 222 to an issue of adjusting results to account  
for actual gender bias. Brookfield testified it appeared to her the  
Treasury Board had made a decision there was definitive evidence of gender  
bias in the Wisner 222 and all that needed to be done was to adjust the  
scores for the bias.  
528. In early 1990, Willis was contacted by the Commission. This  
contact was made after the Alliance had announced their withdrawal from the  
JUMI Study. Willis was informed by Durber that the Commission had  
determined an additional analysis was necessary based on re-evaluations to  
be undertaken by Willis & Associates. The Commission itself would,  
however, analyze the results of the Willis re-evaluations.  
529. In Willis' opinion, the only alternative to a further study,  
would be to use some other evaluation system which would have, in effect,  
reconstructed much of the study. This exercise would have been extremely  
costly. Willis also expressed his opinion as to what ought to be done with  
the study results. He suggested the Tribunal has three alternatives: (i)  
to implement the study as it is; (ii) to adjust the results; or (iii) to  
trash the study. Willis maintained he would rule out trashing the study,  
and would adjust the results for any possible gender bias.  
E. THE COMMISSION  
530. When the Commission responded, in April of 1985, to the  
invitation of the President of the Treasury Board to support the JUMI, the  
Commission agreed to put on hold the investigation of s. 11 complaints  
filed prior to the announcement of the JUMI, as well as complaints filed  
subsequently to the announcement of the JUMI. The Commission indicated it  
would await the results of the study before taking action. This also  
depended upon the circumstances at the time of the filing of the  
complaints.  
531. The Commission's response to the invitation was contained in a  
letter dated April 17, 1985 (Exhibit HR-18, Tab 18), from Gordon  
Fairweather to the Honourable Mr. de Cotret. That letter indicates that if  
the Commission satisfied itself the methodology employed in carrying out  
the study was consistent with s. 11 of the Act, then it would issue a  
special guideline advising that the study was consistent with the Act. It  
would also issue guidelines for the implementation of corrective action in  
accord with s. 11.  
532. The Commission participated in the JUMI Process only as an  
observer. Representatives of the Commission attended the JUMI Committee  
meetings and when asked by members of the JUMI Committee provided  
clarification and advice relative to the JUMI Study. Participation by the  
Commission was mainly of a technical nature, and involved such tasks as  
selecting samples in the ICR testing and dealing with problems of  
interpretation relevant to the Act and Guidelines. Commission employees  
also attended as observers during the operation of the five and nine  
evaluation committees.  
114  
533. The Commission did not intend to be a party to settlements  
reached by the parties to the JUMI. It did, however, intend to examine any  
agreement reached to determine whether it met the requirements of s. 11 of  
the Act.  
534. In early May, 1989, Durber joined the Commission as Chief of  
Equal Pay. This title was later changed to Director of Pay Equity. On  
June 12, 1989, Durber met the JUMI Committee co-chairs and expressed his  
concern that if the parties were unable to determine what should be done  
with the Wisner 222, the initiative could easily founder. Durber testified  
the co-chairs agreed at this meeting that all the parties, including the  
Commission, ought to have free access to the job evaluation results from  
the JUMI Study.  
535. Durber advised the co-chairs at that time the question for the  
Commission was how to interpret the job evaluations that had been done. He  
emphasized if there was gender bias the Commission would have to be  
involved because it needed to know whether the evaluations were acceptable  
as evidence, should the Commission pursue the complaints filed by the  
Alliance.  
536. No formal investigation of the complaints was done by the  
Commission until March 6, 1990. On that date, at the request of the  
Commission, the JUMI participants met with the Commission to review  
outstanding issues. By that time the JUMI had permanently broken down.  
537. The next significant date is March 6, 1990, when the Commission  
met with the JUMI participants. The Commission wanted to reduce the number  
of issues arising from the JUMI should the complaints be referred to a  
Tribunal. The Commission's press release, following the meeting, specified  
the Commission must be satisfied that all the requirements of the Act had  
been met. It also specified Treasury Board had given the Commission the  
calculations used to predict their adjustments which the Commission would  
examine in its investigation.  
(i) Commission Investigation  
538. When the JUMI Study ended in the beginning of 1990, it became  
evident to the Commission its role as observer in the JUMI Study was also  
at an end and it was time to begin pursuing the normal complaint process.  
The question of apparent gender bias raised by the Wisner 222 was a part of  
the investigation into the complaints. The approach by the Commission was  
to treat the question of apparent gender bias as the first focus of its  
investigation into whether wage discrimination persisted in the Federal  
Public Service. The government had made equalization payments in January,  
1990, and the Alliance maintained those payments had not closed the wage  
gap, leaving wage discrimination still in place.  
539. Gender bias was a consideration when the President of the  
Treasury Board announced the wage equalization payments in January of 1990.  
The Treasury Board President had not indicated the extent to which the  
equalization payments accounted for the bias, but did state in his  
announcement the Commission would be examining the matter.  
115  
540. The Commission's approach to the investigation as described in  
Exhibit HR-55, "Notes for Presentation on Alleged Gender Bias in Job  
Evaluation of the Joint Initiative" was conservative in terms of the amount  
of evidence it sought for in addressing the question of apparent gender  
bias.  
541. Durber testified the Commission investigated all five complaints  
from both the Alliance and the Institute simultaneously. It was probably  
the speediest Commission investigation performed prior to that time because  
the Commission had before it all the job evaluation data gathered from the  
JUMI Study. The Commission had no need, therefore, to conduct its own job  
evaluations.  
542. There were four areas for investigation by the Commission. The  
first involved the investigation of gender bias. The Commission had to  
decide whether they could rely on the job assessment information from the  
JUMI Study. The second involved looking at any wage gaps that might  
appear. The Commission had to develop a methodology to calculate wage  
gaps. The third area for investigation involved considering and valuing  
benefits. Finally, the fourth area, (not yet complete), involved parts of  
two complaints which bore on limitations on employment opportunities as a  
result of compensation practices.  
543. An overview of the chronology begins with the Commission's  
investigation starting in March, 1990, arriving at tentative conclusions on  
gender bias in July of that year. In the same month, the Commission  
briefed the parties on its findings regarding "apparent gender bias" in the  
committee evaluations. In August, 1990, the Commission produced a draft  
report on the wage gap and the parties were briefed on the Commission's  
interim findings regarding its conclusions.  
544. There was also a meeting in August with the parties on the status  
of the Commission's investigation pertaining to the valuation of benefits.  
In September, 1990, the Treasury Board submitted a written response to the  
Commission's August draft report. The final investigation report went to  
the Commissioners in late September, 1990. The following October, the  
Commission made its decision with respect to the wage gap on the five  
complaints and requested the President of the Human Rights Tribunal to  
appoint a tribunal.  
545. The Commission's investigation into the s. 11 complaints is  
contained in Exhibit HR-250, entitled, Investigator's Report: Wage  
Adjustment in the Federal Public Service - Possible Gender Bias in Job  
Evaluation Data. Durber released the Investigator's Report on this subject  
to the parties in September, 1990. The report contains the Commission's  
findings and conclusions relating to the question of apparent gender bias  
in the committee evaluations. The Commission's conclusions are found in  
para. 51 of that report which states as follows:  
116  
51. Conclusions  
Commission staff have found that the Willis checks reveal some  
differences between consultants' evaluations and those performed  
by the Joint Initiative. Investigators do not find that these  
differences reveal patterns that can be correlated consistently  
with gender or occupation in the Joint Initiative evaluations.  
The extent of possible "undervaluation" of male jobs is less than  
3%, but can likely be accounted for by differing understandings of  
work described, as well as the meaning of bench marks and the  
application of the Willis plan. It is not apparently the result  
of bias linked to sex. Moreover, the 3% is not evenly distributed  
across occupations. Certainly, the fact that two sets of  
independent groups (Willis consultants and the Quality Analysis  
Committee) could produce results varying by a margin of 2% to 3%  
indicates that such differences may be expected and be due to  
reasons other than bias.  
(Exhibit HR-250, Part I)  
546. Durber's oral evidence corroborates and confirms the contents of  
this report and focuses on the steps pursued by the Commission in  
investigating the possibility of gender bias in the committee evaluations.  
It is noted from Durber's evidence, further testing procedures were  
undertaken by the Commission subsequent to the commencement of this  
hearing. Both the investigative procedures conducted as part of the  
Commission's initial investigation and the subsequent testings conducted at  
the request of the Commission will be reviewed by the Tribunal.  
547. The Investigator's Report indicates there was no clear evidence  
of gender bias in the evaluation results. The report contains a  
recommendation of formulae for equalizing pay between males and females  
which pay ought not to be adjusted for possible gender bias. It proposed  
that the Commission accept its findings vis a vis the related complaints  
under s. 11.  
548. A draft of the Investigator's Report (Exhibit HR-250), was  
provided to the parties for comment in the summer of 1990. The Treasury  
Board responded by letter and written report dated August 17, 1990, from  
Ouimet in her capacity as Assistant Secretary, Classification, Human  
Resources Information and Pay Division, addressed to Durber. Ouimet  
testified during the voir dire hearing of the Tribunal but was not called  
when the hearing reconvened. The last paragraph of her letter concludes  
that the Commission's investigation was deficient and did not demonstrate a  
clear case there was no gender bias. On the other hand, she expresses the  
view as to the unlikelihood of any party being able to demonstrate the  
existence of gender bias in the results. The paragraph is reproduced as  
follows:  
On the other hand, it is unlikely that anyone could demonstrate  
gender bias does exist given that the Willis firm has not provided  
a baseline by which evaluation results may be compared from study  
to study. It is not possible to measure adequately the application  
117  
of the plan so as to conclude definitively that bias does or does  
not exist. Do not conclude however, that we should not examine  
very closely all the rating inconsistencies raised by the various  
committees of the Joint Initiative, your own research, and ours.  
It is now vital that we leave aside the `why' behind rating  
anomalies and focus instead on how they may be corrected. We  
would be prepared to contribute to the design of an appropriate  
study to resolve rating inconsistencies. [emphasis added]  
(Exhibit HR-46, p. 2)  
549. In the detailed comments attached to her letter, Ouimet asks the  
rhetorical question, "Is it possible to distinguish between evaluation  
biases along sex lines and the overall application of the Willis Plan in a  
manner that would assign an appropriate weight to each?" The report states  
the answer must indicate the degree to which the question of gender bias is  
purely a statistical or substantive question. In the latter case,  
according to Ouimet, statistics may contribute little.  
550. In the Treasury Board's written response to the Commission's  
final report on Possible Gender Bias in the Evaluation Data, which is  
contained in a letter from Ouimet to Durber dated September 7, 1990, the  
Treasury Board is clearly of the opinion a statistical study is not the  
best approach when determining possible gender bias. The following  
excerpts from her comments at p. 1 of the report are helpful in  
understanding the Employer's response:  
In essence our disagreement can be summarized as follows: the  
Investigator embarked on a highly restricted look at gender bias  
through statistical research that was inappropriately conducted.  
Even if it were appropriate, the restricted nature of the overall  
study is such that nothing can be said about the issue of gender  
bias since the important issues implied by it were never examined.  
The Commission quotes at length the position of the Public Service  
Alliance of Canada (PSAC) that many of the issues are non-  
statistical. We are in agreement with this position and have  
argued that statistical analysis in this area is useful only  
insofar as it may raise the possibility of a problem that would  
require a non-statistical approach to answer. Notwithstanding  
this objection, we are of the opinion that any statistical study,  
no matter how adequate, is not the best approach in this matter.  
There is so much judgement involved in the scoring of any job  
questionnaire that to determine gender bias statistically is  
difficult at best because it requires that a weight be assigned to  
every factor of judgement/bias/inconsistency, what have you, to  
the score itself. Since you have decided to restrict your study  
to a statistical analysis of Willis evaluation data, we feel  
compelled nevertheless to critique your study on statistical  
grounds.  
The long critique we sent to you was an attempt to demonstrate,  
through statistical arguments, that the approach taken and the  
118  
empirical findings do not, under any circumstance, permit you to  
conclude with certainty there is no gender bias in the Joint  
Initiative evaluations. The most you can conclude is that there  
is not enough evidence to decide one way or the other. You have  
not addressed any of our concerns systematically other than  
through an editorial comment that our 'statistical criticisms make  
rather too fine a point'.  
(Exhibit HR-250, Tab J, pp. 1-2)  
551. The Treasury Board apparently had used an alternative line of  
enquiry into the question of possible gender bias described in Ouimet's  
detailed comments of September 7, 1990. Using a different approach, she  
writes the Treasury Board came to the same conclusion as Sunter, but in  
their view, the conclusion is misleading since it only represents half the  
story and says nothing about how often questionnaires are under- or over-  
evaluated. The Treasury Board's overall conclusion is found on page 10  
which states:  
Using criteria provided by the Willis firm, it is not possible to  
conclude that while there may be statistically significant  
differences in patterns of evaluations, they are not substantively  
important. As shown above, the issue of level of difference has  
ignored the frequency dimension and the differences in patterns  
are indeed significant. We attempted to take into account mis-  
evaluations in order to see whether there was a gender pattern to  
them and it would appear there is.  
We have analyzed the same data and using the same measure as the  
Sunter analysis, and yet reached different conclusions. We are  
convinced that the data show serious problems with the evaluations  
and that these problems look very much like gender bias; in any  
event, further analysis is required. We remain firm in our belief  
that the scores need to be adjusted, but we are prepared to  
discuss a different adjustment strategy from the one originally  
used. Any adjustment is going to be difficult to estimate given  
the significant differences between the two Willis studies.  
[emphasis added].  
(Exhibit HR-250, Tab J)  
552. We will now describe and examine specific factual information  
found in the Commission's investigation provided by Durber. On March 8,  
1990, the Commission received from the Treasury Board, a document (Exhibit  
HR-185) which explained the methodology used by the Employer in making its  
equalization payments. According to Durber, the Treasury Board paper,  
issued in March, 1990, estimated an average bias of +3 per cent for  
evaluations of positions from female-dominated occupations and of -4 per  
cent for evaluations of positions from male-dominated occupations.  
Accordingly, the wage equalization payments had therefore incorporated a  
corresponding across-the-board adjustment when calculating equal pay for  
work of equal value. The adjustments resulted in payments to public  
service employees in female-dominated occupational groups which were lower  
than they would have been without those adjustments for possible gender  
bias.  
119  
553. The revision of scores is explained in the methodology paper as  
follows:  
A score revision factor based on simple statistical techniques was  
estimated by the Treasury Board. All questionnaires except those  
rated by the Master Evaluation Committee and the Willis consultant  
were revised: ratings for female questionnaires were reduced by  
approximately 3% overall and male questionnaires were raised by  
roughly 4% overall. All policy analyses presented in the  
remainder of this report use the revised evaluation scores as  
described.  
(Exhibit HR-185, pp. 6-7)  
554. In attempting to understand Exhibit HR-185, which contains a good  
deal of detailed statistical jargon and information, Durber sent the report  
to seven independent individuals for their comments. These individuals  
included pay equity experts, Weiner, Dr. Morley Gunderson, Lois Haignere,  
Willis & Associates, Roberta Rob, Judith Davidson-Palmer, and a  
statistician, Sunter. Durber viewed these individuals as potential  
participants in a workshop the Commission had scheduled for April, 1990, to  
review the Treasury Board's methodology (Exhibit HR-185) and to advise him  
how he ought to deal with it.  
555. The Commission had difficulty in obtaining data from the Treasury  
Board during its investigation of the complaints. Durber testified the  
actual data the Treasury Board used to arrive at its conclusions in HR-185  
were never produced. The Commission had to project salaries and create  
their own salary data bases because of the length of time it took the  
Treasury Board to provide salary information. A complete set of the salary  
data was finally provided to the Commission during these hearings.  
556. On April 9, 1990, the Commission held its workshop and some of  
the individuals who are listed above attended, namely, Sunter, Roberta Rob,  
Judith Davidson-Palmer, and a representative of Willis & Associates. The  
others, who did not attend the meeting, provided written comments. Durber  
wanted to be "as well informed as possible by some of the better minds in  
Canada on the issue of pay equity." (Volume 147, p. 18197). After fairly  
extensive consultation with these individuals, Durber consolidated the  
advice he received and formulated an investigation plan and hypothesis.  
557. Following the meeting of April 9, 1990, Durber consolidated the  
advice resulting from his discussions with these individuals in order to  
clarify the issues needed to be addressed by the Commission. A decision  
was made to challenge the Treasury Board methodology by detailed  
questioning.  
558. The Commission was also interested in knowing whether the factors  
in the Willis Plan, were different from the results for male-dominated  
occupational groups as opposed to the results from female-dominated  
occupational groups. Durber contracted the Wyatt Company, an international  
company of management consultants which enjoys a considerable job  
evaluation practice. The Wyatt firm was asked to use the database for all  
120  
the JUMI Study job evaluations. The Wyatt firm looked at the data to  
determine whether the relationship between the factors was the same  
regardless of the gender of the group and regardless of the occupation from  
which the questionnaire was taken. Their report was provided to the  
Commission in early June, 1990. The Wyatt analysis demonstrates there were  
correlations between various factors, for example, the extent to which a  
score on mental demands correlates with knowledge. The conclusions from  
this report was there appeared to be no significant differences in the  
correlations between the factors for the male and female jobs or between  
the overall patterns. The report further indicated there was some  
difference in scores on working conditions between male and female jobs.  
It was Durber's belief this was explainable by the nature of the work.  
(Volume 147, p. 18208).  
559. The approach of the Commission in assessing gender bias was not  
"to prove no bias" but simply to find whether or not a reasonable person  
would see bias operating. (Volume 149, p. 18521). According to Durber,  
because there is a different pattern for males as opposed to females, as  
for example in the Wisner 222, this does not tell the investigator anything  
except, perhaps, "whether one ought to look further".  
560. During the initial investigation, a letter dated June 20, 1990  
accompanied by a binder, was delivered by a representative of the Treasury  
Board to the Commission, which contained information relevant to the  
Commission's assessment of gender bias. The documents in the binder  
included IRR Sub-Committee documents, the ICR studies, the recommendations  
for changes prepared by the Willis consultant, Drury on the MEC  
evaluations, the Tristat Report, Willis' report on MEC's work dated July,  
1988, questions referred to Willis in August, 1988 from the management side  
regarding the MEC evaluations, minutes of JUMI Committee meetings, copies  
of letters written in July, 1988 by the Alliance and the Institute to Drury  
regarding evaluation rationales and interpretation of the factors under the  
Willis Plan, Willis' response to committee challenges of the MEC  
evaluations, a copy of a letter from Willis regarding Committee #4 written  
on August 17, 1989, and copies of letters between the parties regarding the  
Wisner 222.  
561. Durber stated the documentation in the binder provided by the  
Employer did not particularly pertain to gender bias. In the set of  
documents relating to the ICR Sub-Committee, Durber searched for specific  
evidence of gender bias. With regard to the Tristat Report he testified he  
was looking for bottom line conclusions because he wanted to know whether  
in fact there had been indications, or hard evidence of gender bias. As to  
the ICR studies, considering the small number, 25 tests, it was not  
possible, he said, to detect a trend.  
562. Durber spoke with one of the Commission's observers, Brian  
Hargadon, concerning his observations of the ICR tests. Hargadon  
participated in all of the tests. He also administered some of the tests.  
Durber testified that in Hargadon's view the evaluation committees did not,  
over time, take the tests as seriously as when they had begun. Durber,  
therefore, found the ICR tests inconclusive on the question of gender bias.  
121  
563. On the question of the changes to the MEC evaluations prepared by  
the consultant, Drury, Durber primarily relied on Willis' opinion that the  
matters brought forward by her were resolved. As to the report prepared by  
Willis & Associates in July, 1988 and their analysis and conclusions  
regarding the MEC's work, Durber considered the bottom line in the report  
to be that there was no problem with gender bias. In Durber's opinion,  
after reviewing the materials submitted to him by Treasury Board, he came  
away with no better understanding of how gender bias might operate in the  
job evaluation results. Durber testified the Treasury Board material was  
not helpful and he needed to better understand whatever was going on with  
respect to so called gender bias. Accordingly, he decided to look  
elsewhere for answers.  
564. Durber stated the only discussions he had with Treasury Board  
staff about material contained in the binder was during a presentation he  
made to the Employer on July 5, 1990, regarding issues surrounding gender  
bias. A more detailed analysis of gender bias as viewed by the Treasury  
Board was not made available to the Commission until August, 1990. It was  
then the Treasury Board submitted its more detailed written submission  
concerning this subject to the Commission.  
565. Part of the Commission's investigation into the question of  
apparent gender bias was a follow through of the recommendation, contained  
in the Saveland Report, for further analysis of the 27 "under-evaluated"  
male jobs (subsequently reduced to 25). These jobs had been identified by  
Saveland as showing a difference of 10 per cent or more between Wisner and  
the evaluation committees. Durber convened a joint committee in the spring  
of 1990, composed of management and union employees under the chairmanship  
of Ron Renaud, Senior Consultant, Equal Pay Section of the Commission.  
They met for two weeks beginning on April 30, 1990. In the Commission's  
letter to the committee members, the committee was informed as follows:  
The committee's mandate is to carry out a quality check of twenty  
seven positions that were evaluated by JUMI committees coming  
after MEC. In an analysis of 222 position evaluations by Willis  
and Associates, June, 1989, it was found that the evaluations were  
significantly different from the MEC discipline and contributed  
most to the finding of apparent gender bias.  
(Exhibit PIPSC-135)  
566. Former MEC evaluators were selected to participate in this  
committee, including two management and three union representatives whose  
names were suggested by the Employer and the unions. Durber wanted  
participants who had a breadth of views. In his opinion, this goal was  
achieved. This committee was referred to as the Quality Analysis Committee  
(the "QA Committee") and produced a report, The Quality Analysis Report.  
567. Durber testified that within the context of the QA Committee, he  
was less interested in the fact there were differences between the  
consultant and the multiple evaluation committees, than he was on what  
accounted for these differences. He was interested in knowing whether the  
QA Committee members perceived the multiple evaluation committees and  
consultant differences in a way that related to the fact these were male  
122  
jobs, or whether they perceived any bias on the part of the multiple  
evaluation committees. He considered the five former MEC members to have a  
special insight into both the Willis Plan and the MEC discipline. He  
expected they would understand "the mechanisms behind their own differences  
with the committees."  
568. Durber testified the Commission was trying to determine if there  
was a reason, a motive, or some conscious or unconscious effort by the  
multiple evaluation committees to disfavour these male jobs. If gender  
bias was to be evident anywhere, he reasoned, it would be evident with  
these 25 jobs.  
569. The procedure followed by the QA Committee in completing its  
assignment was for each committee member to read the questionnaire,  
independently evaluate the questionnaire, review the MEC benchmarks used by  
the JUMI Committee and those used by the Willis consultant, and then select  
additional appropriate MEC benchmarks.  
570. The evidence before the Tribunal is contradictory as to whether  
or not the QA Committee was required to arrive at a consensus in their  
evaluations. According to Durber's evidence, the QA Committee were not  
asked to form a consensus. Durber testified the Commission asked each QA  
Committee member to report to the chair on their evaluations, then discuss  
them, but not arrive at a consensus. Durber further testified the  
Commission was not attempting to validate the ratings of the 25 jobs but  
simply wished to understand whether the members of the QA Committee might  
become aware, during this process, of any gender issues either in their own  
ratings or in the multiple evaluation committees' ratings.  
571. On the other hand, two union members of the QA Committee  
testified the QA Committee was asked to reach a consensus and failed to do  
so. Their evidence is the QA Committee followed the same Willis procedure  
used by the evaluation committees. The only exception, according to these  
witnesses, was that the consensus had to be unanimous for each sub factor  
in the Willis Plan, rather than the two-thirds majority required for  
consensus in the evaluation committees. An attachment to the letter dated  
April 23, 1990, from the chair to the QA Committee members corroborates and  
confirms the unanimous agreement requirement for consensus. The relevant  
part of that document states:  
Evaluation findings will be arrived at by committee consensus.  
This means that the evaluations by factor, sub-factor and points  
must be agreed to by each member of the committee.  
(Exhibit PIPSC-135, p. 3)  
572. Durber testified that at the conclusion of the QA Committee's  
work, the chair of the Committee, Ron Renaud, reported to him the  
differences in the ratings between the QA Committee and the evaluation  
committees were due to "perceptions of the work", but that the QA Committee  
found the gender of the jobs played no role whatsoever in the ultimate  
evaluations. A review of the written report does not include any reference  
to this verbal report from Renaud to Durber.  
123  
573. Durber concluded from this exercise, it would not be unusual to  
find a range of views between evaluators which would be reflected in a  
range of ratings. Durber interpreted the difference between Wisner and the  
evaluation committees as "normal, honest disagreement about work as opposed  
to any problems with gender bias."  
574. Durber stated in his evidence, "that the entire edifice of the  
question of gender bias which is before the Tribunal rests on a foundation  
of one person's view [i.e., Wisner's] of 25 questionnaires." (Volume 149,  
p. 18581).  
575. Durber used the QA Committee Report to compare the average of the  
QA Committee evaluator's ratings to the total point score given by the JUMI  
evaluation committees and Wisner. According to Durber, this comparison  
indicated to him the QA Committee disagreed as often with Wisner as with  
the evaluation committees and he states in Volume 149, at p. 18573, line 14  
to p. 18574, line 22:  
The patterns were that the low raters agreed, essentially, as  
often as they disagreed with the committee ratings.  
The high end rater agreed only about one-third of the time with  
the committees, although a third of the time was still a  
reasonable number.  
I concluded from this exercise that in fact one should expect a  
range of view, a range of ratings on jobs, that it wasn't unusual  
to find a range of ratings, that it certainly would not be unusual  
to find differences between any raters.  
That permitted me to believe, interpret Mr. Wisner's differences  
from the committees as a normal, honest disagreement about work as  
opposed to any problems with gender bias.  
The fact that they were male jobs may or may not have been  
coincidental, but I could not see any necessary reason to believe  
that there was bias operating as a result of the differences  
between Mr. Wisner and the committees.  
I didn't, for example, conclude that Mr. Wisner was biased in  
favour of male jobs, which could have been one of the  
interpretations from his report. He being a male, one might have  
concluded that. But whether he was a professional consultant and  
objective or whatever was another issue.  
But we did find these five individuals from MEC also disagreed  
less than Mr. Wisner, but probably about as often or a little more  
than Mr. Willis when he and his other three consultants had looked  
at male jobs.  
576. One of the union representatives on the QA Committee, Tim Yates,  
was asked in chief about his understanding of the purpose of the QA  
Committee. His response was that its purpose was to look at the committee  
124  
evaluations, ascertain if they had chosen appropriate benchmarks and  
correctly applied them. Yates testified he could not recall any instances,  
during this review, where inappropriate benchmarks were used. As to  
differences between the consultant's evaluations and the committees'  
evaluations, Yates says the following in Volume 175, at p. 22226, lines 4 -  
22:  
A. Well, if one is to make a huge assumption, that we were the  
experts in the thing, sometimes we were higher than the  
consultant, sometimes we were lower than the committee. I think  
it was Mr. Willis who said many times, "this is not a science".  
I would say personally that what was the problem? It all appears  
to be within tolerance.  
Q. What do you mean by it would all appear to be within  
tolerance? Where did that phrase come from?  
A. The lowest possible difference is one step. One step is 15  
per cent. That's the very slightest possible bit of shading in  
any factor is 15 per cent.  
577. The other union representative who testified regarding the QA  
Committee was Mary Crich, who had been an alternate on the MEC and  
participated in the committee evaluations as a member of Committee #5. She  
was asked about her observations. On reflection, she found her  
participation on the QA Committee was a good experience because it led her  
to understand that what she had done as a committee member was precisely  
what the evaluation committees were supposed to have been doing. She found  
the QA Committee evaluations were reached by exactly the same discussions  
relating to the same points and with more or less the same kinds of  
agreements and disagreements she experienced in her evaluation committee.  
578. As to Crich's understanding of the work of the QA Committee, she  
testified the individuals selected for the QA Committee knew the MEC  
discipline, and thus could decide whether or not the ratings of the  
evaluation committees respected the MEC discipline or differed  
significantly. Crich further testified when the QA Committee finished its  
work, there was general agreement among the members there was no bias. If  
there was a significant difference, it was, according to Crich, because it  
was a genuinely difficult job to evaluate which had no comparable  
benchmark. She described the 25 jobs as "very difficult jobs". Crich was  
asked in cross examination what she understood was meant by "bias". She  
responded in Volume 192, at p. 24830, lines 5 - 15:  
A. What I remember the other participants saying is that there  
had been allegations in the media that there had been -- the  
results of the study were biased and by "biased", that meant that  
the evaluations had not been fair to all jobs equally and that  
female jobs had been rated too high. I don't know if the -- also  
that male jobs had been rated too low. Maybe it was both. Maybe  
it was just one or maybe it was -- but that was -- the bias is  
that female jobs were rated too high.  
125  
579. A further clarification of this response was given by her in  
Volume 192, at p. 24841, lines 2 - 12:  
Q. Mr. [sic] Crich, I just have one question, really, and I  
will try to phrase it as clearly as I can.  
When your Quality Assurance Committee agreed that there was no  
bias in these questionnaires, these 27 questionnaires that you  
evaluated, were you looking at the reasons for the difference and,  
therefore, concluding that the gender of the questionnaire was not  
the reason for the differences?  
A. That's correct.  
580. Willis testified he had a number of problems with the QA  
Committee. He was disappointed with the composition of the committee and  
would have preferred if the total MEC membership had been reconvened rather  
than only the five individuals selected. Another factor which troubled him  
was although three of the members were from the MEC, two of them had acted  
only as alternates. Moreover, one of the members had been identified as an  
outlier in the Tristat Report. Willis also believed, since two of the  
members had participated in the evaluation committees, their opinion about  
the committee results might be suspect.  
581. Another area of concern for Willis was these five individuals had  
not done any prior evaluations for at least two years. Willis testified  
this committee should, at the very least, been given a day or two of  
refresher training by the consultants. In his opinion, it would be  
difficult after a two year lapse in time to return and do evaluations,  
particularly evaluations which were to be critiqued. His biggest concern  
is noted in Volume 208, at p. 26950, lines 3 - 8:  
However, I guess my biggest concern about the QA committee was it  
was my understanding that there was no consensus process. To me I  
look at the consensus phase of the evaluation process as being  
part of the data-gathering collection.  
582. Willis stressed the consensus phase of the Willis Process is a  
very important exercise because it gives the committee members opportunity  
to discuss the facts of the job and time for all members to consider the  
information thus elicited. It is the "fine honing of the information"  
which is important, according to Willis, in this stage of the process when  
committee members change their evaluation at this point, Willis believes  
the change is appropriate as long as it is based on facts which are brought  
out as a result of the discussions. On this basis, Willis discounted the  
results of the QA Committee because an essential and critically important  
step was left out. Willis testified that to some extent he might change  
his opinion regarding consensus, if indeed the QA Committee included the  
consensus process in their deliberations.  
583. Durber testified, in the normal course of an investigation, the  
Commission expects an employer to provide evidence in support of "their  
defence". He testified the Commission receives a defence from the employer  
126  
which says, in effect, it ought to be excused from accepting the results of  
its own study. Notwithstanding, the Employer was duly represented, and  
presented no evidence on which to support such a conclusion.  
584. According to Durber, differences between a committee and a  
consultant are bound to occur, but the Commission needs to be vigilant  
about understanding those differences and their relationship to gender.  
585. The Commission opted to conduct further analysis of the  
consistency of the evaluations by the nine evaluation committees compared  
to those of the MEC. Durber felt he had no alternative but to order  
another study so as to complete the picture. He was not happy with the  
alternative because, in his opinion, it was impossible to replicate job  
evaluations done by the committees. Durber felt uneasy about the validity  
of the process, which he described as people in a sense second guessing  
what a rather large number of people had done over a period of time.  
Durber would have preferred to have the parties to the JUMI Study deal with  
the issue of apparent gender bias in their own way. He elaborates in  
Volume 149, at page 18599, lines 1 - 11:  
But conceivably they might well have had committees explain their  
results, look at the differences between themselves and Mr.  
Wisner. There might well have been some judgments raised or  
brought to bear on the patterns themselves and on the differences  
between the committees and Mr. Wisner.  
There could have been some good rationalization, if you like. But  
in the event, that proved not possible. Once the committees were  
gone, they were gone.  
586. Durber said in the course of his investigation, he did not  
contact Wisner because he preferred to relate to what he considered  
"reasonable criteria for judging the quality of job evaluation." The  
issue, in Durber's view was one of differences between committees and  
consultants and the process followed by the committees. Durber questioned  
why he should prefer to believe a consultant over the evaluation  
committees. Given a choice between the judgment of a group of people who  
are well informed as opposed to following the discipline of one individual,  
Durber would prefer to believe the group of people. This was one of the  
"indirect measures" which Durber used in drawing his conclusions about  
gender bias. Durber believed if he contacted Wisner, he then would have  
been bound to call each of the working committee members.  
587. Durber contacted Willis to do a further evaluation in the early  
part of 1990. Willis confirmed his acceptance by letter of February 12,  
1990 to Durber which states in part:  
The purpose will be to determine the extent of any systematic bias  
that may exist in the results of evaluation committee efforts.  
...the sample size should be 300 positions, with at least 131  
being from male dominated occupational groups and the balance from  
female dominated occupational groups.  
127  
As to the sample selection, the random selection methodology we  
used in the earlier special analysis would, I believe, be  
acceptable to the unions and management. The Human Rights  
Commission should have input into this methodology...  
The method employed for the analysis will be the same as used in  
our previous analysis. Each selected questionnaire will be  
reviewed and a determination made as to whether a similar position  
is included among the Master Evaluation Committee's evaluations.  
In cases where a similar MEC benchmark exits [sic], the MEC  
evaluation will be adopted as the consultants evaluation. When no  
similar benchmark exists, the consultants will do an independent  
evaluation of the position, supported by reference to appropriate  
MEC benchmarks. Comparison will then be made with the sample  
committee evaluation and rationale for that position. When  
differences are found between the consultants evaluation and that  
of the committee, a written rationale explaining the consultants  
evaluation will be provided.  
(Exhibit HR-93)  
588. Durber stated his objective in commissioning Willis to re-  
evaluate the additional 300 positions was essentially to pursue the issue  
which had been raised as a result of the Wisner 222 relating to possible  
gender bias. In view of s. 9 of the Guidelines, supra, Durber wanted to be  
assured there was no question of gender bias. He further testified he  
could see no alternative but to pursue the same approach as Wisner had  
because it was through that approach the issue had arisen in the first  
place.  
589. Durber would have preferred to engage Wisner to perform the  
second set of re-evaluations but, in the meantime, Wisner had left the  
Willis firm. Accordingly, Willis was authorized to form a committee  
consisting of four consultants, (collectively referred to as the "Gang of  
Four"), who were to perform the 300 re-evaluations (the "Willis 300").  
590. Willis testified he understood there was a concern the four  
consultants working together would arrive at a slightly different result  
than Wisner. Accordingly, their additional task involved selecting jobs  
from among the Wisner 222 and independently evaluating them without making  
any judgment as to differences between the Gang of Four's and Wisner's re-  
evaluations. Using the Gang of Four, Willis was to review approximately 20  
per cent of the evaluations of the Wisner 222, i.e., 44 questionnaires, as  
a double check on Wisner's interpretation of the jobs.  
591. The Gang of Four tried to match, as closely as possible, the  
methodology that had been used in the Wisner 222. The sample of positions  
was selected by the Commission and was taken from the total sample of  
evaluations excluding the MEC evaluations and any re-evaluations included  
in the Wisner 222. Willis was not asked to do any analysis of those re-  
evaluations. Once the Gang of Four completed the 300 re-evaluations, the  
results were turned over to the Commission for analysis.  
128  
592. The Gang of Four consisted of Willis, two of his associates, Owen  
and Davis, and one outside bilingual consultant, Esther Brunet.  
Questionnaires were assigned to each consultant and a second consultant  
reviewed each of those evaluations, so that there were always two  
consultants involved. The work took approximately two months. A report  
entitled Report to the CHRC Equal Pay, Quality Analysis of Sampled  
Committee Evaluations, Joint Initiative Equal Pay Study, was presented by  
Willis to the Commission in March of 1990.  
593. Although this review was to assess the quality of the Wisner 222,  
Willis & Associates were instructed by the Commission not to draw  
conclusions as to the quality of either their work or Wisner's re-  
evaluations. In the course of these hearings, and in the context of this  
review, Willis was asked his opinion on the quality of the Wisner re-  
evaluations. He replied in Volume 59, at p. 7337, lines 11 - 24:  
THE WITNESS: I was satisfied with the quality of the Wisner  
evaluations six or seven months earlier when I looked at his  
rationales and I looked at his actual evaluations. I had a great  
deal of confidence in Mr. Wisner's ability as a professional job  
evaluator.  
I did not, at this point, sum up the 44 evaluations by our team of  
consultants and compare them in total with the Wisner evaluations.  
They were not identical, there were some differences. But I felt  
that it was up to Mr. Durber to analyze those differences and, in  
effect, decide whether the quality was consistent between both  
consultant teams.  
594. In terms of analyzing the results of the 300 evaluations, Willis  
stated it would have been appropriate, in his opinion, to perform a  
statistical analysis to identify the existence or non-existence of a  
systematic pattern of gender bias. Had the Commission asked Willis to  
perform this analysis, he would have retained a statistician, Dr.  
Milczarek, who in the normal course of events, performs this kind of  
analysis for him.  
595. The last communication between Willis & Associates and the  
Commission, concerned the 44 re-evaluations. This took the form of a  
letter dated May 1, 1990, written by the Willis consultant, Keith Davis to  
the Commission. During the re-evaluation of the 300 positions and the  
review of the 44 Wisner re-evaluations, the Gang of Four inadvertently  
referred to a list relating to the working conditions factor in the Willis  
Plan. Changes had been made by the JUMI Committee to this factor which the  
Gang of Four had failed to take into account. Davis informed the  
Commission when using the re-evaluations, the working conditions factor  
needed to be changed. In the end, one re-evaluation by the consultant  
required a change.  
596. The Tribunal had the benefit of hearing evidence from Esther  
Brunet concerning her participation in the re-evaluations as a member of  
the Gang of Four. Brunet was the only member of the Gang of Four who was  
an employee of the Federal Public Service. She had been involved in the  
129  
JUMI Study as a chair in the first version of Committee #4. Her employment  
background at the relevant time was Director of Personnel, Finance and  
Administration with the Status of Women Canada. Willis testified he needed  
a French speaking consultant to participate in the Willis 300 and, because  
he and his staff had a great deal of confidence in Brunet's ability to  
evaluate, they contracted with her to evaluate the French questionnaires.  
597. Brunet evaluated approximately 100 questionnaires out of the  
total of 300. About 70 per cent of those were French questionnaires. She  
first evaluated the questionnaires independently. If the evaluation  
committee had used only one benchmark, she would try to find more. Once  
her evaluation was done, she would look at the evaluation committee scores  
and rationales, and if she felt the reason for the difference made sense,  
she would give the benefit of the doubt to the evaluation committee scores,  
if not, she would then prepare her justification and present it to the  
other three consultants. During this presentation, Brunet would try to  
convince the other three team members of the need for the change she was  
proposing. If she was unable to persuade the other members, the evaluation  
committee scores remained as they were. Brunet explained the Gang of Four  
did not write rationales in the same manner as the committees because the  
reason they wrote them was simply to justify the difference between the  
consultant and the committee.  
598. Brunet's evaluations of the French questionnaires can be compared  
to the committee scores because she was the only consultant in the Gang of  
Four evaluating French questionnaires. The French questionnaires are  
summarized in PIPSC-162 and confirms that for female-dominated  
questionnaires, Brunet's average score was 157.1 compared to the  
committees' average score of 157.9. With respect to the male-dominated  
questionnaires, her average score was 250.7 compared to the committee's  
average score of 249.7. Brunet rated the same as the committees except in  
eight cases, five from the female and three from the male.  
599. Unlike other Commission investigations, the investigations here  
under s. 11 differed somewhat, in that the factual foundation for the  
complaints were known to the Commission because it had participated in the  
process as observer from an early stage. The Commission observers attended  
the JUMI meetings and observed the committees during their evaluations on  
an ongoing basis from the commencement of the study. The Commission did  
not have enough observers to attend all of the committee sessions, and over  
the years the number of observers was reduced.  
600. Daily notes were made by these observers when they attended an  
evaluation committee at work (Exhibit R-142), and these notes, which were  
quite extensive, were entered in evidence during the cross-examination of  
Durber. Durber had not read the notes himself. He asked Brian Hargadon,  
one of the Commission's observers, whether there was anything in the  
observer notes relating to the committee process in particular, which  
needed to be explored as part of the investigation. Durber testified he  
received an overview from Hargadon about difficulties in the process of job  
evaluation, in arriving at consensus, and dealing with the issues.  
However, at the end of the day, there was nothing in them to be concerned  
about in terms of the bottom line, that is to say the reliability of the  
130  
results. Consequently Durber expressed the opinion he did not consider it  
necessary for the observer notes to be provided to the Tribunal as evidence  
in this hearing.  
601. Excerpts from the observer notes were read to Durber during his  
cross-examination and he was asked whether he was given the information  
either by Hargadon or in any other context to form his conclusions about  
the JUMI Study. Some of these excerpts include the following:  
Committee #5  
...A gender bias problem appears to be developing in this  
committee.  
There is one woman (Sherry) who gives higher scores than the rest  
of the group for female-dominated jobs and lower scores for male-  
dominated jobs. She also claims to have first hand knowledge of  
most jobs and when describing them makes extremely subjective  
comments reflecting this bias. She will rarely change her rating  
even if she has taken an extreme position.  
There is also a man in the group (Paul) whose ratings reflect the  
opposite bias. However, his ratings tend to be closer to the  
consensus rating.  
There is another woman in the group (Mary) who, in discussion,  
appears to have a strong alliance with Sherry. However, Mary's  
evaluations do not appear to indicate a bias.  
Discussion tends to be extremely drawn out in this group as there  
are consistently opposing views...  
(Exhibit R-142, Volume I, page 6)  
Functioning of Committees:  
In general, committees have settled into routines which are  
efficient and also reflect the uniqueness of each group. Given  
that working conditions are not ideal (i.e. working time is  
tightly structured and individuals with very different  
personalities and views must spend extended working hours  
together) committees are working well.  
However, there are a few problems which need to be monitored. I  
do not know enough about Committee #3 to comment. Committee #5  
also has its problems which affect productivity, although not to  
the same extent as Committee #3.  
Members of Committee #5 have problems listening to the views of  
others. They constantly interrupt each other and often the  
emotional tenor of the Committee is extremely high.  
131  
Committee #5 needs a chair who can be very firm with such  
disparate and strong personalities. The present chair does not  
seem to have this capacity...  
(Exhibit R-142, Volume I, page 90)  
Committee #3  
Splits in committee union/management. Job was well written and  
complete. Louise moved to conform with Jake & Al on K&S. No  
improvement on committee operations. Atmosphere tense.  
Committee #2:  
Committee works well.  
(Exhibit R-142, Volume II, page 125)  
Committee #4:  
Took 5 hours to deal with this job (simple). New raters prolonged  
process, obstinate, even after clarification by consultant.  
(Exhibit R-142, Volume II, page 130)  
Committee #4:  
...Language gender used, Chairperson trying to influence raters.  
(Exhibit R-142, Volume II, page 176)  
Committee #5:  
...Pierre Collard noted a blow-up in Committee 5. He felt it may  
have been indirectly influenced by the fact that some members of  
#5 would have no jobs when this process is complete...  
Wednesday - the Pay Equity Section, CHRC, rec'd a call from TB to  
intervene in a blow-out by 2 members of Committee 5.  
Thursday - Brian H. and I wandered around committee and things  
were quiet.  
(Exhibit R-142, Volume II, page 203)  
Weekly meeting, Monday, October 31, 1988:  
132  
Ron [Renaud] brought out the point that the ground rules with  
regard to meeting of the consensus guidelines is not being  
followed. Result is that after it is all over one party could say  
that it was not a valid agreement because the rule was not  
followed, as covered in the procedures guidelines.  
(Exhibit R-142, Volume I, page 36)  
Additional Observer Notes dated November 24, 1988:  
3. Majority vote. Committee 3 & 5 have a problem with this.  
Apparently they are not following the rules for consensus as  
spelled out on page 2 of the Working committee Procedures.  
Committee 2 follow the instructions with no exceptions...  
There is also some question on reaching consensus by using the  
median. Fred had suggested this. For example, you have under  
working conditions, the following scores, 13, 13, 15, 17, 17. You  
should settle on 15 as the score. Should this be the solution?"  
(Exhibit R-142, Volume I, page 79)  
JUMI Committees - Observations:  
Today, during my visit to committee number three, I noted that the  
committee was not observing the two third rule in order to reach  
consensus. Committee decided to take an average value as a  
consensus, however, I was consulted in the matter and they went  
along with my advice. Moreover, a comment was made: "We do not  
follow this rule unless somebody is here observing us".  
(Exhibit R-142, Volume II, p. 102)  
Committee #6:  
...Also assumption made in working conditions as the committee  
felt the incumbent was not thorough in filling out the  
questionnaire.  
(Exhibit R-142, Volume II, p. 187)  
Notes from Brian Hargadon to Ted Ulch:  
I see a couple of problems, at least, with Committee #2.  
...  
133  
Lack of utilization of the original bench marks. We are told that  
we as a committee do not have any obligation to follow them. Is  
this so?  
(Exhibit R-142, Volume II, p. 111)  
JUMI Committee:  
...Keith and Sharon made comments on the analysis they did on  
their respective committees there was a concern shown by all of  
the people sitting in for the CHRC that it is obvious there are  
certain people rating consistently high or low, it may not be  
resolved soon enough if the information from the tests is not  
quickly analyzed.  
...it has been suggested that we keep out of personal dynamics,  
that is fingering any person in a committee that may not be up to  
snuff because it could come back to haunt us. There is a feeling  
that some committee members, particularly union, are being advised  
on how to approach the evaluations which would best fit the  
interests of a specific union membership.  
(Exhibit R-142, Volume I, p. 56)  
Committee #2:  
In position 2317, committee did not follow MEC benchmark and it  
seems that the position has been overrated...A comment was made:  
"It does make a difference to have your presence here during  
evaluations." People here are not discussing the jobs at all.  
(Exhibit R-142, Volume II, p. 180)  
Weekly Activities:  
...The problem is that committee 5 has well over 100 evaluations  
to sore thumb and there is a question as to how they were allowed  
to accumulate so many.  
(Exhibit R-142, Volume II, p. 208)  
Weekly Meeting, November 8, 1988:  
...The members of the committee brought up a number of  
inconsistencies that have been noted in the various committees.  
There is a concern as to them being limited to questioning if  
there is an obvious standard set that may not be followed with  
other committees.  
134  
For example, one committee decided that level D under job  
knowledge can only be re-asked if the job requires a university  
degree. Ron asked the consultant if that was the case, and the  
answer was that was not correct.  
Ron will be writing more specifics to be submitted to Ted under  
separate cover. There is a real concern that these  
inconsistencies will be allowed to go on and grow in number with  
the end result that the credibility of the committees, and indeed  
of us, will be challenged...  
Committee #3 is continuing to have some problems. The committee  
will do their ratings, then search for a benchmark to fit the  
rating rather than check their rating against an appropriate bench  
mark.  
(Exhibit R-142, Volume I, p. 45)  
Consistency JUMI Study:  
I would like to bring to your attention what I consider an  
important issue at this stage of the study and one that should be  
brought to the attention of JUMI.  
Essentially, we should confirm our position that consistency is  
important; consistency with the MEC discipline and consistency of  
the five evaluation committees in applying the Willis Plan. I  
believe we have some legislative authority in the Equal Pay  
Guidelines in respect to consistency.  
...  
There have been a number of instances not only mentioned above  
where committees have for some jobs followed an evaluation process  
which is inconsistent with MEC and among the various evaluation  
committees...  
-There are other situations like this which makes us  
concerned about inconsistencies and how we can help ensure  
that they are corrected as early as possible without  
compromising our role.  
In summary, I recommend that JUMI be advised of our opinion as to  
how "Acting" situations are to be handled. In addition it would  
be timely to confirm our position on the importance of  
consistency; consistency with the MEC discipline and consistency  
of the five evaluation committees in applying the plan.  
(Exhibit R-142, Volume I, p. 47)  
135  
Update on Observers Remarks, December 7, 1988:  
The observers decided they wanted to go over a number of points  
that concerned them so a meeting was held this morning.  
Before getting into the individual items I want to confirm that we  
are having some concern shown by various committees during testing  
time...  
The reason for our numbers being diminished at the committees has  
been discussed with the observers so that we would give the same  
reason, a) other commitments and b) committees are now requiring  
less observation because of the time they have been in operation.  
Committees 1,2, and 4 are operating quite well. Committee 5 does  
still have some problems however, they will probably sort  
themselves out.  
Committee 3 is still not functioning up to par. The question  
arises whether the remaining observers, Sharon and Keith, should  
spend a disproportionate amount of time in committee 3 because of  
the problem. So the question remains, do we give preference to  
committee #3?  
When we go back and look at the reason observers from the CHRC  
were brought into the picture, there is concern that our efforts  
will be for nothing should a) JUMI fold up, or b) we are to attest  
to the credibility of both the Master Evaluation Committee and the  
current five committees in operation.  
As it stands now, no observer would attest to the evaluations  
being fair, balanced and objective. There are too many  
irregularities within committees and between committees.  
(Exhibit R-142, Volume I, pp. 82-87)  
602. It should be borne in mind the role of the observers was to act  
as a "watch dog" in the committee evaluation process. They were to  
observe, critique and when asked to do so suggest improvement in the  
functioning of the committees. The observers' notes need to be viewed in  
this context.  
603. Durber accepted the bottom line opinion of the Commission  
observer, Hargadon, and decided not to rely on the notes as evidence of the  
reliability of the evaluation results.  
604. The Tribunal heard testimony from witnesses who were evaluators  
on committees and who provided evidence in response to specific observer  
notes about their particular committee. Having considered Durber's  
responses to the questions raised during his testimony, the vagueness and  
lack of specificity of these notes and the responses of the evaluators who  
testified at this hearing, the Tribunal finds as a fact that the notes do  
136  
not significantly impact in a negative sense on the broader issue of  
reliability.  
605. Another aspect of the Commission's investigation involved a three  
member committee organized by Durber to review re-evaluations conducted by  
the Treasury Board relating to the Nursing, Home Economics, Occupational  
and Physical Therapists and Computer Services benchmarks. These re-  
evaluations are contained in two reports which were presented to the  
Commission in July, 1990, in response to the Commission investigation into  
the question of apparent gender bias in the evaluation results. The  
reports are entitled Evaluation of CS Benchmarks and Corrected Version of  
NU, Annex B (Exhibit HR-252), and Final Report on Evaluation of Equal Pay  
Study Questionnaire (Exhibit HR-253).  
606. The Commission had asked the Treasury Board whether the Employer  
subscribed to the observations offered in these reports which raised  
questions about the specific job evaluations of the multiple evaluation  
committees. The Commission received no response from the Treasury Board to  
their enquiries. Durber concluded these reports could be viewed as  
possible evidence in the investigation, but in the short term, excluded the  
reports as valid evidence in the Commission's investigation, reserving  
however, the option to advise the Tribunal of the documents in greater  
detail. Nonetheless, Durber decided to have a committee explore the  
substance of the reports (the "Benchmark Review Committee").  
607. The Benchmark Review Committee consisted of Esther Brunet,  
Christine Roberge, an employee with the Commission and Brian Hargadon, an  
investigator for the Commission. Hargadon and Roberge were trained by  
Willis. In early September, 1990, the three participants, using the Willis  
Process, started to re-evaluate each of the evaluations found in the  
Treasury Board reports. These included 65 benchmark questionnaires. They  
also examined 203 multiple committee evaluations from the OP, HE, NU and CS  
Groups. The process, defined by Durber, was that all three committee  
members had to agree on the evaluation for each job that was re-evaluated.  
After reaching consensus, the committee then compared their score to the  
Treasury Board consultant score and the score of the multiple evaluation  
committees.  
608. If the Benchmark Review Committee score was different from the  
Treasury Board and the evaluation committees' scores, there was an attempt  
to examine the reason why the scores were different. Then the Benchmark  
Review Committee gave the benefit of the doubt to the Treasury Board  
consultants, or to the evaluation committees or failing that, the Committee  
would justify its own score if it differed from Treasury Board and the  
evaluation committees.  
609. Since Durber was not informed by the Treasury Board as to the  
purpose of the reports provided in July, 1990, his conclusions were  
primarily based on the conclusions contained in the Benchmark Review  
Committee's report.  
610. Brunet did not participate in writing the Committee's final  
report (Exhibit  
137  
HR-254). It was prepared by the Commission members, Roberge and Hargadon  
and was reviewed by Durber. The conclusion contained within the report and  
attested to by Durber is that no weight should be placed on the Treasury  
Board reports. The Benchmark Review Committee's examination confirmed the  
JUMI evaluations with very few exceptions.  
611. An earlier draft of Exhibit HR-254 was prepared by the two  
Commission members of the Committee, and is dated June of 1991. That draft  
was introduced in the cross-examination of Durber as Exhibit R-140. There  
were two passages, at pp. 26-27, which were not included in the final  
report. These pages refer to sore-thumbing and difficulties experienced by  
the evaluation committees in the use of benchmarks. Durber removed these  
from the final report. It is his opinion, these pages were not  
particularly "relevant to what they [the Benchmark Review Committee] were  
doing..." (Volume 159, p. 19790). Durber instructed these pages be  
dropped from the final version. In his view they were interesting comments  
on difficulties encountered with benchmarks but did not add to what the  
Commission already knew. In his opinion, they were more instructional for  
use in future pay equity exercises.  
612. Durber testified he asked both Roberge and Hargadon about the  
considerations raised on pages 26 and 27 of the original report, Exhibit R-  
140. Durber testified he was told that the purpose of these two pages was  
to comment upon "lessons learned, and their own perceptions of the  
difficulties the Commission might encounter in fulfilling their observer  
role in future initiatives." The Commission would, as a result, be  
forewarned of the problems which occurred during the JUMI Study including  
the difficulties with the rationales. Durber did not consider their  
comments as solid evidence, but more as useful material for future work of  
the Commission.  
613. With respect to the report of the Benchmark Review Committee,  
Durber considered that the matters contained in pages 26 and 27 would come  
forward through Willis during the Tribunal hearings. Durber claimed the  
Commission had neither the resources nor the time to begin an investigation  
of the MEC process while preparing for its participation in these hearings.  
614. The Tribunal did have the benefit of Brunet's testimony relating  
to pages 26 and 27 of Exhibit R-140, which appears in Volume 214, at p.  
27852, lines 5 - 15:  
I noticed that pages 26 and 27 made me smile when I saw them  
because, when I was working with Christine and Brian Hargadon, Jim  
Sadler was heading the study from the Northwest Territories. He  
would often come and see how things were going, and all that.  
Once we found out that he was going up there, we said, "How about  
we share some information that we have, so that you can bring it  
up.  
When I saw pages 26 and 27, a lot of that I had input in.  
615. Brunet was under the impression she would be called upon to  
review and sign the report. In fact, she was not asked to do so but she  
138  
did, however, receive a copy of the report. She testified while the  
committee was doing its work, Jim Sadler, an employee of the Commission who  
was heading a pay equity study in the N.W.T., often came to see how the  
Benchmark Review Committee was functioning. The Benchmark Review Committee  
suggested that they share information with Sadler so he could take it with  
him to the N.W.T. study.  
616. Both Brunet's understanding of the comments contained on pages 26  
and 27, and Durber's opinion as to their usefulness, are corroborated in  
Exhibit R-141, a letter written by Sadler addressed to a union  
representative involved in a pay equity study in the N.W.T. This study is  
referred to as the Joint Equal Pay Study (JEPS) which was using a newer  
version of the Willis Plan. Some of Sadler's comments in that letter were  
based on discussions he had with members of the Commission's Committee.  
Those discussions corroborate both Durber's and Brunet's evidence about the  
Committee's perception about sharing this information with the Commission.  
617. Durber's opinions and conclusions about Exhibits R-140 (Draft  
Report) and R-142 (Observer Notes) led him to decide not to introduce these  
documents as part of the Commission's case. The Tribunal hearing is in the  
nature of a public enquiry, and the Commission's role is to represent the  
public interest. Decisions about the relevance of documentation garnered  
by the Commission during its investigation of the s. 11 complaint is within  
the purview of the Commission. In circumstances such as these however, the  
Commission's decision to exclude these documents from its case is open to  
criticism if the documents are found to be relevant and sensitive to the  
issue of reliability.  
618. Before proceeding further, the Tribunal is of the view the  
reports in question, namely exhibits R-140 and R-142, should have been  
introduced in their entirety as part of the Commission's case with  
accompanying explanations. The decision as to their usefulness ought to  
have been left with the Tribunal. The Commission's case would have been  
better served had the entire exhibit been entered in the first place.  
619. During cross-examination Durber offered a further explanation as  
to his reasons for not interviewing Wisner. He conducted an ex post facto  
review of Wisner's rationales for purposes of what he described as  
clarification. Durber used both the committee's and Wisner's rationales to  
do this analysis. It involved a review of each difference and a  
determination of the extent these differences cancelled one another out.  
After Durber categorized the differences between the committee and  
consultant, he looked at the numbers to determine whether the distribution  
of these differences were patterned or random.  
620. Willis was asked to comment on Durber's analysis which was based  
on the examination of rationales. Willis replied in Volume 208, at p.  
26939, he had trouble with Durber's conclusions. Willis doubts very much  
if bias could be recognized by looking at rationales. In Willis' opinion,  
bias is very subtle and not something that can be looked at on a job by job  
basis. Willis testified in Volume 208, at p. 26939, lines 8 to 13:  
139  
You have to look at a total pattern and, to me, it would be  
totally inappropriate to single out certain ones of those re-  
evaluations and say, "We will discount those." I think you either  
take them all and look at them at their face value or you don't  
take any of them.  
621. According to Willis, if his consultants are doing an evaluation  
during the course of the study, the reasons for the differences are very  
important as they will provide the consultants with some basis for  
retraining of a committee. Willis recognizes there is always going to be  
some random variance, and random disparity after the study is completed,  
and therefore, he does not, at this stage, concern himself with the  
reasons. In the context of Durber's analysis, Willis said he always  
expects some differences between consultants and committees, but he did not  
see any value in attempting to use those differences to analyze whether or  
not there is a problem. Willis elaborates further in Volume 208, at p.  
26944, lines 17 - 23:  
A. What I have said or at least what I intended was that since  
bias is a very subtle thing, I think our only opportunity for  
examining the extent to which there is a different interpretation  
of male versus female jobs is by looking at the total results  
after the study has been completed.  
622. The analysis done by Durber was presented in mathematical form as  
numbers and tables and conclusions about symmetry between numbers and  
whether these numbers were demonstrative of patterns. The Tribunal's view  
is that this analysis has a statistical component because of the particular  
methodology used by Durber. Without the assistance of a qualified  
statistical expert, we are unable to properly interpret Durber's analysis  
which, therefore, must be disregarded.  
623. In 1992, during the appearance of Willis before the Tribunal,  
Durber decided to further investigate the quality of job information  
contained in the questionnaires. Accordingly, he retained a researcher,  
who had no experience in job evaluation but who had "pretty good analytical  
ability" for the purpose of examining a cross-section of the  
questionnaires. The cross-section included 63 benchmarks, 587 non-  
benchmarks for a total of 650 questionnaires. The researcher did not  
appear before the Tribunal.  
624. The researcher's task was to look at the information to assess  
completeness, consistency, legibility, and whether the safeguards had been  
followed and finally, to determine if there was an indication each  
questionnaire had been validated by the employer's supervisor.  
625. Durber's evidence is he discussed with the researcher some of the  
characteristics that could lead to deciding whether or not the  
questionnaires were complete. In this regard, Durber prepared some  
procedures and questions for the researcher. As background, the researcher  
was provided with the purpose of the job information, the process used  
during the study to collect and screen the information, as well as  
140  
information for identifying basic data such as department, questionnaire  
number, occupational group and other such information.  
626. This project took the Researcher two months to complete. A  
meeting between Durber and the researcher occurred every week to discuss  
problems. Durber personally reviewed any questionnaires where problems  
were encountered, which involved approximately 5 per cent of the  
questionnaires. Durber testified he closely supervised the researcher  
during the examination of benchmark questionnaires.  
627. The following is a list of criteria used by the researcher in  
this exercise:  
1. Legibility - can the questionnaire be read?  
2. Language - whether the questionnaire was French or English?  
3. Script - whether the questionnaire was typed or hand written?  
4. Signature - whether it was signed or not?  
5. Comments - whether the supervisor commented?  
6. Completion - whether all of the parts of the questionnaire had  
been completed?  
7. Consistency - whether supervisor was consistent with the  
incumbent?  
8. Notes - whether there was evidence of interviewer or reviewer  
notes?  
9. Facts - whether the questionnaire contained fact versus editorial  
comment?  
628. The report entitled An Examination of the Quality of  
Questionnaire Information used by the Federal Pay Equity Study (Exhibit HR-  
245), contained both findings and conclusions about the completeness and  
accuracy of the job information. In the Tribunal's view, Durber is  
expressing, in the report and in oral evidence, the opinions of his  
researcher which may or may not be well founded. Due to the researcher's  
lack of expertise in pay equity job evaluation, it is the Tribunal's  
conclusion it must reject any opinions contained in this report. There is,  
however, factual content in the report, not based on opinion, which in our  
view is helpful. These are listed as follows:  
Findings:  
¨Required questions were answered 95% of the time.  
¨Supervisors provided signatures on just over 99% of questionnaires. In  
just over 96%, the supervisor commented, seeming to contradict  
incumbents about 9% of the time. In 95% of these contradictions,  
subsequent interviews clarified the work.  
¨In two-thirds of the files, interviews were carried out, with  
supplementary information provided. The investigator noted that the  
latter was frequently extensive...  
¨Legibility of the description in questionnaires was in all cases good.  
141  
Conclusions:  
¨There was a system for reviewing and assuring the completeness of the  
information about work in the Joint Initiative.  
¨There was a system for ensuring the accuracy of the job  
information...through supervisory review.  
¨Those involved in reading questionnaires made efforts...to obtain  
further information to improve their understanding...where the  
supervisor and incumbent appeared to disagree about the work.  
(Exhibit HR-245)  
(ii).Sunter's Analysis  
629. The Commission asked a former director of Statistics Canada, Alan  
Sunter, to examine the full set of data from the Wisner 222 and the Willis  
300 and look for patterns relating to gender composition. The Commission  
also requested Sunter, to assess the statistical significance of the  
formulae relating to possible gender bias used by the Treasury Board in its  
March, 1990 methodology paper.  
630. Sunter, a qualified statistical expert, did not have a background  
knowledge in pay equity prior to his involvement with the JUMI Study  
results. He became involved in the analysis of the JUMI data as a result  
of a request by Durber on April 6, 1990 who asked him to attend the  
workshop scheduled for April 9, 1990. The workshop was to focus on the  
Treasury Board methodology document (Exhibit HR-185). Sunter testified he  
was unable to contribute in a constructive way to the workshop and he  
simply listened to the discussions. After the workshop, he met with Durber  
and began to realize there had been a large study addressing the question  
of pay equity between male- and female-dominated occupational groups. He  
also learned there had been subsequent re-evaluations of samples taken from  
the evaluations. This led to the question of whether there was gender bias  
in the evaluations. This was a matter of concern to the Commission.  
631. The statistical evidence concerning the question of gender bias  
in the evaluation results was provided by Sunter and Shillington, both  
experts in statistics. Shillington was not employed by the Commission to  
do any statistical analysis of the results. However, because of  
Shillington's involvement in the IRR testing and other aspects of the JUMI  
Study, he testified before the Tribunal. During his appearance, he was  
requested to provide opinions on Sunter's analysis.  
632. Sunter was asked specifically by Durber to perform three  
analyses. Firstly, he was to look at the question of gender bias in the  
re-evaluations and for this purpose, he was given two sets of data, the  
Wisner 222 re-evaluations and the Willis 300 re-evaluations. Secondly, he  
was provided with the whole data set from the JUMI Study and was asked to  
examine the question of equal pay for work of equal value between male- and  
female-dominated occupational groups. Thirdly, he was given the Treasury  
Board methodology document (Exhibit HR-185) and asked to examine  
142  
specifically the Treasury Board methodology and offer whatever criticism  
seemed appropriate.  
633. Sunter's interpretation of the term "gender bias" used in his  
analysis of the data is provided in Volume 102, at p. 12275, lines 3 - 17:  
A. I supposed gender bias to mean that there would be some  
systematic tendency of the evaluation committees to underscore  
positions from male-dominated occupations or to overscore  
positions from female-dominated occupations or perhaps both of  
those things.  
Q. What do you mean by "systematic tendency"?  
A. At this point, of course, I didn't know, but since the term  
bias had been used, then I assumed that bias would have to mean a  
consistent tendency that would display itself in some kind of  
recognizable pattern in the data, that I would see that when I  
looked at the data and performed some kind of analysis on the  
data.  
634. Willis testified a consultant trained and experienced in the  
application of the evaluation system possessing an objective view point,  
can be expected to evaluate consistently and without a predilection towards  
either male or female dominated jobs, or towards either management or union  
sides. Willis asserts consultant evaluations are useful in examining the  
consistency of committee evaluations, and, more importantly, in assessing  
any pattern of bias which may have occurred. Willis' view is that his  
consultants' experience, background, intent and philosophy has always been  
not to favour one side or the other but to walk the middle road. Willis'  
objective in doing the re-evaluations was to identify whether or not there  
was a gender based pattern or difference in treatment between male- and  
female-dominated jobs. Willis referred to the differences between  
consultant and committee as disparities. It is within this framework  
Sunter began to examine the data.  
635. Sunter testified statisticians collect and analyze data with two  
quite distinct concepts. One he refers to as "descriptive" and the other  
as "analytic". In his view, the distinction between these two broad areas  
of enquiry is important in respect of the work he did and his  
interpretation of the JUMI data.  
636. Sunter compared the two re-evaluation data sets, the Wisner 222  
and the Willis 300 against the committee evaluations of the same jobs to  
see whether statistically there was a patterned difference in the manner in  
which evaluators treated different types of positions and if so to measure  
the size of the differences he found.  
637. Sunter performed a statistical test known as a t-test to measure  
whether there was a difference between the treatment of male and female  
questionnaires by consultants and committees using only the Wisner 222,  
then using only the Willis 300 and then pooling the two data sets together.  
143  
638. According to Shillington, who also performed t-tests in his IRR  
analysis, the t-test is a statistical test that summarizes information  
about how far two averages are from each other. In this case, the  
statistician is looking at the male average and the female average to see  
if there is evidence they are treating male and female questionnaires  
differently. He states in Volume 86, at p. 10668, the t-test hinges on  
three things:  
1. How far apart are the two averages? The more apart the two  
averages are, the more likely it is to say yes it comes from  
different populations; yes, this person is treating male and  
female questionnaires differently.  
2. The larger the sample size the more likely it is to say that  
there is significant evidence that they are treating the two  
populations differently.  
3. The more concentrated the values are, the easier it is to  
say that this is a true pattern.  
639. If the difference in the average scores is substantial then,  
according to Shillington, it is more likely you will get a "significant"  
result in statistical terms of measurement. A significant difference  
reflects a true difference between two groups and will demonstrate the  
result, most likely, could not have happened by chance. Statistical  
significance in this context pertains to mathematical probabilities and  
whether the numbers are unlikely to have happened by chance. (Volume 87,  
p. 10673).  
640. Sunter testified about the limitations of the t-test. One such  
limitation is that when the sample is very large, even if the difference is  
minuscule, the t-test would find it to be significant. In other words, the  
t-test rejects the null hypothesis of no difference when the sample is  
large enough. Another limitation is that the t-test is not attentive to  
differences of practical importance, it simply follows a mathematical  
routine of testing the null hypothesis of no difference against the  
alternative hypothesis that a difference exists.  
641. Sunter found the size of the difference in the treatment of  
positions from male- and female-dominated occupational groups by committees  
and consultants was 2.3 per cent in the pooled data. He performed further  
t-tests to determine if the consultants and the committees differed in  
their treatment of female-dominated positions. The results showed that for  
positions from female-dominated jobs, there was no statistically  
significant difference between the manner in which the consultants and the  
committees rated these positions. For positions from female-dominated  
occupational groups, the consultant and committee ratings are not  
significantly different whether one compares the committees to the Wisner  
222, the Willis 300 or the pooled consultant re-evaluations (522). The  
size of the non-significant difference in the treatment of positions in  
female-dominated positions for the pooled data was 0.05 per cent. For the  
Wisner 222, this difference was 0.02 per cent and for the Willis 300, this  
difference was 0.07 per cent (Exhibit HR-191).  
144  
642. Sunter then performed the same t-test on the male-dominated  
positions. He determined that the consultant and committee ratings were  
significantly different for positions from male-dominated occupational  
groups. The size of the difference between the committee and the  
consultant treatment of positions from male-dominated occupational groups  
depended on which of the consultant re-evaluations, the Wisner 222 or the  
Willis 300, were used as a basis for comparison with the committee results.  
It also depended on whether the committee or the consultants were placed in  
the denominator of the equation. Sunter testified since there is no "true  
value" for any given questionnaire, there has to be some standard by which  
to compare committee and consultant evaluations. When it is contended the  
committee is biased relative to the consultant, Sunter states the  
consultant is taken as the baseline or standard of comparison and the  
consultant scores are found in the denominator of the equation to determine  
any difference in treatment.  
643. The size of the difference in the treatment of male-dominated  
positions for the pooled consultant re-evaluations (522) was found to be  
1.8 per cent, when the consultant evaluations are used as the denominator.  
For the Wisner 222, this difference was 2.5 per cent and for the Willis  
300, it was 1.3 per cent (Exhibit HR-191).  
644. Having found the consultant and committee ratings were  
significantly different for positions from male-dominated occupational  
groups, he testified the size of the difference in the treatment of male-  
dominated positions was twice as great in the Wisner 222, a difference of  
2.5 per cent than in the Willis 300, a difference of 1.3 per cent.  
645. Sunter preferred using the Wisner and Willis pooled result (522)  
as more reliable in establishing the size of the difference between the  
committee and the consultants rather than using either the Wisner 222 or  
the Willis 300 independently. This difference is stated as 2.3 per cent.  
646. As to whether there was any pattern in the differences between  
the committees and the consultants, Sunter found in over half of the  
evaluations between the consultants and the committees there was no  
difference at all. In separating the data, he found in about one-third of  
the comparisons between the Wisner 222 and the committees there was no  
difference and in about two-thirds of the comparisons between the Willis  
300 and the committees there was no difference. He found it inconceivable  
that, given this number of agreements, there was a consistent pattern of  
discrimination.  
647. Sunter testified having found differences between the committee  
and the consultant in the treatment of male questionnaires, he would not  
conclude the committee was biased or that the consultant was biased. In  
his opinion, the only conclusion to draw was that both the committee and  
the consultant appear to have a bias relative to each other with respect to  
male evaluations. Sunter went on to say you may call this a relative bias,  
or you may attach the term gender bias to it. However, he had difficulty  
with the term "gender bias" because without further testing, one could not  
conclude whose gender bias it is and whether the bias is merely incidental  
145  
to gender or whether it is contingent on something else, which itself is  
incidental to gender.  
648. The crucial question at this juncture in Sunter's evidence is  
whether the t-test results indicate a systematic pattern in the disparities  
or whether the differences are merely random. The Commission submits  
systematic patterns of gender differences must, by definition, be  
differences which are demonstrative of a system at work, something regular  
or methodical. (Para. 199 of written submissions).  
649. The Employer submits a different treatment of male and female  
questionnaires is indicated by a pattern in the disparities such that the  
evaluation of female jobs systematically differ from the evaluation of male  
jobs. (Para. 289 of written submissions). The Employer's interpretation  
of pattern can be better understood in the following exchange with Sunter  
which appears in Volume 217, at p. 28243, line 8 to p. 28244, line 1:  
Q. Mr. Sunter, I am just talking about the chi-square and the T  
test when you split the questionnaires by male and female. There  
was a pattern there.  
A. There is a difference in the pattern. I wouldn't use the  
term "pattern". There is a difference. We have acknowledged the  
difference. We are trying to explain the difference.  
Q. But there is a difference in treatment, let's put it that  
way.  
A. There is a difference in the average -- I don't like the  
term "treatment", I must say, because it implies some physical  
process. There is a difference in the differences between  
consultant and committee scores. You may use the word "treatment"  
for that if you would like, but I prefer not to use the word  
"treatment".  
650. Sunter then attempted to explain and understand the differences  
between committees and consultants by fitting models to the data which he  
says are necessary in order to attach meaning to the "notion of gender  
bias". It is in this area of his analysis where Sunter emphasizes the  
distinction between the descriptive use of statistics as opposed to the  
analytic use. The latter use involves his adaptation of models to data.  
Sunter testified if gender bias is present in the results, a statistician  
expects to see some degree of consistency across evaluations which are  
somehow related to gender. Therefore, he tested the data for consistency  
by using models to illustrate how gender bias might operate.  
651. Sunter examined three plausible models to explain how gender bias  
might affect the committee's results. For example, one such model he  
termed "additive" which he described as a constant addition by the  
committee to the consultant scores or a constant subtraction by the  
consultant from the committee scores. Sunter eventually disposed of all of  
these models because the data did not support such configurations.  
146  
652. Sunter again tested the differences between the committee and the  
consultant by using the chi square tests. He also applied this test to the  
Wisner 222, the Willis 300 and the pooled data. All of these tests  
indicated statistically significant results. Sunter criticized the  
usefulness of chi-square analysis in these circumstances. In his opinion,  
the chi square tests are not helpful in understanding the difference  
between the treatment of male- and female-dominated jobs by the consultants  
and the committees. His concern about the chi square test is that this  
test measures the frequency rather than the size of the difference, as is  
the case with the t-test. Therefore significant results from the chi-  
square test can be misleading about the real difference between the  
numbers. Accordingly, he preferred to use t-tests which showed a  
difference of 2.3 per cent from the pooled data as best representing the  
size of the difference between committees and consultants.  
653. Having seen no difference between the consultants and the  
committees, on average, for female-dominated occupations, Sunter went on to  
explore the idea of gender bias being an unconscious discrimination for or  
against occupational groups by gender. He suggested, that the way gender  
bias might work in this context, is that there are certain underlying male  
characteristics or female characteristics and that occupational groups that  
have more males will tend to show this pattern of discrimination rather  
strongly. Having tested for that, he did not find any such correlation  
between the degree of maleness of an occupation and a pattern of relative  
differences. Sunter concluded from his analysis that he was unable to find  
any consistent pattern of differences, and that there was no plausible or  
conclusive explanation for the differences between the committees and the  
consultants.  
654. Sunter concluded from his analysis that without a level of  
consistency in the incidence of differences along gender differentiated  
lines between committees and consultants he was unable to conclude the  
difference was attributable to gender bias. He says in Volume 102, at p.  
12277, line 25 to p. 12279, line 1:  
A. My general conclusion on the question of gender bias -- mind  
you, I still don't know what gender bias is, you understand, but  
my general conclusion on this was as follows. There was a slight  
difference between -- there was virtually no difference between  
the consultants and the committee on positions from female-  
dominated occupations. This could be put aside.  
On positions from male-dominated occupations, there is indeed a  
difference, not large but indeed a statistically significant  
difference, between committee evaluations and consultant  
evaluations. This does not lead me to the conclusion, however,  
that there is gender bias, putting aside for the moment that I  
still don't quite know what I mean by gender bias because there  
are other possible explanations...  
...  
147  
A. In order to conclude that this was gender bias, I would have  
to find some kind of consistency in the observations. I was  
unable to find the kind of consistency that would enable me to  
reach that conclusion.  
655. He also found the lack of consistency in the differences and the  
absence of an alternative plausible model of gender bias did not justify  
adjusting committee scores in the manner adopted by Treasury Board in their  
1990 methodology paper.  
656. Sunter returned to the question of gender bias and explored other  
factors which occurred to him and were not pursued in his initial  
investigation. His exploration was with factors that might be associated  
in some way with gender and, therefore, considered possible causes for the  
difference in the scores other than gender bias. He examined other  
characteristics, such as perceived salary, nature of work, size of group,  
which he thought might be correlated with gender. Simply expressed, in  
ordinary language, Sunter explored the degree of association between the  
differences and some of the other characteristics of the data.  
657. One characteristic which Sunter noted between male and female  
questionnaires is that the data showed female questionnaires coming from a  
small number of relatively large occupational groups. Male questionnaires,  
on the other hand, were coming from a large number of relatively small  
occupational groups. Sunter postulated evaluators might be more familiar  
with the female-dominated occupations which included jobs such as clerks,  
secretaries and nurses, than with the male-dominated occupations which  
included air traffic controllers, defence research scientists, patent  
examiners, etc. He divided the databases according to size of the group,  
and using group as a proxy for familiarity with the type of work, he  
compared the differences between the consultants and the committees for the  
Wisner 222 and the Willis 300 data. Although the results of this  
statistical analysis did not indicate statistically significant results,  
Sunter believes they did demonstrate a strong association between size of  
group and the pattern of differences between committee and consultants.  
658. Another characteristic he noted which differentiated between male  
and female questionnaires, is the relative distribution of positions from  
male- and female- dominated occupational groups across the range of  
evaluation points. He found 75 per cent of questionnaires from female-  
dominated occupational groups fell below a certain point value while only  
25 per cent of positions from male-dominated occupational groups fell below  
the same value. He hypothesized any bias that relates to point  
distribution, such as a bias in favour of placement in the hierarchy of  
jobs, or bias in favour of or against managerial or supervisory positions,  
or a bias in favour of the skills acquired in post-secondary education,  
could look like a gender bias.  
659. Sunter then performed several comparisons to see if the  
differences between committee evaluations and consultant re-evaluations  
were associated with the relative distribution of questionnaires in the  
high or low point range. Again, in this comparison, the results did not  
demonstrate a statistically significant difference. He concluded, however,  
148  
they did show an association between high and low points and the  
differences between committees and consultants when split along gender  
lines. Sunter referred to this bias as a "point bias or value bias", that  
is, the higher the value of the job the more likely there is to be a  
difference between the committee-assigned score and the consultant-assigned  
score.  
660. Willis responded to Sunter's evidence about value bias during his  
second appearance before the Tribunal, which followed Sunter's testimony.  
Willis said he would like to see further analysis as to whether the  
differences between the committees and the consultants might be associated  
with value bias. Willis wanted to know if 10 per cent of the high  
evaluation scores were removed from the database, whether the extent of the  
differences between the consultants and the committees would be reduced.  
On this point, Willis says in Volume 211, at p. 27491, line 19 to p. 27492,  
line 4:  
I had said I would rely on a statistician. This task was not  
given to me, but if it had been given to me and my statistician  
had said there is an appearance of bias here and it doesn't  
necessarily represent bias, I would say "Okay, let's take those  
top ones out and let's see what it looks like then." Maybe it  
will be less than 1.8 per cent and maybe it won't. Since we are  
dealing with several million dollars, my suggestion would be that  
if it doesn't change that percentage, then I would tend to adjust.  
661. As a result of Willis' comments, Sunter performed an additional  
analysis to determine whether the differences between the consultants and  
the committees could be reduced by "value effect". His analysis, which is  
termed value effect, was introduced by the Commission in response to the  
question raised by Willis. Sunter defined value effect in Volume 216, at  
p. 28049, line 23 to p. 28050, line 1:  
A. The value effect would be some systematic tendency for  
differences between consultant and committee to show up in  
association with increases in value of the job.  
662. Sunter's further statistical work explored how much of the  
difference between committee evaluations and consultant re-evaluations  
could be attributable to "value bias". By this he meant the difference  
between how the committees and the consultants treated high and low point  
questionnaires. Sunter's analysis included statistical methods for  
standardizing the data because of what he described as a distribution  
problem. Because of this problem, he could not merely discard 10 or 20 per  
cent of the top end scores as suggested by Willis. On the basis of this  
analysis, Sunter concluded that at least one half of the apparent gender  
differences between the committees and the consultants is immediately  
accounted for in differences in value distribution.  
663. Relying on the analysis he performed (Exhibit HR-265), Sunter  
testified, once he removed the value effect, the overall difference of 2.3  
per cent between the consultants and the committees was reduced by 1.2 per  
cent.  
149  
664. There has been doubt expressed by Shillington, on whether or not  
statistically or otherwise, you can separate two data analysis issues, one  
being whether or not a pattern is related to gender, and the second being  
whether or not the pattern is related to the scores being high or low.  
Shillington explains this problem in Volume 131, at p. 16045, line 23 to  
p. 16046, line 15:  
The regressions were done in a way to try to see if there was a  
relationship between the differences between the consultants and  
the committee in gender.  
It is also possible that any differences that might have existed  
between the consultant and the committee scores were not directly  
related to gender but perhaps were related to high values versus  
low values. This has been talked about here.  
The confounding is introduced because there is a strong trend in  
the data for the male questionnaires to all have high values  
relative to the female and the female questionnaires have a fair  
tendency to come from the lower end of the spectrum, which means  
you cannot separate those two data analysis questions, or it is  
difficult to separate them.  
And also in Volume 131, at p. 16048, line 16 to p. 16049, line 11:  
In this circumstance, back to the analysis of the Willis scores  
and the possible adjustment, we have a situation which -- to the  
extent that there is a pattern here, if someone came and said this  
is possibly not due to gender, maleness or femaleness, but rather  
could be due to professionalization or some questionnaires having  
much higher values than others, you would have a problem  
extracting those two separate hypotheses from the analysis because  
you have a situation in which the males predominantly had high  
values, the females predominantly had low values. So maleness is  
confounded with high and low values.  
That is reflected in the distribution. That is why it is a  
distribution question. The distribution of the Willis scores for  
the males tended to be quite a bit higher than the distribution of  
the Willis scores for the females. It is a confounding issue.  
That is why in interpreting it you are going to have to be  
cautious about that.  
And further on this point, he says in Volume 131, at p. 16051, line 12 to  
p. 16052, line 5:  
THE CHAIRPERSON: ...But just looking at these and what you can  
say about what they describe in terms of their distribution, what  
you can interpret from that is that the males tend to be high, the  
females tend to be low, but you can't, because of this confounding  
effect, you can't really interpret anything else with certainty.  
Is that ---  
150  
THE WITNESS: That is right. You have to be very careful when  
interpreting the results because you have to keep in mind that if  
somebody came with an alternative explanation for the data and the  
explanation was that this had nothing to do with gender, that this  
was high score/low score effects, you have collected your data in  
such a way that most of the high scores are males and most of the  
low scores are females. So they are two equally valid  
explanations for the same data.  
665. While Sunter acknowledged difficulties in unconfounding data, he  
said he was able to isolate or distinguish from the disparities, a portion  
that could be attributed to different value distributions of the male and  
female questionnaires. Sunter maintained he did not find it difficult to  
make a differentiation between gender and value and he could unconfound the  
data to this extent. Under cross-examination by Respondent Counsel, he was  
not prepared to agree that gender is a proxy for value or that value is a  
proxy for gender. He did agree, however, there are many factors correlated  
with gender, and if the difference between committees and consultants stems  
from some other causal factor, which itself is associated with gender, then  
he could never determine how much of the difference would be attributable  
to gender bias. (Volume 217, p. 28247).  
666. Sunter believes the question of association of the differences in  
scores with other characteristics in the data becomes important if there is  
going to be some adjustment in the committee results to eliminate gender  
bias. In this context, Sunter believes it is important to demonstrate the  
magnitude of gender bias, how it operates and how it can be adjusted out of  
the actual data. Sunter believes the association of the differences in  
scores with value bias becomes vital at this stage.  
667. Sunter concludes the whole question of association with other  
characteristics is intimately connected with the process of adjustment.  
Accordingly, Sunter found it difficult to separate the question of how to  
analyze the data from the question of what you wish to do with the results.  
668. Sunter was aware of the Treasury Board's methodology paper in  
which the Treasury Board used and adjusted the Wisner 222 data when  
calculating the equalization payments of January 1990. Sunter refers to  
this adjustment as an "across-the-board" adjustment. He describes what he  
means by an across-the-board adjustment of evaluation scores in Volume 103,  
at p. 12426, lines 16 - 22:  
What I do, if I am about to make an across-the-board adjustment,  
let us say, of values assigned to questionnaires from male-  
dominated occupations, would be to say, "Let us increase all of  
these, all of them, by four per cent without exception." That is  
what I mean by an across-the-board adjustment.  
669. In Sunter's view an across-the-board adjustment requires some  
consistency in the pattern of gender bias, and an across-the-board  
adjustment can only be made on the basis of an across-the-board bias. He  
explains this in Volume 103, at p. 12427, lines 8 - 10:  
151  
...these are two sides of the same coin. If I cannot find the  
one, it seems to me that I cannot be justified in doing the other.  
670. According to Sunter, the Employer performed a regression  
analysis, another form of statistical measure, on the Wisner 222 data as  
described in their methodology paper (Exhibit HR-185). The regression  
analysis conducted by the Employer assessed differences in treatment  
between committees and the Wisner 222. The regression analysis was the  
basis upon which the Employer calculated the unilateral adjustments to the  
scores in January, 1990. A critique of the Treasury Board's approach,  
given by Sunter, included an analysis of "overlapping confidence regions"  
of regression lines that represented scores for male- and female-dominated  
jobs.  
671. It was Sunter's opinion the Treasury Board's regressions should  
not have been used to adjust the scores from the female-dominated  
occupational groups at all. With respect to the male data, the regression  
line comparing the Wisner 222 re-evaluations and the committee scores were  
significantly different over the second half of the point range of scores.  
Sunter found that the overlap of the male and female confidence regions, up  
to the 250 Willis point mark, is not strong evidence the consultants and  
the committees differed significantly or consistently below 250 Willis  
points.  
672. Sunter concluded from his analysis of the regression lines there  
appeared to be no difference between the consultants and the committees for  
at least three-quarters of the female questionnaires. Accordingly, he  
found no justification in the Treasury Board regression lines for making  
relative adjustments to all of the male and female questionnaires.  
673. Shillington, under cross-examination by Respondent Counsel,  
indicated he did not have any problems with the way Sunter conducted his  
analysis of the Treasury Board's adjustment methodology. He was of the  
opinion Sunter had drawn a reasonable conclusion from his analysis.  
(Volume 136, pp. 16741-42).  
674. The Tribunal did not hear any expert evidence concerning Treasury  
Board's methodology of adjusting scores, other than what has been was  
provided by Sunter and Shillington about their understanding of the  
methodology contained in Exhibit HR-185.  
675. Sunter testified the use of regression analysis to identify  
differences in evaluation scores between Wisner and the committees, is an  
unsuitable statistical tool. The regression equations, in his estimation,  
do not provide support for the Treasury Board's adjustment of female  
questionnaire scores downward, which average 3 per cent overall and male  
questionnaire scores upward, which average 4 per cent overall. In Sunter's  
opinion, which is supported by Exhibit HR-213, the regressions predict for  
the first three-quarters of the female questionnaires, either an increase  
in the female questionnaire scores or no change at all.  
676. Insofar as the three areas Sunter was asked to review at the  
request of the Commission, his conclusion on the gender bias analysis in  
152  
the first two areas are: (i) there was no where near the level of  
consistency in the incidence of differences along gender differentiated  
lines which would enable him to conclude there was gender bias. Sunter  
testified this is not to say there is not gender bias, only that one cannot  
conclude there is gender bias and a review based on that finding of the  
Treasury Board methodology leads him to conclude there is no basis on which  
the Treasury Board could have justified any adjustment of the committee  
scores. The third aspect which deals with an analysis of the differences  
in compensation from male- and female-dominated occupational groups, is not  
in issue at this stage of our decision.  
F. ROLE OF CONSULTANTS IN RE-EVALUATIONS  
677. Both statistical experts testified under cross-examination  
consultant scores can be used as a reference point to compare committee and  
consultant scores on the assumption the consultant scores are free of  
"gender-related bias". This is a term introduced by Respondent Counsel to  
describe a bias unrelated to gender but to some other characteristic which  
is itself related to gender.  
678. Both statistical experts expressed the opinion they preferred  
committee scores over consultant scores. Shillington, in particular, found  
it difficult to accept that any individual could be free of gender related  
bias and he says the following in Volume 139, at p. 17084, line 4 to p.  
17085, line 2:  
A. I think that is more in the line of a decision that could be  
made. You have indicated that the issue of gender-related bias is  
the area of concern and not being as concerned as to whether or  
not it was directly related to gender or not. So I think deciding  
not to be concerned with the reason that the gender-related bias,  
if there is evidence of that, is present -- that is a decision.  
If that sentence is to be interpreted to mean "if you decide that  
you don't care for the reason, then you don't need to look for  
it", you are right. But I certainly never -- several times in  
testimony you asked me to assume that Mr. Wisner was without  
gender-related bias and I more than once said "How can that be.  
How can someone be so free of thoughts about high score/low score,  
dirty work/clean work. How could this person be equally familiar  
with all jobs", but you asked me to assume that.  
So, I am not sure that the sentence the way it is presented there  
is a fair or complete summary of my opinion about this, and I  
certainly can't speak for Mr. Sunter.  
679. The position of the Employer essentially is the consultants' re-  
evaluations are only used in the statistical analysis as a point of  
reference for determining whether there is a pattern of different treatment  
of male and female questionnaires by the committees. Willis testified the  
consultant scores are not to be substituted for committee scores, therefore  
the Employer submits using the consultants' re-evaluations as a reference  
point does not mean the consultant re-evaluations are to be preferred to  
153  
the committees', because there is no substitution of scores. However, the  
Employer contends, for purposes of using consultant re-evaluation to  
determine a pattern of different treatment, the Tribunal may prefer the  
consultants' relative treatment of male and female questionnaires without  
preferring their scores on any one questionnaire. (Respondent's written  
submissions - paras. 319 and 320).  
680. Shillington expressed the opinion in using the consultant scores  
as a reference point, an assumption had to be made the consultant scores  
were to be preferred to the committees'. He gives the following response  
in Volume 136, at p. 16692, line 16 to p. 16693, line 15:  
Q. When we are using the consultants as a reference point only,  
we are not saying that we prefer the consultant's score on any one  
questionnaire over the committee score. We are only making the  
assumption that the consultant scores across the board are free  
from gender-related bias.  
A. But that you are not preferring them?  
Q. But that we are not preferring them. So, we won't take the  
score on any one questionnaire and say the consultant scores are  
better. That's not a necessary assumption.  
A. But I still think you have to end up assuming they are  
better and the example again is when I used -- suppose that the  
consultant didn't look at the questionnaires at all and the  
consultants just wrote down daytime temperatures, blood pressure,  
whatever. Right? They would certainly not be preferred and they  
certainly would not exhibit a gender preference if they just  
ignored the questionnaires totally. So, I think you do have to  
assume that the consultant scores are to be preferred.  
681. Sunter testified the committees should be preferred to the  
consultant's for four reasons. His first reason is based on his own  
experience in the field of statistics which led him to conclude committees  
often apply a system better than the consultant who developed it. His  
remaining three reasons for supporting committee evaluations over  
consultant evaluations are based on his analysis of the data. One of his  
analysis tested for consistency between Wisner and the Gang of Four.  
682. Sunter tested for consistency between Wisner and the Gang of Four  
by performing statistical tests such as t-tests and chi square analysis.  
The results he obtained confirmed, in his mind, that Wisner and the Gang of  
Four, differed among themselves. Sunter's conclusion was that if the  
consultants cannot agree among themselves, it cannot be the case that the  
consultant is always right. His analysis led him to conclude the  
consultants were not consistent among themselves and on this basis the  
committees should be preferred.  
683. When Sunter was called as a reply witness by the Commission in  
November, 1994, he testified he had undertaken a further analysis on the  
question of the relative reliability of the committees and the consultants.  
154  
Sunter also used standard statistical measures in the form of regression  
analysis, to support the use of the committees as a point of reference in  
any analysis of gender bias. Sunter formed regression line comparisons  
using two sets of the data, the MEC scores and all the scores on which the  
committees and the consultants agreed, which led him to conclude any notion  
of committee bias for male job evaluations could not be sustained.  
684. With the exception of Sunter's further analysis given in reply  
evidence, the remainder of Sunter's analyses were commented on by  
Shillington. Shillington concurred with Sunter's statistical conclusions,  
with the exception of one analysis, namely, Sunter's variance co-variance  
analysis. Shillington had an opportunity to meet with Sunter to discuss  
this analysis. Having had that opportunity, Shillington continued to  
maintain he had problems with drawing the conclusion from the variance co-  
variance analysis that the consultant is to be preferred to the committee.  
Dr. Shillington offers the following explanation in Volume 133, at p.  
16306, lines 8 - 22:  
So, I would have a difficult time believing that the data can help  
you unravel that that the data can actually help you decide that  
one rater is preferable to the other, unless you had a third set  
of numbers which you believe to be the correct values.  
So, I look at the models and I say the models look reasonable and,  
yes, it's clear that the correlation matrix in one case is closer  
to the observed data than the correlation matrix in the other  
case, but even after discussing this, I have to step back and say:  
This may be true, but how can the data help you unravel which  
rater is better if you have no third set of numbers, which is the  
correct values?  
685. Shillington went on to say his opinion on this aspect of Sunter's  
testimony did not distract from his approval of Sunter's analysis on the  
issue of gender bias. He responds as follows in Volume 133, at p. 16306,  
line 23 to p. 16307, line 23:  
Q. Having had that opportunity to discuss this matter with Mr.  
Sunter and standing by your opinion, how does this opinion affect  
your opinion with regards to his approaches taken that we have  
seen summarized in HR-184 that deal with the gender bias issue?  
A. This was one piece of Mr. Sunter's evidence, this was one  
piece of his argument for not preferring the consultants to the  
committees and there are other pieces to that argument. I don't  
have problems with the other parts that I have seen and I have  
indicated to -- I have given evidence on the other part, so I  
don't have problems with those pieces of evidence.  
In general, despite the fact that I disagree with this part of his  
testimony, I don't have problems with the way he has handled the  
committees versus the consultants, even though I disagree with  
this particular step in his argument.  
155  
Q. I wasn't just referring to the committees versus the  
consultants, I was referring to the whole gender bias picture, all  
the other testing in HR-156.  
A. I am restating that those analyses don't cause me a problem,  
no.  
686. Shillington was asked to comment on Sunter's inclination to  
prefer committees to consultants, not for any statistical reason, but  
rather from Sunter's perspective that a decision of a group of individuals  
was preferable to a decision by an individual who may have had more  
advanced technical training. Shillington shared Sunter's opinion, and  
indicated he too preferred the consensus of seven people chosen in a  
balanced way, rather than one well-trained technical expert, at least on an  
issue like pay equity.  
687. Both statisticians, Sunter and Shillington, agreed and informed  
the Tribunal that if we plan to use Sunter's t-test results to make  
adjustments to the evaluation scores of the committees, then the consultant  
scores are no longer simply a reference point but are, in effect, being  
preferred to the committee scores. In this context, the statisticians are  
of the opinion the consultant scores must be deemed to be free of gender  
bias and gender-related bias, before any adjustments are made to the  
committee scores.  
688. Shillington testified the basis for his opinion was not  
statistical, but based on scientific reasoning and logic. His response is  
found in Volume 136, at p. 16706, line 14 to p. 16707, line 6:  
A. I will leave it to you people to debate whether or not it's  
statistical. The question is if you are asking that you could use  
as a reference point for assessing gender-related preference  
someone who was consistent and unbiased, that wouldn't imply that  
you are preferring those scores. I think that's the nub of the  
question here.  
Q. That's right.  
A. I am having problems with that because I just don't see the  
logic to it. I'm saying it's scientific reasoning. To me it's  
logic.  
Q. Can I put it to you this way: Your concern is that you  
can't see how someone can apply a plan consistently without  
gender-related bias and yet not be preferred. Is that it?  
A. Yes.  
689. Willis defended the impartiality and objectivity of his  
consultants, and testified the consultant's re-evaluations can be used as a  
point of reference for determining a pattern of different treatment between  
male and female questionnaires by committees. He based his opinion on his  
belief his consultants had always followed a philosophy of not favouring  
156  
one side or the other, they had more experience in performing job  
evaluations and could evaluate consistently and without bias. Finally, the  
consultants had more experience interpreting difficult questionnaires.  
690. Fred Owen, a pay equity expert, and a former consultant of Willis  
& Associates who participated in the JUMI Study, testified he believed it  
very important in determining the reliability of evaluations in the JUMI  
Study, that the consultants provide a frame of reference in order to  
determine the accuracy of evaluations. It was his opinion the consultant  
evaluations could be used as a standard for comparison for several reasons.  
His first reason is the consultants have an extensive knowledge and  
experience not only with the evaluation plan, but have a broad exposure to  
evaluations in a wide variety of jobs. His second reason is the  
consultants had access to an entire array of jobs that were being evaluated  
and that individual committees only had access to a smaller group. His  
third reason is the consultants had no knowledge of the Employer's  
classification system or pay ranges for any of the classes of jobs and did  
not have any preconceived ideas about the pay system. He testified the  
consultants themselves did frequent, almost daily, quality checks not only  
to determine how consistently the MEC discipline was being applied, but  
also to check the evaluations done by the consultants themselves to  
determine if the consultants were correct in their evaluation.  
691. In Owen's written opinion (Exhibit R-167), confirmed by his oral  
evidence, he outlined criteria for adopting the committee evaluations. He  
suggested if the committees exhibit a good grasp of the evaluation plan as  
demonstrated by the reasonableness of their evaluations, and if there was  
no observable attempt on the part of any committee members to manipulate  
the evaluation outcomes nor to give prejudicial favour to any occupations  
or incumbents, there would be no need to assess committee evaluations  
against the consultant re-evaluations. In Owen's opinion, the evaluations  
fell short of these criteria owing to the lack of complete job information,  
as well as the observable behaviour on the part of some committee members  
who manipulated the evaluations so as to over-score female-dominated jobs  
and downgrade or under-score traditional male-dominated jobs.  
692. There is ample evidence the JUMI Committee, during the operation  
of the study was prepared to use the consultants as a standard. The JUMI  
Committee had agreed to use the consultant scores as the baseline for  
comparison during the ICR testing. In that case, the consultants evaluated  
the test questionnaires that were provided to the committees, and the  
consultant scores functioned as a baseline for the ICR testing.  
693. Throughout the study, the consultants were used by Willis as a  
standard to validate the committees' work. In a letter to Willis dated  
January 6, 1989, the JUMI Committee co-chairs requested Willis to provide  
baseline scores for the test questionnaires in the ICR and the letter reads  
in part:  
...Your failure to provide baseline scores has delayed the work of  
the Inter-Committee Reliability (ICR) Sub-Committee as this  
information is necessary to analyze the consistency of ratings of  
committees with respect to a standard.  
157  
(Exhibit HR-82)  
694. There were other occasions, during the JUMI Study, when both the  
management side and the union side jointly and separately requested the  
Willis consultants to review committee evaluations. Although this did not  
occur in the same framework as the ICR testing, in which the consultant  
scores were used as a baseline for comparison with committee scores, the  
consultants opinion, however, was sought as a check on the quality of the  
committee scores. Consultant reviews, with respect to the MEC benchmark  
evaluations have been previously described in this decision. There remains  
the agreement by the JUMI Committee to have Willis engage his consultant  
Wisner to do the 222 re-evaluations of the evaluation committees. There is  
also the less formal reviews done by the consultants during the operation  
of the five and nine evaluation committees to test for consistency.  
695. The following excerpts are further examples of consultant  
evaluations of committee questionnaires by Willis to validate the results,  
in Volume 60, at p. 7435, lines 3 to 23:  
Q. While the Master Evaluation Committee was performing their  
independent evaluations, were you also reviewing the  
questionnaires that they were looking at?  
A. Yes.  
Q. For what purpose?  
A. Part of the job is to, in effect, validate the consistency  
of their evaluations. My role, for the most part, would be to  
review the questionnaires along with the committee, to listen to  
their discussions and to do my own personal evaluation of the job  
based on the information that was brought forth. Then I would  
track that.  
While I did not give the committee my evaluation, I would track  
the consensus against my evaluation as a means of controlling and  
assuring myself that they were in fact being consistent in their  
interpretation of the information in the questionnaires and in the  
evaluation system itself.  
Also in Volume 67, at p. 8429, lines 2 - 10:  
A. I responded to a number of concerns expressed and re-  
expressed by the Treasury Board from the summer of 1989 -- the  
summer of 1988 on. I had felt that we had put to rest the issue  
of whether or not the Master evaluation committee was evaluating  
fairly and equitably. I, in effect, validated the results. I  
said they were creditable and credible and yet the problems kept  
surfacing.  
696. The JUMI Committee's reaction, during the study, to Willis'  
request to conduct the Wisner 222 did not, at that time, call into question  
Wisner's impartiality. It is reasonable to conclude the parties  
158  
themselves, at that time, assumed the consultants were bias free in  
performing their role in the process.  
697. The parties understood from Willis there was no correct score to  
any one questionnaire. As the process continued, the only measure taken in  
the event of possible gender bias as contemplated by the parties and the  
consultant was to implement steps for improving the process. These steps  
or safeguards have been previously described. Having Willis counsel  
evaluators and provide additional training for either individual evaluators  
or committees occurred as part of these safeguards.  
698. Willis testified the use of consultant re-evaluations after the  
process is concluded is quite different from their use while the process is  
ongoing. Willis testified after the process, the re-evaluations are used  
to identify whether or not there is a gender based pattern of difference.  
At the end of the study, Willis does not think it is particularly important  
to know the reasons for the disparities between consultants and committees  
because it is only the existence of a pattern that is important in his  
opinion.  
699. Willis' firm belief is he and his consultants are without any  
kind of pattern in their evaluations. Willis states in Volume 210, at p.  
27323, lines 9 to 12:  
A. It's my considered judgment that the experienced consultants  
with Willis & Associates tend to be bias-free or as nearly as it's  
humanly possible to be.  
700. He went on to explain by "bias-free" he meant there was no  
differentiation on a gender basis between males and females. He was  
questioned as to whether he believed his consultants were without gender-  
related differences, such as hierarchical treatment where a consultant  
would be more liberal at the high end of a point scale or more conservative  
at the low end of a point scale. He responded as follows in Volume 210 at  
p. 27323, line 23 to p. 27325, line 22:  
A. That's an interesting point.  
Q. That one is a little harder to say, is it?  
A. Well, there is some evidence in a number of studies that we  
have done that it's difficult to get a good handle on a job that's  
two or three levels above your own. Alan Sunter made an  
observation that what might be viewed as gender bias might be  
something else.  
Q. Yes, that's good. I'm going to talk to you lots about that  
point, so we don't have to -- bring me back to it later if I  
haven't dealt with it in detail. You say there are some studies  
to suggest it's difficult to get a handle on jobs three or four  
levels above your own, but your consultants were normally people  
who had very high-level jobs before they joined you, weren't they?  
159  
A. And they are consultants who have had some experience in  
evaluating higher level jobs. One of the problems in addition to  
it being difficult for a committee member to evaluate a job  
several levels above their own -- that is, having to have a good  
understanding of principles and theory and how is this important  
and what does strategic planning mean and things like this, things  
that are somewhat foreign to them -- and at the same time we find  
that the more complex jobs are more difficult to describe.  
So, it's not unusual for -- I think it was Alan Sunter that  
suggested that perhaps the consultants had evaluated the higher  
level positions more liberally than the committees had.  
Q. And that would be consistent. I gather what you are saying  
is that would be consistent with experience you have had in  
watching consultants and committees evaluate jobs?  
A. I would say that would not necessarily be unusual.  
Q. That's one take on it, that the consultants may be in a  
better position to appreciate those jobs. I would suggest the  
other factors at play with higher level jobs, I think I recall you  
telling us at one point that people tend to evaluate their own  
jobs more highly than they tend to evaluate other jobs that  
perhaps they are not as familiar with. Right?  
A. I think maybe we are all a little bit biased in that  
direction.  
701. Willis' rationale for not examining the reasons for the  
differences in the disparities between committees and consultants is  
because he believes it would be very difficult to pick out individual  
evaluations in order to explain the difference.  
702. However, there were occasions during the study when Willis  
examined the consultants' (Drury and Wisner) evaluations to achieve an  
understanding of the differences between the consultants and the MEC.  
Willis did this sort of analysis with 46 MEC benchmarks that showed  
differences between Wisner and the MEC of more than 10 per cent. In this  
analysis he was looking for a pattern. Willis' analysis involved an  
assessment of whether or not there was any pattern or apparent pattern of  
gender bias. He did this by reviewing the differences and the reasons for  
the differences as identified by the committees' rationales.  
703. Willis agreed in cross-examination the difficulty when comparing  
differences between the consultants and the committees lies in determining  
how much of the difference is attributable to a particular factor, because  
there is no guarantee it is just one factor which accounts for the  
disparity. (Volume 210, p. 27350).  
704. As to the differences in the way the committee and the consultant  
treated higher level jobs, Willis testified he was willing to accept the  
fact the consultants were probably more liberal in evaluating the higher  
160  
level positions. Based on his own experience, the consultants probably had  
a better understanding of the higher level jobs than the committees would.  
Willis gives a further opinion on this point in Volume 210, at p. 27355,  
line 18 to p. 27356, line 23:  
Q. You have told us why they might have a better understanding  
of them, but you will also agree with me that it is possible that  
in those situations where you have fewer benchmarks -- right?  
A. Yes.  
Q. And you have to exercise more judgment. Right?  
A. Yes.  
Q. -- that the consultant's view of those jobs might be  
influenced by their experience with high level jobs outside of the  
federal public service.  
A. And other studies which they have done. Yes, that's  
possible.  
Q. So you can see that there are things that might make them in  
a better position to have a "preferable" view of those high level  
jobs. Right?  
A. I don't think there is any question about that.  
Q. You have just told us that one thing could be that they  
could be influenced by things outside, by their baggage from  
outside studies.  
A. I would say that when we are talking about high level,  
complex positions, the consultants should have a better grasp on  
the content of the job than any one of the evaluation committees  
that may not have had that kind of experience on their teams.  
G. WHETHER THE RESULTS SHOULD BE ADJUSTED - THE EXPERTS  
705. Willis testified the Tribunal has three alternatives in dealing  
with the reliability of the results: (i) to implement the study as it is;  
(ii) to adjust the results; or (iii) to trash the study.  
706. As to option (i), Willis said without statistical analysis and  
the advice of a statistician he could not accept the results. In Volume  
78, at p. 9576, line 19 to p. 9577, line 8, he said the following:  
It is true that I was not happy with the various steps that were  
undertaken and to some extent we were able to do some shoring-up.  
However, without any analysis at all, without any opportunity to  
do some statistical analysis or to have it done and have some  
advice of a statistician, I don't think I could have accepted the  
results.  
161  
Once the study is complete, then it is possible to look at the  
results without regard to the other issues and make a separate  
determination: Do we have a consistent result or do we have a  
certain amount of bias and how much bias? In a sense, you do  
change into a different gear after the study is over.  
707. With regard to the third option, Willis stated the following in  
Volume 78, at p. 9574, line 15 to p. 9575, line 7:  
THE CHAIRPERSON: Could you tell us when the third option would be  
utilized?  
THE WITNESS: I would want to sit down and talk to Milczarek  
and review all of the details with him. But it is possible, I  
assume, that the results would be so far out of line that they  
just would not be believable. At that point, they should be  
trashed.  
If we had stopped after the 222 evaluations, nothing had happened  
after that, and I were asked by the decision-makers what to do  
with it, given no opportunity to analyze the results, at that  
point I would say there is nothing we can do with it. We can't  
use what we have so far for any valid results. The 220 was too  
small a test by itself to make any judgments. So, if we aren't  
going to be able to do anything more, then we have to forget the  
study.  
708. In his last appearance before the Tribunal in June of 1994,  
Willis testified, as he had done previously, he would rule out trashing the  
study. Willis suggested the study was about fairness in the treatment of  
employees, and the difference between the consultants and the committees,  
which resulted from Sunter's analysis was so small, in terms of a single  
employee's salary, "by the time you take out the income tax, that is not  
enough to pay for coffee." (Volume 211, p. 27489). On the other hand,  
Willis remarked "We are dealing with millions of dollars, so maybe there is  
more to it than just fairness to the employee." (Volume 211, p. 27489).  
709. After having met with Sunter, Willis was interested in knowing  
how much of the difference between the committee and consultant was really  
a value bias. It was Willis' opinion if the value bias reduces the extent  
of difference between the consultants and the committees to the point where  
it is immaterial, no adjustment to the committee evaluations was necessary.  
Willis suggested if the difference between consultant re-evaluations and  
committee evaluations did not decrease after further analysis, then in view  
of the amount of money involved, "he would tend to adjust." (Volume 211,  
p. 27492).  
710. Although no witness testified on behalf of the Employer  
concerning the Treasury Board's methodology paper (Exhibit HR-185), the  
evidence demonstrates that the Treasury Board made an adjustment to the  
evaluation scores by taking the Wisner 222 re-evaluations as a baseline.  
The adjustment preceded the Employer's equalization payments of January,  
1990. The Employer adjusted all scores, other than the benchmark scores,  
162  
for which there was a consultant re-evaluation. The questionnaires were  
adjusted according to two regression equations contained in Exhibit HR-185,  
at p. 11, footnote 7. Shillington was asked his opinion on the regression  
equations contained in Footnote 7 and responded in Volume 134, at p. 16401,  
lines 13 - 25:  
THE WITNESS: I would not adjust. I can tell you that when I  
first saw those equations and knew much less about the background  
to the data, I formed the opinion that I have expressed several  
times, that the onus is on the person -- before adjusting, I think  
there's an onus on the investigator to show the adjustment is  
warranted and the evidence here is that the adjustment does not  
warrant it and yet it was done. I formed that opinion as a  
statistician before I knew much more about the background of the  
study and nothing that I have heard in the background has changed  
that view.  
711. Sunter expressed the same opinion as Shillington about the  
Treasury Board adjustments and said the following in Volume 106, at p.  
12745, line 21 to p. 12747, line 10:  
The point about this is that the regression equations given in the  
Treasury Board document do not even approach the level of  
certainty that I would consider necessary to make any adjustments  
at all to the male and female evaluations.  
THE CHAIRPERSON: Could you explain that a little more.  
THE WITNESS: Because they are not significantly different.  
If I wanted to make an across-the-board adjustment on the basis of  
gender, I would have to be virtually certain of a number of  
things.  
One, I would have to be certain that the consultant is to be  
preferred to the committee, and I am by no means certain of that.  
As I tried to show yesterday, there are good reasons to doubt  
that.  
Second, I would have to be sure that the reason for the difference  
is gender, not something which is merely related to gender in some  
fashion.  
Finally, I would have to be sure of the numbers that I am using if  
I wanted to make an adjustment.  
We have seen that the order of magnitude of difference between the  
consultants and the committee, depending on which particular  
equation you use and which particular set of observations you use,  
is of the order of about 2 to 2.5 per cent. Nevertheless, we have  
a methodology here that arrives at an adjustment of 7 per cent.  
How can that be?  
163  
The answer is that this regression analysis is a very poor, crude  
instrument for estimating the difference. Even if I were to  
believe all the other things, it remains a very poor instrument  
for making that adjustment because of the inherent uncertainty of  
the regression analyses themselves.  
712. Durber, on behalf of the Commission, supported the results of the  
study without adjustment. His conclusion on the issue of reliability is  
contained in Volume 154, at p. 19167, lines 4 - 24:  
A. My conclusion is that the parties were enormously successful  
in producing a body of excellent job information. They went to  
enormous cost and effort to produce evaluation results. They  
tested those evaluation results, we have seen, exhaustively, at  
least they were exhaustive and I am not sure of the results on us.  
I am quite confident that the studies I have looked at fall short  
of the quality of work that we see in this particular study. I  
think the parties deserve a great deal of credit for what they  
have produced and certainly I had the confidence in those results  
to suggest that the Commissioners rely upon them in examining  
evidence of a wage gap.  
I do not believe that there is what I would characterize as  
evidence of bias. My bottom line is that the results should be  
taken as they are and that any calculation of wage disparities  
ought to be based with a great deal of confidence on the job  
evaluation results.  
713. Sunter has consistently maintained throughout his testimony no  
adjustment of the committee evaluation should be made. However, in  
response to questions raised by Willis and at the request of the  
Commission, he did suggest possible adjustment procedures to the evaluation  
results. We will elaborate more fully on these procedures in the event we  
conclude adjustment of evaluations is necessary.  
VII. DECISION AND ANALYSIS  
714. Throughout the JUMI Study, the Employer and the Alliance relied  
on the expert testimony of Willis to advance their positions. However,  
during the hearing and in both written and oral argument, there was  
considerable debate between the Treasury Board and the other parties  
concerning this consultant's role in the re-evaluation of questionnaires  
and whether the consultant could be relied upon to produce gender bias free  
evaluations.  
715. The Tribunal finds the position of the Commission and the  
Alliance particularly puzzling. Willis' impartiality was not an issue  
prior to this hearing. In its submission, however, the Alliance cited a  
number of reasons why the committee evaluations should be preferred to the  
consultant evaluations. It claimed, for example, consultant "baggage" and  
other factors such as age, sex, education, and lack of gender sensitization  
training which, it alleged, would contribute to consultant gender bias.  
164  
716. In our view, the Alliance was attempting to discredit the witness  
upon whose expert opinion it relied upon in terms of the data-gathering and  
job-evaluation process which occurred during the study. By way of further  
illustration, we refer to the following exchange between Counsel for the  
Alliance and the Tribunal which appears in Volume 224, at p. 29495, line 17  
to p. 29500, line 19:  
THE CHAIRPERSON: Before you go on, Mr. Raven, I think I would  
like to respond to the word "antagonism" that you perceive from  
the Tribunal. I think it's a fairly -- it's a word that carries  
some connotation. I think that what the Tribunal has tried to do  
is understand your argument.  
These parties have put forward or engaged these consultants to  
assist them in conducting a study over a period of five years.  
When we are faced with an argument that these consultants could be  
gender biased -- I think that what the Tribunal has tried to do is  
understand and to challenge you on these types of arguments that  
you are putting forward. I don't think that our conduct in doing  
that -- I don't think it's fair to say that we're antagonizing or  
we're being antagonized, or whatever. I think that's our role and  
we will continue to do that role to try to understand and  
appreciate what it is you are trying to put forward to us.  
MR. RAVEN: I appreciate that. I was really attempting more to  
provide some added definition to my submission for that purpose.  
MEMBER FETTERLY: Before you start, I would like to make a couple  
of comments about this issue.  
To begin with, if this were a civil trial and Mr. Willis was your  
witness -- technically he's not; he's the Commission's witness --  
would you be permitted to discredit him after having introduced  
him as a witness?  
MR. RAVEN: Mr. Fetterly, let me respond to the question in this  
way. I am not attempting to discredit Mr. Willis. That may be  
where we ---  
MEMBER FETTERLY: You certainly give that impression, Mr. Raven.  
Let me just add this: Mr. Willis and his fellow consultants were  
the only experts who were actually involved in JUMI. You rely and  
he has defended the MEC results not only before this Tribunal, but  
also before JUMI. He has defended the ICR results. And he has  
done that both before this Tribunal and before JUMI. He has  
defended the total results before this Tribunal. Basically, he  
has said that they should not be trashed. And it's his plan that  
was adopted as being a gender-neutral plan.  
To hear you and, to some extent, Ms. MacLean, attack, in a sense,  
his neutrality really puts the Tribunal in a very awkward  
position. I find it a matter of real concern. It's not a  
question of antagonism.  
165  
MR. RAVEN: What I had hoped to do this morning is try to clarify  
where we're going with this. Your comment, Mr. Fetterly, is very  
apt. It allows us and it affords me an opportunity to deal with  
that.  
There is, in no sense, an attack here on Mr. Willis. That  
suggests that Mr. Willis or anyone associated with his firm was  
guilty of some malfeasance or misconduct in the way they conducted  
themselves in the course of the study or in the way they --  
MEMBER FETTERLY: Not at all. Not at all. Mr. Willis and his  
associates hold themselves up to be experts in pay equity. They  
promote their plan as being gender-free. They train evaluators in  
order to evaluate on a gender-free basis. Now you are saying that  
their own ability to evaluate on a gender-free basis is suspect.  
That to me is a real contradiction.  
MR. RAVEN: The fact that the Willis Plan was accepted by the  
parties here as being gender neutral for purposes of this study is  
one thing. But if you will permit me to make this point, Mr.  
Fetterly, there's no personal attack on Mr. Willis or his  
associates. What we are trying to grapple with here is a very,  
very minute pattern difference between the consultants and the  
committees for the high end top quartile of male jobs, and we are  
now having to wrestle with the problem of whether we should adjust  
those scores to bring the committee scores in line with the  
consultants and whether there are compelling reasons to do that or  
not do that.  
The submissions that are advanced here that I am about to get into  
is to raise with the Tribunal pertinent considerations in  
determining whether or not it makes a lot of sense to adjust in  
these circumstances. It's not intended as a personal attack on  
Mr. Willis.  
MEMBER FETTERLY: That I understand. I think that's quite  
legitimate.  
As I said to you yesterday, is it necessary, in order to achieve  
that, to allege that the consultants' ability to evaluate on a  
gender-free basis is suspect? Is it necessary for you to do that  
in order to establish or to argue that the committee results are  
to be preferred over the consultant results? I don't think it is.  
MR. RAVEN: I tend to agree with you that there are a variety of  
reasons that support preferring the committees' scores, not just  
the questions that have been asked here and that are raised as to  
the manner in which the consultants themselves did these re-  
evaluations.  
For example, if I understand Ms. MacLean's submission the other  
day, it was that the consultants had a slightly different  
discipline, a more liberal discipline, than the committees did.  
166  
Mr. Willis recognized that, and in his reports to the Joint  
Union/Management Committee, recognized that and found it quite  
suitable. In fact, he in his own words said "Given the context,  
our previous understanding and application of the Willis  
discipline in other contexts is not to be preferred to MEC's."  
I don't know that that necessarily raises the question of bias,  
conscious or unconscious, or pattern differences. It does,  
however, confirm that (1) there were differences in the discipline  
that Mr. Willis has adopted in other studies and the MEC  
discipline; (2) that the MEC discipline was more conservative; and  
(3) the committee scores were more conservative than the  
consultants in high-end male jobs. So I don't raise that  
necessarily as an allegation of bias.  
717. Statistical evidence was introduced by the Commission which, they  
submit shows inconsistency between the consultant Wisner, who conducted the  
222 re-evaluations and the Gang of Four, who conducted the 300 re-  
evaluations. This evidence was also introduced in the context of whether  
committee evaluations should be preferred to consultant re-evaluations. In  
our view, it has a similar effect of discrediting the very expert the  
Commission contracted to do a further study and upon whom they relied  
during their investigation. Reference is made here to paras. 184 and 185  
of Commission Counsel's written submissions:  
(184) A reasonable inference that the consultants as a group were  
not evaluating without gender bias or with relatively more gender  
bias than the committees may be drawn from the fact that Esther  
Brunet, a rater in the Willis II re-evaluations who was familiar  
with the federal public service, and considered a competent  
evaluator free of gender-bias by Mr. Willis, was almost 100%  
consistent with the committee evaluations (for the French-language  
questionnaires).  
(185) If an allegation of gender bias is supported by inconsistent  
application of the evaluation plan to male and female evaluations,  
then it is important to assess the relative consistency of the  
consultant evaluations compared to the committee evaluations.  
Consistency can be measured statistically. The statistical  
evidence of consistency of raters - committees versus consultants  
- demonstrates that it is the committees who are more consistent  
in their ratings than the consultants. The existence of a greater  
degree of rater error on the part of the consultants is described  
by Mr. Sunter as conclusive evidence that the committee is to be  
preferred over the consultant. Thus, the allegation of gender  
bias in the committee results is not supported by the statistics,  
nor is an allegation that the consultant scores are more  
consistent or more reliable.  
718. We are of the view, there are other valid characteristics that  
can account for the differences between the Wisner 222 and the Willis 300  
which should be considered quite apart from a pure statistical analyses.  
Although the two studies followed the same procedures, they are very  
167  
different in other respects. The Wisner 222 was undertaken to validate a  
process which brought Willis discomfort. It was a smaller study conducted  
by a single consultant who had demonstrated a more liberal discipline than  
the MEC. Wisner's analysis was a snapshot assessment only and was not  
intended to portray the whole picture. Not only was the time frame between  
the Wisner 222 and the Willis 300 different but the sample of jobs re-  
evaluated by Wisner 222 were from a smaller population than the Willis 300.  
The Wisner 222 were taken from the evaluations of the multiple evaluation  
committees and excluded the MEC evaluations. The multiple evaluation  
committees had been operating for about three months at the time of the  
Willis 222.  
719. The Willis 300, was a larger scale study, undertaken after the  
process was finished. The purpose of this study was to confirm or to  
dispute the analysis contained in the Wisner 222. Four consultants  
conducted the Willis 300, with two or more consultants working in tandem.  
One of the consultants was an evaluation committee member. The sample of  
jobs came from the entire population of jobs from the expanded evaluation  
committees, excluding the Wisner 222. Not surprisingly, there was greater  
agreement with the committee evaluations in this latter study.  
720. The timing of the re-evaluations by the so called "Gang of Four",  
the range of the sample, the number of consultants involved, the process  
followed and the circumstances then prevailing make the results, in our  
opinion, more likely representative of any real difference between the  
evaluation committees and the consultants.  
721. The Tribunal had ample opportunity to observe Willis as he  
testified, during his first appearance which lasted 36 hearing days, and  
his second appearance which lasted 4 hearing days. We found Willis to be a  
credible witness who demonstrated patience, cooperation, and most  
importantly, impartiality in all respects. The Tribunal accepted Willis as  
an expert in the field of pay equity. Willis' experience, prior to the  
JUMI Study, was garnered entirely from his participation in U.S. studies in  
comparable worth. He had experience and had gained general recognition as  
a pioneer in this field. He was accepted as a qualified pay equity expert  
in the American court system.  
722. We have reviewed the many occasions when the JUMI Committee asked  
Willis and his consultants to review committee evaluations or provide a  
baseline for comparison with committee evaluations. That role was well  
established and endorsed before the breakdown of the study. We do not now  
intend to view Willis' role differently from that which he provided to the  
parties in the JUMI Study. All appropriate factors will be considered by  
the Tribunal if the issue of adjusting scores should arise.  
723. The difficulties experienced by the multiple evaluation  
committees were not unexpected and should be accommodated and understood in  
the context of the sheer size of the Federal Public Service, its  
geographical dispersion and the multifaceted occupations and skills of its  
diversified workforce. These complicating factors coupled with the  
logistical problems which were encountered imposed a daunting challenge for  
all concerned. The experts, Armstrong and Durber emphasized the  
168  
difficulties inherent in the complex job evaluation process as it pertains  
to pay equity.  
724. Given the nature of the JUMI process, the numerous participants  
with diverse backgrounds, and the working conditions within which the  
multiple committees functioned, the Commission and the Alliance submit job  
evaluation for purposes of pay equity will and must involve some conflict.  
This conflict, they submit, arises from a clash of values between  
evaluators who attempt, in a pay equity study, to question stereotypes and  
the attitudes of those with a more traditional mind set. Within that  
framework, the conflict which occurred is, (it is claimed), understandable  
and in fact unavoidable.  
725. Respondent Counsel submits not all committees were working  
together in a "team effort" but instead operated in an adversarial mode.  
Willis said some committees "tended to feel themselves almost in a  
negotiation mode rather than a team of six or seven people trying to  
accomplish a common goal." Respondent Counsel submits it would be "wrong"  
for the Tribunal to accept the proposition pay equity job evaluation must  
inevitably involve conflict and adversity. Counsel submits pay equity job  
evaluation should be a cooperative problem-solving exercise in which  
evaluators work toward a common goal and evaluate based on the relevant  
facts. In the Employer's view, the process should instil confidence the  
relevant facts are being analyzed and that appropriate weight is being  
given to those facts. According to Counsel, when all these things happen,  
then the Tribunal can be confident the results are reliable.  
726. In Weiner's opinion, the application of the plan is more  
important than the plan itself in ensuring gender bias free evaluations.  
She described the characteristics of the process which will prevent or  
minimize potential gender bias. In addition to having diverse committees  
of both genders and different organizational levels, Weiner stated other  
factors, such as the training of committees, discussion as to how gender  
bias might operate, complete and up to date job evaluation information and  
the manner in which the committee conducts itself, must all be considered.  
On this point, she says in Volume 8, at p. 1092, line 13 to p. 1093, line  
3:  
Q. Now, what about the way that the committee conducts its  
affairs on a day-to-day basis?  
A. Traditionally, job evaluation committees strive to be very  
efficient. They try to evaluate as many jobs as possible in a  
day.  
A pay equity committee has to take a different approach and open  
their questioning to asking for more information if they are  
unclear about something in the job information, to have a  
discussion about gender bias, to listen to themselves say things  
like, "This is just a secretary," and realize what they are doing,  
how this dismiss women's work.  
So all of those things take time, questioning, probing.  
169  
727. Weiner makes reference to "questioning, probing" in the context  
of committee evaluations. Although she did not comment directly on  
conflict in the committees, Weiner did insist that traditional values must  
be challenged in a pay equity job evaluation exercise.  
728. The Tribunal is not persuaded, given the issue it has to decide  
that it should be asked to define the nature and degree of what is  
permissible, acceptable and legitimate discussion within the committee  
framework. Moreover, it is most difficult to measure its effect,  
especially when traditional values are being challenged and debated in a  
pay equity context. Nor is the Tribunal prepared to suggest answers for  
the resolution of conflict between committee members who may individually  
entertain strong opinions one way or the other on this sensitive subject.  
The study and implementations of equal pay for work of equal value in  
Canada is a relatively new discipline which is still in the developmental  
stage. Nonetheless, we do find it necessary, considering Willis' concern  
about committee conduct and individual evaluator behaviour, to assess  
whether the process achieved its purpose of producing gender bias free  
evaluations.  
729. With regard to the effectiveness of the safeguards in place  
during the study, and more specifically procedures defined by Willis to be  
part of the Willis Process, we find the expert opinion of Willis to be most  
persuasive and informative. Because of its importance in assessing the  
results we have described in some detail the procedures and the safeguards  
which he recommended be adopted in that process.  
730. The Tribunal believes it is incumbent upon it to comment on the  
JUMI Process as orchestrated by the JUMI Committee. Suffice to say, the  
JUMI Committee had a difficult working relationship from its inception.  
For incomprehensible reasons, the JUMI Committee chose to deprive both  
Willis and the Commission from real decision-making authority. This was  
done, notwithstanding the impartiality of both Willis and the Commission,  
their competence and broad experience in pay equity as compared with the  
parties themselves. In both the information gathering stage and in the  
evaluation stage of the JUMI Study, the JUMI Committee failed to follow  
Willis' advice and frequently refused to implement his recommendations.  
Some of the Willis recommendations were not implemented owing to "make or  
buy" decisions, largely controlled by the Employer and motivated by  
economic considerations. However, other Willis recommendations, not  
complicated by these considerations, were ignored as well.  
731. Willis identified the JUMI Committee as a major weakness in the  
study and, in our view, his opinion is well-founded. The adversarial tone  
set by the JUMI Committee reflected the long-lasting and deep-rooted  
difficulties between management and union sides which permeated the JUMI  
Study throughout its entire life.  
732. There is evidence the Chief of Pay Equity, an individual from the  
Treasury Board, viewed the JUMI Study in Willis' words, as a "bunch of  
bunk." (Volume 210, p. 27280). On the other hand, the Alliance wanted to  
follow a cohesive strategy as described in the correspondence from Millar,  
speaking for the Alliance, in announcing the Mont Ste. Marie meeting. This  
170  
incident and others threatened the foundation of the JUMI Study from the  
beginning and contributed in no small measure to the resulting  
difficulties. The union/management split was evident in the manner in  
which they attempted to resolve the issues. It even manifested itself in  
the seating arrangements at the JUMI Committee meetings with union and  
management on opposite sides. The parties opposed an attempt by Willis to  
change those seating arrangements. Willis said "...they looked at me like  
I was crazy." (Volume 60, p. 7459).  
733. Willis disapproved of meetings the Alliance convened with their  
members prior and during the course of the study. There was the meeting of  
Alliance members at Mont Ste. Marie before the commencement of the study  
itself, where the subject of under-evaluation of female work was discussed  
in the absence of the consultants and the other parties to the study.  
During the course of the study, the Alliance also held evening meetings in  
which the participants discussed their logistical problems but during which  
there was also discussion relating to evaluations. Further, the Alliance  
representative on the MEC attended the evening meetings and was available  
to answer questions concerning the MEC benchmarks. At one week-end  
meeting, occurring in the fall of 1988, the Alliance held a training  
session on pay equity job evaluation, without the knowledge of Willis or  
the other parties. At that meeting, members examined and discussed certain  
of the MEC benchmarks. During this week-end meeting, gender-sensitization  
training, as interpreted by the Alliance, was given to the participants.  
The Alliance justified this unusual action on the grounds it was necessary  
to correct what it conceived to be historical injustices to women as  
"victims" in the work force.  
734. Within the framework of the study, Willis felt he lacked the  
necessary support and backing of those in authority, both from the  
government and from the union sides while the study was ongoing. Although  
the sub-committee on communications had devised a strategic plan for  
communicating the JUMI Study to employees, Willis felt there was not enough  
emphasis on the need for communication from top management. He had  
initially proposed at least 10 consultant days for face to face meetings  
with department heads and union executives. No briefing sessions of this  
type were held and Willis believed this most likely resulted in the long  
delays before the employees completed their questionnaires.  
735. Willis' evidence is that he designed the process to ensure a  
sound result, if the result is sound, it is immaterial whether the process  
is flawed. In examining the Willis Plan itself we find it to be an  
appropriate tool to evaluate jobs for the JUMI Study. During final  
argument, the Tribunal was informed there is no dispute between the parties  
concerning the Willis Plan. We refer to Respondent Counsel's written  
submission at para. 41:  
41. Nevertheless, for purposes of this litigation, the Employer  
accepts that the Willis Plan was an appropriate plan to use in  
evaluating jobs in the Federal Public Service. Therefore, the  
Tribunal need not decide whether weighting of the Willis plan is  
valid.  
171  
736. We rely on Willis' expert opinion that the Willis Questionnaire,  
with slight modifications, was capable of capturing sufficient job  
information to ensure pay equity evaluation could be accomplished in the  
study. In his opinion the questionnaire contained sufficient information  
on which a well-trained and supervised job evaluation committee could  
provide reliable unbiased evaluations.  
737. The degree of effectiveness of the safeguards provided for in the  
information gathering stage was disappointing to Willis. It was during  
this stage, that efforts to ensure the questionnaires were properly  
completed were made. Details of these efforts are described in the  
decision under the heading, The Willis Process.  
738. In assessing the role of the coordinators we find, given the  
breadth of the study, it would have been extremely difficult for Willis &  
Associates themselves to act as coordinators without significant time  
delays and significant additional expense to the JUMI Committee.  
Coordinators were responsible for communicating directly to employees who  
were targeted to complete the questionnaires. Also, the coordinators  
trained incumbents as to the proper manner in which they were to complete  
their questionnaires. The consultants were involved with the JUMI  
Committee in the preparation of training materials supplied to and for the  
training of coordinators. If the number of completed questionnaires is a  
measure of the quality of the work of the coordinators, then their work can  
be viewed as most satisfactory. The percentage of return was impressive;  
nearly 100 per cent of the questionnaires were returned.  
739. Willis' greatest concern lay in the lengthy delays in returning  
the questionnaires. According to Willis, delay in return of questionnaires  
impacts negatively on the quality of information, and the longer the delay  
the poorer the quality. There is little evidence as to what contributed or  
caused these delays. The evidence does not show the incumbents failed to  
fill out the questionnaires in a timely fashion and within the required 10  
to 14 days after receiving training. Furthermore, there is little  
available information concerning when the coordinator-incumbent training  
sessions were held. To an extent, the large number of substitutions almost  
certainly contributed to the delays.  
740. Although the effectiveness of the coordinators' role appears  
weak, this did not deter Willis from continuing with the evaluations. He  
was willing to have the study proceed notwithstanding somewhat weaker  
information. Willis instituted other safeguards, such as  
screeners/reviewers and the evaluators themselves to ensure completeness of  
job information. We do not consider the limitations of the coordinators'  
role to impinge significantly on the issue of reliability.  
741. The screeners/reviewers applied a sophisticated technique of  
double check or safeguard. They were responsible for ensuring the  
questionnaires contained factually complete information for evaluation by  
the committees.  
742. The screening and reviewing function was not conducted by Willis.  
Its sufficiency must be assessed from the training given, the evidence of  
172  
the witnesses who actually performed this function, the Commission's  
research (conducted by the outside researcher, Exhibit HR-245), and Willis'  
own observations and comments. The screeners/reviewers who testified  
believed they had done their job well. Through follow up telephone  
interviews they believed they were able to obtain the required information.  
Although Willis would have preferred more face to face interviews, overall  
he saw no difficulty with their performance or the role they played in the  
JUMI Study.  
743. The screeners/reviewers received the same initial training on the  
Willis Plan as was given to the MEC evaluators. They also received "on the  
job training" from the consultant when needed. We find they functioned  
well and with no apparent problems other than the involvement of some  
committee "outliers" in this work. However, there is no evidence the  
"outliers", who bore this identification because they tended to evaluate  
differently than their committees, failed to perform their task fairly and  
competently or that they unduly influenced others. The six "outliers" who  
functioned as screeners/reviewers were relatively small in numbers compared  
to many others who fulfilled this role.  
744. It is understandable why Willis would have personally preferred  
"hands on" involvement in the screener/reviewer function. However, it  
seems unlikely, given the volume of questionnaires, one consultant could  
have accomplished this task during the time frame allocated. Having  
carefully reviewed the evidence as it relates to the collection of job  
information, we accept Willis' opinion and find as a fact the job  
information was of satisfactory quality when all the "shoring up" is taken  
into account.  
745. Consistency is an important feature in the process of pay equity  
job evaluation. The Willis Plan should be applied consistently especially  
when multiple evaluation committees are involved. This requirement, if met  
by the participants, does not necessarily imply the process is without  
gender bias and, on the other hand, lack of overall consistency between the  
committees does not necessarily imply that the evaluations are biased, nor  
is it crucial to the issue of reliability. In the final analysis, Willis'  
concern was whether the results were biased. However, within the context  
of this study and in assessing how well the process worked, we consider it  
prudent to comment on whether the multiple evaluation committees  
consistently applied the discipline established by the MEC.  
746. There were some committees amongst the original five evaluation  
committees, namely Committees #1 and #2 and the first version of Committee  
#4, that worked well. After the restructuring of the original five  
multiple committees into nine multiple committees, the newly created nine  
committees appeared on the whole to have functioned well. Most of the  
multiple evaluation committees did, in fact, attempt to follow the MEC  
benchmarks, adhere to the discipline created by the MEC and follow the  
same job evaluation procedure as had the MEC. There is evidence, at least  
from the early ICR testing, of consistency between committees in  
interpreting the Willis factors and applying the plan. To some degree, the  
MEC benchmarks had a steadying effect on the functioning of the multiple  
evaluation committees and on the study as a whole. This is most evident  
173  
from Willis' response to a question by the Tribunal regarding the first  
incarnation of Committee #3 in Volume 69, at p. 8676, lines 8 - 18:  
But, as it worked out, one of the things maybe that helped to  
stabilized [sic] the evaluations was that we did have those Master  
Evaluation Committee benchmarks for them and maybe they just got  
so tired each fighting for their own side that they went along  
with the Master Evaluation Committee's benchmarks. I was not at  
all satisfied that I could leave it at that or let it rest at  
that. But I could not observe any particular problem in the  
actual evaluations that we were able to examine.  
747. The Tribunal will now refer to the training the committees  
received in order to properly perform their function as evaluators.  
Willis' approach in dealing with gender stereotypes and traditional values  
is to direct evaluators to break down a job into its component parts and to  
evaluate each part separately so as to ensure bias free evaluations.  
Willis' opinion differs from Armstrong's about whether his method of  
training should have included a more formal kind of gender sensitivity  
training which would focus on under-valuation of female work. In our view,  
the fact this training was not formalized by Willis does not increase the  
potential for gender biased evaluation. Willis preferred "on the job  
training" and this approach was used by him successfully in previous  
studies. Moreover, the JUMI Committee had authority to decide what was to  
be included in the training and what training it expected to be provided.  
Willis was criticized by the Alliance, during this hearing, for not  
providing gender sensitivity training in the form espoused by Armstrong and  
in the reference material from the Ontario Pay Equity Commission. It  
should be noted however, that the Alliance approved of Willis' training  
approach at the outset of the JUMI Study while it was a member of the JUMI  
Committee. The Alliance's criticism of Willis would seem to be motivated  
by Willis' disapproval of the Alliance undertaking this kind of training  
during one of their meetings held in the absence of the consultants and the  
other participants. In addition, Willis commented on another aspect having  
to do with the quality of such training in Volume 211, at p. 27483, line 24  
to p. 27484, line 20:  
Q. On another subject -- and this is one that you have  
discussed at some length with my friend Mr. Raven. It's the  
subject of training participants in a study to be sensitive to  
gender issues. Do you recall the subject?  
A. Yes, I do.  
Q. In deciding whether such training is beneficial, is it  
relevant to know something about the quality of the training?  
A. Certainly.  
Q. Could you comment on that, please?  
174  
A. I would think it would be important for whoever is providing  
the training of this nature to be accepted as an impartial  
individual and to have been trained in this area.  
Q. If the training is not done well or impartially, could it  
have any effect other than off-loading baggage?  
A. It's possible that it could have the effect of creating more  
baggage.  
748. We hold the view, in recognizing Willis' extensive hands on  
experience in conducting pay equity studies, that his practical approach  
has merit and is acceptable. We say this notwithstanding Armstrong's  
opinion based, it would seem, entirely on research and on the available  
literature.  
749. With respect to the actual job evaluation process, there is  
anecdotal evidence the process did not work as well as it ought to have.  
Willis testified about his discomfort with the behaviour of some committee  
members, particularly with the first version of Committee #3, which he  
characterized as consisting of "two warring camps". He was thwarted by the  
JUMI Committee from taking the appropriate remedial measures he believed  
were necessary concerning those evaluators who were evidencing gender bias  
on Committee #3.  
750. The Tribunal had the benefit of observing and hearing witnesses  
who had participated in the evaluation committees. Their evidence can be  
characterized generally as an 'injection of reality' into the evaluation  
process which is best described as a lengthy, arduous, complicated,  
stressful and difficult process. In general, these evaluators did not  
express difficulty with the sufficiency of the information provided in the  
questionnaires. If and when further information was required by a  
committee to complete an evaluation, this was accomplished through the  
procedural safeguard established for that purpose, that is, having the  
screener/reviewer supplement, clarify or obtain new information.  
751. Willis testified about some of the strengths of the JUMI Study.  
He regarded three strengths of the JUMI Study as being, firstly, the large  
number of individuals who participated on evaluation committees, secondly,  
the large number of diversified jobs evaluated and thirdly, the large  
number of jobs in the sample which enabled him to deal with "slightly  
greater disparity" in job information than a study with a smaller  
population. Willis believed the committees represented a "pretty" good  
balance of union and management employees with different backgrounds  
despite the difficulties the unions encountered in naming male  
representatives. There is evidence some of the female evaluators were  
members of male-dominated unions which contributed to more diversification  
within the committees.  
752. One of the problems Willis recognized was the participation of  
management individuals trained in classification. Seven evaluators  
nominated by the management side had extensive knowledge of the  
classification system in the federal government. They served on four of  
175  
the evaluation committees and on the MEC. The problems associated with  
classification backgrounds surfaced during the evaluation process. The  
statistical evidence, however, did not identify the classification  
background of these individuals had an impact on the multiple evaluation  
committees' consensus scores. There is anecdotal evidence these  
individuals had little or no influence and tended to be ignored by the  
other participants.  
753. Another problem which arose was the participation of some  
Alliance supporters who evidenced an agenda for increasing the value of  
female-dominated jobs. There were misguided attempts to influence the  
evaluations of some of its members through confrontation and intimidation.  
The quantitative differences in the consultant re-evaluations point to the  
committees under-evaluating some male-dominated jobs but do not demonstrate  
these misguided individuals accomplished their objective of persuading  
others to over-evaluate female-dominated jobs. As Sunter's analysis  
reveals, significant differences between the committees and the consultants  
exist almost entirely in the treatment of male-dominated questionnaires.  
Furthermore, the IRR test results reveal the majority of both management  
and union outliers exhibited a male preference. Thus, any conscious  
attempt by Alliance members to over-evaluate female-dominated jobs was  
unsuccessful. There is also some comfort to be had in the testimony of all  
of the Alliance evaluators who gave evidence to the effect there was no  
Alliance meeting at which members were told to over-evaluate female-  
dominated jobs or to under-evaluate male-dominated jobs.  
754. Some of the evaluators were identified by both the consultants  
and the IRR test results as outliers. During the JUMI Study, efforts were  
made to assess whether the outliers were exercising influence on the  
committee's final consensus. The statistical analysis demonstrated their  
influence was negligible. As well, while directly observing the  
participation of the outliers in the evaluation committees, Willis could  
not detect them exerting any influence on the other members.  
755. One of the most redeeming features of the JUMI Study was the work  
of the MEC which had the unqualified endorsement and support of Willis.  
When the MEC completed their work, Willis was satisfied they had done a  
good job. There were several reviews of MEC's work by the consultant,  
revealing some differences between the MEC and the consultant evaluations.  
Willis was not concerned with the extent of these differences, as there was  
no evidence of gender bias in the MEC evaluations. Willis said he  
anticipates differences between committees and consultants. In his view,  
the presence of those disparities does not necessarily mean the consultant  
is "always right".  
756. From Willis' perspective, there are four questions that need to  
be addressed in deciding whether or not a real problem exists. They are:  
(i) What is the extent of the disparities on total scores in a  
specific evaluation;  
(ii) How frequently do the disparities occur;  
176  
(iii)The rationale: why have the committees done what they have  
done; and  
(iv) Is there a pattern to the disparities, and if so what is the  
pattern?  
757. When the study is over, Willis examines the total score, to  
answer two of the above four questions, namely, what is the extent of the  
disparities and how frequently do they occur. Willis' examination is done  
with the assistance of a statistician, upon whom he also relies for the  
answer to the fourth question, namely, whether there is a pattern in the  
disparities. There can be a number of reasons for the disparities referred  
to in question (iv) but, at this stage of the study, Willis is not  
interested in those reasons.  
758. In Willis' opinion, when the study is completed, the appropriate  
consideration is how much did the committees stray from the consultant  
evaluations. In his view other considerations are, at this point,  
immaterial. His reason for considering only the bottom line results is  
that the evaluation committees are no longer functioning. An understanding  
of whether or not the committees were applying the plan correctly is no  
longer useful to the consultant because counselling and training is no  
longer feasible.  
759. Willis expressed the view, on a number of occasions during his  
testimony, that the results were more important than the process. By  
results, he meant the comparisons between the committee evaluations and the  
consultant re-evaluations.  
760. However, in view of our interpretation of s. 11 of the Act, which  
is that causation is implicit in the legislation, we must address the  
question of whether the differences between the consultants and committees  
arising during the process are based on gender, or on some other  
consideration. It follows therefore, it is not only necessary but crucial  
that the evidence be examined in detail in order to determine whether or  
not the differences between the committees and the consultants are gender  
based.  
761. There was evidence led by the Alliance concerning analyses done  
by two individual Alliance witnesses who examined committee and consultant  
rationales, with a view to explaining consultant and committee disparities.  
Prior to the commencement of the evidence of the first of these witnesses,  
the Employer provided an admission to the Tribunal which reads in part:  
4. The Employer makes the following admission and clarification  
in order to narrow the issues and to avoid further unnecessary use  
of hearing time in tendering evidence.  
5. The Employer admits that disparities between consultants and  
committees in the Wisner 222 and Willis 300 re-evaluations may  
have occurred for reasons other than gender bias in the Joint  
Initiative Committees.  
177  
6. To clarify the issues, the Employer will not rely on the  
reasons for disparities as evidence of gender bias in the process  
or bias in the results.  
7. Therefore, the Employer contends that evidence analyzing the  
reasons for disparities does not assist the Tribunal to assess:  
(a) the reliability of the process; or  
(b) the reliability of the results.  
(Exhibit R-154)  
762. Willis had an opportunity to comment on the two analyses  
presented by the Alliance witnesses. Willis does not consider either of  
them helpful for identifying gender bias in a large study or for exploring  
consultant and committee disparities. In his experience, individual  
assessments of differences based on the rationales will not reveal the  
existence of gender bias. The Tribunal accepts Willis' view. Our  
determination will not be based on what is contained in the rationales for  
individual differences between committee evaluators and consultants on a  
given question, but instead will be based on an examination of all the  
evidence relevant to committee and consultant evaluations.  
763. Willis wanted questionnaires that were complete and focused on  
factual information. Incomplete questionnaires lead evaluators to make  
assumptions which result in a wider range of possible disparities. The  
number of disparities in this study tended to be higher than what Willis  
usually experiences. On the other hand, Willis had never before  
participated in a study as large as the JUMI Study and was not in a  
position to supervise the entire 522 re-evaluations, some of which had been  
done during and some after the study was over.  
764. We will now address Willis' questions (i), (iii) and (iv).  
Willis testified on numerous occasions about a tolerance level of  
differences between committee and consultant evaluations. The percentage  
variances he uses are simply a function of his experience and what he views  
as acceptable. Based on the quality of information available to the MEC,  
he would expect to find a 10 to 12 per cent random variance, either  
positive or negative, in evaluations. Because the information available to  
the multiple committees was not, in his opinion, of as high quality as was  
available to the MEC, he would expect to see between 15 and 20 per cent  
random variance in their case. There is more opportunity for evaluators to  
make assumptions when they are furnished with poorer quality information.  
765. Willis testified random variance occurs when value judgments are  
made about the meaning of the facts presented in the questionnaire. Willis  
considers in a large study, such as the JUMI Study with the sheer numbers  
of jobs being evaluated, greater disparity is acceptable as a result of the  
relatively weak job information. Willis is concerned, if over time, the  
variance is no longer random and becomes systematic. He defines systematic  
variance as value or values which are "consistently higher or lower than an  
objective evaluation of certain types of jobs." He treats the term  
"systematic variance" as equivalent to "gender bias".  
178  
766. Shillington testified on the distinction between pattern and  
randomness in a large study and the difficulty in defining something as  
random. He said in Volume 86, at p. 10540, line 9 to p. 10541, line 13:  
Q. How do you know that you have something that is random as  
opposed to something that is not, something that is patterned?  
A. Sometimes you are comfortable using a term without trying to  
define it, and randomness is one of those terms that is easier for  
people to use comfortably. I think everybody knows what you mean,  
but as soon as you try to define it, it gets difficult.  
If you show someone a pattern of numbers, quite often people will  
look at that patten and you can say, "Is it random or not?" It is  
very difficult to show that a pattern is random. It is often  
easier to show that it is not.  
Let me write down a sequence. Suppose we toss a coin four (4)  
times and we get heads, tails, heads, tails. You can look at that  
and say that that is a possible outcome from a fair coin. You  
have fifty (50) per cent heads and fifty (50) per cent tails. But  
if you continued getting heads, tails, heads, tails, heads, tails,  
heads tails, something in our brain starts saying that this isn't  
random any more. Yes, you are getting half heads and half tails,  
but that is far too systematic.  
Defining what is random is very, very difficult. It is much  
easier to say, "This is not random. It looks like there is a  
pattern here."  
767. He further states in Volume 86, at p. 10543, lines 1 - 8:  
So, it is easy to show that it is not random, that there is a  
sequence. But proving it is random is virtually impossible.  
We use the term "random" basically as a catch-all phrase for what  
we don't know. If you toss a coin over and over again, we say  
that the coin is random because we can't predict well the next  
outcome.  
768. Willis confirmed at the conclusion of the study, he is willing to  
accept a wide disparity in evaluations provided there is no pattern. He  
does not like to see any pattern at all. He said in his earlier testimony  
if the variance is less than 2 per cent, he probably would not adjust the  
evaluations. He said in Volume 61, at p. 7596, lines 5 to 11:  
A. In the final analysis when the study is over, obviously in  
many cases we are involved in recommending and implementation. At  
that point I might decide that there needs to be some adjustment  
to correct. But obviously, if it is less than 2 per cent, the  
difference in pay is so minimal that I guess I would have to  
accept it.  
179  
769. As a rule of thumb, even with the very best job information  
available, Willis expects to see more than plus or minus 10 per cent  
disparity between the committees and the consultants. Willis considers  
disparities over 10 per cent a "red flag" which suggests there may or may  
not be a real problem in the evaluations. In a large study, such as the  
JUMI, Willis seeks the assistance of a statistician to determine whether  
the disparities are systematic.  
770. The nature of this exercise, which Willis describes as more an  
art than a science, renders it difficult to quantify job evaluation either  
statistically or mathematically. The Tribunal was occupied for a  
considerable time with the presentation of statistical evidence. In the  
end, we had opinions from the statistical experts, Shillington and Sunter,  
to the effect that statistical analysis cannot identify the existence of  
gender bias.  
771. Sunter's conclusions are a product of hypothesis testing. In his  
interpretative analyses, he relies on probability criteria and mathematical  
models to explain variations in the data. His conclusions are not based  
entirely on scientific reasoning and mathematical applications but, in  
part, on assumptions about the "nature of the world". Sunter repeated at  
different times in his testimony when his intuition assisted him in  
reaching his conclusions. The following examples, which are not  
exhaustive, are reproduced. In Volume 110, at p. 13221, lines 8 - 17, he  
remarked:  
When I said that the original stuff is most unexpected it was  
because I felt that if the consultant is always right and the  
committee is always wrong, then my statistician's intuition tells  
me this should lead to a larger variance for committee scores and  
it should lead to a negative covariance and a negative correlation  
between difference and committee scores, which is exactly the  
relationship that you see reproduced by Model 2.  
772. As well, in Volume 119, at p. 14387, lines 10 - 20, Sunter said:  
There is a stronger, positive association between DIFF and CONS  
than there is between DIFF and COMM. Now, let me say that my  
statistician's intuition tells me -- I don't have to justify this,  
it's just that one develops an intuition, and my statistician's  
intuition is surprised by this, if it really is the consultant who  
is in error -- sorry, if it is the committee who is in error. I  
would expect the associations to be somewhat different, but I am  
just speaking intuitively now.  
773. Also in Volume 123, at p. 15046, line 19 to p. 15047, line 2,  
Sunter said:  
I think he asked whether they were relevant tools in the context  
of what Dr. Shillington was doing in the IRR, and I said "yes".  
You know, he was in a different situation, concerned with  
different things, and I would assume that he used both of those  
180  
tests as a result of some kind of intuitive assessment -- which,  
under the circumstances, he was perfectly entitled to make...  
774. And once again in Volume 217, at p. 28225, lines 9 - 23, Sunter  
remarked:  
Typically, in decisions theory, with decisions, you associate  
losses and gains with various decisions, and how you make a  
decision is a consideration -- if you wanted to do it technically,  
you would have to go into all that stuff, and I am trying to skirt  
over it and say, "I have no loss function to offer here. I don't  
know how you should make that decision." If you challenged me to  
come up with one, I suppose I could, a decision-making function  
here.  
This is why I am not taking a position on it. Make the adjustment  
or don't make the adjustment -- it depends on your kind of  
intuitive decision-making process, but I am not about to make that  
decision for you.  
775. Both statisticians agree statistical analysis can lend weight to  
the evidence even though it may not be conclusive in itself. Shillington  
discusses significant and non-significant results in terms of weak or  
strong evidence. In his opinion, a significant result is not conclusive in  
itself. It may, however, lead a statistician to conclude a hypothesis is  
suspect or the statistician may draw an inference which casts doubt on the  
hypothesis. In Sunter's opinion, statistical analysis will lend weight to  
something which already seems plausible. The analysis can very seldom by  
itself provide plausible explanations. In fulfilling this limited role, we  
believe statistical analyses are appropriate and helpful. Therefore, we  
conclude, statistics are ancillary to the primary function of the  
evaluators to render a value judgment, and of the Tribunal, which is to  
determine the reliability of the results.  
776. In Sunter's last appearance before the Tribunal, he agreed there  
were limitations to the applicability of statistics for the determination  
of the issues before the Tribunal. This is found in Volume 217, at p.  
28301, lines 13 - 22:  
MEMBER FETTERLY: I guess the point that I am trying to get at is  
this: Statistics don't necessarily tell us the whole story. I  
think you might agree with that, would you not?  
THE WITNESS: Yes, I would agree with that as a general  
observation.  
MEMBER FETTERLY: So we may have to consider other factors that  
perhaps are not within the realm of your speciality.  
777. Sunter's tests help identify the statistically significant  
differences between the Wisner 222, the Willis 300 and the combined  
database (522) compared with the committee evaluations. Sunter interprets  
the differences as not having a consistent pattern. He found significant  
181  
differences between the consultants and the committees in both studies in  
the male-dominated questionnaires, but more so in the Wisner 222 than in  
the Willis 300. The results of his tests identified differences found  
mainly at the higher end male-dominated and some few higher end female-  
dominated positions. Overall, the female-dominated questionnaires had a  
lower distribution in value than the male-dominated questionnaires. We are  
mindful of the fact the differences with the female-dominated  
questionnaires were not statistically significant.  
778. Shillington provided an opinion regarding Sunter's analyses of  
"other possible causes" for the differences between the consultant and the  
committee scores, that is to say, other than gender differences. One of  
these analyses included comparisons to determine if the differences were  
associated with the relative distribution of questionnaires in the higher  
and lower point ranges. Contrary to Sunter's view, Shillington was of the  
opinion it would be very difficult to separate out these two data analyses  
questions as to whether there is some reason other than gender which is the  
cause of those differences. On this point, Shillington says in Volume 131,  
at p. 16045, line 21 to p. 16046, line 21:  
A. Yes, and the analysis that is behind that.  
The regressions were done in a way to try to see if there was a  
relationship between the differences between the consultants and  
the committee in gender. It is also possible that any differences  
that might have existed between the consultant and the committee  
scores were not directly related to gender but perhaps were  
related to high values versus low values. This has been talked  
about here.  
The confounding is introduced because there is a strong trend in  
the data for the male questionnaires to all have high values  
relative to the female and the female questionnaires have a fair  
tendency to come from the lower end of the spectrum, which means  
you cannot separate those two data analysis questions, or it is  
difficult to separate them.  
THE CHAIRPERSON: What do you mean?  
THE WITNESS: You can't separate the question whether or not a  
pattern is related to gender or whether or not it is related to  
whether or not the scores were high or low.  
779. On the same topic, he says in the same volume at p. 16048, line  
16 to p. 16049, line 11:  
In this circumstance, back to the analysis of the Willis scores  
and the possible adjustment, we have a situation which -- to the  
extent that there is a pattern here, if someone came and said this  
is possibly not due to gender, maleness or femaleness, but rather  
could be due to professionalization or some questionnaires having  
much higher values than others, you would have a problem  
extracting those two separate hypotheses from the analysis because  
182  
you have a situation in which the males predominantly had high  
values, the females predominantly had low values. So maleness is  
confounded with high and low values.  
That is reflected in the distribution. That is why it is a  
distribution question. The distribution of the Willis scores for  
the males tended to be quite a bit higher than the distribution of  
the Willis scores for the females. It is a confounding issue.  
That is why in interpreting it you are going to have to be  
cautious about that.  
780. In the end, Shillington suggests these analyses should be used  
with caution, and we refer to his response in Volume 131, at p. 16049, line  
20 to p. 16052, line 7:  
THE WITNESS: It is more of an interpretation issue and, I  
think can't be stronger than -- I am not Mr. Sunter, but I think  
that we have to make sure that when we use these analyses, because  
of the differences in the distribution, we have to be cautious.  
THE CHAIRPERSON: For example, when we compare regression lines,  
we usually look at the differences -- or we have been looking at  
the wage gap using regression lines, for example, in calculating a  
distance between them. So you are comparing them to see what is  
the distance.  
THE WITNESS: Yes.  
THE CHAIRPERSON: That is what I think when somebody says to me  
that you can't compare these two regression lines. So when Mr.  
Sunter is saying that you can't compare these two regression  
lines, I am saying compare them for what? That is why I am a bit  
confused.  
Are you saying you can't interpret them, meaning that because in  
the male regression line you have distributions of both, high and  
low distributions, but a tendency to be higher, whereas in the  
females you have a distribution of a low and high but a tendency  
to be lower, but when you interpret these lines you can't say it  
is definitely associated with a gender-related bias, for example?  
Is that what you mean?  
THE WITNESS: Yes. I think it is more of an interpretation of  
whether or not the patterns that you are seeing are clearly  
related to gender or whether or not those patterns are related to  
high score versus low score because they are, in the data,  
occurring together. The males are predominantly high score and  
the females are predominantly low score.  
THE CHAIRPERSON: So it is not comparing them in terms of  
calculating a wage gap. Is it?  
183  
THE WITNESS: I think that is a different issue which we will  
get to, I think.  
THE CHAIRPERSON: Okay. But just looking at these and what you  
can say about what they describe in terms of their distribution,  
what you can interpret from that is that the males tend to be  
high, the females tend to be low, but you can't, because of this  
confounding effect, you can't really interpret anything else with  
certainty. Is that ---  
THE WITNESS: That is right. You have to be very careful when  
interpreting the results because you have to keep in mind that if  
somebody came with an alternative explanation for the data and the  
explanation was that this had nothing to do with gender, that this  
was high score/low score effects, you have collected your data in  
such a way that most of the high scores are males and most of the  
low scores are females. So they are two equally valid  
explanations for the same data.  
I think it is a caution in interpretation that I think is  
reasonable.  
781. Sunter conducted further analysis for presentation in reply. He  
refers to this analysis as his "value effect" analysis which attempts to  
explain further the difference in treatment of high point value and low  
point value questionnaires. The two statisticians hold opposing views as  
to whether such questions as value effect and gender can be separated out  
or "unconfounded". We note Shillington's warning to exercise caution when  
attempting to unconfound the data in these circumstances. However, the  
analysis is useful in demonstrating the differences between the consultants  
and the committees occur at the high end of the point range. Having found  
the applicability of statistics for the determination of the issue before  
us to be supportive rather than definitive, we are not convinced as to the  
necessity for, or the validity of, Sunter's other conclusions pertaining to  
his "value effect" analysis. Moreover, Sunter's earlier work which focused  
on identifying significant differences remains helpful and useful in  
understanding where the differences occur between the committees and the  
consultants.  
782. We will now address Willis' question (iv). The Wisner 222 was  
completed while the study was still ongoing. At that time, Willis did not  
perform any in depth analysis to determine the reasons for the differences  
between the Wisner 222 re-evaluations and the committees' evaluations as he  
had done with the previous consultant re-evaluations of the MEC benchmarks.  
Willis would have preferred to proceed immediately with the second part of  
his plan, which was to do a larger study. He believed this further study  
was desirable because the Wisner 222 was inconclusive on the question of  
gender bias. It was recognized at the time by Wisner himself that there  
could be other plausible explanations for what he defined as an observed  
pattern of evaluation differences in the Wisner 222 (Exhibit PSAC-4).  
Wisner does in fact suggest positions in male-dominated classifications,  
with more complex duties and responsibilities, might have been the cause.  
He states:  
184  
...Because this is true, the observed pattern of evaluation  
differences could occur if the committees tended to under evaluate  
more complex positions, in relation to the MEC discipline as seen  
by the consultant.  
(Exhibit PSAC-4, p. 8)  
783. It should be noted during the MEC's work Willis observed that the  
MEC adopted what he described as a "conservative" discipline. This is  
evidenced by the reluctance of the MEC to evaluate jobs above a certain  
level. The Willis evaluation plan had a varied level of complexity from  
levels A through to level G in functional job knowledge. According to  
Willis, the high G level is a level that presupposes "...a requirement for  
an expertise or command of a professional sphere of knowledge." (Volume  
35, p. 4448).  
784. During the operation of the MEC, Willis felt there were four or  
five questionnaires which should have been evaluated at the G level.  
Willis tried, at a special session with the MEC evaluators, to encourage  
them to promote jobs beyond the F level. Willis testified in Volume 35, at  
p. 4448, line 19 to p. 4450, line 24, about the phenomenon he observed:  
A. Out of the ones that the Master Evaluation Committee had  
evaluated. As we got toward the end, in fact, I even had a  
special session with them to see if we could break out of the high  
F into the G level. It was an interesting phenomenon. They all  
realized the problem, but they just could not seem to select any  
jobs to promote above that F level.  
In fact, I said "let's just pick one" -- I want the other  
committees to feel that they have a highly professional job with  
true expertise. I don't want them to feel they can't go beyond  
the F level. "So, pick the strongest job you can. Let's see if  
we can't promote it to the G level." And they just couldn't do  
it.  
This was of some concern to me. That was mitigated, however, for  
two reasons: (1) there were several jobs at the high F level.  
The point totals for the high F are the same as for the light ---  
THE CHAIRPERSON: Excuse me, you were saying there were several  
jobs at the high F level?  
THE WITNESS: Yes, the F leaning toward G.  
If you recall from the evaluation system, the G on the light side  
leaning toward F has the same point total. So I was not concerned  
from the standpoint of the points. But since they were the  
committee that was setting the frame of reference for the other  
committees, I wanted them to be able to exercise that G level.  
That didn't happen with the Master Committee.  
185  
As it worked out, I was in counselling later with the evaluation  
committees. I explained the problem to them. I don't remember  
how many jobs they ultimately evaluated at the G level, but I  
understand that they did break through and they did evaluate some  
of the 4,000 at the G level.  
Q. So was this tendency in the end something that you felt was  
beyond a concern?  
A. The other mitigating factor was that even though they were  
very conservative here, this conservatism was consistent. Looking  
at the alignment, I felt that the internal alignment was still  
appropriate. So while they were very conservative at the top, it  
did not create, let's say, an inversion in the evaluation  
relationships.  
There were so few jobs -- and I remember discussing this with Paul  
Durber after the study. They looked at those jobs that might have  
gone to a higher level and there were so few of them that they  
wouldn't have affected the results materially.  
785. Early in the process, specifically with consultant re-evaluations  
of the MEC positions, there is evidence the consultants were evaluating  
differently than the committees. This first occurred when the Willis  
consultant, Drury, did her review of the MEC's evaluations at its own  
request. It also occurred later on, during Wisner's review of challenges  
to the MEC evaluations. Wisner's discipline was noted to be slightly more  
liberal than the MEC. Willis testified to this effect in Volume 56, at p.  
6940, lines 14 - 24:  
Q. So, this goes back to your comment that Mr. Wisner was  
probably more liberal.  
A. He was slightly more liberal, but that didn't bother me. I  
had a reason for not wanting to do the evaluations myself or to  
have Jan Drury do them, even though we had discussed the  
Committee, I was willing to accept the fact that Mr. Wisner's  
discipline might be slightly different. But it was the  
consistency in evaluation differences that I was looking for. So,  
Wisner made the best choice.  
786. Willis was willing to acknowledge Wisner's discipline might be  
slightly different from the committees'. This did not concern him as long  
as there was no pattern in the differences.  
787. Some evaluations were easier to do than others depending on the  
information in the questionnaire. Willis testified the responses from  
incumbents in female-dominated occupations were returned more quickly and  
contained better information than from incumbents in male-dominated  
occupations. He was asked if this could have an effect on the reliability  
of the evaluations in a restricted sense. His response is contained in  
Volume 68, at p. 8575, lines 3 - 13:  
186  
Q. But what I am trying to get at here is: Could that affect  
the reliability of the evaluations based on, let's say,  
occupational groups? In other words, were you getting more  
reliable information from predominantly female groups and less  
reliable from predominantly male groups.  
A. I haven't tested that, but I believe that is a possibility,  
certainly, since the quality of the information does generally  
tend to be better from female-dominated groups.  
788. Willis further testified the questionnaires from incumbents in  
high level technical and professional jobs were slower in returning and  
they contained weaker information than questionnaires from incumbents in  
clerical and vocational jobs. In this regard, Willis explains "generally  
speaking" the professional and technical jobs are more difficult for the  
evaluators to understand. He says in Volume 69, at p. 8582, lines 11 - 20:  
THE WITNESS: Professional and technical level questionnaires  
would be less easy to understand than, say, trades or clerical.  
MR. FRIESEN:  
Q. And that is partly because they were not as well described  
in the information.  
A. Partly, and partly because it is more difficult to  
understand a more complex job. [emphasis added]  
789. Willis' opinion is verified by the testimony of at least two  
evaluators. Crich, a member of the first version of Committee #5,  
testified her committee had difficulty evaluating questionnaires from male-  
dominated occupational groups. In her view, this contributed to the  
problems experienced by that committee. We also have testimony from  
Latour, also a member of Committee #5, as to the difficulty this committee  
experienced in evaluating technical jobs.  
790. For the most part, the QA Committee's work must be discounted in  
view of Willis' criticisms of their work. However, the evidence of two of  
the participants, Crich and Yates, merits consideration because it  
illustrates the difficulty experienced by the QA Committee members when  
evaluating the 25 male questionnaires identified from the Wisner 222 and  
their inability to achieve consensus in those cases.  
791. By way of contrast, the consultants did not experience the same  
difficulty in evaluating the more complex questionnaires as did the  
committee members. The consultants had the benefit of professional job  
evaluation experience and training, which enabled them to evaluate those  
positions more easily than the committee evaluators. The fact the  
committee evaluators lacked that kind of professional expertise  
contributed, we believe, to the inefficiency of the job evaluation process  
and the lengthy discussions which took place during the evaluations.  
792. Willis expressed a high regard for the competency and experience  
of his consultants in conducting pay equity job evaluations. Willis agreed  
187  
his consultants were more liberal in evaluating higher level positions.  
Considering the consultants' experience, background and education, he also  
believed they probably had a better understanding of the higher level jobs  
than the committee members.  
793. Illustrations provided by the Employer were confirmed in the  
cross-examination by Respondent Counsel of Sunter and Shillington as to the  
effect of different treatment of female and male questionnaires on the wage  
gap. Different treatment (arising from gender bias) will have a direct  
impact on the wage gap. There are two distinct ways in which the wage gap  
will increase. An increase can occur if committees are under-evaluating  
male-dominated questionnaires. It can also occur when the committees are  
over-evaluating female-dominated questionnaires. In either case, it will  
have the same effect. Expressed in another way, the wage gap will be  
"over-stated" when either of these events occur.  
794. If the 2.3 per cent disparity between the committees and the  
consultants is attributable to gender bias, then it arises either because  
the evaluators were consciously or unconsciously treating male-dominated  
jobs less favourably than the consultants or, on the other hand, were over-  
valuing female-dominated questionnaires and were therefore biased against  
male-dominated questionnaires. Sunter's statistical analyses do not  
identify a preference for female-dominated questionnaires by the multiple  
evaluation committees. The IRR test results illustrate the majority of the  
outliers demonstrated a male preference yet, when the final committee  
evaluations are compared to the consultants' evaluations, the disparities  
are indicative of a bias against male-dominated jobs.  
795. In determining why the differences occur, the Tribunal is  
entitled to look at some compelling facts. Most importantly, the MEC was  
conservative in its discipline relative to the consultants. Firstly,  
according to Willis, the MEC discipline was more accurate than the  
consultants as reflected in his report to the JUMI Committee (Exhibit R-22)  
on the re-evaluations of the MEC evaluations which arose out of the IRR 103  
challenges and the Treasury Board challenges. That report states in part:  
We have no significant concerns regarding the MEC's understanding  
and application of the evaluation plan. The MEC's pattern of  
application of the evaluation plan to positions (their  
"discipline") differs in some respects from the pattern which the  
consultants would use. However, given the manner in which the MEC  
membership was determined, their discipline constitutes a more  
accurate reflection of the values of positions as commonly  
understood within the Government of Canada than the consultant  
could determine from an outside point of view.  
(Exhibit R-22, p. 8)  
796. Secondly, the Willis consultants had an established discipline  
prior to the JUMI Study based on their experience in other studies. There  
is ample evidence from which to conclude the Willis discipline was more  
liberal than the MEC discipline. According to Owen, another Willis  
consultant, the Willis discipline influenced the consultants in their  
188  
evaluations performed during the JUMI Study. The consultants were  
experienced and professional evaluators. They were more familiar with  
higher level jobs, both managerial and technical, which they gained through  
previous pay equity exercises. The JUMI Study was the first time the  
consultants had done any evaluations in the Federal Public Service.  
797. Thirdly, overall the evaluation committees followed the MEC's  
discipline. There were three or four occasions where the evaluation  
committees actually evaluated above the F level to the low G level.  
According to Willis, this by no means altered the MEC discipline.  
798. Fourthly, outliers did not exert an observable influence on the  
committee evaluations, either in the MEC or in the multiple evaluation  
committees. The statistical evidence corroborates Willis' own conclusions  
that the outliers had no discernible effect on the evaluations of the other  
committee members.  
799. Finally, both Willis and the evaluators testified the high level  
positions were difficult to evaluate. The distribution of questionnaires  
between male- and female-dominated occupational groups were not the same in  
terms of value. The more difficult questionnaires were in the high level  
male-dominated jobs where the greatest difference between the evaluation  
committees and the consultants occurred.  
VIII.CONCLUSION  
800. In light of these facts, as well as other matters previously  
referred to by the Tribunal, it is reasonable to conclude from the  
conservative discipline established by the MEC, the evaluators'  
inexperience and difficulty with evaluating high level jobs, together with  
the very subjective nature of the exercise that the disparity between the  
consultant and the committee was the result of, and is explainable by those  
factors we have mentioned. We conclude this resulted in a phenomenon which  
manifested itself in a reluctance on the part of MEC to attribute high  
scores to higher level questionnaires. Factors such as weak job  
information and difficulty in comprehending job information also  
contributed to this phenomenon. In applying the reasonable standard of  
proof as required under s. 11 of the Act, it is reasonable to conclude the  
difference between the committees and the consultants was not a gender  
difference. We find as a matter of fact the disparities resulted from an  
inability and/or reluctance on the part of the evaluators to evaluate high  
level male-dominated jobs according to the discipline of the consultants.  
801. The conservative mind set of the MEC evaluators was the origin of  
this phenomenon which spread and continued throughout the work of the  
multiple committees. This conservativeness has its most telling effect on  
the male-dominated jobs at the higher end of the scale.  
802. During his testimony Willis was unable to give his unqualified  
support to the JUMI Study results. He was, however, of the opinion the  
results should not be "trashed". He was of the opinion they could be  
accepted at face value or with some adjustments by the Tribunal. There  
189  
remained, however, lingering questions, in view of Willis' discomfort,  
about how well the process worked.  
803. This hearing has spanned 232 days to date. The Tribunal was  
afforded a wide range of both expert opinion evidence and non-expert  
evidence, including anecdotal evidence. In addressing the issue of  
reliability, we are mindful of the large number of agreements between the  
consultants and the evaluation committees on the re-evaluations. The  
standard of proof in this case is one of reasonableness. We find, for the  
most part, the committees and the consultants were able to agree on the  
evaluation scores, except with the more complex, professional and technical  
jobs distributed at the high range of the male-dominated jobs. The  
phenomenon beginning with the MEC, carried over into the multiple committee  
evaluations and was nourished by other factors, which contributed to the  
disparity between the consultants and the committees.  
804. We find as a fact that the evidence establishes the evaluation  
results are sufficiently reliable, by any reasonable standard, as a basis  
on which to calculate the existence or otherwise of a wage gap between male  
and female employees employed in the same establishment who are performing  
work of equal value within the meaning of s. 11 of the Act and the  
Guidelines. The Employer has failed to provide any evidence which would  
cause the Tribunal to find otherwise or to change its decision.  
Dated at Vancouver, British Columbia, this 19th day of January, 1996.  
Donna Gillis, Chairperson  
Norman Fetterly, Member  
Joanne Cowan-McGuigan, Member  
APPENDIX A  
COMMITTEE MANDATES  
1. Sub-Committee on a Common Evaluation Plan  
(a) Committee Mandate  
The official mandate of this sub-committee was to determine what  
evaluation plans to examine and make recommendations to the JUMI  
Committee at large.  
2. JUMI Committee  
(a) Committee Mandate  
The task of the JUMI Committee was to develop agreed parameters under  
which equal pay for work of equal value, as incorporated in the  
provisions of s.11 of the Canadian Human Rights Act could be  
implemented and to prepare a detailed plan for its implementation  
covering that portion of the Public Service for which the Treasury  
Board represents the employer.  
3. Sub-Committee on Communications Strategy  
(a) Committee Mandate  
The mandate of this sub-committee was to analyze communication  
alternatives and recommend the most effective ones for implementation.  
4. Sub-Committee for Training  
(a) Committee Mandate  
The mandate for this sub-committee was to draft and recommend a  
training package for coordinators. This sub-committee later  
transmuted into the Administrative Sub-Committee.  
5. Testing Sub-Committee on the Willis Evaluation Plan  
(a) Committee Mandate  
The main objective of this sub-committee was to present to the JUMI  
Committee recommendations related to:  
(i) the modification or clarification of the definitions and the  
factors pertaining to the four evaluation charts of the  
Evaluation Plan.  
(ii) the choice between the Working Conditions Evaluation  
Chart No. 1 or 2.  
6. Sub-Committee on the Willis Questionnaire  
A-1  
(a) Committee Mandate  
The mandate of this sub-committee was to finalize the format and  
contents of the Willis questionnaire (including developing  
examples). The sub-committee was asked to review the  
questionnaire and ensure that the questionnaires were sufficient  
to gather the necessary data for the questionnaire.  
7. Administration Sub-Committee  
(a) Committee Mandate  
The mandate of this sub-committee was to conduct examination and  
discussion of, and present recommendations and/or make decisions  
on, all matters related to the administration of the Equal Pay  
for Work of Equal Value Study, with the exception of those  
responsibilities assigned to the Equal Pay Study Secretariat.  
Specifically, this sub-committee:  
(i) devised, implemented and monitored any administrative  
action required by JUMI;  
(ii) provided the EPSS with guidance regarding  
administrative issues;  
(iii)recommended to JUMI actions (to be) taken;  
(iv) ensured the smooth administrative operation of the  
Study, within the framework established by JUMI,  
through setting priorities, delegating work, resolving  
issues and assessing the progress of the Study; and  
(v) co-ordinated required training to coordinators,  
evaluators, reviewers and secretaries.  
8. Master Evaluation Committee  
(a) Committee Mandate  
The primary purpose of the Master Evaluation Committee (MEC) was  
to evaluate a representative sampling of positions and, in so  
doing, provide the frame of reference for the five evaluation  
committees (later expanded to nine) to rely on, so that at the  
conclusion of the position evaluation stage of the study all  
4,400 position evaluations would relate to one another fairly and  
equitably. The mandate of the Master Evaluation Committee was  
to:  
(i) establish benchmark position ratings for approximately  
600 positions through initial evaluation of a  
representative number of positions sampled, and a frame  
of reference to guide subordinate evaluation committees  
in the evaluation process;  
A-2  
(ii) provide advise and assistance to subordinate evaluation  
committees in particularly difficult evaluation cases;  
(iii)implement a monitoring system to ensure consistent  
and bias-free rating by subordinate evaluation  
committees; and  
(iv) as final authority, resolve controversial cases where  
an evaluation committee has made every effort to arrive  
at an agreed to rating but has been unsuccessful in  
doing so.  
9. Mini-JUMI Committee  
(a) Committee Mandate  
The mandate of the Mini-JUMI Committee was to deal with  
procedural problems arising from the study. Initially, the JUMI  
Committee dedicated a large amount of time discussing procedural  
problems but eventually decided to create the Mini-JUMI Committee  
to deal with them.  
10. Equal Pay Study Secretariat  
(a) Committee Mandate  
The Equal Pay Study Secretariat was a Joint Union/Management  
Secretariat. It was located in the Jackson Building and provided  
all administrative support to the evaluation process in the  
Study. The Chief was responsible for the co-ordination of all  
support activities and the effective communication of JUMI and  
Administrative Sub-Committee instructions.  
11. Inter-Rater Reliability and Methodology Sub-Committee  
(a) Committee Mandate  
The mandate of this sub-committee was:  
(i) to determine and make recommendations about the  
methodology and research necessary to test evaluation  
committee rater reliability; and  
(ii) to assess and make recommendations about research  
methodology as it applies to the JUMI Study as a whole.  
12. Five Multiple Evaluation Committees  
(a) Committee Mandate  
The mandate of the five evaluation committees was to:  
(i) evaluate approximately 750 positions each; and  
A-3  
(ii) keep the Master Evaluation Committee abreast of their  
evaluation proceedings, results and issues, through  
chairpersons.  
The five evaluation committees were reorganized into nine  
evaluation committees on April 14, 1989.  
13. Inter-Committee Reliability Sub-Committee  
(a) Committee Mandate  
The mandate of this sub-committee was to:  
(i) examine the results of the tests administered to the  
evaluation committees in relation to the baseline  
provided by the consultants;  
(ii) examine the baseline score provided by the consultants;  
(iii)determine the significant differences in the  
consensus ratings of the committees in relation to  
the benchmarks and the baseline;  
(iv) formulate if needed, recommendations for training, re-  
training by the consultant and/or other courses of  
action for JUMI considerations; and  
(v) identify procedural/process problems and potential for  
improvement including the revisions to the formulation  
of rationales.  
14. Mini-MEC  
(a) Committee Mandate  
The Mini-MEC was charged with the task of reviewing the committee  
challenges to the MEC's evaluations. The JUMI Committee,  
directed Johanne Labine of PSAC and Michel Cloutier of the  
Treasury Board, both of whom sat on the MEC, to review the  
working conditions of all 100 benchmarks for shift work, overtime  
and living conditions, assess the amount of points to be changed,  
if any, and correct rationales.  
It was ultimately decided that the MEC would not be reconvened,  
and that a Mini-MEC, a nucleus, or a small number of evaluators  
from the MEC would undertake this exercise. There were two  
members from the MEC who were selected to represent this Mini-  
MEC, Michel Cloutier and Johanne Labine. The idea was that Mr.  
Willis would meet with the two of them and resolve any  
differences.  
A-4  
15. Sub-Committee on Total Compensation  
(a) Committee Mandate  
The draft terms of reference for this sub-committee as of  
September 21, 1989 were:  
(i) To identify the elements of compensation in the Federal  
Government that comprise wages as defined in Section  
11(6) of the Canadian Human Rights Act;  
(ii) To compile the data required to establish wages for the  
positions evaluated;  
(iii)To devise a method to cost total compensation for  
purposes of correcting any identified wage  
disparities.  
16. Quality Analysis Committee  
(a) Committee Mandate  
Paul Durber of the Commission created the Quality Analysis  
Committee to examine the 25 male-dominated jobs noted as possibly  
undervalued in the Wisner report in May, 1990. The purpose of  
the committee was to shed light on whether the maleness of the  
jobs, might help to account for their rating and whether,  
conversely, the differences between Mr. Wisner and the committees  
were due to simple perceptions of the work.  
A-5  


© 2022 IncJournal is not affiliated with or endorsed by the U.S. Securities and Exchange Commission