Categories
Articles

Working with Multiple People Group Lists by Chris M.

Full title:  Working with Multiple PG Lists – Exploring Generic Issues

Date Completed: 2006-06-19
Revised on 2009-10-24 to make anonymous references to “the group” for whom this was originally prepared.  The paper was written three years ago and I have not checked if the specific examples still exist, but the issues are logical ones that will recur while multiple list exist.  Note that in the end the group decided to continue working with multiple lists and to work to bring those lists closer together.  The following acronyms were used freely:

PG means People Group
JP refers to the Joshua Project PG List
HIS refers to Harvest Information System and more specifically their ROP or Register of Peoples
CPPI is literally “Church Planting Progress Indicators” and refers to a PG list maintained by IMB.  This list underlies Peoplegroups.org
ROP1 refers to “Affinity Bloc” and ROP2 to “People Cluster”

This paper is for consideration by “the group”.  I have carried out specific comparisons of JP, HIS and CPPI for a number of countries.  This paper draws on that work to consider working with multiple People Group lists.  It explores the generic issues facing “the group” even if all “errors” could be removed.  Specifically, what are the issues of using the HIS ROP code for automated or semi-automated interoperation between lists?

Background

  • Here are my own working assumptions behind this paper.  They are based on our February meeting and my initial list comparisons:
  •  Multiple people group lists exist.  This appears to be healthy and beneficial for the most part.
  •  Some differences between them can be eliminated by comparison and improvement.  It is hard work, but it can be done.
  •  Other differences must remain due to:
  •                  different purpose of the lists
  •                  different feeds from the real world
  •                  different culture of list management
  •                  different timing of updates
  • We don’t want to manage a full list ourselves, but to hold a skeleton list with a few additional attributes such as engagement status.
  • We want people to be able to “drill through” our list to use the rich data in the other lists
  • Ideally we want people also to be able to “drill through” other lists to pick up the extra data in ours.
  •  We think the HIS ROP may hold the key to doing this – literally!

Specific Issues if we do a skeleton list using HIS ROP

Unfortunately I think there are several issues that will hinder us in our preferred approach.

Issue 1: “There must be HIS PGs as broad as the broadest of any List PGs”: 
If we want interoperability then list managers cannot pick a wider grouping than exists in HIS.  If they do, then there is no HIS code they can accurately assign to it.  Therefore we cannot reference it.  There would be a “hidden” part to their list.
CPPI have introduced an entry for “Roma (Gypsy)”, while HIS has entries for each type of Roma.
If we don’t stick to this rule, then HIS PG to List PG is a “many-to-many” which is of limited value in data terms.  “One-to-one” would be nice, but that effectively requires “synchronizing” lists, which we have ruled out.  “One-to-many” HISPG-to-ListPG still allows some automation. 
The consequences of this rule need exploration.  Existing practice may be adequate.  One possibility is to use JP’s ROP1 and ROP2 data to create standard, broad groups that can be used when no other is appropriate.
The use of some generic entries like “Arab” or “Arab, Generic” is already evident.
We must make it as easy as possible for a List Manager (and their real world informants) to find a valid ROP to assign.

Issue 2: “HIS must rapidly register new PGs”
I believe that HIS should be a neutral list that exists solely to serve the main lists for the greater good of sharing data.  The main lists in turn must submit to HIS as the master for ROP codes and their meaning.  There should be no ROP code that is not, or will not very soon be registered in HIS.  I know this is not currently the case, but “the group” would need to know where to find a “master list” of ROPs at all times. 

Taking this approach may require additional resources to manage HIS.  It may be a reasonable alternative to ensure that Issue 1 is systematically addressed.  If there is always a generic code that can be used, then the assignment of a specific HIS code can be slower – but see also Issue 3.

Issue 3: “There must be HIS PGs as narrow as the narrowest JP PGs”
This is a specific issue for Joshua Project.  CPPI is able to register several PGs against one ROP code because it has its own key of “PopEntId”.  
For instance, against ROP 110306 Turkmen, CPPI has registered at least four PGs:  PopEntIds 24615 Esari, 24617 Goklen, 24639 Turkmen, 24650 Yomud. 
JP (Joshua Project) on the other hand effectively requires a “one-to-one” with ROP because it uses ROP as its primary key.  This requires a tighter co-ordination – almost total synchronization – between JP and HIS.  (Dan S. has considered changing this.  I will check the situation before our July meeting.)

Issue 4: “HIS PG overlap”
The existence of overlapping groups in HIS can cause some difficulty.  It means that different list managers could legitimately assign different ROP codes to what is the same entity. Often this is historical and noted in the HIS “Memo” field. 
HIS ROP 110753 “West Aramaic” has Memo: “may be same entity as 105954; most sources do not report them in Lebanon or West Bank…”
Given the current maturity of PG data, I don’t see that we can avoid this sort of thing, although we should work for improvement.  Note that my proposed solution to Issue 1 may make overlap inevitable anyway – at least the existence of sub-set / super-set.

Issue 5: “Multiple meanings of generic PGs”
Where generic entities exist, they are being used in multiple ways.  “Arab” can be used to mean “All the Arabs in this country”
HIS in Bosnia. 
It can also be used to mean “The rest of the Arabs in this country that are not included in a more specific Arab PG in this country”.
 HIS in Oman.
This too seems reasonable.  There may be good reasons to break one specific group out, but it does not follow that we have break every group out.  But note that the meaning of “Arab” changes.
In Bosnia “Arab” may include people from PGs “Arab, Arabic Gulf Spoken”, “Ta’izzi, Southern Yemeni”, “Arab, Bahraini”, “Shihuh, Al-Shihuh” and more.  In Oman it does not because these groups are registered explicitly.
Therefore the existence of a generic PG in a given country cannot be taken as a sign that there are no related specific PGs in that country.  The boundaries of which people are in a generic PG have to be interpreted in the light of which related specific PGs have been included in that country in that list. 
It follows that if we have a generic PG for a given country on “the group” list, it does not necessarily correspond to the apparently matching generic PG in another list.

Issue 6: “Flexibility of specific PGs to include some people from related PGs”
Specific PGs are also somewhat flexible. 
In Lebanon, JP chooses to register 109665 “Jew, Syrian”, while HIS chooses to register 104243 “Jew”.  HIS data may be more accurate, while JP data may be more useful.  JP very likely includes some non-Syrian Jews in their “Jew, Syrian” total.
Also in Lebanon, none of the lists chooses to include the generic “Arab” entity, but each has only 4 or 5 specific Arab entities.  This suggests that there are small numbers of other Arab types (e.g. Omani) who are included in figures for some other Arab group.
For most purposes it does not matter if a specific PG happens to include a few closely related people.  It may matter more where the lists differ.
In Lebanon, JP have uniquely registered 58,000 Egyptian Arabs.  It is impossible to tell simply from looking at the data where the other lists would currently account for Egyptian Arabs.
We would not know where to go in one list for information about a PG that is broken out in another list. 

Tentative Conclusion

At our current level of maturity with People Group data, I believe that working with multiple lists is very difficult and often confusing.  This will remain true even if the major differences are reconciled.  While upholding the value of different lists, and supporting HIS as a means of reconciliation between those lists, it may be necessary for “the group” to select one list to partner with for the time being.  I think that if we don’t want to manage an independent list of our own, this may be our only practical route forward.