             SOME ADDITIONAL NOTES on the UKC_ccc.ARJ FILES
             ==============================================

This document contains notes on:-

           1.  History
           2.  A Parish Rediscovered
           3.  Recently acquired transcripts




1.   History
     =======

As you may know, the UKC_ccc.ARJ files, containing the 1851 2% census
extract, and now available on FidoNet BBSs were acquired originally by
Gordon Grant, with the permission and encouragement of Alan Stanier of
Essex University - derived from the original data sample lodged in the
Economic and Social Research Council (ESRC) archive, at Essex.  We all
owe a great deal to both Gordon and Alan for this:- Gordon for
introducing us to them and Alan, for being responsible not only for
retrieving the sample from the ESRC Data Archive, but also for
collecting other transcripts from various sources and massaging them
into a coherent whole.

At Essex, the data became available for anonymous ftp via the Internet,
and in fact still are, for those with an Internet connection.  However,
I understand Gordon received the files from Alan on a tape cartridge. 
Coming from a UNIX-based system, they had been archived using a
combination of tar (Tape ARchiver) - a way of concatenating a series of
files into a single entity, and often used on UNIX systems for backup -
and gzip (GNUzip), a UNIX compression utility, performing a similar
function to the more familiar (on DOS systems) PKZIP.  Incidentally, a
version of gzip *is* available for DOS; anyone requiring to read the
Essex archives on a DOS system would need a copy (the latest is version
1.24, GZIP124.ZIP and likewise of a version of tar.  Gordon, however
performed the conversion to DOS initially.

In the first flush of enthusiasm to tap this marvellous new source of
raw research material, Gordon had made the files available to us on two
sets of five disks, the circulation of which round the UK would put the
proverbial chain letter writer to shame!  Everyone who received a copy
would we hoped, pass the set on, after copying for themselves, and
perhaps making a second copy to pass on elsewhere.  My set arrived
posted to me by Sheila Jones, then co-SysOp of DDLG BBS in Suffolk.  She
had received them from Kerry McCandlish of Benbecula Shuttle, in the
Outer Hebrides.  I in turn copied them, sending to Barry Prazak, of
Paradise Valley, Northampton, who in turn sent them on elsewhere, to
Dave Roocroft of Time Tunnel, and thereafter to Wally's BBS in
Livingston, Scotland and onwards to other BBS in the UK, which I know
nothing about - although I understand disks changed hands for the price
of a pint in the pub!

Gordon named the files originally CCC.ARJ, where CCC is the Chapman
County code, to correspond with the equivalent at Essex -
census_CCC.tar.Z.  However, Gordon's choice had given rise to a problem,
which none of us had first noticed.  During this time, Mike Fisher of
ROOTS (UK!) has also received his copy of the 5-disk set, and was
preparing to hatch them into the Genealogical Software Distribution
Service (GSDS).  He was the first to notice that a county was missing
from his disks!  It was the set of transcripts for Cornwall, which would
have been named CON.ARJ, if it had ever been copied successfully onto
the disks in the first place!

The snag of course is that DOS does not permit the use of CON as a file
name, as it conflicts with the name of the CONsole device driver.  In
the same way it is impossible to have a file named LPT, or PRN.  CON.ARJ
just never made it!

For this reason, Mike decided to rename the complete set of files to
UKC_ccc.ARJ, and that is how they came to be released to GSDS.  Today,
however, there may be copies of the old CCC.ARJ files still around, both
in the UK and abroad - so if you come across them, please be aware the
archive for Cornwall will be missing.

The last released version of XTRACT, v2.2, was written originally to
conform only to the GSDS standard, to accept input from files named
UKC_ccc.ARJ only.  However, since then, following the improved
compression offered by PKZIP v2.04g, it has been modified to accept
ZIP2, or indeed almost any form of archiving, providing the archiver
operates as a single DOS command, with one or more switches, and input
and output file names as arguments.




2.   A Parish Rediscovered
     =====================

Since the original release of the transcripts into GSDS, two new files
have been released to replace the existing UKC_STS.ARJ and UKC_WOR.ARJ.

UKC_STS.ARJ   179,552   UK census 2% transcript .... REPLACEMENT
UKC_WOR.ARJ    86,723   UK census 2% transcript .... REPLACEMENT

These files contain an additional parish, DUDLEY, Worcestershire -1851. 
It contains 56 unique surnames, and 234 different surnames in all, which
were not available at the time of the original GSDS release.  I
discovered this parish was missing whilst I was testing XTRACT - one of
my initial diagnostic aids being written to ensure I could pull out a
list of *all* surnames (and nothing else) from my UKC_ccc.ARJ set.  The
list I produced at first did not tally with the name index prepared at
Essex University - what is now UKC_NIDX.TXT in the GSDS.

My first thoughts were that my code was deficient, but further
investigation revealed that all of the missing surnames were indexed
under the one reference, STS.WOR, the dot indicating a parish split
between two counties -- Staffordshire (STS) and Worcestershire (WOR). 
It began to look suspiciously like - again - a file was missing from the
FidoNet set!

Thankfully, the files are still available on the InterNet accessible via
anonymous ftp from Essex University.  I was able to use my InterNet
connection to retrieve the Staffordshire archive, census_STS.tar.Z (UNIX
systems allow for a more descriptive file name than DOS!), and a peek
inside revealed the missing parish!

Happily - with the additional file, my list now tallies with
UKC_NIDX.TXT!

Both counties, STS and WOR have been re-hatched complete.  The file
names for parishes which are unique to STS and WOR remain just as they
were; however, Murphy's Law being with us as ever, the parish which was
omitted was one of TWO split between Staffordshire and Worcestershire,
the other being the parish of WARLEY WIGORN.

The existing STS.WOR file had been named STS51WOR.TXT, being the only
one in the archive at that time - but in order to conform with the
naming convention in the remainder of the archives (and incidentally,
that in the Essex University archive), this name had to be changed - to
STS51WOR.T02.

The new one, containing the parish of Dudley is named STS51WOR.T01, and
STS51WOR.TXT no longer exists, being renamed to STS51WOR.T02.  I
emphasise the latter intentionally, just in case anyone thinks they're
missing out again!




3.   Recently acquired transcripts
     =============================

This year (1995), some additional census material has been added to the
store in the ESRC data archive, and we will be releasing these additions
into GSDS shortly.  Counties represented are Devon, Glamorgan,
Gloucestershire, and Worcestershire.  In some cases the additions are
significant, for instance, complete transcripts for the parish of
Sevenhampton, and the inmates from Northleach Prison will be added to
the Gloucestershire archive.

In addition, some transcripts have been acquired from other sources,
including samples from Yarm, to be added to the Yorkshire, North Riding
archive, and portions of Hertfordshire - thanks to kind permission of
Paul Joiner, the transcriber; and from the parish of Altarnun in
Cornwall, again with kind permission of the transcriber, Jeff Burnard.


                Copyright (c) Rosemary Lockie, June 1995
