Recent Forum Posts

From categories:

A couple of brief comments.

Scope should include use of open standards to enable machine-to-machine interoperability with other web services and applications. This is, in my view, one of the most important aims of the work, with some of the greatest potential benefits.

The group NEEDS to address and inform the production of census data and metadata. Production should be carried out with a clear view of dissemination in order to achieve a coherent process. We're discovering that some of the 2001 outputs, particularly counts of households with particular combinations of persons (see table CAS 118, for example), are proving very difficult to fit into a hypercube model. If output production isn't informed by dissemination, we risk trying to fit square pegs into round holes all over again.

The ESRC Census Programme funded twelve month project, Data Integration and Dissemination (DIaD), is investigating the potential of using international open standards (OGC) based techniques to perform data linkage between two of the most heavily used census outputs – the aggregate statistical data and the output geographies. The primary objective of this work is to develop a data dissemination model which demonstrates a more generic capability – that of ‘geo-linking’. This provides the ability to separate census statistical data (for example, but other geospatially-linked data are equally capable of utilising this approach) and the boundary (geometry) data to which it relates. Geo-linking allows for distributed, multi-source datasets to be seamlessly linked in a fashion that facilitates data separation for management and administration purposes. In essence, the approach proposed will provide an extensible infrastructure applicable not only to the immediate needs of the Census Programme but also more widely to broader requirements emerging from the e-Social Science programme, especially its e-Infrastructure strand and also the ESRCs national data strategy. Additionally, using the same standards based approach, the project will aim to demonstrate how further value added processing can be invoked by transforming the geo-linked outputs through a series of ancillary web processing services.

Geography by EDINAEDINA, 26 Jun 2009 10:42
Some early thoughts by ONS_BrendanONS_Brendan, 25 Jun 2009 12:46

Was thinking about this, the best way to get this information would be to use the original questions asked on the census forms and use categories defined as the possible answers, then you would get the official text for each code as well.

The questions as shown in the explanatory volume, also show the structure for the hierarchical codelists too.

Discuss issues from the second CWSWG meeting held at ONS Titchfield 24 June 2009.

Meeting 2 - 24 June 2009 by ONS_BrendanONS_Brendan, 05 Jun 2009 09:26

Like the jargon? I made it myself.

The first things that we (mostly Rob) did in the CAIRD Project were to look at the potential XML schemas (DDI and SDMX) in order to familiarise ourselves with them and check on the kinds of information that could be encoded and the ways in which this is achieved within the schemas. Once we'd assured ourselves that the schemas could accommodate the kinds of information that we knew were entangled within the existing 2001 outputs, we set to work trying to extract the information in structured and usable forms as decribed by Rob above. The initial stages of this involved a mixture of interactive tidying up and programmatic parsing of textual table frameworks that we had already produced as an intermediate stage in the creation of the html cell selection frameworks used in our Casweb interface. Rob and Richard Wiseman created versions of the frameworks in which all the row and column headers were straightened out and filled in with values. This involved a lot of rather fiddly work, especially for some of the more complex compound tables in our sample, which had to be split into several different tables. Once the table frameworks had been restructured in this way, the codelists and their constituent codes were compared against a set of cleaned up standard codelists that Rob built up as he went along in order to make sure that all the text labels for codes were consistent (the 0-4, 0 to 4, 0-4 years, 0 to 4 years, aged 0-4 problem described by Rob previously). Rob was then able to process the table frameworks to extract the cell IDs with their associated codelist/code pairs as contained in the final table in his description.

Luckily we had the table frameworks as tab delimited text files, so it was possible to open these in Excel.

We then assigned

Category names to each group of column and row headers

For a table of Age by General Health by Sex, the first row category would be General Health, then second row category would be Sex and the column heading category would be Age

All People Good Health Fairly Good Health Not Good Health
Male Female Male Female Male Female
All People 0001 0002 0003 0004 0005 0006 0007
0 - 9 0008 0009
10 - 19
20 - 39

Then for each cell id we extracted the row and column headings, together with their categories and inserted this into a table.

Cell ID Category Code
0009 General Health Good Health
0009 Sex Male
0009 Age 0 - 9

We did this for for each of tables, then we needed to harmonise the column and row headings as the syntax used varied across the tables, eg Aged 0 to 4 years old, was expressed as 0-4, 0 to 4, 0-4 years, 0 to 4 years, aged 0-4 and so on.

This then created for each category a distinct list of codes, we then used these lists as the codelists for SDMX.

CAIRD - Metadata extraction by CDU_RobCDU_Rob, 02 Jun 2009 15:45

Discuss issues from the first CWSWG meeting held at ONS Titchfield 23 March 2009.

Meeting 1 - 23 March 2009 by ONS_BrendanONS_Brendan, 01 Apr 2009 14:16

Case study specific placeholder

CWSWG Meeting agenda placeholder

CWSWG Meeting agenda placeholder

A placeholder at the highest level

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License