Improving resolution of census data in metropolitan areas using a dasymetric approach: applications for the Baltimore Ecosystem Study

A dasymetric map is a type of thematic map where boundaries are altered by the distribution of other phenomena. In dasymetric maps, administrative or enumeration boundaries are redrawn to better represent the distribution of data classes. Dasymetric mapping methods can improve the resolution of historical and present day census data. Using data collected for the Baltimore Ecosystem Study, I use the dasymetric approach to remap the distribution of census data based on residential land use. By overlaying census boundaries with land use and land cover information in a GIS database, census data can be partitioned into places where, from the land use information, we know people live. This method is particularly helpful in parts of the city and suburbs where residential land use is uneven, such as new subdivisions or industrial neighborhoods. This approach can also be employed to improve the spatial resolution of historical data, typically collected at coarse resolutions. I demonstrate with a case study of the Greater Baltimore Region that dasymetric mapping is also an effective method for environmental equity analysis. I conclude with some promises and limitations of this method for urban ecological research.

1 Dasymetric mapping methods are an effective means of improving the resolution of historical and present day census data.Dasymetric maps are a type of thematic map where boundaries are altered by the distribution of other phenomena.In dasymetric maps, administrative or enumeration boundaries are redrawn to better represent the distribution of data classes.For example, to better represent population density, census tracts may be redrawn to bound the chosen intervals rather than using the predetermined census tract boundaries.The dasymetric approach has been used to improve resolution and analysis of a variety of spatial phenomena, including health data, risk, and crime statistics (Chen et al. 2004;Hay et al. 2005;Kennedy and Kennedy 2004) as well as population distribution (Martin et al. 2000;Yohn et al. 2002).Yet it remains an underused method, primarily because most GIS and mapping software is set up to produce choropleth maps using standard enumeration or administrative boundaries (Crampton 2004;Eicher and Brewer 2001).In this paper, I use the dasymetric approach to remap the distribution of census data based on residential land use.By overlaying census boundaries with land use and land cover information in a GIS database, census data can be partitioned into places where, from the land use information, we know people live (Figure 1) (Halloway et al. 1999;Mennis 2002;Mennis 2003).This method is particularly helpful in parts of the city and suburbs where residential land use is uneven, such as new subdivisions or industrial neighborhoods.Typically, dasymetric mapping has been used for present-day studies where land use and census data are readily available in digital format (Mennis and Jordan 2005).Yet the method has great and underused potential for historical census data where the data are aggregated and geographies are large, such as wards or counties.Fortunately, air photographs are available for much of the country beginning in 1937/8.In the coming years, these images will be downloadable on USGS's EROS website (http://edc.usgs.gov/products/aerial/survey.html).At present, the website acts as a portal for ordering scanned imaged of historic air photos.By digitizing land use and land cover from these photographs and combining the information with digital historical census boundaries and data, we are able to improve the resolution of data.
For the integrated field of urban ecology, the census is a critical social science data set.No other data set is as complete or comprehensive through time and space, and many of the variables available in the census are available nowhere else.Since 1940, the Census Bureau has released data for most metropolitan areas at the census tract level (New York City was first tracted in 1910).Typically, census tracts were delineated for built-up areas; beyond this built-up urban core, data were published only at the much coarser county level.The advantage and disadvantage of the census tract is that it generalizes population characteristics, representing the data as homogeneous throughout the tract.This simplifies the mapping process, makes the maps easy to read, and helps readers discern larger spatial patterns.The disadvantage of census tract maps is they mask the heterogeneity of reality, and sometimes those details are important, especially for fine-scaled spatial analysis.Although the Census Bureau aggregates data to facilitate the mapping of patterns, it does so primarily to protect privacy of individuals and households.The census tract is the smallest geography at which the Census releases all variables collected from the short (100-percent) and long forms (1 in 6 households sample).The census block group, a subset of the census tract, provides finer spatial resolution but does not include all census variables.At the census block level, the Bureau deliberately introduces error to protect privacy and does not release some data if minimum thresholds are not met.Census geographies can vary greatly in size, even at the same level.Because the Census Bureau aims to keep the total population of census tracts around 4000, in areas of low population density, the census tracts can be quite large (http://www.census.gov/geo/www/tiger/glossry2.pdf). 1 In the Baltimore Metropolitan Area, census tracts range from 0.03 to 86 square kilometers, with a mean value of 3 square kilometers.

Social Science Research and the Baltimore Ecosystem Study
The census is one of the core data sets used for research in the Baltimore Ecosystem Study (BES), an Urban Long Term Ecological Research site sponsored by the National Science Foundation (http://beslter.org).The mandate of the BES is to understand metropolitan Baltimore as a human-dominated ecosystem over the long term, from 1790 to the present (as well as forecasting to 2100).Along with a series of biophysical data sets, census data, historical and present-day, are indispensable for this study.
The BES uses the ecological concept of "patches" to understand the structure and function of Baltimore as a heterogeneous human ecosystem (Pickett et al. 1997;Pickett et al. 2001;Pickett and White 1985).Such patches can refer to land cover, such as forests or fields, but can be applied as well to human patterns and groupings, such as neighborhoods or blocks (Grove and Burch 2002;Grove et al. 2006).The BES's basic premise is that the ecological and human patches (structure) are a product of social and ecological dynamics (function), with the understanding that the relationship is two-way.The patches are both a product and driver of the interactions between human beings and the ecosystem -what may be called a dynamic feedback relationship.Others have characterized this reciprocal relationship as a socio-ecological system (Redman et al. 2004).
The BES, as a long term study, depends on good historical census data at patch level resolution.Before 1940, census data for Baltimore are available at the ward level, far larger than a "patch" and too coarse a unit for meaningful study.The manuscript census data can be sampled to overcome this limitation, but this is a long, arduous, and time-consuming process.In 1940, the Census Bureau reported data at the census tract level for the entire city of Baltimore.Since a large portion of the Gwynns Falls Watershed (GFW), a primary unit of analysis for the BES, lies outside the city, data for the upper part of the watershed are reported at the county level (Figure 2).By 1950, the Census Bureau delineated census tracts for part of Baltimore County, including portions of the GFW.By 1960, census tract data were available for the entire watershed, but only at the census tract level.Only after 1980 did the Census make data available for the block group, a smaller and more useful geography for our purposes than the census tract.The census tract boundaries for Baltimore are now available at the National Historic GIS, a Herculean undertaking at the University of Minnesota that will greatly assist social science researchers (http://www.nhgis.org).
What follows are two applications of dasymetric mapping that demonstrate the method's utility for problems of interest to the BES and to urban environmental analyses more generally.In the first application, I create a dasymetric map of population density for 1960 of the lower Gwynns Falls Watershed using census data at the census tract level and digitized air photos from 1957.Population density is a major driver of land cover change, and important for understanding social dynamics within cities.A dasymetric map provides much finer resolution results than census tracts, which is important given the fine-scale analyses employed in ecology and increasingly the social sciences.The second application is for an environmental justice analysis of the Baltimore Metropolitan Region using census data from 2000, reassigned using dasymetric methods, and distance buffers from toxin-releasing facilities.Proximity to these facilities is critical for environmental justice analyses, and the dasymetric approach provides a better approximation of where people live than homogeneous census geographies.

First application: a historical dasymetric census map of population for the Gwynns Falls Watershed in Baltimore
For Baltimore, I have the benefit of being part of the Baltimore Ecosystem Study, a large research community that shares expertise, knowledge, and funds.2Much of the work presented below is a direct result of this partnership.I state this at the outset, because it underscores one difficulty of this approach, especially for historical analysis, and that is labor cost.While satellites improve their resolution every year, scientists using historical sources, including census tracts, wards, and air photos, must achieve such high resolution by hand.Some imaging software, such as Definians e-Cognition, offers promise, but the lower quality of older air photos means that much of the classification needs to happen by heads-up digitizing.Even for present-day analysis, satellite imagery can be classified only by its land cover, not land use.While software can be trained to recognize certain characteristics of built form as residential, other types of data gathering (census, surveys, field visits) are necessary to confirm the use of land (Rindfuss et al. 2004).
To create a 1960 dasymetric layer of population for the Gwynns Falls Watershed, I used air photos that were flown in 1957 (Figure 3) and census tract boundaries from 1960.Ideally, the air photos and census year would match, but three years difference is acceptable.We acquired the 1957 air photos from the National Archives Cartographic and Architectural Records branch (http://www.archives.gov/publications/general-info-leaflets/26.html#aerial1).A graduate student georeferenced and tiled the images to create a geographically accurate composite of the Gwynns Falls Watershed (Wheling 2001).Once the image was georeferenced, land cover polygons were digitized, including built-up urban and residential classes (Figures 4,5).From this GIS layer we can delineate where people lived and then union, or overlay, the polygons with the census tract polygons (Figures 6, 7).
Figure 8 shows population distribution in 1960 represented in the traditional manner using standard census tract boundaries, and by a dasymetric overlay.Improvement in resolution using the dasymetric method is clear.For purposes of ecology as well as social science research, many questions are better addressed by using the higher resolution results from the dasymetric approach.One of those questions in social science research, to which I turn next, is whether some groups of people bear a disproportionate burden of environmental disamenities, a fundamental question in the field of environmental justice.

Second application: environmental equity analysis in Baltimore
Environmental equity studies typically examine the spatial correlation between environmental disamenities, most often Toxics Release Inventory (TRI) sites, and demographic characteristics of surrounding neighborhoods (Bolin et al. 2002;Bowen et al. 1995;Burke 1993;Cutter, Hodgson, and Dow 2001;Downey 2006, 21;Harner et al. 2002;Pastor 2005;Pastor, Sadd, and Hipp 2001;Pulido, Sidawi, and Vos 1996).In an earlier study, I show that TRI sites in Baltimore are concentrated in census tracts that are primarily white.Statistical analyses point to majority white census tracts, while controlling for other socioeconomic factors such as income and education, as the best explanatory variable for the presence or absence of a TRI site, a finding in contrast to many other environmental studies that show a concentration of TRI sites in minority neighborhoods (Boone 2002). 3This study employed buffering methods to try to reduce the limitations of using large polygons as units of analysis (demographics were weighted in proportion to area of the census tract covered by the buffer), but the results are still based on the notion that populations are spread evenly throughout census tracts.
Here census data are reassigned to residential polygons using the dasymetric method so that distance buffers more accurately capture who lives close or further away from TRI facilities.This is one example of where dasymetric mapping has great utility, because land use around TRI sites, which are typically factories, tends not to be residential.Using traditional methods, if a TRI site falls within a census tract or census block group, the assumption is that the population is spread evenly throughout the census geography, including the exact location of the TRI facility.
This analysis uses four data sets: (i) 2000 census at the census block group level; (ii) GDT census block group boundaries; (iii) Maryland Land Use Land Cover GIS layers; and (iv) the Toxics Release Inventory (TRI).Unlike for the historical research, these data are already in digital format, which saves time and labor.However, they do suffer from limitations.Although the census is a rich data source, not all variables are released at all levels of geography.Census blocks provide finer spatial resolution than census block groups, but in order to protect privacy, the Census Bureau does not release most economic variables at this level.Census boundaries are also inaccurate (or "generalized"), which is problematic at certain scales of analysis.GDT (now owned by Tele Atlas, http://www.teleatlas.com/Pub/Home) is a private company that sells cleaned-up and more accurate TIGER census boundaries.ESRI now provides these cleaner boundaries on its data disks.The Maryland Land Use Land Cover dataset includes polygon layers attributed with generalized land use.While they are an improvement over raster sources, such as the USGS National Land Cover Dataset (Figure 9), in the end, the polygons are general assumptions about land use and land cover, a mapping conundrum that I return to later in the paper.Much has been written already on the problems with the TRI data set (Bolin et al. 2002;Bowen 2002;Buzzelli and Jerrett 2004;Cutter et al. 2001;Downey 2005;Holifield 2001;Szasz and Meuser 2000).Suffice it to say that their locations are often reported inaccurately, not all toxic chemicals are included, and not all facilities are required to report their releases.It continues to be used, however, because it reports actual releases of toxic materials into the air, land, and water (unlike, for example, Superfund sites), is national in scope, and has been collected on a reasonably consistent manner since 1986 (http://www.epa/gov/tri/).
Differing units of analysis and spatial extents can lead to significant differences in results, a fact made clear in environmental justice research as well as in ecology (Cutter et al. 2001;Cutter et al. 1996;Downey 1998;Levin 1992).For example, using zip codes as a unit of analysis in environmental equity analysis will produce different results from using census tracts.Likewise, if analysis is restricted to the City of Baltimore, the analysis will have different results than if data are used for the Metropolitan Area.Dasymetric mapping is one way to address the so-called Modifiable Areal Unit Problem (Openshaw 1983) since it produces a near raster-level of detail from relatively large and aggregated vector sources.By converting the smaller polygons to raster and then using raster methods, a finer resolution of analysis is possible than with vector methods.
For this study, I examine environmental equity patterns at two extents, the City of Baltimore and the Baltimore Metropolitan Area, which includes Baltimore City and the five surrounding counties.Because Baltimore has experienced considerable suburbanization in the last 40 years, restricting the analysis to Baltimore City alone would not gauge the distribution of environmental disamenities in relation to where Greater Baltimoreans live.Expanding the extent of analysis to the metropolitan area will capture that population and also allow us to see if flight to the suburbs has spared people, white or black, from living near TRI sites.Instead of using a standard buffer size, I use a raster buffer to examine the relationship between race and distance from TRI sites.The raster buffer extends from 0 to 2 km from the TRI facility.
The vast majority of environmental equity studies, including one I conducted on Baltimore (Boone 2002), have shown that income is weakly related to the location of TRI sites, but that race and ethnicity often are strongly related (Harner et al. 2002).For this analysis, I focus on the residential distribution of black and white populations as these categories account for 96 percent of the population for Baltimore City and 95 percent for the entire metro area (only Howard County has a sizeable ethnic population with Asians accounting for 7.7 percent of the county's population).This is not to suggest that other variables are unimportant, but for the purposes of this paper, which is primarily to show the value of dasymetric mapping, I will restrict the analysis to these two race categories only.

Results
Figure 10 shows a dasymetric map of the ratio of black to white population, and1 km and 2 km buffers from TRI sites.Figure 11 zooms into Baltimore City and shows percent black and the same TRI buffers.A few larger patterns are apparent.First, black populations are most concentrated in the City of Baltimore (blacks account for 64.3 percent of Baltimore City's population), especially to the northwest and east of the city, but with significant clusters in the metropolitan area as well.Second, as expected, TRI sites are clustered in the industrial districts of southeast and south Baltimore, and generally coincide with the harbor and major transportation routes in the region.Third, and perhaps unexpected, is that TRI sites in Baltimore City tend to be near neighborhoods where the black to white ratio is very low.The same pattern is not as apparent in the larger metro region.
Zonal statistics, comparing distance values from TRI sites with demographic characteristics at chosen distance intervals, bear out the general patterns observed on the map.Blacks are the majority in Baltimore City, and in absolute terms more blacks than whites live near TRI sites (Figure 12).In proportion to their numbers, however, a larger portion of Baltimore City's whites live near TRI sites than blacks (Figure 13).Taking the deviation from the mean for the total population of both groups also shows that whites tend to be overrepresented near TRI sites and blacks tend to be underrepresented.Again, I argue in another paper (Boone 2002) that a long history of residential and occupational segregation in Baltimore created an industrial and residential geography of whites living close to well-paid work in factories, and the present landscape reflects those differing notions of preferred locations in the past.For the entire metro region, the results are different, in part because the demographic patterns differ markedly from Baltimore City, as do the residential and industrial patterns.Whereas Baltimore is a majority black city, the metro region is majority white, with whites making up 67 percent of the 2.5 million people in the area.The TRI density of Baltimore City is 0.57 per square kilometer, more than 8 times the TRI density for surrounding countries.Population densities of census block groups in Baltimore City are as high as 65,000 per square kilometer with a median value of 4,567, while census block groups in surrounding counties have a median population density of 966 per square kilometer.
When we zoom out and examine results for the Baltimore Metropolitan Region, they differ from those of Baltimore City.In each distance zone from TRI sites, the majority of the population is white (Figure 14).However, if we examine the proportion of each race within each distance zone, a different pattern emerges -blacks tend to be overrepresented within each zone (Figure 15).In this regard, the results for Baltimore metro are opposite from those of Baltimore City.This may be an artifact of the large number of blacks living in Baltimore City where about a third of all TRI sites are located.Or it may be that blacks living outside Baltimore City are clustering near TRI sites.After removing Baltimore City from the analysis, the answer seems to 6 support the former hypothesis.Approximately 300,000 people live within 2 kilometers of TRI sites in the surrounding counties, and of this population, about 77.5 percent are white and 15.5 percent are black.For the region as a whole, about 79 percent are white and 15 percent are black, suggesting a proportional distribution of black and white persons in the TRI zones.A slightly higher proportion of blacks (16.7 percent of all blacks) live with in the 2 km zone than whites (15.7 percent of all whites).With the exception of the 0-200 meter interval, the distributions along the distance gradient from the TRI sites are relatively proportionate to white and black populations.This analysis suggests that the City of Baltimore defines to a large degree the environmental justice patterns observed in the metropolitan region.

Where do we go from here?
Environmental equity analysis is developing rapidly, and new methods are being developed to address the issue of location, exposure, and risk.One of the limitations of using the methods described above is that it assumes that distance equals risk, or those living close to facilities are more at risk from exposure than those living further away.A standard distance buffer does not take into account wind patterns, variations in topography, or the deposition characteristics of the chemicals in the atmosphere.Nor does it consider synergistic effects from the mixing of toxics that may create more hazardous conditions than if the toxins were released on their own (Bowen 1999;Bowen 2002).Combining environmental justice metrics with modeled air pollutants is one approach to overcome simple interpolation from fixed release points (Grineski, Bolin, and Boone 2007).A second limitation of this approach is that it does not take into account the "spatial nonstationarity" of the relationships between TRI sites and populations.Spatial nonstationarity is the condition where the relationship between dependent and independent variables can vary over space (Brunsdon 1996).In the example of environmental justice, the relationship between TRI sites and percentage white, for instance, depends on where certain white populations and TRIs sites are located.We cannot lump white into a single aspatial category because whites in one neighborhood, even if they are similar in census make-up to whites in another neighborhood, may have a different spatial relationship with environmental disamenities like TRI sites.Geographically Weighted Regression (GWR) is one means of dealing with spatial nonstationarity, and deserves further scrutiny (Mennis and Jordan 2005).Combining the resolution improvements from dasymetric mapping with GWR is a promising and untried approach.
Environmental equity analysis tends to focus on the present, using a single moment in time to assess if disadvantaged communities are more likely than privileged groups to live near environmental disamenities.This type of analysis falls under the broad rubric of outcome equity.While outcome equity studies are important, from such an analysis we can only infer what processes might be responsible for those patterns.More qualitative research is necessary to understand the processes responsible for those patterns, or what is termed the process equity.Because change over time is integral to process equity, longitudinal studies of outcome equity can be an important analytical tool.Historical dasymetric layers of population characteristics can provide a fine spatial resolution of census data for such an analysis.

Time matters: resolution and diminishing returns
Dasyemtric mapping can greatly improve resolution of census data, but the method does require substantial data collection and work.Even using data already in digital format demands considerable upfront labor.If the land use or land cover data are in raster format, they must be converted to vector in order to join the demographic data to residential areas.In some cases, the data must then be converted back to raster, with the chosen demographic variable attributed to each pixel, for fine-scale spatial analysis.If the land use or land cover data are already in vector format, it reduces the work, but it still takes several steps and careful planning to ensure accurate results.GIS software has not been designed to facilitate dasymetric mapping (Crampton 2004;Eicher and Brewer 2001).
Inaccurate data can greatly skew results.For example, if the land use layer is inaccurately delineated or too generalized, some of the census data might be dropped from analysis.In Figure 16, for example, two census block groups are shown that contain population, but because the land use layers do not include residential polygons within these census tracts, they are dropped in the dasymetric overlay.Population density values can also be highly distorted if a residential polygon intersects only a small proportion of a census tract or block group.One means of correcting these errors is to "ground truth" the land use layers.For historical layers, inaccurate digitizing or poor interpretation of the image can also lead to error.
The infusion of GIS into historical research has generated new and exciting research, but like every new path taken, the journey has its dangers (Knowles 2002).The clean lines of a computer-generated map can dupe the reader about the inaccuracy of data or methods used to create it.Purgamentum init, exit purgamentum (garbage in, garbage out) is a critical mantra for mapmaker and map-reader.The art and science of mapmaking, however well-intentioned and carefully executed, is in the end a generalization of reality, and a socially-and politicallyconstructed one at that (Monmonier 2005).How we decide what is a residential polygon and what is not can be guided by rules and principles, but ultimately depends on the interpretation of the mapmaker, the click of the mouse, or the pen touching mylar.Dividing the world up into neat polygons is fraught with difficulty, such as eliminating messy or difficult variables.Homeless populations rarely make it onto official residential land use maps, even though the homeless might be the most vulnerable populations to environmental disamenities (Ruddick 1996).Environmental equity analysis, however, has focused overwhelmingly on stationary sources that are easy to pinpoint on a map.
Historical spatial data sets have the added problem of being difficult to confirm.With present-day imagery, we can go to locations and learn more about their characteristics.With a GPS in hand, one can verify if the pixels on the image that appear to be industry are actually industry, or perhaps residential lofts.This approach is not without its problems (not everything is at appears to be), but what urban historian would not wish to be able to walk through the streets of London in 1830?For historical photos and maps, we can use some methods or sources (e.g.city directories and Sanborn fire insurance maps) to cross-check what we see on the photos.But because we are mortal beings, at some point all researchers must ask if the commitment of time is worth the effort.
The point of diminishing returns depends on a number of factors, but mostly on the question, and particularly on scale.Dasymetric mapping is worth the upfront effort if spatial resolution is critical to answering the question.In the case study of environmental equity, I believe it is.For the scale of analysis, such as the watershed or metropolitan area, our assumption is that small errors are diluted by larger patterns.If we were to scale-up in an environmental justice study, for example, we would likely employ coarser data sets, such as the National Land Cover Dataset from the USGS.At the neighborhood level, we would use a variety of maps and imagery, including the Sanborns, city directories, and manuscript census.Something can be learned and discerned from all scales, and ultimately our understanding of Baltimore as a human ecosystem will depend on careful analyses at multiple temporal and spatial scales.

Boone:
Dasymetric Applications for the Baltimore Ecosystem Study Published by Digital Commons at Loyola Marymount University and Loyola Law School, 2008

Boone:
Dasymetric Applications for the Baltimore Ecosystem Study Published by Digital Commons at Loyola Marymount University and Loyola Law School, 2008

Boone:
Dasymetric Applications for the Baltimore Ecosystem Study Published by Digital Commons at Loyola Marymount University and Loyola Law School, 2008 Boone: Dasymetric Applications for the Baltimore Ecosystem Study Published by Digital Commons at Loyola Marymount University and Loyola Law School, 2008

Figure 1 .Figure 2 .
Figure 1.Dasymetric mapping increases resolution of census geographies by assigning census data, collected in aggregate units such as census block groups, to smaller residential polygons, derived from land use data sources.Tile 1 shows land uses for a downtown portion of Baltimore.The orange polygons are residential.In Tile 2 only residential land uses are extracted.Tile 3 shows census data, in this case population, overlaid with the residential land uses polygons.In Tile 4, a GIS is then used to union overlapping residential polygons with census block groups and the census data are assigned to the new polygons.All Figures are by the author.

Figure 6 .
Figure 6.1957 Urban Land Cover and 1960 Area-Adjusted Population in Census Tracts clipped to the Gwynns Falls Watershed.

Figure 10 .
Figure 10.Black to White Ratio and 1 and 2 km Distance Buffers from TRI sites in the Baltimore Metropolitan Region.

Figure 11 .Figure 12 .
Figure 11.Percent black population, and 1 and 2 km distance buffers from TRI sites for Baltimore City and surrounding areas.

Figure 13 .
Figure 13.Percent of Race Category Populations by Distance Zones from TRI sites, Baltimore City.Note that whites are overrepresented near TRI sites and underrepresented at distances further away from TRI sites.

Figure 14 .
Figure 14.Percent of Population by Race for Distance Zones from TRI sites, Baltimore Metropolitan Region.

Figure 15 .
Figure 15.Percent of Race Category Populations by Distance Zones from TRI sites, Baltimore City.Note that blacks are overrepresented near TRI sites, a pattern opposite from that of Baltimore City alone.

Figure 16 .
Figure 16.Inaccurate or generalized data sources can lead to erroneous results.The two census blocks within the red box contain people, but because the land use is not shown as residential, they are dropped from the results.