acm-header
Sign In

Communications of the ACM

Contributed articles

Jim Gray, Astronomer


Jim Gray and the Sloan Digital Sky Survey telescope

Photograph by Alexander Szalay

Jim Gray worked with astronomers for more than a decade, right up to the time he went missing in 2007. My collaboration with him created some of the world's largest astronomy databases and enabled us to test many unorthodox data-management ideas in practice. The astronomers collaborating with us have continued to be very receptive to them, embracing Jim as a card-carrying member of their community. Jim's contributions have left a permanent mark on astronomy worldwide, as well as on e-science in general.

Astronomy data has doubled in size every year for the past 20 years, due mostly to the emergence of electronic sensors. The largest sky survey of the past decade, the Sloan Digital Sky Survey, or SDSS (www.sdss.org), is often called the cosmic genome project. When it began in 1992, the size of the data set to be used for scientific analysis was measured in terabytes, shockingly large for the time. My group at Johns Hopkins University was selected by the SDSS Collaboration to build the science archive for the SDSS, a task we quickly realized would require a powerful search engine with spatial search capabilities. Our experimental system, based on object-oriented technologies, was good enough to develop an understanding of how the eventual system should function, though we knew we would also need to do something different, most notably in terms of query performance.

One SDSS collaboration meeting in the mid-1990s took me to Seattle where I had dinner with Charles Simonyi, then at Microsoft, who recognized the similarities between our problem and the Microsoft TerraServer (www.terraserver.com), which provides free online access to U.S. Geological Survey digital aerial photographs, and immediately called Jim to arrange a meeting. A few weeks later I flew to San Francisco and visited him at the Bay Area Research Center. Thus began a lively discussion about the TerraServer, how it could be turned inside out for a new (astronomical) purpose, and how spatial searches over the Earth were both similar to and different from spatial searches over the sky. We spent a full day dissecting the problem.

Jim asked about our "20 queries," his incisive way of learning about an application, as a deceptively simple way to jump-start a dialogue between him (a database expert) and me (an astronomer or any scientist). Jim said, "Give me your 20 most important questions you would like to ask of your data system and I will design the system for you. " It was amazing to watch how well this simple heuristic approach, combined with Jim's imagination, worked to produce quick results.

Jim then came to Baltimore to look over our computer room and within 30 seconds declared, with a grin, we had the wrong database layout. My colleagues and I were stunned. Jim explained later that he listened to the sounds the machines were making as they operated; the disks rattled too much, telling him there was too much random disk access. We began mapping SDSS database hardware requirements, projecting that in order to achieve acceptable performance with a 1TB data set we would need a GB/sec sequential read speed from the disks, translating to about 20 servers at the time. Jim was a firm believer in using "bricks," or the cheapest, simplest building blocks money could buy. We started experimenting with low-level disk IO on our inexpensive Dell servers, and our disks were soon much quieter and performing more efficiently.

Back to Top

Astronomy and the SkyServer

Toward the end of 2000 data started arriving from the SDSS telescope in Apache Point, NM (see Figure 1), and Jim said simply, "Let's get to work." So during Christmas and the New Year's holiday we converted the whole object-oriented database schema to a Microsoft SQL Server-based schema. We modified many of our loading scripts by looking at Tom Barclay's TerraServer code and soon, with Jim's guidance, had a simple SQL Server version of the SDSS database.24

The SDSS project was at first reluctant to even consider switching technologies, so for about a year the SQL Server database we had designed was a "cowboy" implementation, not part of the official SDSS data release. Coincidentally, Intel gave us a pool of servers to use to experiment with the database, giving us a show-and-tell meeting in San Francisco a few months after the first bits of data started to come in from the telescope. We decided to create a simple graphical interface on top of the database, similar to the one on the TerraServer, to enable astronomers and anyone else to visually browse the sky. My son, Tamas (13 at the time) came along for the Intel meeting and helped man the booth, telling us, "No self-respecting schoolkid would use such an interface," that it had to be much more visually stimulating and interactive.

Jim gave one of his characteristically big laughs; we then looked at one another and realized we had our target audience. Even if astronomers were not ready, we would design a database and integrated Web site for schoolchildren. This was the moment we set out to build the SkyServer to connect the database to the pixels in the sky. The name was an obvious play on TerraServer, and we pitched it to the SDSS project as a tool for education and outreach, as well as for serious scientific investigation. When the first batch of SDSS data was officially declared public in 2001, the SkyServer, then running on computers donated by Compaq, appeared side by side with the official database for astronomers. We wrote simple scripts to create false-color images from the raw astronomy data and adopted the TerraServer scheme to build an image pyramid consisting of successive sets of tiles at different magnifications.


We soon had the framework and the ability to load hundreds of GB of data in a reasonable amount of time, marking the transition of the SkyServer team from "cowboys" to "ranchers."


By the next year (2002), everyone realized that the SkyServer engine was much more robust and scalable than expected. Ani Thakar, a research scientist at Johns Hopkins, made a superhuman effort to convert the whole existing framework to SQL Server.22 Jim insisted on "two-phase loading," that is, we would load each new batch of data into its own separate little database, then run data-cleaning code and accept the data only if it passed all the tests. This foresight turned out to be enormously useful; once the data started coming through the hose, we could recover from errors (there were lots of them) much more easily. We soon had the framework and the ability to load hundreds of GB of data in a reasonable amount of time, marking the transition of the SkyServer team from cowboys to "ranchers."

Curtis Wong, manager of the Microsoft Next Media Research Group, then redesigned the SkyServer's interface. His seemingly minor modifications of our style sheets had a huge effect on the entire site's look and feel; it suddenly came alive. Many volunteers, including former Johns Hopkins student Steve Landy and physics teacher Rob Sparks, helped add content. Jordan Raddick, a science writer, created a new section of the Web site, with educational exercises and formal class materials for all students, from kindergarten to high school. Professional astronomers also appreciated the power of the visual tools, and the site quickly became popular, even in this community.

The next major step came with the emergence of Microsoft's .NET Web services. Jim invited our development team (at Johns Hopkins) to San Francisco to the VSLive Conference (January 2002) where .NET was introduced and where our students entered a worldwide .NET programming contest, eventually coming in second. They created a set of services—called SkyQuery3—that performed queries across geographically separate databases. At the same time, Jim built a prototype for the ImageCutout, a Web service building dynamic image mosaics, that became the core of the next-generation user interfaces we developed for the SkyServer to integrate images and database content.19

Later, during a six-month sabbatical, Jim picked up a few astronomy textbooks, took them along on his sailboat, Tenacious, and while sailing quickly turned into a "native astronomer," understanding the important concepts of astronomy. He thus enabled himself to participate in the reformulation of research ideas into elegant SQL, working with us side-by-side not only on database-related problems but on major-league astronomy research. We subsequently wrote many papers together where his ideas were quite relevant to astronomy.18 At the same time he taught us database design and computer science and invited several of our students to be interns at BARC.

As Jim spent more and more of his time in astronomy, he noted on one of his famous PowerPoint slides concerning relational database design: "I love working with astronomers, since their data is worthless." He meant it in the most complimentary sense, that the data could be freely distributed and shared, since there were no financial implications or legal constraints. He went on to participate in many SDSS meetings, becoming a much-beloved and highly respected member of the astronomical community. His contributions are indeed very much appreciated, and in recognition of his work an asteroid is about to be named for him by the International Astronomical Union.

Soon after the SkyServer was launched in 2001, it was obvious that astronomers would want to perform a variety of spatial searches for objects in the sky. The survey also had a rather complex geometry, and in order to describe it we would need an extensive framework for spatial operations. Over the next few years (2002–2006), with several of my students and postdocs (particularly Peter Kunszt) we wrote, again with Jim's guidance, a fast package for spatial searches called Hierarchical Triangular Mesh (see Figure 2).17 We also built an interface to SQL Server and were soon performing blazingly fast searches over the sky. This emerged as one of the most notable features of the SkyServer. The tools eventually also made it into the shrink-wrap package of SQL Server 2005 as a demo on how to interface SQL to external software.4,8

Jim was excited about these spatial computations, since they demonstrated one of his main convictions: that when you have lots of data, you take the computations to the data rather than the data to the computations. To Jim, there is nothing closer to the data than the database; thus the computations have to be done inside the database.9

As spatial searches grew in complexity, it became apparent that we would need even more extensive processing capabilities. Besides indexed searches, we needed better ways to represent complex polygons on the sphere. We ended up combining two complementary approaches. In one representation, polygons were represented as intersections of the unit sphere with a 3D polyhedral. The polyhedral was delimited by planes, so each convex polyhedron could be built from the intersection of a set of these half-spaces. This turned out to be handy for testing a point against a polygon in SQL. The inside test focused on the dot-product of the Cartesian vector describing the point with the normal vector of the half-space against the distance of the plane from the origin.9 The dual representation (in terms of arcs) formed the outlines of the polygons (see Figure 3). We built tools to perform the set algebra of spherical polygons, including morphological operations over the sphere, a complex computational geometry library, all in SQL. I can think of no other person who would have thought of such an idea, much less been able to implement it. The library was subsequently converted to C# by Tamas Budavari and George Fekete of John Hopkins, though much of the code in SkyServer remains Jim's original.

Jim realized there are two different types of spatial problems: one related to a localized, relatively small region, the other to a fuzzy (probabilistic) spatial join over much of the celestial sphere. He came up with the idea of using latitude zones and wrote the whole query, joining two tables with hundreds of millions of rows, as a single SQL statement, letting the optimizer do its magic in terms of parallelizing the join10—one of the finest examples of SQL wizardry I have ever seen. He worked with Maria Nieto-Santisteban of Johns Hopkins to create parallel implementations of this cross-matching operation across many servers; performance is nothing short of stunning.6,12 These ideas are the basis of the next-generation SkyQuery engine we are building today.

It was around 2001 that astronomers began to explore the idea of a U.S. National Virtual Observatory (www.usvo.org/).10,21 Given the fact that most of the world's astronomy data is public ("worthless") and online, the time seemed right to develop a framework where all of it would appear as part of a single system. Jim was an enthusiastic supporter of the idea and an active participant in all the discussions about its design. His ideas are still at the heart of its service-based architecture. His advice helped us avoid many computational and design pitfalls we would undoubtedly have fallen into. He helped many different groups from around the world bring their data into databases; his astronomy collaborators are found everywhere, from Edinburgh to Beijing, Pasadena, Munich, and Budapest. He bought several sneakernet boxes, inexpensive servers that travel the world as an inexpensive way to transport data, and was highly amused by the fact that in spite of the delays due to postal services and customs checks the bandwidth still exceeds that of the scientific world's high-speed networks.11

The SkyServer also turned out to be a groundbreaking exercise in publishing and curating digital scientific data. We learned that once a data set is released, it cannot be changed and must be treated like an edition of a printed book, in the sense that one would not destroy an old copy just because a new one appears on the shelves. To date, we carry forward all the old releases of SDSS data.

We also aimed to capture all relevant information in the database. We created a framework for automatically supporting physical units and descriptions by the database, using markup tags in the comments of our SQL scripts. We recently (2008) archived all email sent during the project in a free-text searchable database.

We were indeed anxious to see how scientists would interact with the database. Analyses, we knew, must be done as close to the data as possible, but it is also difficult to allow general users to create and run their own functions inside a shared, public database. Nolan Li, a graduate student at Johns Hopkins, and Wil O'Mullane, a senior programmer in the Johns Hopkins SDSS group, proposed giving users their own serverside databases (called MyDB/CasJobs) where they could do anything yet still link to the main database as well. Jim embraced the idea and was instrumental in turning it into generic dataspace.13

Over the years, we also noticed another interesting user pattern. Even though the MyDB interface gave users who wanted to run long jobs a way around our five-minute timeouts for anonymous queries, many astronomers and non-astronomers alike were writing Python and Perl crawlers where a simple query template was repeatedly submitted with a different set of parameters, occasionally leading to problems.

In one case someone was submitting a query every 10 seconds that was less than optimally written and so took more than 10 seconds to execute. As a result, the requests kept piling up, and the server became extremely overloaded. As we noted this odd behavior and identified and isolated the "guilty" query, Jim quickly modified the stored procedure that executed the user-written free-form SQL queries. He put in a statement conditional to the IP address of the user running the particular robot script, so, for that user alone, the query would not be executed but instead give the message: "Please contact Jim Gray at the following email address:..." The queries stopped immediately. We later learned they were coming from a CS graduate student in Tokyo who had the shock of his life from Jim's email, which (for a student of CS) must have sounded like the voice of God. Jim followed up and sent the student an email that said: "It is OK to use the system and OK to send an email."

We logged all traffic from day one and were amazed to see how it grew (see Figure 4) and how a New York Times article on a new SDSS result caused a huge spike in user traffic. It was gratifying to see that afterward the traffic continued to stay higher than before, indicating that many people, astronomers and non-astronomers alike, liked what they saw. Our analysis of SkyServer traffic found that most of the one million users were non-astronomers and that there is a power law with no obvious breaks in any of the usage statistics.15 Jim liked to say: "You have nothing to fear but success or failure" and, of course, he never intended to fail at anything.

Back to Top

Beyond Astronomy

It became clear from our SkyServer experience that virtual observatories are sure to emerge on every scale of the physical world, from high-energy physics to nanotech, molecular biology, environmental observatories, planet Earth, even the entire universe. Many of the unknown issues related to managing huge amounts of data are common to all disciplines and revolve around our human, as well as our digital, inability to deal with increasing amounts of data.7

As a result we've been considering the broader implications of our SkyServer work. The SDSS marked a transition to a new kind of science. Science itself has evolved over the centuries, from empirical to analytic, then to computational-X, where X represents many (if not all) scientific disciplines. With the emergence of large experiments like SDSS, where even data collection is via computer, a paradigm shift is under way. We are entering an era where there is so much data that the brute-force application of computational hardware is not enough to collect and analyze it all. We need to approach even the design of our experiments differently, taking an algorithmic perspective. Data management and enormous databases are inevitable in this new world, where business is e-business and science is e-science."16

SDSS data represents a wonderful opportunity to explore and experiment with how scientists adopt to new tools and new technologies. In the same spirit, Jim experimented with how tools and technologies carry over to other disciplines. For example, he consciously started (beginning in 2005) to develop relationships with molecular biologists and genomics researchers. I went along for some of his visits to the Whitehead Institute for Biomedical Research at MIT (www.wi.mit.edu) and the National Center for Biotechnology Information (www.ncbi.nlm.hih.gov/) and was amazed to find how similar many of the bioinformatics challenges were to those in astronomy. It was great to see Jim go native in biology with the same comfort he did in astronomy and how his "20 queries" cut through the communication gap in the various communities. The same thing happened when he started to work with oceanographers from the Monterey Bay Aquarium Research Institute (www.mbari.org/) and the North-East Pacific Time-Series Undersea Networked Experiments project (www.neptuneproject.org/).

He was among the first computer scientists to realize how the data explosion changes not only science but scientific computing as well. As the amount of data grows faster than our ability to transfer it through the network, the only solution that promises to keep up is to take the computation directly to the data.20 This principle contrasts with recent trends in high-performance computing where the machines are increasingly CPU-intensive, while the ability to read and write data lags behind processing speed. Lively discussions with Jim and Gordon Bell of Microsoft Research about this problem resulted in a paper outlining what is wrong with today's computing architectures2; I am immensely proud of having been a co-author. Our group at Johns Hopkins is now implementing the vision we outlined there, building a machine—called in Jim's honor the GrayWulf (graywulf.org/)—specially tailored for data-intensive computations.

We realized that the data explosion in astronomy is due to the electronic charge-coupled device detectors that have replaced photographic plates. As semiconductor manufacturing matured, each year has brought a new generation of bigger and more sensitive detectors that could be replaced without affecting the telescopes themselves. Much as gene chips and gene sequencers have industrialized molecular biology, the revolution in Earth-observing satellite imagery has also been the result of better imaging devices. The common theme is that whenever an inexpensive sensing device is on an exponential growth path, a scientific revolution is imminent.

Such a revolution is taking place today with inexpensive wireless sensor networks, sometimes called "smart dust" after the University of California, Berkeley, project that first developed them almost a decade ago. It is expected that within the next five years there will be more sensors online than computers worldwide. Intel's Berkeley Lab was among the first to develop such devices. My wife, Kathy Szlávecz, is a soil biologist interested in the soil ecosystem and has for years painstakingly sought and collected data involving environmental parameters. Jim connected her to the Berkeley lab, and after her seminar we came away with a shoebox full of Berkeley Motes (www.eecs.berkeley.edu/department/EECSbrochure/c6-sl.html). At the same time Johns Hopkins hired Andreas Terzis, a computer scientist specializing in wireless sensors, and thus a new collaboration (lifeunderyourfeet.org/) was formed. Despite having only a shoestring budget, it still managed to build a small sensor network to study soil moisture and temperature.


To Jim, there is nothing closer to the data than the database; thus the computations have to be done inside the database.9


Jim realized that in this field of enviro-sensor networks, almost everyone focuses on the first phase of the problem—collecting data. In astronomy we have learned the hard way that with exponential data growth one should worry about data processing and analysis even at the beginning; otherwise, it will be difficult to catch up once the data stream really opens up.1

He was also very interested in the flexibility of the SkyServer framework. Another aspect of the environmental work is how interested scientists are in long-term trends and averages, even as they want to retain all the raw data and dive in whenever they find something unusual. We again went to work, converting in a matter of weeks the SkyServer framework into an end-to-end system to handle data from environmental science.23 We wrote code to handle time-series and in-database calibrations. Soon, we had help from Stuart Ozer from Microsoft Research who built an OLAP data cube for the sensor data, the first ever (as far as we know) in a scientific application (see Figure 5).14

Back to Top

Collaborator and Friend

Over the years, as our collaboration intensified, our work days would start with Jim's phone calls while he walked from home to BARC, followed by back-and-forth calls until early morning on the east coast (of the U.S.). Very often we were still talking at 3 A.M. my time or 7 A.M. his time. We spent a lot of time together, chasing bugs and arguing over code.

Jim had an uncanny ability to go for the jugular, recognizing the critical issue or bottleneck. I had the privilege of meeting some of the top physicists of the 20th century, including Richard Feynman and Yakov Zeldovich. Jim's mind worked the same way as theirs; like them, he could solve a problem on the back of an envelope.

He was also very good at getting results published. When he felt the time was right for us to write a paper, he would start with a quick draft, helping his collaborators (like me) with writer's block get up to speed. He was very generous, often doing much more than his collaborators, yet still insisted on others, particularly young researchers, take the role of lead author. He men-tored many students, always patient and encouraging, trying to get them excited and lead by example.

He never gave up hands-on work. If he did not have time to write code or tinker with databases, it was not a good day for him. He had an inexhaustible source of energy; when everyone else was falling over, he kept going, pulling everyone along with him. Starting at 7 A.M. one day at BARC we kept going at a spurious SQL bug and never stood up (except for coffee) until 11 P.M. (when we finally found it). By then, most restaurants were closed, but Jim led the way through San Francisco's North Beach neighborhood until we found an inviting Italian restaurant and had a proper dinner.

He once took a piece of paper and drew a rectangle for me, with two diagonals splitting it into four pieces, then said: "This is our life" with the vertical axis representing the arrow of time (see Figure 6) and asked: "Alex, where are we on this diagram?" He did all he could to ensure the work part occupied as big a piece as possible.

Jim and I met rather late in our careers, a point in life when most people might perhaps be expected to establish good working relationships, but deep friendships generally come much earlier. For whatever reason, we connected, became close friends and had amazing conversations where the bandwidth regularly went way beyond the spoken word.

Jim's love of life and work inspired anyone fortunate enough to spend time with him. My friendship and collaboration took my career in new, entirely unexpected directions, away from astrophysics and into e-science. He affected the lives of many others in the same way. All of us privileged enough to call Jim a friend will forever be trying to turn at least some of the projects we dreamed of together into reality.

Back to Top

Acknowledgments

I thank Jim Gray for many years of advice, support, and friendship and Donna Carnes for being the strongest of all, holding us together when everything was falling apart.

Back to Top

References

1. Balazinska, M., Deshpande, A., Franklin, M.J., Gibbons, P.B., Gray, J., Hansen, M., Liebhold, M., Nath, S., Szalay, A., and Tao, V. Data management in the worldwide sensor Web. IEEE Pervasive Computing (2007), 30.

2. Bell, G., Gray, J., and Szalay, A.S. Petascale computational systems. IEEE Computer (Jan. 2006), 39.

3. Budavári, T., Malik, T., Szalay, A.S., Thakar, A., and Gray, J. SkyQuery: A prototype distributed query Web service for the Virtual Observatory. In Proceedings of the ADASS XII, ASP Conference Series, H. Payne, R.I. Jedrzejewski, and R.N. Hook, Eds. (Baltimore, MD, Oct. 2002). Astronomical Society of the Pacific, San Francisco, 2003, 31.

4. Fekete, G., Szalay, A.S., and Gray, J. HTM2: Spatial toolkit for the Virtual Observatory. In Proceedings of the ADASS, ASP Con ference Series (Strasbourg, France, Oct. 2003). Astronomical Society of the Pacific, San Francisco, 2003, 289.

5. Gray, J., Nieto-Santisteban, M.A., and Szalay, A.S. The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatiai Datasets, MSR-TR-2006-52. Microsoft Technical Report, Redmond, WA, 2006.

6. Gray, J., Szalay, A., Budavári, T., Thakar, A.R., Nieto-Santisteban, M.A., and Lupton, R. Cross-Matching Multiple Spatial Observations and Dealing with Missing Data, Microsoft Technical Report, MSR-TR-2006-175, Microsoft Technical Report, Redmond, WA, 2006.

7. Gray, J., Liu, D.T., Nieto-Santisteban, M.A., Szalay, A.S., Heber, G., and De Witt, D. Scientific Data Management in the Coming Decade, MSR-TR-2005-10. Microsoft Technical Report, Redmond, WA, 2005.

8. Gray, J., Szalay, A.S., and Fekete, G. Using Table Valued Functions in SQL Server 2005 to Implement a Spatial Data Library, MSR-TR-2005-122. Microsoft Technical Report, Redmond, WA, 2005.

9. Gray, J., Szalay, A.S., Fekete, G., 0'Mullane, W., Thakar, A.R., Heber, G., and Rots, A.H. There Goes the Neighborhood: Relational Algebra for Spatial Data Search, MSR-TR-2004-32. Microsoft Technical Report, Redmond, WA, 2004

10. Gray, J. and Szalay, A.S. Where the rubber meets the sky: Bridging the gap between databases and science. IEEE Data Engineering Bulletin (Dec. 2004), 4.

11. Gray, J., Chong, W., Barclay, T., Szalay, A.S., and Vandenberg, J. TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange, MS-TR-2002-54. Microsoft Technical Report, Redmond, WA, 2002.

12. Nieto-Santisteban, M.A., Thakar, A.R., Szalay, A.S., and Gray, J. Large-scale query and xmatch, entering the parallel zone. In Proceedings of the Astronomical Data Analysis Software and Systems XV ASP Conference Series, C. Gabriel, C. Arviset, D. Ponz, and E. Solano, Eds. (El Escoreal, Spain, Oct. 2005). Astronomical Society of the Pacific, San Francisco, 2006, 493.

13. O'Mullane, W., Gray, J., Li, N., Budavari, T., Nieto Santisteban, M., and Szalay, A.S. Batch query system with interactive local storage for SDSS and the V0. In Proceedings so the ADASS XIII, F. Ochsenbein, M. Allen, and D. Egret, Eds. (Strasbourg, France, Oct. 2003). Astronomical Society of the Pacific, San Francisco, 2004, 372.

14. Ozer, S., Szalay, A.S., Szlavecz, K., Terzis, T. Musâoiu-E., R., and Cogan, J. Using Data-Cubes in Science: An Example from Environmental Monitoring of the Soil Ecosystem, MSR-TR-2006-134. Microsoft Technical Report, Redmond, WA, 2006.

15. Singh, V., Gray, J., Thakar, A.R., Szalay, A.S., Raddick, J., Boroski, B., Lebedeva, S., and Yanny, B. SkyServer Traffic Report: The First Five Years, MSR-TR-2006-190. Microsoft Technical Report, Redmond, WA, 2006.

16. Szalay, A.S. and Gray, J. Science in an exponential world. Nature 413 (2006), 440–441.

17. Szalay, A.S., Gray, J., Fekete, G., Kunszt, P., Kukol, P., and Thakar, A. Indexing the Sphere with the Hierarchical Triangular Mesh, MSR-TR-2005-123. Microsoft Technical Report, Redmond, WA, 2005.

18. Szalay, A.S., Budavári, T., Connolly, A.J., Gray, J., Matsubara, T., Pope, A. and Szapudi, I. Spatial clustering of galaxies in large data sets. In Proceedings of the SPIE Conference on Advanced Telescope Technologies (Waikaloa, HI, July). The International Society for Optical Engineering, 2002, 1–12.

19. Szalay, A.S., Budavári, T., Malik, T. Gray, J., and Thakar, A. Web services for the Virtual Observatory. In Proceedings of the SPIE Conference on Advanced Telescope Technologies (Waikaloa, HI, July). The International Society for Optical Engineering, 2002, 124.

20. Szalay, A.S., Gray, J., and Vandenberg, J. Petabyte-scale data mining: Dream or reality? In Proceedings of the SPIE Conference on Advanced Telescope Technologies (Waikaloa, HI, July). The International Society of Optical Engineering, 2002, 333–338.

21. Szalay, A.S. and Gray, J. The World-Wide Telescope. Science 293 (2001), 2037–2040.

22. Szalay, A.S., Kunszt, P., Thakar, A., Gray, J., Slutz, D., and Brunner, R. Designing and mining multi-terabyte astronomy archives: The Sloan Digital Sky Survey. In Proceedings of the SIGMOD 2000 Conference (Madison, WI). ACM Press, New York, 2000, 451–462.

23. Szlavecz, K., Terzis, A., Musâoiu-E., R., Cogan, J., Small, S., Ozer, S., Burns, R., Gray, J., and Szalay, A.S. Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service, MSR-TR-2006-90. Microsoft Technical Report, Redmond, WA, 2006.

24. Thakar, A., Szalay, A., Kunszt, P., and Gray, J. Migrating a multiterabyte archive from object to relational database. Computing in Science and Engineering 5 (Sept./Oct. 2003),16.

Back to Top

Author

Alexander S. Szalay (szalay@jhu.edu) is Alumn. Centennial Professor in the Department of Physics and Astronomy at The Johns Hopkins University, Baltimore, MD.

Back to Top

Footnotes

DOI: http://doi.acm.org/10.1145/1400214.1400231

Back to Top

Figures

F1Figure 1. Jim Gray and the Sloan Digital Sky Survey telescope, Apache Point, NM.

F2Figure 2. The hierarchical subdivision of the sphere that forms the basis of the Hierarchical Triangular Mesh, starting with an octahedron.

F3Figure 3. Typical spherical polygon (orange) describing an area of uniform target selection for follow-up spectroscopic observation in the sloan Digital Sky Survey arising from the intersection of geometries in the survey area.

F4Figure 4. Aggregate SkyServer monthly traffic 2001–2006 when the number of Web hits doubled each year.

F5Figure 5. Charts of sensor measurements generated from the On Line Analytical Processing data-cube for sensor deployment at Johns Hopkins University.

F6Figure 6. The world of science according to Jim Gray

Back to top


©2008 ACM  0001-0782/08/1100  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2008 ACM, Inc.


 

No entries found