{"id":783111,"date":"2024-05-29T12:23:05","date_gmt":"2024-05-29T17:23:05","guid":{"rendered":"http:\/\/spaceweekly.com\/?p=783111"},"modified":"2024-05-29T12:23:05","modified_gmt":"2024-05-29T17:23:05","slug":"astronomy-generates-mountains-of-data-thats-perfect-for-ai","status":"publish","type":"post","link":"https:\/\/spaceweekly.com\/?p=783111","title":{"rendered":"Astronomy Generates Mountains of Data. That&#8217;s Perfect for AI"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<p>Consumer-grade AI is finding its way into people\u2019s daily lives with its ability to generate text and images and automate tasks. But astronomers need much more powerful, specialized AI. The vast amounts of observational data generated by modern telescopes and observatories defies astronomers\u2019 efforts to extract all of its meaning. <\/p>\n<p><span id=\"more-167153\"\/><\/p>\n<p>A team of scientists is developing a new AI for astronomical data called AstroPT. They\u2019ve presented it in a new paper titled \u201cAstroPT: Scaling Large Observation Models for Astronomy.\u201d The paper is available at arxiv.org, and the lead author is Michael J. Smith, a data scientist and astronomer from Aspia Space. <\/p>\n<p>Astronomers are facing a growing deluge of data, which will expand enormously when the Vera Rubin Observatory (VRO) comes online in 2025. The VRO has the world\u2019s largest camera, and each of its images could fill 1500 large-screen TVs. During its ten-year mission, the VRO will generate about 0.5 exabytes of data, which is about 50,000 times more data than is contained in the USA\u2019s Library of Congress. <\/p>\n<figure class=\"wp-block-image size-large\"><figcaption class=\"wp-element-caption\">The VRO\u2019s need for multiple sites to handle all of its data is a testament to the enormous volume of data it will generate. Without effective AI, that data will be stuck in a bottleneck. Image Credit: NOIRLab. <\/figcaption><\/figure>\n<p>Other telescopes with enormous mirrors are also approaching first light. The Giant Magellan Telescope, the Thirty Meter Telescope, and the European Extremely Large Telescope combined will generate an overwhelming amount of data.<\/p>\n<p>Having data that can\u2019t be processed is the same as not having the data at all. It\u2019s basically inert and has no meaning until it\u2019s processed somehow. \u201cWhen you have too much data, and you don\u2019t have the technology to process it, it\u2019s like having no data,\u201d said Cecilia Garraffo, a computational astrophysicist at the Harvard-Smithsonian Center for Astrophysics. <\/p>\n<p>This is where AstroPT comes in.<\/p>\n<p>AstroPT stands for Astro Pretrained Transformer, where a transformer is a particular type of AI. Transformers can change or transform an input sequence into an output sequence. AI needs to be trained, and AstroPT has been trained on 8.6 million 512 x 512-pixel images from the DESI Legacy Survey Data Release 8. DESI is the Dark Energy Spectroscopic Instrument. DESI studies the effect of Dark Energy by capturing the optical spectra from tens of millions of galaxies and quasars. <\/p>\n<p>AstroPT and similar AI deal with \u2018tokens.\u2019 Tokens are visual elements in a larger image that contain meaning. By breaking images down into tokens, an AI can understand the larger meaning of an image. AstroPT can transform individual tokens into coherent output. <\/p>\n<p>AstroPT has been trained on visual tokens. The idea is to teach the AI to predict the next token. The more thoroughly it\u2019s been trained to do that, the better it will perform. <\/p>\n<p>\u201cWe demonstrated that simple generative autoregressive models can learn scientifically useful information when pre-trained on the surrogate task of predicting the next 16 \u00d7 16 pixel patch in a sequence of galaxy image patches,\u201d the authors write. In this scheme, each image patch is a token.<\/p>\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"993\" height=\"993\" src=\"https:\/\/www.universetoday.com\/wp-content\/uploads\/2024\/05\/AI-galaxy-tokens-AstroPT.png\" alt=\"This image illustrates how the authors trained AstroPT to predict the next token in a 'spiralised' sequence of galaxy image patches. It shows the token feed order. &quot;As the galaxies are in the centre of each postage stamp, this set up allows us to seamlessly pretrain and run inference on differently sized galaxy postage stamps,&quot; the authors explain. Image Credit: Smith et al. 2024. \" class=\"wp-image-167154\" srcset=\"https:\/\/www.universetoday.com\/wp-content\/uploads\/2024\/05\/AI-galaxy-tokens-AstroPT.png 993w, https:\/\/www.universetoday.com\/wp-content\/uploads\/2024\/05\/AI-galaxy-tokens-AstroPT-580x580.png 580w, https:\/\/www.universetoday.com\/wp-content\/uploads\/2024\/05\/AI-galaxy-tokens-AstroPT-250x250.png 250w, https:\/\/www.universetoday.com\/wp-content\/uploads\/2024\/05\/AI-galaxy-tokens-AstroPT-768x768.png 768w, https:\/\/www.universetoday.com\/wp-content\/uploads\/2024\/05\/AI-galaxy-tokens-AstroPT-100x100.png 100w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\"\/><figcaption class=\"wp-element-caption\">This image illustrates how the authors trained AstroPT to predict the next token in a \u2018spiralised\u2019 sequence of galaxy image patches. It shows the token feed order. \u201cAs the galaxies are in the centre of each postage stamp, this set up allows us to seamlessly pretrain and run inference on differently sized galaxy postage stamps,\u201d the authors explain. Image Credit: Smith et al. 2024. <\/figcaption><\/figure>\n<p>One of the obstacles to training AI like AstroPT concerns what AI scientists call the \u2018token crisis.\u2019 To be effective, AI needs to be trained on a large number of quality tokens. In a 2023 paper, a separate team of researchers explained that a lack of tokens can limit the effectiveness of some AI, such as LLMs or Large Language Models. \u201cState-of-the-art LLMs require vast amounts of internet-scale text data for pre-training,\u201d the wrote. \u201cUnfortunately, \u2026 the growth rate of high-quality text data on the internet is much<br \/>slower than the growth rate of data required by LLMs.\u201d<\/p>\n<p>AstroPT faces the same problem: a dearth of quality tokens to train on. Like other AI, it uses LOMs or Large Observation Models. The team says their results so far suggest that AstroPT can solve the token crisis by using data from observations. \u201cThis is a promising result that suggests that data taken from the observational sciences would complement data from other domains when used to pre-train a single multimodal LOM, and so points towards the use of observational data as one solution to the \u2018token crisis\u2019.\u201d<\/p>\n<p>AI developers are eager to find solutions to the token crisis and other AI challenges. <\/p>\n<p>Without better AI, a data processing bottleneck will prevent astronomers and astrophysicists from making discoveries from the vast quantities of data that will soon arrive. Can AstroPT help?<\/p>\n<p>The authors are hoping that it can, but it needs much more development. They say they\u2019re open to collaborating with others to strengthen AstroPT. To aid that, they followed \u201ccurrent leading community models\u201d as closely as possible. They call it an \u201copen to all project.\u201d<\/p>\n<p>\u201cWe took these decisions in the belief that collaborative community development paves the fastest route towards realising an open source web-scale large observation model,\u201d they write. <\/p>\n<p>\u201cWe warmly invite potential collaborators to join us,\u201d they conclude. <\/p>\n<p>It\u2019ll be interesting to see how AI developers will keep up with the vast amount of astronomical data coming our way. <\/p>\n<div class=\"sharedaddy sd-block sd-like jetpack-likes-widget-wrapper jetpack-likes-widget-unloaded\" id=\"like-post-wrapper-24000880-167153-665761ce8c8d0\" data-src=\"https:\/\/widgets.wp.com\/likes\/?ver=13.2#blog_id=24000880&amp;post_id=167153&amp;origin=www.universetoday.com&amp;obj_id=24000880-167153-665761ce8c8d0&amp;n=1\" data-name=\"like-post-frame-24000880-167153-665761ce8c8d0\" data-title=\"Like or Reblog\">\n<h3 class=\"sd-title\">Like this:<\/h3>\n<p><span class=\"button\"><span>Like<\/span><\/span> <span class=\"loading\">Loading&#8230;<\/span><\/p>\n<p><span class=\"sd-text-color\"\/><\/div>\n<\/p><\/div>\n<p><br \/>\n<br \/><a href=\"https:\/\/www.universetoday.com\/167153\/astronomy-generates-mountains-of-data-thats-perfect-for-ai\/?rand=772204\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Consumer-grade AI is finding its way into people\u2019s daily lives with its ability to generate text and images and automate tasks. But astronomers need much more powerful, specialized AI. The&hellip; <\/p>\n","protected":false},"author":1,"featured_media":781829,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13],"tags":[],"class_list":["post-783111","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-genaero"],"_links":{"self":[{"href":"https:\/\/spaceweekly.com\/index.php?rest_route=\/wp\/v2\/posts\/783111","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/spaceweekly.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/spaceweekly.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/spaceweekly.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/spaceweekly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=783111"}],"version-history":[{"count":0,"href":"https:\/\/spaceweekly.com\/index.php?rest_route=\/wp\/v2\/posts\/783111\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/spaceweekly.com\/index.php?rest_route=\/wp\/v2\/media\/781829"}],"wp:attachment":[{"href":"https:\/\/spaceweekly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=783111"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/spaceweekly.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=783111"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/spaceweekly.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=783111"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}