天道首页 > 英语学习 > 双语阅读


责任编辑:siyang.zhang来源:互联网时间:2019-04-08 15:28:28点击:


关键词: 英语学习双语阅读大数据

  Our ability to collect data far outpaces our ability tofully utilize it—yet those data may hold the key tosolving some of the biggest global challenges facingus today.


  Take, for instance, the frequent outbreaks ofwaterborne illnesses as a consequence of war ornatural disasters. The most recent example can befound in Yemen, where roughly 10,000 newsuspected cases of cholera are reported each week—and history is riddled with similar stories. What if we could better understand the environmental factors that contributed to the disease, predict which communities are at higher risk, and put in place protective measures to stemthe spread?


  Answers to these questions and others like them could potentially help us avert catastrophe.


  We already collect data related to virtually everything, from birth and death rates to cropyields and traffic flows. IBM estimates that each day, 2.5 quintillion bytes of data aregenerated. To put that in perspective: that's the equivalent of all the data in the Library ofCongress being produced more than 166,000 times per 24-hour period. Yet we don't reallyharness the power of all this information. It's time that changed—and thanks to recentadvances in data analytics and computational services, we finally have the tools to do it.


  As a data scientist for Los Alamos National Laboratory, I study data from wide-ranging, publicsources to identify patterns in hopes of being able to predict trends that could be a threat toglobal security. Multiple data streams are critical because the ground-truth data (such assurveys) that we collect is often delayed, biased, sparse, incorrect or, sometimes, nonexistent.


  For example, knowing mosquito incidence in communities would help us predict the risk ofmosquito-transmitted disease such as dengue, the leading cause of illness and death in thetropics. However, mosquito data at a global (and even national) scale are not available.


  To address this gap, we're using other sources such as satellite imagery, climate data anddemographic information to estimate dengue risk. Specifically, we had success predicting thespread of dengue in Brazil at the regional, state and municipality level using these datastreams as well as clinical surveillance data and Google search queries that used terms relatedto the disease. While our predictions aren't perfect, they show promise. Our goal is to combineinformation from each data stream to further refine our models and improve their predictivepower.


  Similarly, to forecast the flu season, we have found that Wikipedia and Google searches cancomplement clinical data. Because the rate of people searching the internet for flu symptomsoften increases during their onset, we can predict a spike in cases where clinical data lags.


  We're using these same concepts to expand our research beyond disease prediction to betterunderstand public sentiment. In partnership with the University of California, we'reconducting a three-year study using disparate data streams to understand whether opinionsexpressed on social media map to opinions expressed in surveys.


  For example, in Colombia, we are conducting a study to see whether social media posts aboutthe peace process between the government and FARC, the socialist guerilla movement, can beground-truthed with survey data. A University of California, Berkeley researcher is conductingon-the-ground surveys throughout Colombia—including in isolated rural areas—to pollcitizens about the peace process. Meanwhile, at Los Alamos, we're analyzing social media dataand news sources from the same areas to determine if they align with the survey data.


  If we can demonstrate that social media accurately captures a population's sentiment, itcould be a more affordable, accessible and timely alternative to what are otherwiseexpensive and logistically challenging surveys. In the case of disease forecasting, if socialmedia posts did indeed serve as a predictive tool for outbreaks, those data could be used ineducational campaigns to inform citizens of the risk of an outbreak (due to vaccineexemptions, for example) and ultimately reduce that risk by promoting protective behaviors (such as washing hands, wearing masks, remaining indoors, etc. ).


  All of this illustrates the potential for big data to solve big problems. Los Alamos and othernational laboratories that are home to some of the world's largest supercomputers have thecomputational power augmented by machine learning and data analysis to take thisinformation and shape it into a story that tells us not only about one state or even nation, butthe world as a whole. The information is there; now it's time to use it.


满分考生亲授 如何在家搞定托福110+



  • 考试系列APP上线
  • 微留学 大梦想 天道微信


  • 2018年GMAT考试:最有价值的考试攻略!


关于天道 天道历史 资质荣誉 特色优势 媒体报道 企业招聘 更多
培训服务 北美课程 北美课程特色 英联邦课程 英联邦课程特色 天道培训优势 更多
培训课程 SAT课程 托福课程 雅思课程 GRE课程 GMAT课程 更多
高分学员榜 SAT高分学员 托福高分学员 雅思高分学员 GRE高分学员 GMAT高分学员 更多
名师风采 五星名师 多对一教学加辅导 海归+名校 多年教学经验 内部教材独家研发 更多