Skip to main content

Open source and open data

There’s currently an ongoing debate about the value of data and whether internet companies should do more to share their data with others. At Google we’ve long believed that open data and open source are good not only for us and our industry, but also benefit the world at large.

Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems. Similarly, the millions of annotated videos in the YouTube-8M collection can be used to train video recognition.

With respect to language processing, we’ve shared the Natural Questions database, which contains 307,373 human-generated questions and answers. We’ve also made available the Trillion Word Corpus, which is based on words used on public web pages, and the Ngram Viewer, that can be used to explore the more than 25 million books in Google Books. These collections can be used for statistical machine translation, speech recognition, spelling correction, entity detection, information extraction and other language research.

And these are only a few  examples of a much broader activity: Google AI currently lists 62 datasets of this sort that we’re making available to the research community.   

We also host a large number of publicly available datasets, such as the 20,000 Kaggle Open Datasets and the Cloud Public Datasets, which allows people to access frequently used public data directly from their workspace.

Google also offers Google Trends, a free service that enables anyone to see and download aggregate search activity since 2004 for Google Search, Image Search, News Search, Shopping and YouTube. You can get search information for countries, regions, metro areas and cities on a monthly, weekly, daily and even hourly basis. The Trends data is widely used by researchers in fields as varied as medicine and economics. According to Google Scholar, there aremore than 21,000 research papers that cite Trends as a data source.

Google is also a major contributor to open source software.  Key examples of this include Android, our smartphone operating system, Chromium, the code base for our Chrome browser (now also powering many competitors), and TensorFlow, our machine learning system. Google’s release of Kubernetes changed cloud hosting forever, and has enabled innovation and competition across the cloud industry. Google is also the largest contributor of open source code to GitHub, a shared repository for software development. In 2017, Googlers made more than 250,000 changes to tens of thousands of projects on GitHub alone.

Finally, we’ve also released over 5,300 research reports written at Google, most of which have subsequently been published in scientific journals or conference proceedings.  

Of course, it is costly to create and compile this data, software, and research. So why do we release these materials free of charge?

First and foremost, our primary mission is “to organize the world’s information and make it universally accessible and useful.” Certainly one obvious way to make information universally accessible and useful is to give it away! 

Second, making these materials available stimulates scientific research outside of Google. We know we can’t do it all, and we spend a lot of time reading, understanding and often extending work done by others, some of which has been developed using tools and data we have provided to the research community. This mix of competition and cooperation among groups of researchers is what pushes science forward.

Third, when we hire new employees, it’s great if they can hit the ground running and already know and use the tools we have developed. Familiarity with our software and data makes engineers productive from their first day at work.

There are many more reasons to share research data, but these three alone justify the practice. We aren’t the only internet company to appreciate the power of open data, code, and open research. Our colleagues in academia, and many other companies follow the same practices for much the same reasons.

Of course, we can’t release all the data we use in our business. We need to protect user privacy, maintain confidentiality for business customers, and protect Google’s own intellectual property. But, subject to such considerations, we generally try to make our data as “universally accessible and useful” as possible.


by via The Keyword

Comments

Popular posts from this blog

certain keys on my keyboard dont work when "cold"

Hi guys, i have a Lenovo Y520-15IKBN (80WK) and certain keys on the keyboard don't work (e,g,h,8,9,Fn...) but only when the weather is cold. for example in the winter it used to work after certain amount of time when i first boot the laptop and stops working when i stop using it for a while, but now that the weather is hot it works just fine except for the first couple of minutes or when its colder. of course i do realise that it has nothing to do with the outside weather but with the temperature of the computer itself. can someone explain to me why this is happening and how it should be fixed as i cannot take it to the tech service until july even though it's still under warranty because i need it for school. ps: an external keyboard works fine. Submitted April 29, 2018 at 03:35PM by AMmej https://ift.tt/2KiQg05

Old PC with a Foxconn n15235 motherboard needs drivers! Help!!

So my Pc corrupted and I had to fresh install windows on it, but now its missing 3 drivers and one of them is for the Ethernet controller! I've tried searching everywhere for the windows 7 drivers but all I seem to find are some dodgey programs saying they will install it for me. Problem is without the ethernet driver I can't bloody connect to the internet. I've been using a USB to try get some drivers on there, but they just end up being useless programmes . I'm also a bit of a noob at these things, I don't understand where to find the names of things in my PC, I've opened it up but I don't understand whats significant and what isnt. If someone has the drivers and can teach me how to install them I'd be very appreciative! Submitted April 29, 2018 at 02:47PM by darrilsteady https://ift.tt/2r76xMZ