Twice a month, we revisit some of our readers’ favorite posts from throughout the history of Nettuts+. This tutorial was first published in July, 2010.
Offering your content or logic as a service on the web is a great idea. For starters it allows you to build numerous front-ends for your own information without having to access the databases all the time (and thus making scaling your system much easier).
The even more practical upshot is that you allow people on the web to play with your information and build things you never even dreamed of doing. A lot of companies understand that this "crowd-sourced innovation" is a freebie that is too good to miss which is why there are so many great APIs around.
Providing an API to the world is a totally different story though. You need to know how to scale your servers, you need to be there for answering questions by implementers, and you need to maintain a good documentation to allow people to use your content. You also need to think about a good caching strategy to keep your servers from blowing up and you need to find a way to limit access to your system to avoid people abusing it. Or do you?
Enter YQL
Yahoo offers a system for people to access their APIs called the Yahoo Query Language, or YQL. YQL is a SQL-style language that turns information on the web into virtual databases that can be queried by end users. So if you want to, for example, search the web for the term "elephant," all you need to do is to use the following statement:
select * from search.web where query="elephant"
You send this statement to a data endpoint, and you get it back as either XML, JSON, or JSON-P. You can request more results, and you can filter them by defining what you want to get back:
http://query.yahooapis.com/v1/public/yql ?q={yql query} &diagnostics={true|false} &format={json|xml} &callback={function name}
Mix and Match
All of Yahoo's APIs are available through this interface, and you can mix and match services with sub-selections. For example, you could run a keyword analysis tool over the abstract of a web search to find relevant keyterms. Using the unique()
functions, you can also easily remove false positives.
select * from search.termextract where context in ( select abstract from search.web(50) where query="elephant") | unique(field="Result")
See the results of this more complex query here.
The Console
The easiest way to play with YQL as a consumer is to use the console at http://developer.yahoo.com/yql/console/. There you can click on different tables to see a demo query how to use it and if you click the desc
link you find out which options are available to you.
YQL Limits
The use of YQL has a few limits which are described in the documentation. In essence, you can access the open data endpoint 1,000 times in an hour, per IP. If you authenticate an application with oAuth, you get 10,000 hits an hour. Each application is allowed 100,000 hits a day.
This, and the caching of results that YQL does automatically means that the data does only get requested when it changed. This means that YQL is sort of a firewall for requests to the data people offer with it.
Be careful when using jQuery's "$.getJSON," and an anonymous function as its callback. This can bust YQL's caching abilities, and hinder performance.
Building Web Services with Open Tables
The really cool thing for you as a provider is that YQL is open for other data providers.
If you want to offer an API to the world (or just have one for yourself internally) you can easily do that by writing an "open table" which is an XML schema pointing to a web service.
People do this a lot, which means that, if you click the "Show community tables" link in the YQL console, you will find that there are now 812 instead of 118 tables to play with (as of today - tomorrow there will probably be more).
To get your service into YQL and offer it to the world all you need to do is to point YQL to it. Let's look at a simple example:
Real-World Application: Craigslist as an API
The free classified ad web site Craigslist has no public API - which is a shame, really. However, when you do a search on the site you will find that the search results have an RSS output - which is at least pointing towards API functionality. For example, when I search for "schwinn mountain bike" in San Francisco, the URL of the search would be:
http://sfbay.craigslist.co.uk/search/sss?format=rss&query=schwinn+mountain+bike
This can be changed into a URL with variables, with the variables being the location, the type of product you are looking for (which is the section of the site) and the query you searched for (in this case I wrapped the parameters in curly braces):
http://{location}.craigslist.co.uk/search/{type}?format=rss&query={query}
Once you found a pattern like this you can start writing your open table:
<?xml version="1.0" encoding="UTF-8"?> <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd"> <meta> <author>Yahoo! Inc.</author> <documentationURL>http://craigslist.org/</documentationURL> <sampleQuery>select * from {table} where location="sfbay" and type="sss" and query="schwinn mountain bike"</sampleQuery> <description>Searches Craigslist.org</description> </meta> <bindings> <select itemPath="" produces="XML"> <urls> <url>http://{location}.craigslist.org/search/{type}?format=rss</url> </urls> <inputs> <key id="location" type="xs:string" paramType="path" required="true" /> <key id="type" type="xs:string" paramType="path" required="true" /> <key id="query" type="xs:string" paramType="query" required="true" /> </inputs> </select> </bindings> </table>
For a full description of what all that means, you can check the YQL documentation on open tables but here is a quick walkthrough:
- You start with the XML prologue and a
table
element pointing to the schema for YQL open tables. This allows YQL to validate your table. - You add a
meta
element with information about your table: the author, the URL of your documentation and a sample query. The sample query is the most important here, as this is what will show up in the query box of the YQL console when people click on your table name. It is the first step to using your API -- so make it worth while. Show the parameters you offer and how to use them. The{table}
part will be replaced with the name of the table. - The
bindings
element shows what the table is connected to and what keys are expected in a query. - You define the
path
and thetype
of the output in theselect
element - values for the type are XML or JSON and the path allows you only to return a certain section of the data returned from the URL you access. - In the
urls
section, you define the URL endpoints of your service. In our case, this is the parameterised URL from earlier. YQL replaces the elements in curly braces with the information provided by the YQL user. - In the
inputs
section, you define all the possible keys the end users can or should provide. Eachkey
has anid
, aparamType
which is eitherpath
, if the parameter is a part of the URL path, orquery
, if it is to be added to the URL as a parameter. You define which keys are mandatory by setting themandatory
attribute totrue
.
And that is it! By putting together this XML document, you did the first of three steps to get your web services to be part of the YQL infrastructure. The next step is to tell YQL where your web service definition is. Simply upload the file to a server, for example http://isithackday.com/craigslist.search.xml. You then point YQL to the service by applying the use
command:
use "http://isithackday.com/craigslist.search.xml" as cl; select * from cl where location"sfbay" and type="sss" and query="playstation"
You can try this out and you'll see that you now find playstations for sale in the San Francisco Bay Area. Neat, isn't it?
Logic as a Service
Sometimes you have no web service at all, and all you want to do is offer a certain logic to the world. I found myself doing this very thing the other day. What I wanted to know is the distance between two places on Earth. For this, I needed to find the latitude and longitude of the places and then do very clever calculations. As I am a lazy person, I built on work that other people have done for me. In order to find the latitude and longitude of a certain place on Earth you can use the Yahoo Geo APIs. In YQL, you can do this with:
select * from geo.places(1) where text="paris"
In order to find a function that calculates the distance between two places on Earth reliably, I spent a few minutes on Google and found Chris Veness' implementation of the "Vincenty Inverse Solution of Geodesics on the Ellipsoid".
YQL offers an executable block inside open tables which contains server-side JavaScript. Instead of simply returning the data from the service, you can use this to convert information before returning it. You can also do REST calls to other services and to YQL itself in these JavaScript blocks. And this is what I did:
<?xml version="1.0" encoding="UTF-8"?> <table xmlns="http://query.yahooapis.com/v1/schema/table.xsd"> <meta> <sampleQuery> select * from {table} where place1="london" and place2="paris" </sampleQuery> <author>Christian Heilmann</author> <documentationURL> http://isithackday.com/hacks/geo/distance/ </documentationURL> <description> Gives you the distance of two places on earth in miles or kilometers </description> </meta> <bindings> <select itemPath="" produces="XML"> <inputs> <key id='place1' type='xs:string' paramType='variable' required="true" /> <key id='place2' type='xs:string' paramType='variable' required="true" /> </inputs> <execute><![CDATA[ default xml namespace = "http://where.yahooapis.com/v1/schema.rng"; var res = y.query("select * from geo.places(1) where text='" + place1 + "'").results; var res2 = y.query("select * from geo.places(1) where text='" + place2 + "'").results; var lat1 = res.place.centroid.latitude; var lon1 = res.place.centroid.longitude; var lat2 = res2.place.centroid.latitude; var lon2 = res2.place.centroid.longitude; var d = distVincenty(lat1,lon1,lat2,lon2); function distVincenty(lat1, lon1, lat2, lon2) { /* ... vincenty function... */ var d = d / 1000; var miles = Math.round(d/1.609344); var kilometers = Math.round(d); response.object = <distance> <miles>{miles}</miles> <kilometers>{kilometers}</kilometers> {res.place} {res2.place} </distance>; ]]></execute> </select> </bindings> </table>
- The
meta
element is the same as any other open table. - In the
bindings
we don't have a URL to point to so we can omit that one. However, we now add anexecute
element which ensures that thekey
s defined will be sent to the JavaScript defined in this block. - As the Geo API of Yahoo returns namespaced XML, we need to tell the JavaScript which namespace that is.
- I execute two YQL queries from the script using the
y.query()
method using theplace1
andplace2
parameters to get the locations of the two places. The.results
after the method call makes sure I get the results. I store them inres
andres2
respectively. - I then get the latitude and longitude for each of the results and call the
distVincenty()
method. - I divide the result by 1000 to get the kilometers and multiply the result with the right number to get the miles.
- I end the script part by defining a
response.object
which is what YQL will return. As this is server-side JavaScript with full E4X support all I need to write is the XML I want to return with the JavaScript variables I want to render out in curly braces.
Using this service and adding a bit of interface to it, I can now easily show the distance between Batman and Robin.
Using server-side JavaScript you can not only convert data but also easily offer a service that only consists of calculations - much like Google Calculator does.
Turning an Editable Data Set into a Web Service
What you really want to do in most cases though is to allow people to edit the data that drives the web service in an easy fashion. Normally, we'd build a CMS, we'd train people on it, and spend a lot of time to get the data from the CMS onto the web to access it through YQL. It can be done easier though.
A few months ago, I released a web site called winterolympicsmedals.com which shows you all the information about the Winter Olympics over the years.
The data that drives the web site was released for free by The Guardian in the UK on their Data Blog as an Excel spreadsheet. In order to turn this into an editable data set, all I had to do was save a copy to my own Google Docs repository. You can reach that data here. Google Docs allows sharing of Spreadsheets on the web. By using "CSV" as the output format, I get a URL to access in YQL:
And using YQL you can use CSV as a data source:
select * from csv where url="http://spreadsheets.google.com/pub? key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc &hl=en&output=csv"
See the result of that in your own browser.
As you can see, the CSV table automatically adds rows and columns to the XML output. In order to make that a more useful and filter-able web service, you can provide a columns list to rename the resulting XML elements:
select * from csv where url="http://spreadsheets.google.com/pub? key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc&hl=en&output=csv" and columns="year,city,sport,discipline,country,event,gender,type"
See the renamed columns in your browser.
This allows you to filter the information, which is exactly what I did to build winterolympicsmedals.com. For example to get all the gold medals from 1924 you'd do the following:
select * from csv where url="http://spreadsheets.google.com/pub? key=0AhphLklK1Ve4dHBXRGtJWk1abGVRYVJFZjQ5M3YxSnc&hl=en&output=csv" and columns="year,city,sport,discipline,country,event,gender,type" and year="1924" and type="Gold"</code>
See the gold medals of 1924 in your browser.
So you can use the free storage of Google and the free web service infrastructure to convert free data into a web service. All you need to do is create a nice interface for it.
Adding your Service to YQL's Community Tables
Once you've defined your open table, you can use it by hosting it on your own server, or you can go full in by adding it to the YQL table repository. To do this, all it needs is for you to add it to the YQL tables repository at GitHub which can be found at http://github.com/yql/yql-tables/. Extensive help on how to use Git and GitHub can be found in their help section.
If you send a request to the YQL team to pull from your repository, they'll test your table, and if all is fine with it, they'll move it over to http://datatables.org/ which is the resource for the communities table in the YQL console.
This does not only make the life of other developers more interesting, but is also very good promotion for you. Instead of hoping to find developers to play with your data, you bring the data to where developers already look for it.
Advanced YQL Topics
This introduction can only scrape the surface of what you can do with YQL. If you check the documentation, you'll find that, in addition to these "read" open tables, you can also set up some services that can be written to, and YQL also offers cloud storage of your information. Check the extensive YQL documentation for more.
Summary
Combining open systems like YQL and Google Docs, and some knowledge of XML and JavaScript, you can offer a web service to people in a matter of minutes. In any case, moving your development from accessing local files and databases to accessing services makes it much more versatile and allows you to switch providers any time in the future. With YQL, you can dip your toes into the water of web services without drowning as most of the tough work has already been done for you. Thanks for reading!
About the Author
Christian Heilmann is an international Developer Evangelist who works for Mozilla.
Comments