News publishing is an inherently ephemeral act. A big story will consume public attention for a day, or a month or a year only to fade from memory as quickly as it erupted. But news coverage, aggregated over time, can provide a fascinating “first draft of history” — a narrative of events as they occurred. At The New York Times, we have an incredibly rich resource in our 162-year archive of Times reporting, and one of the areas we occasionally explore in the lab is how to harness our archive to create new kinds of experiences or tools.
Two years ago, I created Chronicle, a tool for graphing the usage of words and phrases in New York Times reporting. Inspired by my own love of language and history, it’s a fascinating way to see historical events, political shifts, cultural trends or stylistic tropes. Chronicle can reveal things like the rise of feminism, evolution of cultural bêtes noires or when we shifted from talking about the “greenhouse effect” to talking about “climate change”. The Times’ corpus is particularly interesting as a reflection of culture because our style guide carefully informs how our reporters use language to describe the world, which allows us to see those changes more clearly than if we were looking at a heterogenous archive of text. More broadly, Chronicle acts as another example of “semantic listening” approaches we have been researching in the lab — methods for extracting useful semantic signals from streams as diverse as conversations, web browsing history, or in this case, a historic corpus of news coverage.
Since its creation, Chronicle has been in use internally as a research tool, and has occasionally made its way into our report, most notably in Margaret Sullivan’s election day look at hot-button issues in past presidential elections. While we have made a multitude of discoveries through using Chronicle within The Times, we want to see what our readers can unearth about our history as well. As of today, Chronicle is now open and available to the public. Go explore and tell us about your best finds by tweeting them to @nytlabs!
In the course of our work, we make a lot of small experiments, often in code. Sometimes we hit upon something that may not be a signal from the future, but is quite useful in the present. Vellum is one such project.
One of my primary uses for Twitter is to find interesting reading material: breaking news, long reads, research relevant to my work, or funny things about maps. However, Twitter’s interface treats commentary as primary and content as secondary, which can make it difficult to discover things to read if I’m mostly interested in that secondary content.
To address this use case, we created Vellum. Vellum acts as a reading list for your Twitter feed, finding all the links that are being shared by those you follow on Twitter and displaying them each with their full titles and descriptions. This flips the Twitter model, treating the links as primary and the commentary as secondary (you can still see all the tweets about each link, but they are less prominent). Vellum puts a spotlight on content, making it easy to find what you should read next.
We also wanted to include signals about what might be most important to read right now, so links are ranked by how often they have been shared by those you follow on Twitter, allowing you to stay informed about the news your friends and colleagues are discussing most.
Vellum was built as a quick experiment, but as we and other groups within The New York Times have been using it over the past few months, it has proven to be an invaluable tool for using Twitter as a content discovery interface. So today we are opening up Vellum to the public. We hope you find it as useful as we have. Happy reading!
Check out Vellum now »
Note: This was also published as a guest post for the Superflux blog.
Earlier this year, I saw a video from the Consumer Electronics Show in which Whirlpool gave a demonstration of their new line of connected appliances: appliances which would purportedly engage in tightly choreographed routines in order to respond easily and seamlessly to the consumer’s every need. As I watched, it struck me how similar the notions were to the “kitchen of the future” touted by Walter Cronkite in this 1967 video. I began to wonder: was that future vision from nearly fifty years ago particularly prescient? Or, perhaps, are we continuing to model technological innovation on a set of values that hasn’t changed in decades?
When we look closely at the implicit values embedded in the vast majority of new consumer technologies, they speak to a particular kind of relationship we are expected to have with computational systems, a relationship that harkens back to mid-20th century visions of robot servants. These relationships are defined by efficiency, optimization, and apparent magic. Products and systems are designed to relieve users of a variety of everyday “burdens” — problems that are often prioritized according to what technology can solve rather than their significance or impact. And those systems are then assumed to “just work”, in the famous words of Apple. They are black boxes in which the consumer should never feel the need to look under the hood, to see or examine a system’s process, because it should be smart enough to always anticipate your needs.
So what’s wrong with this vision? Why wouldn’t I want things doing work for me? Why would I care to understand more about a system’s process when it just makes the right decisions for me? [Read more...]
We see a moment coming when the collection of endless streams of data is commonplace. As this transition accelerates it is becoming increasingly apparent that our existing toolset for dealing with streams of data is lacking. Over the last 20 years we have invested heavily in tools that deal with tabulated data, from Excel, MySQL and MATLAB to Hadoop, R and Python+Numpy. These tools, when faced with a stream of never ending data, fall short and diminish our creative potential.
In response to this shortfall we have created streamtools – a new, open source project by The New York Times R&D Lab which provides a general purpose, graphical tool for dealing with streams of data. It provides a vocabulary of operations that can be connected together to create live data processing systems without the need for programming or complicated infrastructure. These systems are assembled using a visual interface that affords both immediate understanding and live manipulation of the system.
Noah, Matt and I just returned from Geneva, Switzerland, where we were attending Lift 2014. Lift is a conference on technology, design and innovation that we have followed from afar for a while now, as it is co-founded by Nicolas Nova, of the Near Future Laboratory (whose work we all greatly admire) and always has a great lineup of speakers. This year, we not only had the opportunity to attend but we all led a workshop and I presented a talk.
My talk was entitled “In the Loop: Designing Conversations with Algorithms”. In it, I shared signals we’re seeing that indicate a shifting relationship between people and algorithmic systems, discussed how those changes are at odds with some of the implicit ideas we’ve been building into innovation for decades, and advocated for a set of design principles that can help us create better interactions for the future: ones where we are empowered to engage in negotiations with the complex and increasingly pervasive systems around us. Full video of the talk (20 minutes) is below:
Our workshop was framed around the idea of “impulse response”, which refers to a means of sounding out the properties of an unknown system by sending a known signal into it. As increasing aspects of our lives are mediated by algorithmic systems, we adapt our behavior according to our understanding of how these systems sense, track, and analyze what we do. Some of these systems show us what they know and how they work; other systems may behave as black boxes, recording their observations and making inferences we don’t fully understand. As we learn more about how these systems work, what behaviors are emerging / will emerge to optimize or obscure our participation with them? We had participants design strategies, products, or other interventions that apply this idea to the systems around them. An overview video is below and you can also check out Lift’s Storify of the workshop.
Understanding these emerging trends around how people engage with algorithmic systems deeply informs the work we do in the R&D Lab. As we design and build new kinds of interactions with information, we strive to make those interactions embody the values of transparency, agency and virtuosity in order to create compelling, satisfying and empowered experiences for our users now and in the future.
We’ve been using a new system in the Lab for a few months now, and it has really captured our imagination. The system, called “Curriculum,” is a real-time stream of topics from Lab members’ web browsing activity. So for example, right now, the latest topics in Curriculum are:
- “DHT humidity/temperature sensors”
- “3.3V i2c interface”
- “PIR thermometer device”
…these topics are generated by a semantic analyzer that reads the content of the page and infers the topics that the page is about (in this case, I was just researching some sensors for another project). Of course, there’s a healthy amount of weird noise as well; here are some less-intelligible topics that were also browsed recently:
- “deepest thoughts”
- “G2108 G2110 G2111 G2112 G2113 G2116 G2124 G2125 G212C”
- “insane flow”
Even with some noise, the feed is fascinating enough that it has quickly become a habit for all five Curriculum users to check the feed quite regularly. Checking the feed is rewarding because it is always at least a little bit funny (the imperfections of semantic analysis make for some great robot poetry), and it often affords a deeper or more intimate view into what my colleagues are working on.
Click through for a more in-depth look at Curriculum and how it was designed. [Read more...]
As I was working on a new prototype last week, I once again found myself in the position of having a large set of URLs for which I dynamically needed to get some basic information: page titles, summaries, etc. Instead of writing yet another block of custom code for the project as I have in the past, I went looking for an API or library that I could use for this purpose. While I found a couple of options, most options were external services upon which my code would be dependent, and those services either seemed poorly maintained or were paid products. So after a brief discussion with others in the lab to make sure that this was something that was broadly useful (the answer was a resounding yes), I decided to write a simple Python module to get meta information from web pages.
The pageinfo module is very straightforward: You import it, pass it a url, and it gives you back the following (where available):
- Page title
- Page description
- Twitter card data
- Facebook open graph data
Since this seems like a task lots of people need on a regular basis, I packaged up pageinfo and it is available to install via pip (details below), or for those who may want to tweak or expand upon the concept, the code is all up on nytlabs’ github. Below are details on how to install and use pageinfo. [Read more...]
There are a lot of approaches to naming projects; some groups like very literal names (the Very Large Array radio telescopes, for example) while others choose names more evocative yet related (Space Shuttles Endeavour and Discovery) and still others choose names that barely apply at all to the task at hand (Mac OS X Leopard, Snow Leopard, Lion, Mountain Lion, etc.) My team and I tend to land somewhere in the middle: evocative and fun. I think that a good project name should bring to mind an image of the work and its possibilities, and give a group of developers and creators a quick shorthand language to refer to a broader, multifaceted idea.
The task that usually triggers this discussion is the creation of a code repository. Our process gets to a development step pretty early; by this point we’ve typically had an idea, talked a little about the best way to elucidate that idea, and started some very rudimentary coding to prove it out. This is usually within hours, or just a few days, from the initial concept coming into focus. It’s a chore to rename a repository if we get it wrong, but more to the point, giving a thing a name gives it a shape that’s hard to break out of once it’s been set. Here are some examples: [Read more...]