Sunday, October 9, 2016

Weekly Report 10: Visualising facts and asking questions

Earlier this week, tarrow published factvis, short for fact visualisation. I decided to have a go with the design, and I made this, in the style of cardlists. Note: If my version and tarrow's version of factvis look very similar, my changes are probably pushed to the master branch already.

Screenshot of my factvis design

The facts being visualised come from the ContentMine. It publishes facts about things related to zika, extracted from papers, on Zenodo. A fact has the following structure:

{
  "_index": "facts",
  "_type": "snippet",
  "_id": "AVdDntnH_8VqgcuJwvpW",
  "_score": 1,
  "_source": {
    "prefix": "icle-title>Mosquitos (Diptera: Culicidae) del ",
    "post": "</article-title><source>Entomol Vect</source><y",
    "term": "Uruguay",
    "documentID": "AVdDnq-oJ9hGurOzZIZE",
    "cprojectID": ["PMC4735964"],
    "identifiers": {
      "contentmine": "CM.wikidatacountry8",
      "wikidata": "Q77"
    }
  }
}

As you can see, it has a fact ID, and next to it the actual fact. The fact consists of the found term ("Uruguay"), the text before and after the term (prefix and post), the document it was found in, and identifiers, saying what the term actually means. The identifiers are a ContentMine ID and a Wikidata Entity ID.

That's all it is, for now. Still pretty cool, to distinguish special words and abbreviations from normal one, and linking them with established identifiers like those from Wikidata.

Conifers

The second topic today is asking biological questions about conifers. Now that I know most parts of the ContentMine pipeline with all its extensions, I can start to think of what I want to learn about conifers with it. The first questions are simple ones, or at least ones with simple facts as answers. Take "What height does a grown Pinus sylvestris normally have?". I know the answer is the value of the property height of the tree, and that the value is measured in some length unit.

Now all I have to do is search for the answer. Not that easy, but doable. First, I see if there actually are papers about the height of trees under normal conditions. So let's search EUPMC with the following query:

"Pinus sylvestris"[Abstract] AND height

With this, it searches for articles with the exact text "Pinus sylvestris" in the abstract, and with the word "height" anywhere in the article. The first found article is, at first sight, a bit unclear in wether it has an interesting answer, so let's move on to the second one. Remember, we are only taking a peek at what's inside. The second article however, looks more promising. The first table already contains exactly what we're looking for, and more than that. Apart from the height of Pinus sylvestris species it also has the diameter, and all this for two other conifers as well.

The same goes for the third article. While the first table hasn't got height data, it does have the diameter of several species in separated age groups, not to mention the properties I hadn't even thought of, like bark crevice depth, and canopy cover.

(I tweeted about the fourth one, as there were some funny stylesheet issues)

And if only three papers yield so much, imagine what can be done with more papers. The search I showed had 78 results, and when combined with searches for all the other species, there should be hundreds of articles having answers to just one, simple question. And with the ContentMine, I can "read" all those articles, and collect and summarise all these facts, in a matter of hours. Of course, I'll need to make some specialised programs to perform exactly what I want to do, so that's exactly what I'm going to do the next months.

No comments:

Post a Comment