Wednesday, August 22, 2018

Citation.js: Showing Blogger Posts on a Different Site

Citation.js: Showing Blogger Posts on a Different Site

I made a small client for Blogger that takes a tag and transforms it into its own little blog: citation.js.org/blog/?post=542…. No metadata though, as it’s all client-side.
— Lars Willighagen (@larswillighagen) August 6, 2018

I made a Material-themed page showing Citation.js blog posts from Blogger. It supports pagination, tags, search and linking individual posts. Since it’s a single, static page I can’t support meta and link tags for metadata, that would require JavaScript which indexers don’t run.

The great thing about the Blogger API is that you can generate feeds for single tags, like Citation.js for example, and search for tags and general queries within that tag. That’s what makes all this possible. The URL scheme is very simple:

# Tag feed
https://$BLOG.blogspot.com/feeds/posts/default/-/$TAG

# Tag-in-tag feed
https://$BLOG.blogspot.com/feeds/posts/default/-/$TAG/$OTHER_TAG

# Search-in-tag feed
# Note: don't copy this, there's a ZWS before ?q= for syntax highlighting
https://$BLOG.blogspot.com/feeds/posts/default/-/$TAG​?q=$QUERY

# Post
https://$BLOG.blogspot.com/feeds/posts/default/$POST

Pagination and response formats complicate things a little, and are dealt with in the code below.

Apart from the Material theme, it only uses vanilla JavaScript to generate the pages. The search bar doesn’t even use JavaScript at all, just good ol’ form semantics. The JavaScript it does use is fairly simple. First, the query is parsed and an API URL is generated.

window.onload = function () {
  var params = {}
  
  location.search.slice(1).split('&').map(function (pair) {
    pair = pair.split('=')
    params[pair[0]] = pair[1]
  })

  var url

  if (params.post) {
    url = 'https://larsgw.blogspot.com/feeds/posts/default/' + params.post + '?alt=json-in-script&callback=cb'
  } else if (params.tag) {
    url = 'https://larsgw.blogspot.com/feeds/posts/default/-/Citation.js/' + params.tag + '?alt=json-in-script&callback=cb'
  } else if (params.query) {
    url = 'https://larsgw.blogspot.com/feeds/posts/default/-/Citation.js/?q=' + params.query + '&alt=json-in-script&callback=cb'
  } else {
    url = 'https://larsgw.blogspot.com/feeds/posts/default/-/Citation.js?alt=json-in-script&callback=cb'
  }

  var startIndex = location.href.match(/start-index=(\d+)/)
  if (startIndex) {
    url += '&' + startIndex[0]
  }

  load(url)
}

Since the only JSON API for Blogger is JSON-in-script, we append a script element loading the resource. This then calls the callback, cb.

function cb (data) {
  content.innerHTML = data.feed ? templates.feed(data.feed.entry) : templates.feedItem(data.entry)

  // pagination
  if (data.feed) {
    var href = location.href
    var hasIndex = href.indexOf('start-index') > -1
    var hasParams = href.indexOf('?') > -1
    var indexPattern = /start-index=(\d+)/

    var prev = find(data.feed.link, function (link) { return link.rel === 'previous' })
    if (prev) {
      prev = 'start-index=' + prev.href.match(indexPattern)[1]
      var url = hasIndex ? href.replace(indexPattern, prev) : href + (hasParams ? '?' : '') + prev
      paginatePrev.setAttribute('href', url)
    }

    var next = find(data.feed.link, function (link) { return link.rel === 'next' })
    if (next) {
      next = 'start-index=' + next.href.match(indexPattern)[1]
      var url = hasIndex ? href.replace(indexPattern, next) : href + (hasParams ? '&' : '?') + next
      paginateNext.setAttribute('href', url)
    }
  }
}

function load (url) {
  loader.setAttribute('src', url)
}

The callback then uses simple templates, which are just JS functions taking in the API response and outputting HTML to show the results on the page. Then, it figures out the pagination. Below is an example template. It extracts the post id to make links and does some preprocessing, removing stackedit metadata and styling and lowering each heading two levels. Then, it puts together the HTML with some additional util functions and subtemplates.

  feedItem: function (item) {
    var id = item.id.$t.replace(/^.*\.post-(\d+)$/, '$1')
    var content = item.content.$t
      .replace(/^[\s\S]*<div class="stackedit__html">([\s\S]*)<\/div>[\s\S]*$/, '$1')
      .replace(/<(\/?)h([1-6])/g, function (match, slash, level) {
        if (+level > 4) {
          return '<' + slash + 'b'
        } else {
          return '<' + slash + 'h' + (+level + 2)
        }
      })

    return '<div class="mdl-card mdl-shadow--2dp mdl-cell mdl-cell--12-col">' +
      '<div class="mdl-card__title">' +
        '<h2 class="mdl-card__title-text">' +
          '<a href="?post=' + id + '">' + item.title.$t + '</a>' +
        '</h2>' +
      '</div>' +
      '<div class="mdl-card__supporting-text mdl-card--border">' +
        '<p>' +
          '<span><i class="material-icons">edit</i> ' +
            templates.author(item.author[0]) +
          '</span>' +
          '<span><i class="material-icons">access_time</i> ' +
            formatDate(item.updated.$t) +
          '</span>' +
          '<span><i class="material-icons">link</i> <a href="' +
            canonical(item.link) +
          '">Original post</a></span>' +
        '</p>' +
        '<p>' +
          '<span><i class="material-icons">bookmark</i> ' +
            map(item.category, templates.tag).join(' ') +
          '</span>' +
        '</p>' +
      '</div>' +
      '<div class="mdl-card__supporting-text">' +
        content +
      '</div>' +
    '</div>'
  },

The full source is available at here, and the page can be viewed here.

Blog screenshot
Blog screenshot

Saturday, August 11, 2018

Modern Altmetric badges

Modern Altmetric badges

I recently found myself working with Altmetric badges again, and I realized how cumbersome it can be to work with scripts. The Altmetric badges can only be added by using their JavaScript library, while it would be a lot more user friendly to have a simple URL that embeds the badge, preferably even an image. I may be a bit spoiled by the badge ecosystem of the open source community, including Shields.io. There, badges are dynamically generated on the server side.

Badges in open source JavaScript projects
Badges in open source JavaScript projects

Unfortunately, Shields doesn’t support Altmetric. It does, however, support dynamic badges, and Altmetric does have an API. The endpoint is api.altmetric.com/v1/doi/ for DOI-based access (which is what we want in this case). So the parameters needed for the badge are:

  • Data type: JSON (the output format of the API)
  • label: Altmetric
  • url: https://api.altmetric.com/v1/doi/<DOI>
  • query: $.score
  • style: social

The logo would be https://www.altmetric.com/wp-content/themes/altmetric/favicon.ico, but I can’t get that to work. The resulting URL is

https://img.shields.io/badge/dynamic/json.svg?url=https://api.altmetric.com/v1/doi/DOI&label=Altmetric&query=$.score&style=social

Which looks like this: Altmetric badge

Note, however, that the use of the Altmetric API is limited:

Free, rate-limited API

  • No key required.
  • Includes research object metadata and metrics only.
  • Available only for one-time, limited term research projects.
  • Best for small projects.
  • Rate limited to 1 call per second.

Wednesday, July 18, 2018

Citation.js: Use Case for a Wikidata GraphQL API

Citation.js: Use Case for a Wikidata GraphQL API

Citation.js has supported Wikidata input for a long time. However, I’ve always had some trouble with the API. See, when Citation.js processes Wikidata API output (which looks like this) and gets to, say, the P50 (author) property, it encounters this:

"P50": [
	{
		"mainsnak": {
			"snaktype": "value",
			"property": "P50",
			"hash": "1202966ec4cf715d3b9ff6faba202ac6c6ac3df8",
			"datavalue": {
			"value": {
				"entity-type": "item",
				"numeric-id": 2062803,
				"id": "Q2062803"
			},
			"type": "wikibase-entityid"
			},
			"datatype": "wikibase-item"
		},
		"type": "statement",
		"id": "Q46601020$692cc18d-4f54-eb65-8f0a-2fbb696be564",
		"rank": "normal"
	}
]

The problem with this is that there’s no name string readily available: to get the name of this author, and of any author, journal, publisher, etcetera, Citation.js has to make extra queries to the API, to get the data.

In the case of people, you could then just grab the label, but there’s also P735 (given name) and P734 (family name) in Wikidata. That saves some error-prone name parsing, you might think. However, this is what the API output looks like:

{
    "P735":[
        {
            "mainsnak":{
                "snaktype":"value",
                "property":"P735",
                "hash":"26c75e68a9844db73d0ff2e0da5652c5d571e46d",
                "datavalue":{
                    "value":{
                        "entity-type":"item",
                        "numeric-id":15635262,
                        "id":"Q15635262"
                    },
                    "type":"wikibase-entityid"
                },
                "datatype":"wikibase-item"
            },
            "type":"statement",
            "id":"Q22581$3554EADD-B8D8-4506-905B-014823ECC3EA",
            "rank":"normal"
        }
    ],
    "P734":[
        {
            "mainsnak":{
                "snaktype":"value",
                "property":"P734",
                "hash":"030e6786766f927e67ed52380f984be79d0f6111",
                "datavalue":{
                    "value":{
                        "entity-type":"item",
                        "numeric-id":41587275,
                        "id":"Q41587275"
                    },
                    "type":"wikibase-entityid"
                },
                "datatype":"wikibase-item"
            },
            "type":"statement",
            "id":"Q22581$598DF0D7-CEC7-470B-8D0F-DD320796BF01",
            "rank":"normal"
        }
    ]
}

Another two dead ends, another two (one, with some effort) API calls. It would be great if it was possible to get this data with a single API call. I think GraphQL would be a good option here. With GraphQL, you can specify exactly what data you want. I’m not the first one to think of this; in fact, a simple example is already implemented. This is what a query would look like (variables: {"item": "Q30000000"}): Try it online!

query ($item: ID!) {
  entry: item(id: $item) {
    # get every property
    # to get specific properties, use "statements(properties: [...])"
    claims: statements {
      mainsnak {
        ... on PropertyValueSnak {
          # get property id and English label
          property {
            id
            name: label(language: "en") {
              text
            }
          }
          # get value
          value {
            ... on MonolingualTextValue {
              value: text
            }
            ... on StringValue {
              value
            }
            # if value is an item, get the label too
            ... on Item {
              id
              label(language: "en") {
                text
              }
            }
            ... on QuantityValue {
              amount
              unit {
                value: label(language: "en") {
                  text
                }
              }
            }
            ... on TimeValue {
              value: time
            }
          }
        }
      }
    }
  }
}

Another handy thing is that the API output is basically the equivalent of the query in JSON, but with the data filled in. I think a GraphQL API would be really useful for this and similar use cases, and it definitely seems possible given the fact that there is an experimental API available.

Tuesday, July 17, 2018

Journal Metadata: Authors & Institutions

Journal Metadata: Authors & Institutions

I finished the General Plugin system for Citation.js a few days ago (more on that later), so I could finally publish a new beta release. Now, after that half-finished piece of code had been blocking other work for a long while, I can at last start… fixing bugs, and closing other items in the backlog.

One of the items that has been on the backlog for a long time, and was on the backlog of the previous major version too, was sorting out BibJSON. BibJSON has been “supported” since before CSL-JSON was introduced as the internal standard, but under the name of ContentMine JSON, as I only knew it as the output of ContentMine’s quickscrape tool.

quickscrape output
quickscrape output (source, license: MIT)

Since then, I learned it actually was a more standardised format, but never got to the act of reading the standard and updating the parser. Today, however, I did. Turns out, it is something in between JSON-LD and BibTeX. While searching around for more comprehensive documentation, I saw the journal-scrapers (used by quickscrape) again, which I used to compile some test cases.

Unfortunately, one of the first examples went wrong already. The meta tags, containing the bibliographical data that quickscrape scrapes, specifically data pertaining to the authors, are not structured in a machine-friendly way, in my opinion. Certainly, quickscrape has trouble with it.

...
<meta name="citation_author" content="P. Pandikumar"/>
<meta name="citation_author_institution" content="Division of Ethnopharmacology, Entomology Research Institute, Loyola College, Chennai, India"/>
<meta name="citation_author" content="S. Ignacimuthu"/>
<meta name="citation_author_institution" content="Division of Ethnopharmacology, Entomology Research Institute, Loyola College, Chennai, India"/>
<meta name="citation_author_institution" content="International Scientific Partnership Programme, King Saud University, Riyadh, Saudi Arabia"/>
<meta name="citation_author" content="N. A. Al-Dhabi"/>
<meta name="citation_author_institution" content="Addiriyah Chair for Environmental Studies, College of Science, King Saud University, Riyadh, Saudi Arabia"/>
...

This particular example is from Biomed Central. However, the pattern persists throughout multiple journals: Nature (example), PLOS One (example), PeerJ (example), and probably many more, as these were just the first four I checked.

Prepend view-source: to those example URLs to quickly view the HTML source, with the meta tags.

The pattern is so similar, especially the authors always being after a whole list of citation_references in the case of Nature and BMC, that there must be some sort of library or service that generates these, I thought. This quest first led me to search what kind of tags citation_ are. The fact that the answer wasn’t very easy to find and the amount of unanswered questions I found along the way quickly made it clear what kind of quest this was going to be.

First of all, the tags: they’re called HighWire Press tags. Normally I would link a website, but I don’t think there is any. They’re the preferred method of metadata tagging of Google Scholar, which lists 16 tags, they’re also the preferred format of Mendeley, which points to the Google Scholar documentation, and yet the only thing I find searching for some canonical list is people asking where that list could be, and getting no answers (1, 2).

Even with the 16-tag list, I can find at least two tags, each non-trivial (e.g. citation_reference and citation_author_institution), in any of the examples mentioned above, that aren’t on that list. Not to mention that, again, those examples weren’t chosen, they were picked semi-randomly.

Luckily, I’m not the first one to run into this problem. Someone previously compiled a list of 39 citation_ tags based on observations, which is very useful if I want to write a crosswalk for Citation.js sometime, but doesn’t really help with finding a generator.

Back to HighWire: they claim Nature is one of their customers. BMC and Springer Open, however, aren’t, and yet they share the same system, or a common standard that can’t be found anywhere else. That they share a system makes sense, but what system and/or standard are they using? I asked, and will report back when I get an answer.

Monday, April 30, 2018

Debugging: Deja Vu

Debugging: Deja Vu

I’m going to share some stories on debugging with you, because I’m proud of them. After writing up the first story, I’m no longer particularly proud, but I still want to share the story. Here’s the first: a bug that seemed quite familiar.

After some trouble with mocking API requests I decided that supporting mocking in the browser isn’t as important as supporting mocking at all, so I installed mock-require (I didn’t get proxyquire to work). Now, to confirm that the test bundle script actually omitted the API mocking code from the bundle, I loaded the test suite in the browser. Guess what? Errors everywhere! Or rather, error everywhere. Every test case gave the same error:

TypeError: Cannot assign to read only property 'length' of function 'function () {
          old.apply(self, arguments);
        }'
    at new Assertion (/build/test.citation.js:1810:30)
    at new Assertion (/build/test.citation.js:1799:25)
    at new Assertion (/build/test.citation.js:1799:25)
    at new Assertion (/build/test.citation.js:1799:25)
    at expect (/build/test.citation.js:1775:12)
    at Context._callee2$ (/build/test.citation.js:3582:35)
    at tryCatch (/build/citation.js:7891:17)
    at Generator.invoke [as _invoke] (/build/citation.js:8064:22)
    at Generator.prototype.(anonymous function) [as next] (/build/citation.js:7934:21)
    at step (/build/test.citation.js:3542:221)

Naturally, like a good programmer, I immediately googled the error message in conjunction with the various tools and frameworks I used for this bundle, instead of looking at the stack trace showing that something’s wrong with expect.js. Anyway, after some time, I found a GitHub issue describing exactly the problem I was having. Scrolling through the responses, I was stunned:

larsgw commented on Jul 29, 2017
+1: Same issue, but for all assertions

Somehow, I reported the same issue 10 months ago, but the site, running a bundle from 2 months old didn’t have the problem. Apparently, I had fixed the issue earlier, but forgotten about the solution, and I couldn’t figure out what that solution was. So I started debugging. First of all, the offending code wasn’t different from any other working bundle. It registered a bunch of functions as properties to a function, and it choked on the length property. Makes sense, the length property isn’t writable on functions. But running some simple test code showed this shouldn’t be the problem:

let f = function () {}
console.log(f.length) // 0
f.length = 3   // (no error)
console.log(f.length) // 0

Sure, it didn’t actually do anything, but it didn’t throw an exception either. Besides, this was the exact same lines of code as in the GitHub repo, so how could that be the issue? The only thing I could do was comparing with the working examples. Diffing my bundle against the published one, which was difficult, because it’s generated code. Checking out commits until I found out which one worked, which was a pain because I repeatedly forgot to reinstall dependencies, something that could be critical.

Sidebar (minor spoilers): the previous time I ‘solved’ the issue, it was much easier, because it was the first time setting the system up, so instead of referring to older commits as working examples, I looked at the docs.

After a long while I realised what the workaround was earlier: instead of bundling expect.js with Browserify, I included it as a script (as recommended) and created a wrapper that exposed the expect.js module and simply grabbed and exported the global expect variable exposed by the script. I thought this was because the order of requiring the scripts mattered, but some testing proved this wasn’t the case. No, actually including expect.js into a Browserify bundle with Babelify transform on, or even simply running it through the Babel compiler caused the error.

Back to diffing bundles I guess: what are the differences between a babel-ed file and its source code if there isn’t really any syntax that needs to be transformed, or APIs that need to be polyfilled?

Turns out, not much. Between those files, the only real difference was the location (or with comments: false the existence) of comments, and some style differences caused by Babel’s code generator.

And 'use strict'.

Apparently, 'use strict' makes assignments that otherwise fail silently throw an error. If I had read that documentation earlier, or if I had payed proper attention to the Function.prototype.length docs (linked above), I would have known. Now it’s just a boring ending to a long journey. But hey, at least I learned some stuff.


Solving this issue requires either a big change in the internals of a toolkit that hasn’t had an update in 4(!) years, or a workaround. I don’t want to use the workaround of including two extra files anymore, now that I know what is causing the issue, but the other workaround proves to be a problem itself, involving outdated documentation and a bug in Babelify. More on that later.

On a related note: I’m working on a new release for Citation.js, improving the parsing plugin system. The API should be pretty stable now, apart from the namespaces being prone to change, so I might change the schedule to one update with all current API changes instead.

Sunday, February 4, 2018

Microscopic photography: Part 2

I promised more photos in the previous post, so here they are.

Penicillium:

Penicillium
Detail

Penicillium with conidiophores
With conidiophores

Penicillium (conidiophore)
Detail of conidiophores

More fungi:
Fungus cells

Fungus cells

Cat brains:
Cat brain cells


This post is part of a series.