Measuring Innovativeness

Author

Bas Machielsen

Published

February 16, 2026

Introduction

This blog post contains a literature review of studies attempting to measure the innovativeness of various descriptions. Our strategy: 1. Uniqueness of embeddings compared to all other embeddings 2. Comparison of description embeddings with embedded patents/scientific text 3. Preprocess into concepts, then classify descriptions according to concepts, then score concepts as innovative or not, either empirically (by counting 1st occurence of concept), or a priori.

Seminal Article

Measuring Technological Innovation over the Long Run by Bryan Kelly, Dimitris Papanikolaou, Amit Seru, and Matt Taddy. Published in American Economic Review: Insights (2021)

The study uses textual analysis on millions of patent documents.
It computes the textual overlap (similarity) of a patent with two distinct “corpora” of writings:
1. Past Patents (Prior Art): To measure novelty. A patent is “novel” if it has low overlap with previous patents (it uses different words/concepts).
2. Future Patents (Subsequent Work): To measure impact/influence. A patent is “impactful” if it has high overlap with future patents (subsequent inventors start using its language).
They define a “breakthrough innovation” as a patent that is textually distinct from the past but textually similar to the future. This effectively treats the “future corpus of patents” as the “innovative writings” that validate the original patent’s importance.

Other Similar Studies

New Ideas in Invention” (Packalen & Bhattacharya, 2015/2020)
- Method: They index every 1-3 word phrase (“concepts”) in the entire patent corpus.
- Innovation Measure: They determine the “age” of a patent’s ideas based on the first appearance of those concepts in the corpus. A patent is considered more innovative if it uses “newer” concepts (words that just appeared in the “innovative” lexicon).
- Something we could also use - first require an LLM to identify “concepts” in each description, then employ this method.
“A Text-Based Analysis of Corporate Innovation” (Bellstam, Bhagat, & Cookson, 2021)
- Method: This study flips the direction. It measures the innovation of firms (not just patents) by comparing the text of Financial Analyst Reports to the text of Patent Grants.
- Connection: If you recalled the study comparing “writings” (analyst reports) to “patents” to measure innovation, this is likely the one.
Studies Linking Patents to Scientific Literature (Science-Technology Linkage)
- Example: Measuring Science and Innovation Linkage Using Text Mining (Motohashi, 2018) or The Dual Frontier (Ahmadpoor & Jones, 2017).
- Method: These studies measure how “scientific” or “novel” a patent is by computing the text overlap or citation distance between the patent and a corpus of scientific research papers (e.g., Web of Science, PubMed). If the patent overlaps heavily with basic science journals, it is often considered more “radical” or “science-based.”
The Diffusion of Disruptive Technologies](https://www.nber.org/papers/w28999) Authors: Nicholas Bloom, Tarek Hassan, Aakash Kalyani, Josh Lerner, and Ahmed Tahoun Journal: NBER Working Paper (2021) / Quarterly Journal of Economics (2025) Methodology:

Corpus: They use three distinct text corpora: Patents (technical invention), Earnings Call Transcripts (corporate strategy discussions), and Job Postings (labor demand).
Method: They first identify novel technical terms (unique bigrams like “touch screen” or “cloud computing”) that appear in patent documents.
Comparison: They measure the “disruptiveness” and diffusion of these technologies by tracking when these patent-derived terms start appearing in Earnings Calls (implying the technology is now commercially relevant to CEOs) and Job Postings (implying the technology is reshaping the workforce).
Why it fits: It validates the “innovativeness” of a patent not just by looking at other patents, but by checking if the “writings” of corporate executives and HR managers begin to overlap with the technical language of the patent.

5.The Impact of Artificial Intelligence on the Labor Market Author: Michael Webb Journal: Stanford University Working Paper (2020) (Widely cited in top journals like AER) Methodology: * Corpus: Patents (specifically AI-related patents) and **O*NET Job Descriptions (a government corpus describing the tasks involved in every occupation). * Method: He extracts “verb-object” pairs from the patent text (e.g., “diagnose disease,” “recognize image”) to capture what the technology does. He does the same for job descriptions to capture what workers do. * Comparison: He computes the overlap (quantified via text similarity) between the patent tasks and the job tasks. * Why it fits:** It classifies the impact of an innovation (specifically AI) by measuring how much its text overlaps with the “corpus of work” (job descriptions). A high overlap implies the patent is innovative enough to potentially automate or augment human labor.

6. The Dual Frontier: Patented Inventions and Prior Scientific Advance](https://www.science.org/doi/10.1126/science.aam9527)

Authors: Mohammad Ahmadpoor and Benjamin F. Jones Journal: Science, 2017 (Highly influential in the Economics of Innovation) Methodology: * Corpus: USPTO Patents and the Web of Science (millions of academic/scientific research articles). * Method: The study links patents to scientific articles primarily through citations, but it effectively maps the “distance” between a patent and the corpus of basic science. * Comparison: A patent is considered more “science-based” (and often more fundamental/innovative) if it has a direct overlap (citation) with the scientific corpus. They calculate a “distance to science” metric for every patent. * Why it fits: It measures the nature of the innovation by comparing the patent against the “corpus of very innovative writings” (academic science) rather than just other commercial patents.

7. Text-Based Network Industries and Endogenous Product Differentiation

Authors: Gerard Hoberg and Gordon Phillips Journal: Journal of Political Economy, 2016 Methodology: * Corpus: 10-K Filings (Business descriptions filed by US public firms). * Method: They compute the textual Cosine Similarity between every pair of firms in the US economy based on their product descriptions. * Comparison: instead of looking for overlap to prove “fit,” they look for lack of overlap to measure differentiation. A firm is considered to have a unique or innovative product positioning if its text is distinct from the “corpus” of its competitors’ descriptions. * Why it fits: It uses text overlap (or the lack thereof) to classify the economic position and innovativeness of a firm’s products relative to the market.