Does a change in news sentiment predict a change in the stock price?

This is the holy grail question of algorithmic trading. As an engineer moving into the AI space, I wanted to test this empirically. My initial plan was simple: build a pipeline, plug in the industry-standard Financial BERT model (“FinBERT”), and watch the insights roll in.

But before deploying this to production, I decided to run a stress test in my personal R&D sandbox. I called it “The Reality Check.”

The results forced me to rethink my entire architecture.

The Hypothesis: “One Model Fits All”

In the world of Financial NLP, models like FinBERT (ProsusAI) are the gold standard. They are pre-trained on massive corpora of financial news, earnings calls, and analyst reports.

My hypothesis was straightforward: If a model is trained on “financial language,” it should work equally well on a Bloomberg headline and a Reddit thread.

To test this, I built a benchmarking framework in Python to pit 5 different models against two very different datasets:

  1. Formal News: Financial Phrasebank (Clean, editorialized text).
  2. Social Media: Twitter Financial News (Messy, sarcastic, slang-heavy).

The Experiment

I used the Hugging Face transformers library to load a diverse collection of models, ranging from specialized financial experts to generalist transformers:

  • ProsusAI/finbert (The Banker)
  • cardiffnlp/twitter-roberta (The Socialite)
  • distilbert-base-uncased (The Generalist)

The challenge wasn’t just running the models; it was normalizing them. Some models output [Positive, Negative, Neutral], others output [Label_0, Label_1]. I wrote a normalization engine to map every output to a standard schema so I could compare apples to apples.

The Result: The Accuracy Gap

When I visualized the data using Plotly, the “One Model Fits All” hypothesis fell apart.

  • On Formal News: FinBERT was a genius. It achieved ~97% accuracy, correctly identifying that “Profit rose by 5%” is positive.
  • On Social Media: FinBERT crashed. Its accuracy dropped to ~30%.

Why? Because FinBERT doesn’t speak “Internet.”

When a user on Twitter says, “My portfolio is bleeding but I have diamond hands đź’Ž,” a traditional financial model sees the word “bleeding” and predicts Negative. But any crypto trader knows “diamond hands” implies a stubborn, bullish conviction (Positive).

The Lesson: We Need an Ensemble

This experiment proved that in AI engineering, domain expertise is not enough; context expertise matters.

A model trained on the Wall Street Journal cannot navigate r/WallStreetBets. This “Reality Check” saved me from deploying a flawed system that would have misread 50% of the market’s signals.

What’s Next?

The failure of the single-model approach led me to design a Multi-Agent Quorum. Instead of relying on one brain, I am now building an architecture where:

  1. “The Banker” Agent handles news.
  2. “The Socialite” Agent handles tweets.
  3. A “Meta-Agent” resolves the conflicts.

You can check out the code for this benchmark and follow the development of the Agentic Quorum (see 02_Agent_Quorum_POC.ipynb) in my GitHub repository.

A Philosophy of Software Design, by John Ousterhout, is a great read for anyone who wants to understand what actually causes systems to be complex and in turn, how to improve their own designs. At some point in this book, actually on page 169, they mention that this book is about one thing: complexity. How it happens and how to avoid it.
Increased complexity in a codebase makes it difficult to make changes without breaking features. It also makes it difficult to understand all the moving parts and how they work together.
Two resounding causes of complexity are identified as dependencies and obscurity. The book goes into detail about how to minimize or isolate them.
Another great point that I learned from this book is that development is not about building features but about building abstractions. Building the right abstractions also makes systems scalable, changeable, obvious.
This book is very easy to read and understand, the author explains the concepts in a simple and direct way and might only take you a week to read it, as it is only about 170 pages. So, give it a try!

Asynchronous calls allow client applications to react to changes on the server without impacting the users experience and without the need of the user to specifically interact with that interface to receive those updates.

It allows the system to process the results of a given request as soon as the information is received. It will not lock up the application during this period since the execution of this block of code is delayed.

Two ways to perform requests asynchronously in JavaScript are by using callbacks and by using promises. Note that these are both non-interchangeable, which means that you either use promises or callbacks, not both.

JavaScript promises vs callbacks, which is better? Let’s discuss.

Read more >

I have been spending a large portion of my time reviewing my design for an application I’m working on… and part of it involves deciding whether what I wrote made sense.
When I was reading through my code, I saw that certain portions of the code had comments. They seemed innocent enough, basically explaining my thought process and what was the purpose of a given method or the next set of code lines. I thought that I was providing good information to whoever would need to read it (including myself).
However, as I’ve been learning more about good software development practices, I realized that the comments I wrote were written purely because the code wasn’t written intuitively. Without the comments, the lines of code were unclear. I wasn’t quite sure why I did what I did, especially since it was weeks since I first wrote it.

So, I found the following two actions to greatly help improve the readability of my code.

Read more >

I watched a fantastic talk from J.B. Rainsberger called “Integrated Tests are a Scam.” It was an insightful discussion about the types of tests that we write during software development and the types of tests that actually help you ensure proper code coverage.

Read more >

Software engineering can be incredibly complex. There are a variety of tools, software patterns, architectural decisions, and process flows. This can be daunting for a new engineer who wants to make an impact. It can help to take a step back and look at the bigger picture. Building software isn’t just about writing code.

Here, I write about simple ways to make an impact, even when you are starting out in your software development career.

Read more >

Unit testing is a great idea. It provides for code coverage, is a resource for documentation, and, paired with TDD, it provides a vehicle for good design. There are a lot of articles and blogs talking about why unit tests are important; however, it’s hard to know how to write good unit tests. This blog will talk about how to build a suite of robust unit tests that still allow for refactoring.

Here are some tips and tricks that have allowed me to leverage the value of unit tests while still having the flexibility to refactor during the development process.

Read more >

Dependency injection is a form of inversion of control. This means that a component’s dependencies are not part of a component’s internals, they are defined at the public-facing seam, or interface, of the component.

Dependency injection allows for a decoupled design and makes testing easier. It fosters clear boundaries between components and allows for the simple substitution of one concrete implementation of a dependency with another concrete implementation. It also allows for substitution of a dependency with a mock, or fake, during testing.

This blog details some techniques for using dependency injection in React.

Read more >

Yup is a schema validation library. One would use it to add property validations that execute prior to form submission. It is also pretty easy to use, it’s just difficult to find concrete examples to model off of.

Because of this difficulty in finding good examples that showcase its syntax, I’ve compiled some common examples in Javascript.

Read more >