Skip to content

Data Science in Finance

Opportunities in Finance Data Science

The Promise of Big Data

There has been an explosion in the velocity, variety and volume of financial data. Social media activity, mobile interactions, server logs, real-time market feeds, customer service records, transaction details, information from existing databases – there’s no end to the flood.

To make sense of these giant data sets, companies are increasingly turning to data scientists for answers. These numbers gurus are:

  • Capturing and analyzing new sources of data, building predictive models and running live simulations of market events
  • Using technologies such as Hadoop, NoSQL and Storm to tap into non-traditional data sets (e.g., geolocation, sentiment data) and integrate them with more traditional numbers (e.g., trade data)
  • Finding and storing increasingly diverse data in its raw form for future analysis

They’ve been aided in this quest by the development of cloud-based data storage and the surge of sophisticated (and sometimes free or open-source) analytics tools. A serendipitous confluence of circumstances is leading to a host of new financial applications.

Sentiment Analysis

Sentiment analysis (aka opinion mining) applies natural-language processing, text analysis and computational linguistics to source material to discover what folks really think.

Businesses like MarketPsy Capital, Think Big Analytics and MarketPsych Data are using it to:

  • Build algorithms around market sentiment data (e.g., Twitter feeds) that can short the market when disasters (e.g., storms, terrorist attacks) occur
  • Track trends, monitor the launch of new products, respond to issues and improve overall brand perception
  • Analyze unstructured voice recordings from call centers and recommend ways to reduce customer churn, up-sell and cross-sell products and detect fraud

Some data companies are even acting as intermediaries, collecting and selling sentiment indicators to retail investors.

Automated Risk Credit Management

Even in places where credit scoring is paltry or poor, Internet finance companies are finding ways to approve loans and manage risk. In Regulation Must Adapt to Big Data Revolution, Lei Yao and Chen Wei discuss the case of Alibaba’s Aliloan.

Aliloan is an automated online system that provides flexible micro-loans to entrepreneurial online vendors. Due to their lack of collateral, many of these vendors have difficulty obtaining funds through traditional channels.

  • To gauge whether a vendor is creditworthy, Alibaba collects data from its e-commerce and payment platforms and analyzes transaction records, customer ratings, shipping records and a host of other info.
  • These findings are confirmed by third-party verification and cross-checked against external data sets (e.g., customs, tax data, electricity records, etc.).
  • Once the loan is granted, Alibaba continues to monitor the use of funds and assess the business’s strategic development.

Entrepreneurs in emerging markets are also reaping the benefits. Like Aliloan, companies such as Kreditech and Lenddo provide automated small loans based on innovative credit scoring techniques. In these cases, much of the score is calculated from applicants’ online social networking data.

Real-Time Analytics

In days of yore, financial institutions were hampered by the lag-time between data collection and data analysis. Real-time analytics short-circuits this problem and provides the industry with new ways to:

  • Fight Financial Fraud: Banks and credit card companies routinely analyze account balances, spending patterns, credit history, employment details, location and a load of other data points to determine whether transactions are above aboard. If suspicious activity is detected, they can immediately suspend the account and alert the owner.
  • Improve Credit Ratings: A continuous feed of online data means credit ratings can be updated in real time. This provides lenders with a more accurate picture of a customer’s assets, business operations and transaction history.
  • Provide More Accurate Pricing: Progressive Insurance already tailors its policies to account for a customer’s changing financial situation. In the Internet of Things, data from automobile sensors will also help insurance companies issues its policy holders with warnings about accidents, traffic jams and weather conditions. That makes for safer drivers and fewer payouts.

The Billion Prices project is an example of this phenomenon in action. Frustrated with the lag time on the U.S. Bureau of Labor Statistics’s consumer price index (CPI), MIT’s Alberto Cavallo and Roberto Rigobon turned to information from the web.

Every day, their software collected half a million prices of products sold in the U.S. and analyzed the results. In 2008, just after Lehman Brothers filed for bankruptcy, their tool was able to detect a deflationary swing in prices far earlier than the official CPI report did.

Today, banks and other major financial institutions use PriceStats – the project’s commercial spinoff – to analyze inflation trends around the world.

Customer Segmentation

Like every other industry on the planet, banks and financial institutions are hungry to know more about the people using their products and services. And though they already store a ton of data – from credit scores to day-to-day transactions – they’re not too proud to look for it elsewhere.

As Sushil Pramanick notes in Big Data Use Cases – Banking and Financial Services, they continue to purchase data from a host of retailers and service providers in an effort to create a 360-degree view of their customers.

This kind of customer segmentation allows them to:

  • Offer customized product offerings and services
  • Improve existing profitable relationships and avoid customer churn
  • Create better marketing campaigns and more attractive product offerings
  • Tailor product development to specific customer segments
  • And more

Predictive Analytics

By combining segmentation with predictive analytics, companies can also cut down on risk. For example, to decide whether certain customers are likely to pay off their credit cards, some major banks use technology developed by the company Sqrrl. This analysis takes into account the demographic characteristics of customers’ neighborhoods and makes calculated predictions.

Similar strides have been made in forecasting market behavior. Once upon a time (e.g., 2009), high-frequency trading – the speedy exchange of securities – was hugely lucrative. With competition came a drop in profits and the need for a new strategy.

HFT traders adapted by employing strategic sequential trading, using big data analytics to identify specific market participants and anticipate their future actions. In a field of breakneck speed, this gives HFT traders an unmistakable advantage.

Predictive analytics can also be used to issue early warnings on the market. In their paper, Quantifying Trading Behavior in Financial Markets Using Google Trends, Tobias Preis, Helen Susannah Moat and H. Eugene Stanley focused on the behavior of search engine users.

By studying search volume data provided by Google Trends, they were able to identify online precursors for stock market moves. Their results suggest that increases in search volume for financially relevant search terms usually precede big losses in financial markets.

Data Risks and Regulations

The Importance of Data Humility

The lack of an interdepartmental data strategy, data trapped in awkward and inaccessible “silos,” an inability to handle overwhelming volumes of information – the financial institution has a long way to go before it can claim it has a handle on big data.

Another pitfall for data scientists is overestimating what the data can tell them. For example, Johan Bollen, Associate Professor at Indiana University’s School of Informatics and Computing, has estimated that even if social-sentiment trading signals achieve an accuracy of 80 percent, the margin of error is still enough to result in bankruptcy.

That calls for a certain amount of data humility. As Rod Bodkin points out in Big Data Opens New Doors For Financial Analysts, tools like sentiment analysis need to be combined with other factors if institutions are to gain an accurate picture of events. He says, “You want to see a combination of sentiment plus some other factors like fundamentals, trading activity, and trending over time.”

In fact, the market has already discovered that trusting faulty algorithms can lead to disastrous results:

  • In May 2010, a flash crash – the largest intraday decline in the history of the Dow Jones Industrial Average – threw the market into temporary panic. U.S. regulars laid the blame on aggressive HFT trading.
  • In April 2013, a fake tweet announced the White House had been subject to a terrorist attack. This faulty data point wiped out 1 percent of the Dow Jones in seconds.

And then, of course, there’s a little thing called privacy…

Fair Credit Reporting Act (FCRA)

Financial institutions, unsurprisingly, are subject to some of the U.S. government’s most stringent privacy laws and regulations. Much of this deals with consumer rights.

During the 1960s, the Retail Credit Company made a move to computerize its records. In response to consumer concern about the availability of information, the U.S. Congress held a series of hearings in 1970.

The result was the Fair Credit Reporting Act (FCRA), which set forth legal standards governing the collection, use, and communication of credit and other information about consumers. This includes information about a consumer’s credit worthiness, credit standing, credit capacity, character, general reputation, personal characteristics, or mode of living, that is to be used for these purposes.

The act applies to financial institutions and any business or individual who uses a consumer report for a business purpose.

In addition to the FCRA, the Gramm-Leach-Bliley Act (GLBA) contains restrictions for disclosure of nonpublic personal information to nonaffiliated third parties. All financial institutions are required to provide consumers with a notice and opt-out opportunity.

Equal Credit Opportunity Act (ECOA)

At the moment, financial institutions are at liberty to use predictive and behavioral analytics. Provided, of course, that they’re not breaking the law.

In 1974, the government passed the Equal Credit Opportunity Act (ECOA). The ECOA makes it unlawful for any creditor – including banks, retailers, bankcard companies, finance companies and credit unions – to discriminate against any applicant with respect to any aspect of a credit transaction:

  1. On the basis of race, color, religion, national origin, sex or marital status, or age (provided the applicant has the capacity to contract)
  2. Because all or part of the applicant’s income derives from any public assistance program
  3. Because the applicant has in good faith exercised any right under the Consumer Credit Protection Act

Predictive models that unintentionally discriminate against applicants run the risk of running afoul of the long arm of the law.

Keeping Data Safe

In addition to using consumer data ethically, financial institutions are legally bound to store and protect it from theft.

And theft is a huge problem. Verizon’s 2013 Data Breach Investigations Report noted that financial services accounted for the most data breaches in 2012.

Examples of the Fed’s safety rules and regulations include:

  • Bank Secrecy Act (BSA): This requires all U.S. financial institutions to keep records of cash purchases of negotiable instruments, file reports of cash transactions exceeding $10,000 (daily aggregate amount), and to report suspicious activity that might signify money laundering, tax evasion and other criminal activity.
  • Fair and Accurate Credit Transactions Act (FACTA): FACTA contains several provisions that require financial institutions, creditors, and other businesses that rely on consumer reports to detect and resolve fraud by identity theft.
  • FACTA Disposal Rule: The child of FACTA also states that any business or individual who uses a consumer report for a business purpose must properly dispose of the information in the consumer reports and records to protect against “unauthorized access to or use of the information.”
  • Payment Application Data Security Standards (PA DSS): These data security standards apply to software vendors and others who develop applications that store, process, or transmit cardholder data as part of authorization or settlement, where these payment applications are sold, distributed or licensed to third parties.
  • Payment Card Industry Standard (PCI DSS): PCI DSS provides a baseline of technical and operational requirements designed to protect cardholder data and requisites for compliance reporting and business certification for processors of cardholder data.

As storage moves to the cloud and data access approaches the speed of light, financial institutions must be careful to keep their sensitive information very safe indeed.

History of Data Analysis and Finance

“Money management has been a profession involving a lot of fakery — people saying they can beat the market and they really can’t.” – Robert Shiller

On the morning of March 22, 1899, in a rented office on the fifth floor of the Gould Building in Atlanta, a brand new company opened for business. Seated at their desks were two brothers: Cator and Guy Woolford. Printed on the door, in fresh black ink, was the sign, “Retail Credit Company.”

Now known as Equifax, the Woolfords’ venture marked a turning point in the history of finance. Data intelligence, the Woolfords realized, could be profitable.

The Rise of Credit Reports

It began as the “Merchant’s Guide” – a $15 hard-covered book containing a list of customers and information on their creditworthiness. This enabled merchants and retailers to decide who should be entrusted with personal charge accounts.

But the brothers quickly realized that 15-cent credit reports weren’t paying the bills. In 1901, they were saved from disaster by a request from a cashier from the Home Life of New York company. Could the Woolfords please supply information on three local applicants for life insurance?

From that point on, the company became a behemoth. To provide accurate credit and insurance reports, Equifax began to:

  • Collect data on the health, habits and morals of U.S. citizens
  • Examine employment records and investigate financial decisions
  • Аccrue statistics on childhood, marriage, education and politics

Nor was it alone in this endeavor. In 1969, TransUnion acquired 3.6 million card files stored in 400 cabinets – the valuable assets of the Credit Bureau of Cook County (CBCC).

By the early 1970s, it had replaced this manual mess with automated tape-to-disc transfer.

By the 1980s, it was part of one of the largest conglomerates in the country.

A Revolutionary Concept: Credit Scores


Under the balmy skies of San Rafael in 1956, two alumni of the Stanford Research Institute (SRI) were setting up shop in a studio apartment on Lincoln Avenue.

Bill Fair was an engineer; Earl Isaac was a mathematician. Both were aware of the power of computers through their research for the Defense Department. Both were enthralled with the potential of applying data analytics to solve business problems.

In 1958, Fair, Isaac and Company (FICO) sent a letter to fifty of the largest U.S. credit grantors offering to demonstrate a new tool: credit scoring.

As Larry E. Rosenberger and John Nash explain in The Deciding Factor: The Power of Analytics to Make Every Decision a Winner, this predictive analytics model:

“…was the first to use historical data being captured by finance companies to predict a person’s creditworthiness based on their past behavior. The model produced a score, based on analysis of specific sets of numbers related to variables such as a person’s bank balance and payment records.”

It was a revolutionary concept. With this simple score in hand, major lenders could instantly determine an applicant’s credit risk.

Just one company responded to their letter.

Making It Work

Nevertheless, Fair Isaac’s model signaled the future of finance. Credit cards were taking the place of cash. Mainframe computers were becoming more ubiquitous. Data was going electronic.

All of this information needed to be organized and put to good use:

  • 1960s: Conrad Hilton installed an IBM computer system for Carte Blanche that performed a daily check on the state of accounts and send reminders to delinquent cardholders.
  • 1972: Isaac’s software for the Automated Strategic Applications Processing (ASAP) system debuted at Well Fargo. Built on analytics models, this was the first automated loan application-processing system in the country.
  • 1975: Fair Isaac developed the first behavior scoring system to predict the credit risk of existing customers.

To Market, To Market

Savvy economists were equally excited by the potential of applying large-scale data analytics to the financial market.

Take 1973 (the same year folks were gripped by the scandal of Watergate and dancing to reggae in the streets). Not many noticed when the Journal of Political Economy published a paper by Fischer Black and Myron Scholes entitled, “The Pricing of Options and Corporate Liabilities”. Nor did many care to read their descriptions of stochastic partial differential equations.

Yet the creation of the Black-Scholes Model (as it would come to be known) was a key event in data science. Thanks to Black and Scholes, along with the subsequent work of Robert Merton, this model allowed traders to estimate the optimal price for stock options over time. It sliced risk off the buying and selling of underlying assets, prompted a boom in options trading and netted Merton and Scholes a Nobel Price in Economics.

Harvard Meets Yale

During the 1980s, a Harvard graduate and Vietnam vet named Karl Case was absorbed in an economics project. To study the ebb and flow of home pricing trends, Case had accrued several years of data on Boston house sales and was developing a rudimentary index to compare repeat sales of the same homes.

In 1985, Case met Robert Shiller, a Yale economist interested in behavioral aspects of economic bubbles. Working together, Case and Shiller added housing data from other cities and refined Case’s work into the Case-Shiller index – a tool that could track the relative changes in the price of real estate over time.

In 1989, they produced the first empirical paper on housing bubbles. Analysis of big data, they demonstrated, could be used for the greater good. Shiller went on to predict the stock market bubble of 2000 and forecast early warnings about the Great Recession. In 2013, he too won the Nobel Prize in Economics.

The World Goes Online

Then things got really fast.

When the world came online in the late 20th century, a new economy sprang up overnight. The exchange of financial information increased exponentially. E-commerce companies grew like weeds. Investors heard the siren call of Silicon Valley. In 1999, there were 457 IPOs, most of which were technology-related.

  • 1995: Security First Network Bank, the first Internet bank in the world, was born.
  • 1998: PayPal launched its service for transferring payments through the Internet.
  • 2000: The bubble reached its limit on March 10. The NASDAQ peaked at 5408.60 in intraday trading and closed at 5048.62.

The Internet also fundamentally changed how the financial industry conducted business. In the first decade of the 21st century:

  • Investors from every corner of the planet could watch the leaps and plunges of the market unfold in real-time.
  • Thanks to the widespread availability of market data, financial education tools and expert commentary, all users had the ability to educate themselves about the industry.
  • Bank accounts, brokerages, investment management, insurance, credit cards, securities, futures – all these and more made a steady migration to online settings.
  • Social media began to supply companies with an unfiltered view of consumer opinion.
  • With the arrival of mobile devices, finance took to the streets, providing 24/7 access for every participant.


Posted in


Leave a Comment


Sign Me Up!

Scroll To Top