Splitting Names into First and Last Without Delimiters: A SQL Solution

Splitting Names into First and Last Without Delimiters

=====================================================

In this article, we will explore how to split a field of mixed names into first and last names where no delimiter exists.

The Problem


We have a dataset with 1 million records, which includes both personal and business names. The column Last contains all the names, including both types, without any delimiters. Our goal is to split these names into first and last names.

Example Data


Here’s an example of what our data might look like:

LastFirst
James Brown
Dicks Sporting Goods
Charlie Chaplin
Five Guys Burgers and Fries
Bob Newhart
Big Lots

As you can see, some of the names are clear (James Brown) while others are not (Dicks Sporting Goods).

Solution


After analyzing the problem, we found that a simple CASE statement can be used to split the names into first and last.

with 
names (
  FullName
) as (
  select *
  from (
    values 
      ('James Brown')
    , ('Dicks Sporting Goods')
    , ('Charlie Chaplin')
    , ('Five Guys Burgers and Fries')
    , ('Bob Newhart')
    , ('Big Lots')
    , ('George Walker Bush')
  ) q (FullName)
)

select 
  FullName
, case
    when len(replace(FullName, ' ', '')) = len(FullName) - 1
      then substring(FullName, 1, charindex(' ', FullName) - 1)
    else FullName
  end as 'First'
, case
    when len(replace(FullName, ' ', '')) = len(FullName) - 1
      then substring(FullName, charindex(' ', FullName) + 1, len(FullName) - charindex(' ', FullName))
  end as 'Last'
from names

This query first creates a temporary table names with the full names. Then it selects from this table and uses two CASE statements to split the names into first and last.

How It Works


The CASE statement works by checking if the length of the name minus the length of the spaces equals 1. If so, it means that there is no space in between the first name and the last name, therefore we can use charindex(' ', FullName) to get the position of the first space and then use substring(FullName, 1, charindex(' ', FullName) - 1) to get the first name. Otherwise, it uses charindex(' ', FullName) + 1 as the start position for the last name.

Limitations


However, this solution has a limitation: without an indicator identifying a name as an organization name, we have no way to correctly distinguish between person names and organization names. For example, in our data, Big Lots is a business name but it’s split into first and last while George Walker Bush is not.

Conclusion


In this article, we explored how to split a field of mixed names into first and last where no delimiter exists using a simple CASE statement. While this solution works for many cases, there are limitations such as distinguishing between person and organization names without additional indicators.

If you have any questions or need further clarification, feel free to ask in the comments below.


Last modified on 2024-09-11