Revision history [back]

One approach to delete the initial zeros from a string/varchar in Spark Scala is to use the regexp_replace function. Here's an example:

import org.apache.spark.sql.functions._

val df = Seq("000123", "004567", "010987", "987654").toDF("num_string")
df.show()

// Output:
// +---------+
// |num_string|
// +---------+
// |   000123|
// |   004567|
// |   010987|
// |   987654|
// +---------+

val df2 = df.withColumn("num_string_trimmed", regexp_replace($"num_string", "^0*", ""))
df2.show()

// Output:
// +---------+----------------+
// |num_string|num_string_trimmed|
// +---------+----------------+
// |   000123|             123|
// |   004567|            4567|
// |   010987|           10987|
// |   987654|          987654|
// +---------+----------------+

In this example, we use the regular expression ^0* to match zero or more occurrences of the digit 0 at the beginning of the string. The ^ character is used to anchor the match at the start of the string. The * character means "zero or more." The regexp_replace function replaces these matches with an empty string (""), effectively removing them from the start of the string. The resulting DataFrame has a new column called num_string_trimmed with the initial zeros removed from each value in the original num_string column.