baleen - Library for Validating Legacy Data - Kotlin Resources

Maven Central


Baleen is a library for validating streams of data (XML, CSV, …). It can be especially useful for legacy data because a schema can slowly be introduced.

Since it is written in Kotlin it can be easily used in any JVM language.

Getting Started


Binaries and dependency information for Maven, Ivy, Gradle and others can be found at

You will need the baleen core for describing the data and one or more parsers baleen-csv or baleen-xml.

Example for Maven:


and for Gradle:

compile 'com.shoprunner:baleen:x.y.z'
compile 'com.shoprunner:baleen-csv:x.y.z'
compile 'com.shoprunner:baleen-xml:x.y.z'


See CSV example

Getting Help

Join the slack channel

Core Concepts

  • Tests are great

    There are a lot of great libraries for testing code. We should use those same concepts for testing data.

  • Performance and streaming are important

    A data validation library should be able to handle large amounts of data quickly.

  • Invalid data is also important

    Warnings and Errors need to be treated as first class objects.

  • Data Traces

    Similar to a stack trace being used to debug a code path, a data trace can be used to debug a path through data.

  • Don’t map data to Types too early.

    Type safe code is great but if the data hasn’t been santized then it isn’t really typed.

Example Schema Definition

import com.shoprunner.baleen.Baleen.describeAs
import com.shoprunner.baleen.ValidationError
import com.shoprunner.baleen.dataTrace
import com.shoprunner.baleen.types.StringType

val departments = listOf("Mens", "Womens", "Boys", "Girls", "Kids", "Baby & Toddler")

val productDescription = "Product".describeAs {

    "sku".type(StringType(min = 1, max = 500),
          required = true)

    "brand_manufacturer".type(StringType(min = 1, max = 500),
          required = true)

    "department".type(StringType(min = 0, max = 100))
         .describe { attr ->

        attr.test { datatrace, value ->
            val department = value["department"]
            if (department != null && !departments.contains(department)) {
                sequenceOf(ValidationError(dataTrace, "Department ($department) is not a valid value.", value))
            } else {


  • Baleen does not assume that an attribute is not set and an attribute that is set with the value of null are the same thing.

Similar Projects

compile "com.shoprunner:baleen:1.11.2"

Related Libraries


Library for Validating Legacy Data

Last updated 3 mins ago