Supplying Info to Containers

Contents

Introduction

When running a container in Docker or Kubernetes, it is usually necessary to pass some configuration information or other reference data into the container. In what follows, we describe various types of information that frequently need to be passed as well as methods you can use to do so.

Supplying Environment Variables to a Container

Images can publish a set of environment variables that can be set at container startup time. These environment variables can be defined in the Dockerfile (or equivalent).

When starting an image, you can set values for these (or any) environment variables. The values can be consumed in the COMMAND or ENTRYPOINT script.

For example, SQLstream supplies 3 standard images:

  • sqlstream/micro
  • sqlstream/slim
  • sqlstream/complete

For each of these, you can set values for environment variables like the following:

  • SQLSTREAM_JAVA_SECURITY_KRB5_CONF
  • SQLSTREAM_JAVA_SECURITY_AUTH_LOGIN_CONFIG

For more information about those parameters, see Appendix: SQLstream s-Server Authentication.

Note: Whether relying on default values or passing revised settings into the container, you must ensure that the files to which they point are present on (or accessible from) that container. See the next section: Supplying Files to a Container).

Docker Run

In this example we have the credentials files in a local directory, /path/to/credentials, and we mount them as a volume. The environment variables are set using the container’s view of the file system.

docker run -d -p 5580:5580 -p 5570:5570 \ 
   -e SQLSTREAM_JAVA_SECURITY_KRB5_CONF=/mnt/credentials/mykrb5.conf \
   -e SQLSTREAM_JAVA_SECURITY_AUTH_LOGIN_CONF=/mnt/credentials/myjaas.conf \
   -v path/to/credentials:/mnt/credentials \
   sqlstream/slim

Docker Compose

For Docker Compose we can directly define environment variables either in the default file .env or in another file that can be referenced using docker-compose --env-file myfile.env.

Environment variables can then be used and will be expanded in the YAML file. And inside those files you can explicitly set environment variables for the container:

sqlstream:
  environment:
    - SQLSTREAM_JAVA_SECURITY_KRB5_CONF=/data/krb5.cnf
    ...

For more information, see Docker’s article: Environment variables in Compose.

Kubernetes

In the pod specification we can use the env section to set environment variables. Once again, you must ensure the referenced files are actually provisioned where we say they are.

  ...
  spec:
    containers:
    - name: sqlstream
      image: path/to/image/sqlstream/slim:8.0.8
      env:
      - name: SQLSTREAM_JAVA_SECURITY_KRB5_CONF
        value: "/secrets/krb5.cnf"
      - name: SQLSTREAM_JAVA_SECURITY_AUTH_LOGIN_CONF
        value: "/secrets/myjaas.conf

Helm

In a Helm chart, the pod spec will likely be included in deployment.yaml. It will look like the Kubernetes example except that the explicit values shown above may be templatized:

  ...
  spec:
    ...
    containers:
    - name: {{ .Values.sqlstream.image }}:{{ .Values.sqlstream.tag }}
      image: path/to/image/sqlstream/slim:8.0.8
      imagePullPolicy: {{ .Values.sqlstream.imagePullPolicy | default "Always" | quote }}
      env:
      - name: SQLSTREAM_JAVA_SECURITY_KRB5_CONF
        value: {{ .Values.sqlstream.krb5 | default "/secrets/krb5.conf" | quote }}
      - name: SQLSTREAM_JAVA_SECURITY_AUTH_LOGIN_CONF
        value: {{ .Values.sqlstream.jaas | default "/secrets/jaas.conf" | quote }}

The actual values will be defined in the values.yaml file or on the helm install command line.

Using Kubernetes ConfigMap

Environment variables can be set based on files passed as part of a ConfigMap object. This can be mounted as a volume on any or all containers within a pod. For more information see the section below entitled “What is a ConfigMap?" .

Supplying Files to a Container

Many types of files may need to be passed to a container. Below is a table of some common cases:

File Type Use More Information
JNDI property files Set the value of OPTIONS for SQL/MED SERVERs, FOREIGN STREAMs, FOREIGN TABLEs. Note: These files must be located in (or symbolically linked from) $SQLSTREAM_HOME/plugin/jndi.
Other s-Server plugin property files Can be referenced by the Kafka10 plugin using kafka.producer.config and kafka.consumer.config. These files can be located anywhere, and the path may be absolute or relative to s-Server’s current working directory. Integrating Kafka
s-Server runtime properties aspen.custom.properties may be dropped into or sym-linked from $SQLSTREAM_HOME. Configuring s-Server Parameters
SQL for Session Variables Can be set at run time and used anywhere in a SQL statement where a literal is allowed. User-Defined Session Variables
Secret files Authentication Appendix: SQLstream s-Server Authentication
Seed data files & “IB” files For larger data sets, lookup FOREIGN TABLES can be created based on source files. Integrating the File System

Any of the file types listed in the table above can be supplied to a container. Various techniques can be used:

  1. Bake files into an image as a kind of “early binding”. This is done by placing the files into $SQLSTREAM_HOME/plugin/jndi.

    • Early binding makes sense for application-specific files that never or rarely change - especially smaller files.

    • If a set of larger data files is included, consider compression. Note: Currently only gzip is supported. Either of 2 approaches can be used:

      • Compress the whole set of files, and inflate it at container start up time.

      • Compress each file separately and inflate them when reading, using the appropriate option of the file plugin (FILE_COMPRESSION for FILE_SERVER and FILE_TYPE for FILE_VFS_SERVER).

  2. Mount a local volume onto the container, and link the files to the $SQLSTREAM_HOME/plugin/jndi directory. This is a kind of “late binding”.

    • Mounting local volumes works well during development, but it is less useful in production environments. That is because the Kubernetes cluster will be typically running across several servers.
  3. You can mount some kind of shared storage volume, such as a Storage Area Network (a SAN), aensuring that the files are available there.

Docker Run

This can be done in either of 2 ways:

  • Create a persistent local volume using docker volume.

  • Create a container-specific volume as part of docker run, using either the -v (–volume) switch or the –mount switch.

See the Docker article on volumes for a comparison.

Docker Compose

As with Docker run, you can reference a persistent volume that has been defined using docker volume, or you can define a specific volume as part of the docker-compose.yml. (See Compose file version 3 reference).

services:
  sqlstream:
    image: sqlstream/slim:8.0.8
    ports:
      - "5590:5590 5580:5580"
    volumes:
      - $HOME/my_test_data:/data
    networks:
      # ... etc

Kubernetes

Kubernetes allows the provision of storage to a container. There are several ways to pass files:

  1. Bake files into the image, as with Docker.

  2. Have the image pull files from some remote repository (e.g. artifactory or git). This can be done at container startup, or you can have a generic startup container do it.

    • The startup container executes first and simply pulls the required files into an emptyDir local volume before exiting and allowing the SQLstream container to start.

    • The sqlstream container will mount the emptyDir volume (either at $SQLSTREAM_HOME/plugin/jndi direclty or through symbolic links from there to wherever the volume is mounted).

  3. Bake the site-specific property files into a startup container that runs in the same pod as SQLstream. This is similar to the previous option. Note: A different version of the startup container can be baked for each deployment environment

  4. Mount a shared volume.

    • If you are using Kubernetes in the cloud, you can make a volume based on any of the following:

      • Amazon EBS - awsElasticBlockStore

      • Azure - azureFile

      • GCE Persistent Disk - gcePersistentDisk

      • AWS S3 - There are several 3rd-party tools that allow you to present an AWS S3 bucket as a mounted file system (particularly for read)

    • On-premise you can use:

      • GlusterFS

      • NFS

  5. Access remote files using a standard API.

    • You may store your files in the cloud (AWS S3 or Azure Blob Storage). SQLstream can read from either of these.

    • You may store your file in a MinIO service (compatible with S3), which could be running in your local Kubernetes cluster. (See this article by MinIO.

  6. If you are using external storage, then you will need to:

    • Provision the required files onto that storage (for example, into a site specific S3 bucket)

    • Pass the access keys and parameters into the SQLstream container, either as a small properties file or as environment variables.

Note: Keeping the data separate from the application code’s image greatly simplifies dependency management. In this way, changes to the data (based on site or customer) can be managed independently from any changes to the application code.

For more information see the Kubernetes articles on Storage.

Helm

Any kind of storage that is configured in Kubernetes YAML files can also be configured in Helm.

Access keys and specific access parameters will be provided either in the values.yaml file or as command line parameters to helm install.

Entire files can be directly included in the helm chart as long as they are small. These these are configuration files (placed in the ConfigMap) or else credential files (added to the Secrets). Both ConfigMap and Secrets are discussed in sections below.

In following example, three config files are included as data in the ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ .Release.Name }}-configmap
data:
  {{- $files := .Files }}
  {{- range tuple "config1.toml" "config2.toml" "config3.toml" }}
  {{ . }}: |-
        {{ $files.Get . }}
  {{- end }}

The next example maps credential files from the bar directory to Secrets:

apiVersion: v1
kind: Secret
metadata:
  name: very-secret
type: Opaque
data:
{{ (.Files.Glob "bar/*").AsSecrets | indent 2 }}

For more information, see ee Helm | Accessing Files Inside Templates.

What is a ConfigMap?

According to Kubernetes.io:

A ConfigMap is a Kubernetes API object that lets you store configuration for other objects to use. Unlike most Kubernetes objects that have a spec, a ConfigMap has data and binaryData fields. These fields accept key-value pairs as their values. Both the data field and the binaryData are optional. The data field is designed to contain UTF-8 byte sequences while the binaryData field is designed to contain binary data as base64-encoded strings.

It continues:

You can write a Pod spec that refers to a ConfigMap and configures the container(s) in that Pod based on the data in the ConfigMap. The Pod and the ConfigMap must be in the same namespace.

Once a ConfigMap has been defined, it can be included in a pod or other deployment object and mounted as a volume in one or more containers.

The SQLstream Kubernetes Operator uses a ConfigMap to pass sharding information into the container. The ConfigMap is mounted as a volume, and the entrypoint.sh picks out the files based on the configuration keys it is expecting.

Here is an example ConfigMap definition:

apiVersion: v1
kind: ConfigMap
metadata:
  name: sqlstream-config
data:
  # property-like keys; each key maps to a simple value
  checkpoint.interval: "10000"
  kafka_consumer_config: "kcc.properties"
  kafka_prcer_config: "kpc.properties"

  # file-like keys
  kcc.properties: |
    bootstrap.servers=source-kafka:9092
    isolation.level=read_committed    
  kpc.properties: |
    seed_brokers=target-kafka:9092
    compression.type=snappy
    batch.size=500    

The ConfigMap can be associated with a SQLstream deployment. In the next example, it configures a pod.

apiVersion: v1
kind: Pod
metadata:
  name: sqlstream-pod
spec:
  containers:
    - name: sqlstream
      image: sqlstream/slim:8.0.8
      # etc ...
      env:
        # Define the environment variable
        - name: SQLSTREAM_CHECKPOINT_INTERVAL
          valueFrom:
            configMapKeyRef:
              name: sqlstream-demo # The ConfigMap this value comes from.
              key: checkpoint.interval # The key to fetch.
        - name: SQLSTREAM_KAFKA_CONSUMER_PROPERTIES_FILE_NAME
          valueFrom:
            configMapKeyRef:
              name: sqlstream-demo
              key: kafka_consumer_config
        - name: SQLSTREAM_KAFKA_PRODUCER_PROPERTIES_FILE_NAME
          valueFrom:
            configMapKeyRef:
              name: sqlstream-demo
              key: kafka_producer_config
      volumeMounts:
      - name: config
        mountPath: "/config"
        readOnly: true
  volumes:
    # You set volumes at the Pod level, then mount them into containers inside that Pod
    - name: config
      configMap:
        name: sqlstream-demo  #  the name of the ConfigMap you want to mount.
        # An array of keys from the ConfigMap to create as files
        items:
        - key: "kcc.properties"
          path: "kcc.properties"
        - key: "kpc.properties"
          path: "kpc.properties"

The env section references simple values. The volumeMounts section references volumes, which itself references sqlstream-config. The volumes section lists several file-like keys, which will be mapped to files in the mounted volume. The other details of the container spec are ignored.

What is a Secret?

According to Kubernetes.io:

A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a Pod specification or in a container image. Using a Secret means that you don’t need to include confidential data in your application code.

Because Secrets can be created independently of the Pods that use them, there is less risk of the Secret (and its data) being exposed during the workflow of creating, viewing, and editing Pods. Kubernetes, and applications that run in your cluster, can also take additional precautions with Secrets, such as avoiding writing confidential data to nonvolatile storage.

Secrets are similar to ConfigMaps but are specifically intended to hold confidential data.

Secrets would be used to supply any of the following to a pod:

  • AWS or Azure access keys

  • Kerberos credential files

For a description of how to manage Kerberos tokens using a sidecar container, see this RedHat article.

Session Variables

Session Variables can be used to modify the behavior of a SQLstream pipeline. See User Defined Session Variables. Note: Session Variables can only be set using DDL (CREATE PUMP, ALTER PUMP). Currently they cannot be supplied as JNDI property files.

You can write 1 or more DDL scripts that set the values needed for session variables, calling these scripts as part of application startup. At exectuion, these will typically look something like the following:

ALTER PUMP myschema.* SET v1 = 'value1';
ALTER PUMP myschema.* SET v2 = 'value2';
-- etc
ALTER PUMP myschema.* START;

You can plug this into entrypoint.sh or your application startup script (which should normally be baked into the image). For more information, see Supplying Environment Variables to a Container above).

# Part of a shell script to start the application
# Use a "here" (inline) document to embed the SQL into the shell script
# The shell evaluates the environment variables and can use defaults if the vars are blank of undefined:

sqllineClient <<EOF
    
    -- set the required session variables
    -- either per pump, or across all pumps as shown here
    
    ALTER PUMP myschema.* SET v1 = '${MY_V1:-value1';
    ALTER PUMP myschema.* SET v2 = '${MY_V:-value2};
    -- etc
    
    -- finally start all the pumps together
    ALTER PUMP myschema.* START;
EOF    

Alternatively you can pass in a properties file and then use it in the entrypoint.sh script. For the following example, let’s assume the file is called session_variables.properties and stored in /path/to/props. In practice, this would be accomplished using one of the techniques from the section Supplying Files to a Container above).

The session_variables.properties file looks like this:

MY_V1=value1
MY_V2=value2
# and so on

This file can be executed within the entrypoint.sh script, which sets an environment variable for each property. Then continue to execute the ALTER PUMP SET statements. Finally, execute ALTER PUMP START:

# evaluate the props file to set environment variables
source /path/to/props/session_variables.properties

# and execute DDL supplying environment variables where needed

sqllineClient <<EOF
    
    -- set the required session variables
    -- either per pump, or across all pumps as shown here
    
    ALTER PUMP myschema.* SET v1 = '${MY_V1:-value1';
    ALTER PUMP myschema.* SET v2 = '${MY_V:-value2};
    -- etc
    
    -- finally start all the pumps together
    ALTER PUMP myschema.* START;
EOF