Adding External Data to OPA
Overview
OPA document indicates a few options for obtaining external data.
This blog post will cover all the scenarios except ‘Synchronous Pull’.
Regarding synchronous pull through ‘http.send’, you will reference the data directly from the response, and wouldn’t need to use either the ‘data’ or ‘input’ global variable.
Prerequisites
Configure VSCode OPA Plugin
If you don’t have Visual Studio Code, go to VSCode website to download and install it. Next, install the OPA plugin for VSCode either from the Marketplace OPA plugin, or by searching for the ‘OPA policy agent’ extension directly within VSCode.”
Install OPA in Windows Environment
From Windows git bash command line tool, download OPA from the OPA website.
#download OPA agent, 0.54.0 is the latest verson as of Jul. 2023, please locate the latest installation if required
$curl -L -o https://openpolicyagent.org/downloads/v0.54.0/opa_windows_amd64.exe
#download checksums
$curl -L -o opa.sha256 https://github.com/open-policy-agent/opa/releases/download/v0.50.2/opa_windows_amd64.exe.sha256
#verify the binary checksum
$ shasum -c opa.sha256
opa_windows_amd64.exe: OK
#rename file
$ mv opa_windows_amd64.exe opa.exe
Add your target path to your environment path. You can confirm the installation by running the ‘opa version’ command.
$ opa version
Version: 0.54.0
Build Commit: 292288cb074662ca5d2fc9841175a22b01233e19
Build Timestamp: 2023-06-29T19:14:17Z
Build Hostname: 9198f5afb78d
Go Version: go1.20.5
Platform: windows/amd64
WebAssembly: available
A Basic Policy Example
Write a Policy ‘policy.rego’
package rules
default allow = false
myusers := data.users
allow {
input.path == ["fake1"]
}
data.json
{
"users" : {
"name1": {"manager": "charlie", "title": "salesperson"},
"name2": {"manager": "charlie", "title": "salesperson"},
"name3": {"manager": "dave", "title": "manager"},
"name4": {"manager": null, "title": "ceo"},
"dummy": {"manager": "dummy", "title": "dummy"}
}
}
input.json
{
"method": "PUT",
"path": ["fake1"],
"user": "name1"
}
Evaluate a Policy
You can evaluate the policy through VSCode extension ‘Command Palette’
-> ‘OPA: Evaluate Package’ or ‘OPA: Evaluate Selection’.
You can also use OPA CLI to evaluate a policy. (All base and virtual document start from ‘data.’)
opa eval -d data.json -d policy.rego 'data.rules.myusers'
{
"result": [
{
"expressions": [
{
"value": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
},
"text": "data.rules.myusers",
"location": {
"row": 1,
"col": 1
}
}
]
}
]
}
When we omit the data.json in the opa cli, the result does not include ‘myusers’ document, as ‘myusers’ needs the data from data.json.
Let’s run evaluation with input.json. Now the value of ‘allow’ document is true, because ‘input.path == [“fake1”]’ is evaluated as true.
opa eval -d policy.rego 'data.rules' -i input.json
{
"result": [
{
"expressions": [
{
"value": {
"allow": true
},
"text": "data.rules",
"location": {
"row": 1,
"col": 1
}
}
]
}
]
When we include data.json and input.json in the OPA CLI, all documents in the package are returned in the result.
opa eval -d policy.rego -d data.json 'data.rules' -i input.json
{
"result": [
{
"expressions": [
{
"value": {
"allow": true,
"myusers": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
}
},
"text": "data.rules",
"location": {
"row": 1,
"col": 1
}
}
]
}
]
}
Synchronous Push: Overload Input
In the overload input option, we assign the ‘input.users’ to ‘myusers’. ‘myusers’ document reflects the value from the ‘input.json’. By the way OPA only supports one single input json file. The ‘input’ global variable is available in the module and you don’t need to import.
Build OPA Bundle and Validate
In this section, we will build an OPA bundle to demonstrate asynchronous pull (data.json in the bundle) and asynchronous push (OPA data API PUT).
We will first enhance policy.rego, add ‘pushdata.json’ and manifest, then we will build an OPA bundle. There is no change to ‘input.json’ and ‘data.json’.
Manifest
Bundle files may contain an optional .manifest
file that stores bundle metadata. ‘roots’ is one of the fields in metadata file.
roots
- If you expect to load additional data into OPA from outside the bundle (e.g., via OPA’s HTTP API) you should include a top-levelroots
field containing of path prefixes that declare the scope of the bundle. If theroots
field is not included in the manifest it defaults to[""]
which means that ALL data and policy must come from the bundle.
Here is the directory structure:
Because OPA bundle mandates ‘data.json’ filename, we can not name it anything else. (data.yaml is also supported)
As we use the remote bundle and ‘roots’ field, data.json and policy.rego must be under the same base package.
policy.rego
We don’t need to import any document from data.json, because policy and data are under same base package in the bundle. The ‘policy.rego’ module is able to access any document in ‘data.json’.
# package or namespace for 'policy.rego' module
# policy.rego file can be in any directory, not necessarily in the exact same directory as the package naming
# In this example, 'policy.rego' file is in 'com\example' directory
# It's not in 'com\example\marketing\rules' folder
package com.example.marketing.rules
# Optional, you only need this import in case you want an alias
import data.com.example as base
default allow = false
# 'local' documet 'users'
users := {
"alice": {"manager": "charlie", "title": "salesperson"},
"bob": {"manager": "charlie", "title": "salesperson"},
"charlie": {"manager": "dave", "title": "manager"},
"dave": {"manager": null, "title": "ceo"}
}
# obtain data from data.json when evaluating policy direclty from json file
# we will provide an option for consistency at the end
myusers := data.users
# obtain data from data.json when evaluating policy againt bundle
myusers_base := base.datafolder.users
# obtain data from pushdata.json or through HTTP PUT
mytest := data.test
# OPA evaluates input.path and assign true or false to virtual document 'allow'
allow {
input.path == ["fake1"]
}
pushdata.json
{
"test": "this is a test"
}
.manifest
Even though the package naming uses dot, the bundle path prefix uses slash. So the setting is ‘com/example’, it’s not ‘com.example’.
{
"roots": ["com/example"]
}
Build bundle file
opa build -b .
Inspection
In the tar file, OPA places the ‘data.json’ file in the root directory, while the ‘policy.rego’ file remains in its sub-directory. This confirms that an OPA bundle can only have one single ‘data.json’ file.
‘pushdata.json’ is not present in the bundle, since OPA does not take in any other data file other than ‘data.json’ or ‘data.yaml’.
$ tar -tf bundle.tar
tar: Removing leading `/' from member names
/data.json
tar: Removing leading `/\' from member names
/\\com\\example\\policy.rego
/.manifest
What happened to the ‘data.json’, why it was moved to the root directory?
data.json in the bundle
The ‘data.json’ file was automatically updated during the OPA bundle build process. The entire root package path (com/example) and the sub-directory (datafolder) were added to the JSON object.
{
"com": {
"example": {
"datafolder": {
"users": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
}
}
}
}
}
I would prefer to place data.json’ in the root directly from the very beginning and add the appropriate hierarchy in JSON data during the development phase.
Run OPA Evaluations against Bundle
When we evaluate policy using bundle file, we no long pass along ‘data.json’, because it was already present in the bundle file.
opa eval -d .\async_push\pushdata.json -b bundle.tar.gz -i input.json 'data'
{
"result": [
{
"expressions": [
{
"value": {
"com": {
"example": {
"datafolder": {
"users": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
}
},
"marketing": {
"rules": {
"allow": true,
"mytest": "this is a test",
"myusers_base": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
},
"users": {
"alice": {
"manager": "charlie",
"title": "salesperson"
},
"bob": {
"manager": "charlie",
"title": "salesperson"
},
"charlie": {
"manager": "dave",
"title": "manager"
},
"dave": {
"manager": null,
"title": "ceo"
}
}
}
}
}
},
"test": "this is a test"
},
"text": "data",
"location": {
"row": 1,
"col": 1
}
}
]
}
]
}
The ‘data.test’ document is sourced from the ‘pushdata.json’ file. ‘pushdata.json’ is the other data source and is NOT loaded through the OPA remote bundle, it’s loaded from file system.
The policy data outside of the OPA remote bundle has a flat structure under the global variable ‘data’.
More about Asynchronous Push
In previous section, we demonstrated to make the other data source available to OPA through a ‘pushdata.json’ file in the local file system. It’s considered as the other data source because ‘data.test’ document is not under the bundle root package path ‘data.com.example’.
We actually need to update the other data source dynamically without restarting OPA. Let’s explore OPA data API PUT.
Start OPA in Server Mode
We start OPA in server mode, so that we can interact with OPA through REST API, also we do not pass along any JSON files.
opa run --addr :19001 -b bundle.tar.gz -s
{"addrs":[":19001"],"diagnostic-addrs":[],"level":"info","msg":"Initializing server.","time":"2023-07-14T20:41:53-04:00"}
{"client_addr":"127.0.0.1:62434","level":"info","msg":"Received request.","req_id":1,"req_method":"GET","req_path":"/v1/data","time":"2023-07-14T20:42:12-04:00"}
{"client_addr":"127.0.0.1:62434","level":"info","msg":"Sent response.","req_id":1,"req_method":"GET","req_path":"/v1/data","resp_bytes":525,"resp_duration":1.0574,"resp_status":200,"time":"2023-07-14T20:42:12-04:00"}
What documents are loaded into the OPA server? Let’s call OPA data API to find out.
In the response, ‘data.test’ does not exist. It’s not there because pushdata.jaon is not part of the remote bundle. All documents from the remote bundle show up except ‘data.com.example.marketing.rules.mytest’, since this document needs the value from ‘data.test’.
Inspection
Issue a HTTP GET request.
$ curl http://localhost:19001/v1/data
Let’s take a look at the response. The value of ‘allow’ document is false, because we didn’t pass input.json when calling the OPA data API.
{
"result": {
"com": {
"example": {
"datafolder": {
"users": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
}
},
"marketing": {
"rules": {
"allow": false,
"myusers_base": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
},
"users": {
"alice": {
"manager": "charlie",
"title": "salesperson"
},
"bob": {
"manager": "charlie",
"title": "salesperson"
},
"charlie": {
"manager": "dave",
"title": "manager"
},
"dave": {
"manager": null,
"title": "ceo"
}
}
}
}
}
}
}
}
Here is another API call with the input.json.
curl --request POST \
--url http://127.0.0.1:19001/v1/data \
--header 'Content-Type: application/json' \
--data '{"input":
{
"method": "PUT",
"path": [
"fake1"
]
}
}'
Document ‘allow’ is evaluated as true.
{
"result": {
"com": {
"example": {
"datafolder": {
"users": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
}
},
"marketing": {
"rules": {
"allow": true,
"myusers_base": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
},
"users": {
"alice": {
"manager": "charlie",
"title": "salesperson"
},
"bob": {
"manager": "charlie",
"title": "salesperson"
},
"charlie": {
"manager": "dave",
"title": "manager"
},
"dave": {
"manager": null,
"title": "ceo"
}
}
}
}
}
}
}
}
Update Other Data Source
We will perform multiple tests to make the document ‘data.test’ available in the OPA server. Let’s call ‘PUT http://localhost:19001/v1/data/test’.
curl --location --request PUT 'http://localhost:19001/v1/data/test' \
--header 'Content-Type: text/plain' \
--data '"push dynamic data"'
Let’s query the documents again.
curl http://localhost:19001/v1/data
Both ‘data.test’ and ‘‘data.com.example.marketing.rules.mytest’ documents are updated with the new value ‘push dynamic data’. Previously, when we run OPA CLI for the interactive policy evaluation, we placed ‘data.test’ document in the ‘pushdata.json’ file. After configuring OPA bundle and running OPA in the server mode, we can use OPA data API to dynamically update the document and make it available to the OPA server.
{
"result": {
"com": {
"example": {
"datafolder": {
"users": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
}
},
"marketing": {
"rules": {
"allow": true,
"mytest": "push dynamic data",
"myusers_base": {
"dummy": {
"manager": "dummy",
"title": "dummy"
},
"name1": {
"manager": "charlie",
"title": "salesperson"
},
"name2": {
"manager": "charlie",
"title": "salesperson"
},
"name3": {
"manager": "dave",
"title": "manager"
},
"name4": {
"manager": null,
"title": "ceo"
}
},
"users": {
"alice": {
"manager": "charlie",
"title": "salesperson"
},
"bob": {
"manager": "charlie",
"title": "salesperson"
},
"charlie": {
"manager": "dave",
"title": "manager"
},
"dave": {
"manager": null,
"title": "ceo"
}
}
}
}
}
},
"test": "push dynamic data"
}
}
Let’s update ‘data.test’ again and check whether the change is effective dynamically without recycling OPA server.
Here is the request for updating the ‘data.test’ document:
curl --location --request PUT 'http://localhost:19001/v1/data/test' \
--header 'Content-Type: text/plain' \
--data '"push dynamic data for second time"'
Here is the request for getting the document:
curl --location 'http://localhost:19001/v1/data/test
Here is the HTTP response. ‘data.test’ has been successfully updated.
{
"result": "push dynamic data for second time"
}
Can We Update a Document Under the Bundle?
curl --location --request PUT 'http://localhost:19001/v1/data/com/example/pushdata' \
--header 'Content-Type: text/plain' \
--data '"push dynamic data for second time"'
We got ‘400’ bad request along with following errors.
{
"code": "invalid_parameter",
"message": "path com/exmaple/pushdata is owned by bundle \"bundle.tar.gz\""
}
We are not allowed to push any document under ‘data.com.example’, but we are good to push to other paths, such as ‘data.com.pushdata’ or ‘data.pushdata’.
Conclusions
There are multiple options to make the dynamic data available to OPA server.
A policy or data included in a bundle can’t be overwritten by queries to the OPA REST API, since that policy or data is considered to be owned by that bundle. You can still upload new policies or data via the REST API, but not on a path owned by a bundle.
Remote bundle is a common approach for making polices and data available to OPA. By default, the root directory ‘/’ or the entire ‘data’ namespace is owned by the bundle. If you need to push dynamic data asynchronously through the OPA data API, you will need to declare scopes or path prefixes for the bundle.
‘data.json’ should be placed in the root folder. You should add appropriate hierarchy JSON data during development phase. If you need multiple ‘data.json’ files during the development and test phases, you can find more details in my next blog post at https://cloudjourney.medium.com/opa-json-data-desgin-104b4e4b46c1.
Here is the summary of folder structure.
Here is the ‘data.json’.
{
"com": {
"example": {
"datafolder": {
"users": {
"alice": {
"manager": "charlie",
"title": "salesperson"
},
"bob": {
"manager": "charlie",
"title": "salesperson"
},
"charlie": {
"manager": "dave",
"title": "manager"
},
"dave": {
"manager": null,
"title": "ceo"
},
"fake": {
"manager": "1",
"title": "2"
}
}
}
}
}
}
References
“https://academy.styra.com/enrollments”
“Open Policy Agent | External Data”
“Open Policy Agent | Philosophy”
“https://github.com/open-policy-agent/opa/issues/3118”