Skip to content

Commit dd54012

Browse files
committed
Merge branch 'main' into dependabot/pip/tutorials/genai-chatbot/langchain-0.0.325
2 parents 8559850 + c3026f7 commit dd54012

File tree

97 files changed

+25418
-69
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

97 files changed

+25418
-69
lines changed

README.md

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,38 @@ Samples and documentation for using the [Amazon ElastiCache](https://aws.amazon.
44

55
---
66

7-
## Tutorials
7+
## Blogs
8+
9+
- [Online Feature Store](./blogs/feast-aws-credit-scoring/) Building online feature stores on AWS with Amazon ElastiCache for Redis to support mission-critical ML use cases that require ultra-low latency and high throughput. [Build an ultra-low latency online feature store for real-time inferencing using Amazon ElastiCache for Redis](https://aws.amazon.com/blogs/database/build-an-ultra-low-latency-online-feature-store-for-real-time-inferencing-using-amazon-elasticache-for-redis/)
10+
11+
## Hands-On Tutorials
812

913
The following are tutorials covering various use cases for [Amazon ElastiCache](https://aws.amazon.com/elasticache/).
1014

11-
1. [Database Caching](./database-caching/) - Learn how to create a query cache for a relational database using Redis. In this tutorial, we take you through the process of deploying an [Amazon Relational Database Service](https://aws.amazon.com/rds/) RDS MySQL database and integrating an Amazon ElastiCache Redis cluster in front of the RDS instance in order to reduce query latency for often run MySQL queries.
15+
- [Database Caching](./database-caching/) - Learn how to create a query cache for a relational database using Redis. In this tutorial, we take you through the process of deploying an [Amazon Relational Database Service](https://aws.amazon.com/rds/) RDS MySQL database and integrating an Amazon ElastiCache Redis cluster in front of the RDS instance in order to reduce query latency for often run MySQL queries.
16+
17+
- [Session Store](./session-store/) - Discover how to manage user sessions in a web-based application using Redis. In this tutorial, you will learn how to use ElastiCache for Redis as a distributed cache for session management. You will also learn the best practices for configuring your ElastiCache nodes and how to handle the sessions from your application.
18+
19+
- [Lambda Feature Store](./lambda-feature-store/) - Learn how Amazon ElastiCache can serve as the focal point for a custom-trained ML model to present recommendations to application and web users. Lambda functions are used to facilitate the interactions between ElastiCache for Redis and Amazon S3. Then review how AWS Lambda interacts with Amazon ElastiCache for Redis with insights loaded from a custom-built ML recommendation engine.
20+
21+
## Webinars
22+
23+
- [Generative AI Virtual Assistant](./webinars/genai-chatbot/) - Build a generative AI Virtual Assistant with [Amazon Bedrock](https://aws.amazon.com/bedrock/), [Langchain](https://github.com/langchain-ai/langchain) and [Amazon Elasticache](https://aws.amazon.com/elasticache/). See [YouTube Video](https://www.youtube.com/watch?v=yWxDmQYelvg).
24+
25+
- [Flask Redis Session Management](./webinars/flask-redis-session/) - This is a simple Flask web application that demonstrates user session management using Redis as the session storage.
26+
27+
## DevOps
28+
29+
Deploy infrastructe as code
30+
31+
### AWS CDK [Cloud Development Kit]
1232

13-
2. [Session Store](./session-store/) - Discover how to manage user sessions in a web-based application using Redis. In this tutorial, you will learn how to use ElastiCache for Redis as a distributed cache for session management. You will also learn the best practices for configuring your ElastiCache nodes and how to handle the sessions from your application.
33+
#### TypeScript
1434

15-
3. [Lambda Feature Store](./lambda-feature-store/) - Learn how Amazon ElastiCache can serve as the focal point for a custom-trained ML model to present recommendations to application and web users. Lambda functions are used to facilitate the interactions between ElastiCache for Redis and Amazon S3. Then review how AWS Lambda interacts with Amazon ElastiCache for Redis with insights loaded from a custom-built ML recommendation engine.
35+
- [Amazon ElastiCache Serverless for Memached](devops/aws-cdk/typescript/elasticache-serverless-memcached-minimal/README.md)
36+
- [Amazon ElastiCache Serverless for Redis](devops/aws-cdk/typescript/elasticache-serverless-redis-minimal/README.md)
37+
- [Amazon ElastiCache for Redis Cluster Mode Disabled](devops/aws-cdk/typescript/elasticache-redis-cmd/README.md)
1638

17-
4. [Generative AI Virtual Assistant](./tutorials/genai-chatbot/) - Build a generative AI Virtual Assistant with [Amazon Bedrock](https://aws.amazon.com/bedrock/), [Langchain](https://github.com/langchain-ai/langchain) and [Amazon Elasticache](https://aws.amazon.com/elasticache/)
1839
---
1940

2041
## Security

blogs/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -326,3 +326,6 @@ Homestead.yaml
326326
Homestead.json
327327
/.vagrant
328328
.phpunit.result.cache
329+
330+
*.zip
331+
*.json
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
feature_repo/registry.db
2+
feature_repo/online_store.db
3+
*.bin
4+
infra/*.tfvars.json
5+
**__pycache__**
6+
.idea/
7+
.idea/*
8+
*.terraform/
9+
*.tfstate*
10+
*.terraform*
11+
generate_data.py
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
MIT No Attribution
2+
3+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy of
6+
this software and associated documentation files (the "Software"), to deal in
7+
the Software without restriction, including without limitation the rights to
8+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
9+
the Software, and to permit persons to whom the Software is furnished to do so.
10+
11+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
13+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
14+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
15+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
16+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Real-time Credit Scoring with Feast on AWS
2+
3+
## Overview
4+
5+
![credit-score-architecture@2x](https://user-images.githubusercontent.com/6728866/132927464-5c9e9e05-538c-48c5-bc16-94a6d9d7e57b.jpg)
6+
7+
This tutorial demonstrates the use of Feast as part of a real-time credit scoring application.
8+
* The primary training dataset is a loan table. This table contains historic loan data with accompanying features. The dataset also contains a target variable, namely whether a user has defaulted on their loan.
9+
* Feast is used during training to enrich the loan table with zipcode and credit history features from a S3 files. The S3 files are queried through Redshift.
10+
* Feast is also used to serve the latest zipcode and credit history features for online credit scoring using DynamoDB.
11+
12+
## Requirements
13+
14+
* Terraform (v1.0 or later)
15+
* AWS CLI (v2.2 or later)
16+
17+
## Setup
18+
19+
### Setting up Redshift and S3
20+
21+
First we will set up your data infrastructure to simulate a production environment. We will deploy Redshift, an S3
22+
bucket containing our zipcode and credit history parquet files, IAM roles and policies for Redshift to access S3, and create a
23+
Redshift table that can query the parquet files.
24+
25+
Initialize Terraform
26+
```bash
27+
cd infra
28+
terraform init
29+
```
30+
31+
Make sure the Terraform plan looks good
32+
```bash
33+
terraform plan -var="admin_password=thisISyourPassword1"
34+
```
35+
36+
Deploy your infrastructure
37+
```bash
38+
terraform apply -var="admin_password=thisISyourPassword1"
39+
```
40+
41+
Once your infrastructure is deployed, you should see the following outputs from Terraform
42+
```
43+
redshift_cluster_identifier = "my-feast-project-redshift-cluster"
44+
redshift_spectrum_arn = "arn:aws:iam::<Account>:role/s3_spectrum_role"
45+
credit_history_table = "credit_history"
46+
zipcode_features_table = "zipcode_features"
47+
```
48+
49+
Next we create a mapping from the Redshift cluster to the external catalog
50+
```bash
51+
aws redshift-data execute-statement \
52+
--region us-west-2 \
53+
--cluster-identifier [SET YOUR redshift_cluster_identifier HERE] \
54+
--db-user admin \
55+
--database dev --sql "create external schema spectrum from data catalog database 'dev' iam_role \
56+
'[SET YOUR redshift_spectrum_arn here]' create external database if not exists;"
57+
```
58+
59+
To see whether the command was successful, please run the following command (substitute your statement id)
60+
```bash
61+
aws redshift-data describe-statement --id [SET YOUR STATEMENT ID HERE]
62+
```
63+
64+
You should now be able to query actual zipcode features by executing the following statement
65+
```bash
66+
aws redshift-data execute-statement \
67+
--region us-west-2 \
68+
--cluster-identifier [SET YOUR redshift_cluster_identifier HERE] \
69+
--db-user admin \
70+
--database dev --sql "SELECT * from spectrum.zipcode_features LIMIT 1;"
71+
```
72+
which should print out results by running
73+
```bash
74+
aws redshift-data get-statement-result --id [SET YOUR STATEMENT ID HERE]
75+
```
76+
77+
Return to the root of the credit scoring repository
78+
```bash
79+
cd ..
80+
```
81+
82+
### Setting up Feast
83+
84+
Install Feast using pip
85+
86+
```bash
87+
pip install feast
88+
```
89+
90+
We have already set up a feature repository in [feature_repo/](feature_repo/). It isn't necessary to create a new
91+
feature repository, but it can be done using the following command
92+
```bash
93+
feast init -t aws feature_repo # Command only shown for reference.
94+
```
95+
96+
Since we don't need to `init` a new repository, all we have to do is configure the
97+
[feature_store.yaml/](feature_repo/feature_store.yaml) in the feature repository. Please set the fields under
98+
`offline_store` to the configuration you have received when deploying your Redshift cluster and S3 bucket.
99+
100+
Deploy the feature store by running `apply` from within the `feature_repo/` folder
101+
```bash
102+
cd feature_repo/
103+
feast apply
104+
```
105+
106+
```
107+
Registered entity dob_ssn
108+
Registered entity zipcode
109+
Registered feature view credit_history
110+
Registered feature view zipcode_features
111+
Deploying infrastructure for credit_history
112+
Deploying infrastructure for zipcode_features
113+
```
114+
115+
Next we load features into the online store using the `materialize-incremental` command. This command will load the
116+
latest feature values from a data source into the online store.
117+
118+
```bash
119+
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
120+
feast materialize-incremental $CURRENT_TIME
121+
```
122+
123+
Return to the root of the repository
124+
```bash
125+
cd ..
126+
```
127+
128+
## Train and test the model
129+
130+
Finally, we train the model using a combination of loan data from S3 and our zipcode and credit history features from Redshift
131+
(which in turn queries S3), and then we test online inference by reading those same features from DynamoDB
132+
133+
```bash
134+
python run.py
135+
```
136+
137+
The script should then output the result of a single loan application
138+
```
139+
loan rejected!
140+
```
141+
142+
## Interactive demo (using Streamlit)
143+
144+
Once the credit scoring model has been trained it can be used for interactive loan applications using Streamlit:
145+
146+
Simply start the Streamlit application
147+
```bash
148+
streamlit run streamlit_app.py
149+
```
150+
Then navigate to the URL on which Streamlit is being served. You should see a user interface through which loan applications can be made:
151+
152+
![Streamlit Loan Application](streamlit.png)
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
# SPDX-License-Identifier: MIT-0
3+
4+
from pathlib import Path
5+
6+
import feast
7+
import joblib
8+
import pandas as pd
9+
from sklearn import tree
10+
from sklearn.exceptions import NotFittedError
11+
from sklearn.preprocessing import OrdinalEncoder
12+
from sklearn.utils.validation import check_is_fitted
13+
14+
15+
class CreditScoringModel:
16+
categorical_features = [
17+
"person_home_ownership",
18+
"loan_intent",
19+
"city",
20+
"state",
21+
"location_type",
22+
]
23+
24+
feast_features = [
25+
"zipcode_features:city",
26+
"zipcode_features:state",
27+
"zipcode_features:location_type",
28+
"zipcode_features:tax_returns_filed",
29+
"zipcode_features:population",
30+
"zipcode_features:total_wages",
31+
"credit_history:credit_card_due",
32+
"credit_history:mortgage_due",
33+
"credit_history:student_loan_due",
34+
"credit_history:vehicle_loan_due",
35+
"credit_history:hard_pulls",
36+
"credit_history:missed_payments_2y",
37+
"credit_history:missed_payments_1y",
38+
"credit_history:missed_payments_6m",
39+
"credit_history:bankruptcies",
40+
]
41+
42+
target = "loan_status"
43+
model_filename = "model.bin"
44+
encoder_filename = "encoder.bin"
45+
46+
def __init__(self):
47+
# Load model
48+
if Path(self.model_filename).exists():
49+
self.classifier = joblib.load(self.model_filename)
50+
else:
51+
self.classifier = tree.DecisionTreeClassifier()
52+
53+
# Load ordinal encoder
54+
if Path(self.encoder_filename).exists():
55+
self.encoder = joblib.load(self.encoder_filename)
56+
else:
57+
self.encoder = OrdinalEncoder()
58+
59+
# Set up feature store
60+
self.fs = feast.FeatureStore(repo_path="feature_repo")
61+
62+
def train(self, loans):
63+
train_X, train_Y = self._get_training_features(loans)
64+
65+
self.classifier.fit(train_X[sorted(train_X)], train_Y)
66+
joblib.dump(self.classifier, self.model_filename)
67+
68+
def _get_training_features(self, loans):
69+
training_df = self.fs.get_historical_features(
70+
entity_df=loans, features=self.feast_features
71+
).to_df()
72+
73+
self._fit_ordinal_encoder(training_df)
74+
self._apply_ordinal_encoding(training_df)
75+
76+
train_X = training_df[
77+
training_df.columns.drop(self.target)
78+
.drop("event_timestamp")
79+
.drop("created_timestamp")
80+
.drop("loan_id")
81+
.drop("zipcode")
82+
.drop("dob_ssn")
83+
]
84+
train_X = train_X.reindex(sorted(train_X.columns), axis=1)
85+
train_Y = training_df.loc[:, self.target]
86+
87+
return train_X, train_Y
88+
89+
def _fit_ordinal_encoder(self, requests):
90+
self.encoder.fit(requests[self.categorical_features])
91+
joblib.dump(self.encoder, self.encoder_filename)
92+
93+
def _apply_ordinal_encoding(self, requests):
94+
requests[self.categorical_features] = self.encoder.transform(
95+
requests[self.categorical_features]
96+
)
97+
98+
def predict(self, request):
99+
# Get online features from Feast
100+
feature_vector = self._get_online_features_from_feast(request)
101+
102+
# Join features to request features
103+
features = request.copy()
104+
features.update(feature_vector)
105+
features_df = pd.DataFrame.from_dict(features)
106+
107+
# Apply ordinal encoding to categorical features
108+
self._apply_ordinal_encoding(features_df)
109+
110+
# Sort columns
111+
features_df = features_df.reindex(sorted(features_df.columns), axis=1)
112+
113+
# Drop unnecessary columns
114+
features_df = features_df[features_df.columns.drop("zipcode").drop("dob_ssn")]
115+
116+
# Make prediction
117+
features_df["prediction"] = self.classifier.predict(features_df)
118+
119+
# return result of credit scoring
120+
return features_df["prediction"].iloc[0]
121+
122+
def _get_online_features_from_feast(self, request):
123+
zipcode = request["zipcode"][0]
124+
dob_ssn = request["dob_ssn"][0]
125+
126+
return self.fs.get_online_features(
127+
entity_rows=[{"zipcode": zipcode, "dob_ssn": dob_ssn}],
128+
features=self.feast_features,
129+
).to_dict()
130+
131+
def is_model_trained(self):
132+
try:
133+
check_is_fitted(self.classifier, "tree_")
134+
except NotFittedError:
135+
return False
136+
return True
21.4 MB
Binary file not shown.

0 commit comments

Comments
 (0)