aws-samples
diff --git a/‎README.md‎
Lines changed: 26 additions & 5 deletions b/‎README.md‎
Lines changed: 26 additions & 5 deletions
diff --git a/‎blogs/.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎blogs/.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎blogs/feast-aws-credit-scoring/.gitignore‎
Lines changed: 11 additions & 0 deletions b/‎blogs/feast-aws-credit-scoring/.gitignore‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎blogs/feast-aws-credit-scoring/LICENSE‎
Lines changed: 16 additions & 0 deletions b/‎blogs/feast-aws-credit-scoring/LICENSE‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎blogs/feast-aws-credit-scoring/README.md‎
Lines changed: 152 additions & 0 deletions b/‎blogs/feast-aws-credit-scoring/README.md‎
Lines changed: 152 additions & 0 deletions
diff --git a/‎blogs/feast-aws-credit-scoring/credit_model.py‎
Lines changed: 136 additions & 0 deletions b/‎blogs/feast-aws-credit-scoring/credit_model.py‎
Lines changed: 136 additions & 0 deletions
diff --git a/‎blogs/feast-aws-credit-scoring/data/credit_history.parquet‎
21.4 MB b/‎blogs/feast-aws-credit-scoring/data/credit_history.parquet‎
21.4 MB
@@ -4,17 +4,38 @@ Samples and documentation for using the [Amazon ElastiCache](https://aws.amazon.
 
 ---
 
-## Tutorials
+## Blogs
+
+- [Online Feature Store](./blogs/feast-aws-credit-scoring/) Building online feature stores on AWS with Amazon ElastiCache for Redis to support mission-critical ML use cases that require ultra-low latency and high throughput. [Build an ultra-low latency online feature store for real-time inferencing using Amazon ElastiCache for Redis](https://aws.amazon.com/blogs/database/build-an-ultra-low-latency-online-feature-store-for-real-time-inferencing-using-amazon-elasticache-for-redis/)
+
+## Hands-On Tutorials
 
 The following are tutorials covering various use cases for [Amazon ElastiCache](https://aws.amazon.com/elasticache/).
 
-1. [Database Caching](./database-caching/) - Learn how to create a query cache for a relational database using Redis.  In this tutorial, we take you through the process of deploying an [Amazon Relational Database Service](https://aws.amazon.com/rds/) RDS MySQL database and integrating an Amazon ElastiCache Redis cluster in front of the RDS instance in order to reduce query latency for often run MySQL queries.
+- [Database Caching](./database-caching/) - Learn how to create a query cache for a relational database using Redis.  In this tutorial, we take you through the process of deploying an [Amazon Relational Database Service](https://aws.amazon.com/rds/) RDS MySQL database and integrating an Amazon ElastiCache Redis cluster in front of the RDS instance in order to reduce query latency for often run MySQL queries.
+
+- [Session Store](./session-store/) - Discover how to manage user sessions in a web-based application using Redis.  In this tutorial, you will learn how to use ElastiCache for Redis as a distributed cache for session management. You will also learn the best practices for configuring your ElastiCache nodes and how to handle the sessions from your application. 
+
+- [Lambda Feature Store](./lambda-feature-store/) - Learn how Amazon ElastiCache can serve as the focal point for a custom-trained ML model to present recommendations to application and web users. Lambda functions are used to facilitate the interactions between ElastiCache for Redis and Amazon S3. Then review how AWS Lambda interacts with Amazon ElastiCache for Redis with insights loaded from a custom-built ML recommendation engine.
+
+## Webinars
+
+- [Generative AI Virtual Assistant](./webinars/genai-chatbot/) - Build a generative AI Virtual Assistant with [Amazon Bedrock](https://aws.amazon.com/bedrock/), [Langchain](https://github.com/langchain-ai/langchain) and [Amazon Elasticache](https://aws.amazon.com/elasticache/). See [YouTube Video](https://www.youtube.com/watch?v=yWxDmQYelvg).
+
+- [Flask Redis Session Management](./webinars/flask-redis-session/) - This is a simple Flask web application that demonstrates user session management using Redis as the session storage.
+
+## DevOps
+
+Deploy infrastructe as code
+
+### AWS CDK [Cloud Development Kit]
 
-2. [Session Store](./session-store/) - Discover how to manage user sessions in a web-based application using Redis.  In this tutorial, you will learn how to use ElastiCache for Redis as a distributed cache for session management. You will also learn the best practices for configuring your ElastiCache nodes and how to handle the sessions from your application. 
+#### TypeScript
 
-3. [Lambda Feature Store](./lambda-feature-store/) - Learn how Amazon ElastiCache can serve as the focal point for a custom-trained ML model to present recommendations to application and web users. Lambda functions are used to facilitate the interactions between ElastiCache for Redis and Amazon S3. Then review how AWS Lambda interacts with Amazon ElastiCache for Redis with insights loaded from a custom-built ML recommendation engine.
+- [Amazon ElastiCache Serverless for Memached](devops/aws-cdk/typescript/elasticache-serverless-memcached-minimal/README.md)
+- [Amazon ElastiCache Serverless for Redis](devops/aws-cdk/typescript/elasticache-serverless-redis-minimal/README.md)
+- [Amazon ElastiCache for Redis Cluster Mode Disabled](devops/aws-cdk/typescript/elasticache-redis-cmd/README.md)
 
-4. [Generative AI Virtual Assistant](./tutorials/genai-chatbot/) - Build a generative AI Virtual Assistant with [Amazon Bedrock](https://aws.amazon.com/bedrock/), [Langchain](https://github.com/langchain-ai/langchain) and [Amazon Elasticache](https://aws.amazon.com/elasticache/)
 ---
 
 ## Security
 
@@ -326,3 +326,6 @@ Homestead.yaml
 Homestead.json
 /.vagrant
 .phpunit.result.cache
+
+*.zip
+*.json
@@ -0,0 +1,11 @@
+feature_repo/registry.db
+feature_repo/online_store.db
+*.bin
+infra/*.tfvars.json
+**__pycache__**
+.idea/
+.idea/*
+*.terraform/
+*.tfstate*
+*.terraform*
+generate_data.py
@@ -0,0 +1,16 @@
+MIT No Attribution
+
+Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,152 @@
+# Real-time Credit Scoring with Feast on AWS
+
+## Overview
+
+![credit-score-architecture@2x](https://user-images.githubusercontent.com/6728866/132927464-5c9e9e05-538c-48c5-bc16-94a6d9d7e57b.jpg)
+
+This tutorial demonstrates the use of Feast as part of a real-time credit scoring application.
+* The primary training dataset is a loan table. This table contains historic loan data with accompanying features. The dataset also contains a target variable, namely whether a user has defaulted on their loan.
+* Feast is used during training to enrich the loan table with zipcode and credit history features from a S3 files. The S3 files are queried through Redshift.
+* Feast is also used to serve the latest zipcode and credit history features for online credit scoring using DynamoDB.
+
+## Requirements
+
+* Terraform (v1.0 or later)
+* AWS CLI (v2.2 or later)
+
+## Setup
+
+### Setting up Redshift and S3
+
+First we will set up your data infrastructure to simulate a production environment. We will deploy Redshift, an S3 
+bucket containing our zipcode and credit history parquet files, IAM roles and policies for Redshift to access S3, and create a 
+Redshift table that can query the parquet files. 
+
+Initialize Terraform
+```bash
+cd infra
+terraform init
+```
+
+Make sure the Terraform plan looks good
+```bash
+terraform plan -var="admin_password=thisISyourPassword1"
+```
+
+Deploy your infrastructure
+```bash
+terraform apply -var="admin_password=thisISyourPassword1"
+```
+
+Once your infrastructure is deployed, you should see the following outputs from Terraform
+```
+redshift_cluster_identifier = "my-feast-project-redshift-cluster"
+redshift_spectrum_arn = "arn:aws:iam::<Account>:role/s3_spectrum_role"
+credit_history_table = "credit_history"
+zipcode_features_table = "zipcode_features"
+```
+
+Next we create a mapping from the Redshift cluster to the external catalog
+```bash
+aws redshift-data execute-statement \
+    --region us-west-2 \
+    --cluster-identifier [SET YOUR redshift_cluster_identifier HERE] \
+    --db-user admin \
+    --database dev --sql "create external schema spectrum from data catalog database 'dev' iam_role \
+    '[SET YOUR redshift_spectrum_arn here]' create external database if not exists;"
+```
+
+To see whether the command was successful, please run the following command (substitute your statement id)
+```bash
+aws redshift-data describe-statement --id [SET YOUR STATEMENT ID HERE]
+``` 
+
+You should now be able to query actual zipcode features by executing the following statement
+```bash
+aws redshift-data execute-statement \
+    --region us-west-2 \
+    --cluster-identifier [SET YOUR redshift_cluster_identifier HERE] \
+    --db-user admin \
+    --database dev --sql "SELECT * from spectrum.zipcode_features LIMIT 1;"
+```
+which should print out results by running
+```bash
+aws redshift-data get-statement-result --id [SET YOUR STATEMENT ID HERE]
+```
+
+Return to the root of the credit scoring repository
+```bash
+cd ..
+```
+
+### Setting up Feast
+
+Install Feast using pip
+
+```bash
+pip install feast
+```
+
+We have already set up a feature repository in [feature_repo/](feature_repo/). It isn't necessary to create a new
+feature repository, but it can be done using the following command
+```bash
+feast init -t aws feature_repo # Command only shown for reference.
+```
+
+Since we don't need to `init` a new repository, all we have to do is configure the 
+[feature_store.yaml/](feature_repo/feature_store.yaml) in the feature repository. Please set the fields under
+`offline_store` to the configuration you have received when deploying your Redshift cluster and S3 bucket.
+
+Deploy the feature store by running `apply` from within the `feature_repo/` folder
+```bash
+cd feature_repo/
+feast apply
+```
+
+```
+Registered entity dob_ssn
+Registered entity zipcode
+Registered feature view credit_history
+Registered feature view zipcode_features
+Deploying infrastructure for credit_history
+Deploying infrastructure for zipcode_features
+```
+
+Next we load features into the online store using the `materialize-incremental` command. This command will load the
+latest feature values from a data source into the online store.
+
+```bash
+CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
+feast materialize-incremental $CURRENT_TIME
+```
+
+Return to the root of the repository
+```bash
+cd ..
+```
+
+## Train and test the model
+
+Finally, we train the model using a combination of loan data from S3 and our zipcode and credit history features from Redshift
+(which in turn queries S3), and then we test online inference by reading those same features from DynamoDB 
+
+```bash
+python run.py
+```
+
+The script should then output the result of a single loan application
+```
+loan rejected!
+```
+
+## Interactive demo (using Streamlit)
+
+Once the credit scoring model has been trained it can be used for interactive loan applications using Streamlit:
+
+Simply start the Streamlit application
+```bash
+streamlit run streamlit_app.py
+```
+Then navigate to the URL on which Streamlit is being served. You should see a user interface through which loan applications can be made:
+
+![Streamlit Loan Application](streamlit.png)
@@ -0,0 +1,136 @@
+# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# SPDX-License-Identifier: MIT-0
+
+from pathlib import Path
+
+import feast
+import joblib
+import pandas as pd
+from sklearn import tree
+from sklearn.exceptions import NotFittedError
+from sklearn.preprocessing import OrdinalEncoder
+from sklearn.utils.validation import check_is_fitted
+
+
+class CreditScoringModel:
+    categorical_features = [
+        "person_home_ownership",
+        "loan_intent",
+        "city",
+        "state",
+        "location_type",
+    ]
+
+    feast_features = [
+        "zipcode_features:city",
+        "zipcode_features:state",
+        "zipcode_features:location_type",
+        "zipcode_features:tax_returns_filed",
+        "zipcode_features:population",
+        "zipcode_features:total_wages",
+        "credit_history:credit_card_due",
+        "credit_history:mortgage_due",
+        "credit_history:student_loan_due",
+        "credit_history:vehicle_loan_due",
+        "credit_history:hard_pulls",
+        "credit_history:missed_payments_2y",
+        "credit_history:missed_payments_1y",
+        "credit_history:missed_payments_6m",
+        "credit_history:bankruptcies",
+    ]
+
+    target = "loan_status"
+    model_filename = "model.bin"
+    encoder_filename = "encoder.bin"
+
+    def __init__(self):
+        # Load model
+        if Path(self.model_filename).exists():
+            self.classifier = joblib.load(self.model_filename)
+        else:
+            self.classifier = tree.DecisionTreeClassifier()
+
+        # Load ordinal encoder
+        if Path(self.encoder_filename).exists():
+            self.encoder = joblib.load(self.encoder_filename)
+        else:
+            self.encoder = OrdinalEncoder()
+
+        # Set up feature store
+        self.fs = feast.FeatureStore(repo_path="feature_repo")
+
+    def train(self, loans):
+        train_X, train_Y = self._get_training_features(loans)
+
+        self.classifier.fit(train_X[sorted(train_X)], train_Y)
+        joblib.dump(self.classifier, self.model_filename)
+
+    def _get_training_features(self, loans):
+        training_df = self.fs.get_historical_features(
+            entity_df=loans, features=self.feast_features
+        ).to_df()
+
+        self._fit_ordinal_encoder(training_df)
+        self._apply_ordinal_encoding(training_df)
+
+        train_X = training_df[
+            training_df.columns.drop(self.target)
+            .drop("event_timestamp")
+            .drop("created_timestamp")
+            .drop("loan_id")
+            .drop("zipcode")
+            .drop("dob_ssn")
+        ]
+        train_X = train_X.reindex(sorted(train_X.columns), axis=1)
+        train_Y = training_df.loc[:, self.target]
+
+        return train_X, train_Y
+
+    def _fit_ordinal_encoder(self, requests):
+        self.encoder.fit(requests[self.categorical_features])
+        joblib.dump(self.encoder, self.encoder_filename)
+
+    def _apply_ordinal_encoding(self, requests):
+        requests[self.categorical_features] = self.encoder.transform(
+            requests[self.categorical_features]
+        )
+
+    def predict(self, request):
+        # Get online features from Feast
+        feature_vector = self._get_online_features_from_feast(request)
+
+        # Join features to request features
+        features = request.copy()
+        features.update(feature_vector)
+        features_df = pd.DataFrame.from_dict(features)
+
+        # Apply ordinal encoding to categorical features
+        self._apply_ordinal_encoding(features_df)
+
+        # Sort columns
+        features_df = features_df.reindex(sorted(features_df.columns), axis=1)
+
+        # Drop unnecessary columns
+        features_df = features_df[features_df.columns.drop("zipcode").drop("dob_ssn")]
+
+        # Make prediction
+        features_df["prediction"] = self.classifier.predict(features_df)
+
+        # return result of credit scoring
+        return features_df["prediction"].iloc[0]
+
+    def _get_online_features_from_feast(self, request):
+        zipcode = request["zipcode"][0]
+        dob_ssn = request["dob_ssn"][0]
+
+        return self.fs.get_online_features(
+            entity_rows=[{"zipcode": zipcode, "dob_ssn": dob_ssn}],
+            features=self.feast_features,
+        ).to_dict()
+
+    def is_model_trained(self):
+        try:
+            check_is_fitted(self.classifier, "tree_")
+        except NotFittedError:
+            return False
+        return True