You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/core/settings.mdx
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ See [List of Environment Variables](#list-of-environment-variables) for specific
28
28
29
29
:::tip
30
30
31
-
You can consider place a `.env` file in your directory.
31
+
You can consider placing a `.env` file in your directory.
32
32
The [CLI](cli#environment-variables) will load environment variables from the `.env` file (see [CLI](cli#environment-variables) for more details).
33
33
From your own main module, you can also load environment variables with a package like [`python-dotenv`](https://github.com/theskumar/python-dotenv).
34
34
@@ -69,7 +69,7 @@ cocoindex.init(
69
69
For example, you can call it in the main function of your application.
70
70
Once the `cocoindex.init()` is called with a `cocoindex.Settings` dataclass object as argument, the `@cocoindex.settings` function and environment variables will be ignored.
71
71
72
-
This is more flexible, as you can more easily construct `cocoindex.Settings` based on other stuffs you loaded earlier.
72
+
This is more flexible, as you can more easily construct `cocoindex.Settings` based on other objects or instances you loaded earlier.
73
73
But be careful that if you call `cocoindex.init()` only under the path of main (e.g. within `if __name__ == "__main__":` guard), it won't be executed when you're using CocoIndex CLI, as it won't execute your main logic.
74
74
75
75
:::info
@@ -123,7 +123,7 @@ If not set, all flows are in a default unnamed namespace.
123
123
124
124
:::tip
125
125
126
-
Please be careful that all values in `url`needs to be url-encoded if they contain special characters.
126
+
Please be careful that all values in `url`need to be URL-encoded if they contain special characters.
127
127
For this reason, prefer to use the separated `user` and `password` fields for username and password.
128
128
129
129
:::
@@ -140,7 +140,7 @@ If not set, all flows are in a default unnamed namespace.
140
140
141
141
If you use the Postgres database hosted by [Supabase](https://supabase.com/), please click **Connect** on your project dashboard and find the following URL:
142
142
143
-
* If you're on a IPv6 network, use the URL under **Direct connection**. You can visit [IPv6 test](https://test-ipv6.com/) to see if you have IPv6 Internet connection.
143
+
* If you're on an IPv6 network, use the URL under **Direct connection**. You can visit [IPv6 test](https://test-ipv6.com/) to see if you have IPv6 Internet connection.
144
144
* Otherwise, use the URL under **Session pooler**.
145
145
Note that Supabase has a pool size limit of 15 by default, while CocoIndex's default `max_connections` value is 25.
146
146
You can adjust either value to make sure Supabase's pool size limit is greater than CocoIndex's `max_connections` value.
@@ -156,7 +156,7 @@ If you use the Postgres database hosted by [Supabase](https://supabase.com/), pl
156
156
*`source_max_inflight_rows` (type: `int | None`, default: `1024`): The maximum number of concurrent inflight rows for all source operations.
157
157
*`source_max_inflight_bytes` (type: `int | None`, default: `None`): The maximum number of concurrent inflight bytes for all source operations.
158
158
159
-
See also [flow definition docs](/docs/core/flow_def#control-processing-concurrency) about why it's necessary to control processing concurrency, and how to configure it on per-source basis.
159
+
See also [flow definition docs](/docs/core/flow_def#control-processing-concurrency) about why it's necessary to control processing concurrency, and how to configure it on a per-source basis.
160
160
If both global and per-source limits are specified, both need to be satisfied to admit additional source rows.
This example shows how to create a query handler that lets you search HackerNews threads and comments stored in CocoIndex.
436
+
This example shows how to create a query handler that lets you search HackerNews threads and comments stored in CocoIndex.
437
437
- The handler looks up the correct database table, then uses PostgreSQL’s full-text search functions (`to_tsvector` and `plainto_tsquery`) to find entries that match your search terms.
438
-
- Matching results are sorted by their relevance (`ts_rank`) and by creation time, then converted to dictionaries.
438
+
- Matching results are sorted by their relevance (`ts_rank`) and by creation time, then converted to dictionaries.
439
439
- Finally, these results are returned in a `cocoindex.QueryOutput` object—making it easy to perform fast, ranked searches across your indexed HackerNews content.
440
440
441
441
## Running Your HackerNews Custom Source
@@ -480,7 +480,7 @@ cocoindex update -L main
480
480
481
481
## 3. Troubleshoot & Inspect with CocoInsight
482
482
483
-
CocoInsight lets you **visualize and debug your flow**, see the lineage of your data, and understand what’s happening under the hood.
483
+
CocoInsight lets you **visualize and debug your flow**, see the lineage of your data, and understand what’s happening under the hood.
484
484
485
485
Start the server:
486
486
@@ -491,9 +491,9 @@ cocoindex server -ci main
491
491
Then open the UI in your browser: [`https://cocoindex.io/cocoinsight`](https://cocoindex.io/cocoinsight)
492
492
493
493
> CocoInsight has zero pipeline data retention — it’s safe for debugging and inspecting your flows locally.
494
-
>
494
+
>
495
495
496
-
Note that this requires QueryHandler setup in previous step.
496
+
Note that this requires QueryHandler setup in previous step.
497
497
498
498
499
499
## What You Can Build Next
@@ -547,8 +547,8 @@ Custom Sources extend this model to *any* API — internal, external, legacy, or
547
547
This unlocks a simple but powerful pattern:
548
548
549
549
> If you can fetch it, CocoIndex can index it, diff it, and sync it.
550
-
>
550
+
>
551
551
552
552
## ⭐ Try It, Fork It, Star It
553
553
554
-
If you found this useful, a **star on [GitHub](https://github.com/cocoindex-io/cocoindex)** means a lot — it helps others discover CocoIndex and supports further development.
554
+
If you found this useful, a **star on [GitHub](https://github.com/cocoindex-io/cocoindex)** means a lot — it helps others discover CocoIndex and supports further development.
In the age of information overload, understanding what's trending—and why—is crucial for developers, researchers, and data engineers. HackerNews is one of the most influential tech communities, but manually tracking emerging topics across thousands of threads and comments is practically impossible.
23
23
24
-
What if you could automatically index HackerNews content, extract topics using AI, and query trending discussions in real-time? That's exactly what CocoIndex enables through its [**Custom Sources**](https://cocoindex.io/blogs/custom-source) framework
24
+
What if you could automatically index HackerNews content, extract topics using AI, and query trending discussions in real-time? That's exactly what CocoIndex enables through its [**Custom Sources**](https://cocoindex.io/blogs/custom-source) framework
25
25
combined with [LLM-powered extraction](https://cocoindex.io/docs/ai/llm).
26
26
27
27
In this post, we'll explore the **HackerNews Trending Topics** example, a production-ready pipeline that demonstrates some of the most powerful concepts in CocoIndex: incremental data syncing, LLM-powered information extraction, and queryable indexes.
@@ -120,7 +120,7 @@ class _HackerNewsThreadKey(NamedTuple):
120
120
thread_id: str
121
121
```
122
122
123
-
CocoIndex recommends to use stable keys. HN thread IDs are perfect keys.
123
+
CocoIndex recommends using stable keys. HN thread IDs are perfect keys.
124
124
125
125
#### Value Type
126
126
@@ -319,7 +319,7 @@ The genius of Custom Sources is the **separation of discovery from fetching**.
This block sets up a CocoIndex flow that fetches HackerNews stories and prepares them forindexing. It registers a flow called **HackerNewsTrendingTopics**, then adds a `HackerNewsSource` that retrieves up to 200 stories and refreshes every 30 seconds, storing the resultin`data_scope["threads"]`for downstream steps.
421
+
This block sets up a CocoIndex flow that fetches HackerNews stories and prepares them forindexing. It registers a flow called **HackerNewsTrendingTopics**, then adds a `HackerNewsSource` that retrieves up to 200 stories and refreshes every 30 seconds, storing the resultin`data_scope["threads"]`for downstream steps.
@@ -502,7 +502,7 @@ with data_scope["threads"].row() as thread:
502
502
)
503
503
```
504
504
505
-
This block processes each HackerNews thread as it flows through the pipeline. Inside `data_scope["threads"].row()`, each `thread` represents a single story record.
505
+
This block processes each HackerNews thread as it flows through the pipeline. Inside `data_scope["threads"].row()`, each `thread` represents a single story record.
506
506
507
507
- We use an LLM (`gpt-5-mini`) to extract semantic **topics** from the thread's text by applying `ExtractByLlm`, which returns a list of `Topic` objects.
508
508
- We use `message_index` to collect relevant metadata for this thread.
This function finds all HackerNews threads related to a given topic and ranks them. It looks up the topic table in the database, calculates a **score** for each thread (higher for main threads, lower for comments), and finds the most recent mention.
671
+
This function finds all HackerNews threads related to a given topic and ranks them. It looks up the topic table in the database, calculates a **score** for each thread (higher for main threads, lower for comments), and finds the most recent mention.
672
672
673
673
It then returns a list of threads with their URL, score, and latest mention time, so you can see which threads are most relevant or trending for that topic.
674
674
@@ -681,7 +681,7 @@ By weighting 5:1, the system prioritizes original discussions over tangential me
681
681
682
682
### get_trending_topics(limit) → Top 20 topics by score
683
683
684
-
This function finds the **most popular topics** on HackerNews.
684
+
This function finds the **most popular topics** on HackerNews.
685
685
686
686
```python
687
687
@hackernews_trending_topics_flow.query_handler()
@@ -755,7 +755,7 @@ cocoindex server -ci main
755
755
Then open the UI in your browser: [**`https://cocoindex.io/cocoinsight`**](https://cocoindex.io/cocoinsight)
756
756
757
757
> CocoInsight has zero pipeline data retention — it’s safe for debugging and inspecting your flows locally.
758
-
>
758
+
>
759
759
760
760
Note that this requires QueryHandler setup in previous step.
761
761
@@ -823,23 +823,23 @@ By integrating with an AI agent framework, the indexed topics and threads become
823
823
## Why Use CocoIndex for This?
824
824
825
825
1. **Incremental Sync**
826
-
826
+
827
827
Traditional pipelines fetch everything repeatedly. CocoIndex fetches only new or updated threads, dramatically reducing API calls and latency.
828
-
828
+
829
829
2. **Declarative & Modular**
830
-
830
+
831
831
Flows, collectors, and query handlers are modular. You can add new sources (Reddit, Twitter, internal chat logs) or new transformations (summaries, sentiment analysis, embeddings) without rewriting the entire pipeline.
832
-
832
+
833
833
3. **LLM Integration is Seamless**
834
-
834
+
835
835
CocoIndex treats LLMs as first-class citizens for structured extraction. You don’t need complex glue code — the framework handles transformation, type enforcement, and storage.
836
-
836
+
837
837
4. **Queryable Structured Index**
838
-
838
+
839
839
Topics and messages are stored in Postgres, ready for SQL queries or API-based search. You can serve both analytics dashboards and AI agents from the same structured store.
840
-
840
+
841
841
5. **Supports Continuous Workflows**
842
-
842
+
843
843
CocoIndex pipelines can run live, with real-time updates every few seconds. Combine this with AI agents, and you have a **self-updating knowledge system** that reasons over the latest information.
844
844
845
845
@@ -849,4 +849,4 @@ CocoIndex is designed for systems that need to continuously monitor, detect, and
849
849
850
850
## Support Us ❤️
851
851
852
-
Enjoying CocoIndex? Give us a [⭐️ on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with your peers. Every star helps more developers discover the project and strengthens the community.
852
+
Enjoying CocoIndex? Give us a [⭐️ on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with your peers. Every star helps more developers discover the project and strengthens the community.
0 commit comments